Dark silicon and the end of multicore scaling

TLDR

Since 2005, processor designers have increased core counts to exploit Moore’s Law, but the collapse of Dennard scaling threatens to limit multicore scaling just as single‑core scaling has been curtailed. This paper models multicore scaling limits by integrating device scaling, single‑core scaling, and multicore scaling to estimate speedup potential for parallel workloads over the next five technology generations. Using ITRS projections and conservative device parameters, Pareto‑optimal frontiers from 150 processors, and a detailed performance model of upper‑bound performance and lower‑bound core power, the study evaluates single‑threaded CPU‑like and massively threaded GPU‑like multicore designs across symmetric, asymmetric, dynamic, and composed topologies. The analysis shows multicore scaling is power‑limited, requiring 21 % of a chip to be powered off at 22 nm and over 50 % at 8 nm, yielding only a 7.9× average speedup through 2024 and leaving a ~24‑fold gap from the target of doubling performance per generation.

Abstract

Since 2005, processor designers have increased core counts to exploit Moore's Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9x average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation.

References

Page 1

	Year	Citations

Page 1