Exploring GPU performance, power and energy-efficiency bounds with Cache-aware Roofline Modeling

Abstract

Optimization, portability and development of GPGPU applications are not trivial tasks, since the capabilities and organization of GPU processing elements and memory subsystem greatly differ from the traditional CPU concepts, as well as among different GPU architectures. This work goes a step further in aiding this process by delivering a set of visual models that can be used by GPU programmers to analyze and improve application performance and energy-efficiency across a range of different GPU devices. For the first time in this paper, the state-of-the-art Cache-aware Roofline Modeling principles are applied for insightful modeling of GPU upper-bounds for performance, power consumption and energy-efficiency. The proposed models are developed by relying on extensive GPU micro-benchmarking aimed at fully exercising the capabilities of GPU functional units and memory hierarchy levels. The models are experimentally validated across 8 GPU devices from 3 different NVIDIA generations, and their benefits are explored when characterizing the behavior of 23 real-world applications from 5 different benchmark suites. Furthermore, the DVFS effects on GPU performance upper-bounds are also analyzed by scaling both core and memory frequencies.

References

Page 1

	Year	Citations

Page 1