Program optimization space pruning for a multithreaded gpu

TLDR

Program optimization for highly-parallel systems has historically been considered an art, with experts doing much of the performance tuning by hand, and the advent of inexpensive, single-chip, massively parallel platforms has increased the number of developers lacking the experience to maximize performance. The paper seeks to provide more structured optimization methods that estimate performance effects and are understandable by most programmers. The authors illustrate the complexity of optimizing applications for a single-chip massively parallel system and propose a relatively simple methodology to reduce the workload involved in the optimization process.

Abstract

Program optimization for highly-parallel systems has historically been considered an art, with experts doing much of the performance tuning by hand. With the introduction of inexpensive, single-chip, massively parallel platforms, more developers will be creating highly-parallel applications for these platforms, who lack the substantial experience and knowledge needed to maximize their performance. This creates a need for more structured optimization methods with means to estimate their performance effects. Furthermore these methods need to be understandable by most programmers. This paper shows the complexity involved in optimizing applications for one such system and one relatively simple methodology for reducing the workload involved in the optimization process.

References

Page 1

	Year	Citations

Page 1