Concepedia

Publication | Closed Access

Performance portable GPU code generation for matrix multiplication

22

Citations

18

References

2016

Year

Abstract

Parallel accelerators such as GPUs are notoriously hard to program; exploiting their full performance potential is a job best left for ninja programmers. High-level programming languages coupled with optimizing compilers have been proposed to attempt to address this issue. However, they rely on device-specific heuristics or hard-coded library implementations to achieve good performance resulting in non-portable solutions that need to be re-optimized for every new device.

References

YearCitations

Page 1