Publication | Closed Access
Performance portable GPU code generation for matrix multiplication
22
Citations
18
References
2016
Year
Unknown Venue
EngineeringGpu BenchmarkingNew DeviceCompiler TechnologyComputer ArchitectureMatrix MultiplicationSoftware EngineeringGpu ComputingHardware SecurityParallel AcceleratorsParallel ComputingCompilersProgramming LanguagesParallelizing CompilerCompiler SupportComputer EngineeringComputer ScienceOptimizing CompilerGpu ArchitectureHardware AccelerationProgram AnalysisParallel ProgrammingNinja Programmers
Parallel accelerators such as GPUs are notoriously hard to program; exploiting their full performance potential is a job best left for ninja programmers. High-level programming languages coupled with optimizing compilers have been proposed to attempt to address this issue. However, they rely on device-specific heuristics or hard-coded library implementations to achieve good performance resulting in non-portable solutions that need to be re-optimized for every new device.
| Year | Citations | |
|---|---|---|
Page 1
Page 1