Publication | Open Access
CUDA: Compiling and optimizing for a GPU platform
21
Citations
29
References
2012
Year
Hardware SecurityGpu ArchitectureEngineeringCuda ArchitectureCompute KernelGpu BenchmarkingProgram AnalysisComputer EngineeringComputer ArchitectureHigh Level LanguageParallel ProgrammingComputer ScienceGpu PlatformCompilersParallel ComputingGpu ClusterGpu ComputingCuda C.
Graphics processor units (GPUs) have evolved to handle throughput oriented workloads where a large number of parallel threads must make progress. Such threads are organized around shared memory making it possible to synchronize and cooperate on shared data. Current GPUs can run tens of thousands of hardware threads and have been optimized for graphics workloads. Several high level languages have been developed to easily program the GPUs for general purpose computing problems. The use of high-level languages introduces the need for highly optimizing compilers that target the parallel GPU device. In this paper, we present our experiences in developing compilation techniques for a high level language called CUDA C. We explain the CUDA architecture and programming model and provide insights into why certain optimizations are important for achieving high performance on a GPU. In addition to classical optimizations, we present optimizations developed specifically for the CUDA architecture. We evaluate these techniques, and present performance results that show significant improvements on hundreds of kernels as well as applications.
| Year | Citations | |
|---|---|---|
Page 1
Page 1