Publication | Closed Access
CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application
102
Citations
5
References
2013
Year
Unknown Venue
Performance Case StudiesEngineeringGpu BenchmarkingOpenacc SpecificationComputer ArchitectureGpu ComputingHardware SecurityCompute KernelHigh-performance ArchitectureModeling And SimulationParallel ComputingProgramming InterfaceCuda CodeKernel BenchmarksComputer EngineeringComputer ScienceGpu ArchitectureHardware AccelerationParallel ProgrammingCuda Vs Openacc
OpenACC is a new accelerator programming interface that provides a set of OpenMP-like loop directives for the programming of accelerators in an implicit and portable way. It allows the programmer to express the offloading of data and computations to accelerators, such that the porting process for legacy CPU-based applications can be significantly simplified. This paper focuses on the performance aspects of OpenACC using two micro benchmarks and one real-world computational fluid dynamics application. Both evaluations show that in general OpenACC performance is approximately 50\% lower than CUDA. However, for some applications it can reach up to 98\% with careful manual optimizations. The results also indicate several limitations of the OpenACC specification that hamper full use of the GPU hardware resources, resulting in a significant performance gap when compared to a fully tuned CUDA code. The lack of a programming interface for the shared memory in particular results in as much as three times lower performance.
| Year | Citations | |
|---|---|---|
Page 1
Page 1