Publication | Closed Access
CGPredict
17
Citations
15
References
2017
Year
EngineeringGpu BenchmarkingComputer ArchitectureEmbedded SystemsEmbedded Gpu ArchitectureGpu ComputingHardware SecurityCompute KernelEmbedded GpusCompilersParallel ComputingAnalytical Framework CgpredictComputer EngineeringHeterogeneous SystemsComputer ScienceGpu ArchitectureHardware AccelerationProgram AnalysisParallel Programming
Heterogeneous multiprocessor system-on-chip architectures are endowed with accelerators such as embedded GPUs and FPGAs capable of general-purpose computation. The application developers for such platforms need to carefully choose the accelerator with the maximum performance benefit. For a given application, usually, the reference code is specified in a high-level single-threaded programming language such as C. The performance of an application kernel on an accelerator is a complex interplay among the exposed parallelism, the compiler, and the accelerator architecture. Thus, determining the performance of a kernel requires its redevelopment into each accelerator-specific language, causing substantial wastage of time and effort. To aid the developer in this early design decision, we present an analytical framework CGPredict to predict the performance of a computational kernel on an embedded GPU architecture from un-optimized, single-threaded C code. The analytical approach provides insights on application characteristics which suggest further application-specific optimizations. The estimation error is as low as 2.66% (average 9%) compared to the performance of the same kernel written in native CUDA code running on NVIDIA Kepler embedded GPU. This low performance estimation error enables CGPredict to provide an early design recommendation of the accelerator starting from C code.
| Year | Citations | |
|---|---|---|
Page 1
Page 1