Publication | Closed Access
Effective sampling-driven performance tools for GPU-accelerated supercomputers
24
Citations
17
References
2013
Year
Unknown Venue
Cluster ComputingEngineeringGpu BenchmarkingComputer ArchitectureGpu ComputingHardware SecurityCompute KernelSystems EngineeringParallel ComputingComputer EngineeringGpu ComponentsComputer ScienceGpu ClusterGpu ArchitecturePerformance AnalysisGpu-accelerated SupercomputersProgram AnalysisParallel ProgrammingCuda Initialization
Performance analysis of GPU-accelerated systems requires a system-wide view that considers both CPU and GPU components. In this paper, we describe how to extend system-wide, sampling-based performance analysis methods to GPU-accelerated systems. Since current GPUs do not support sampling, our implementation required careful coordination of instrumentation-based performance data collection on GPUs with sampling-based methods employed on CPUs. In addition, we also introduce a novel technique for analyzing systemic idleness in CPU/GPU systems. We demonstrate the effectiveness of our techniques with application case studies on Titan and Keeneland. Some of the highlights of our case studies are: 1) we improved performance for LULESH 1.0 by 30%, 2) we identified a hardware performance problem on Keeneland, 3) we identified a scaling problem in LAMMPS derived from CUDA initialization, and 4) we identified a performance problem that is caused by GPU synchronization operations that suffer delays due to blocking system calls.
| Year | Citations | |
|---|---|---|
Page 1
Page 1