Publication | Closed Access
Enabling and scaling the HPCG benchmark on the newest generation Sunway supercomputer with 42 million heterogeneous cores
41
Citations
37
References
2021
Year
Unknown Venue
Cluster ComputingEngineeringComputer ArchitectureHigh Performance ComputingSupercomputer ArchitectureHardware SecurityCompute KernelHigh-performance ArchitectureParallel ComputingAdequate ParallelismManycore ProcessorPerformance Optimization TechniquesMassively-parallel ComputingOpen Source SupercomputingComputer EngineeringComputer ScienceHpcg BenchmarkHeterogeneous CoresParallel Performance EvaluationCloud ComputingMany-core ArchitectureParallel Programming
We study and evaluate performance optimization techniques for the HPCG benchmark on the newest generation Sunway supercomputer. Specifically, a two-level blocking scheme is proposed to expose adequate parallelism in the symmetric Gauss-Seidel kernel while keeping a fast convergence rate, a fine-grained kernel fusion technique is developed to alleviate the bandwidth load on local storage with small capacity, and a low overhead thread collaboration method is presented to efficiently move data between threads and hide its cost with data transfer operations. Test results show that the optimized HPCG code is able to exploit 73.0% of the theoretical memory bandwidth, and scale to over 42 million heterogeneous cores with 95.5% weak-scaling efficiency and 5.91 Pflop/s performance. We also study how the performance can be improved if the specific rules of HPCG are not fully obeyed, and design dependency preserving parallelization and vectorization methods, further boosting performance to 27.6 Pflop/s.
| Year | Citations | |
|---|---|---|
Page 1
Page 1