Concepedia

Publication | Closed Access

Enabling and scaling the HPCG benchmark on the newest generation Sunway supercomputer with 42 million heterogeneous cores

41

Citations

37

References

2021

Year

Abstract

We study and evaluate performance optimization techniques for the HPCG benchmark on the newest generation Sunway supercomputer. Specifically, a two-level blocking scheme is proposed to expose adequate parallelism in the symmetric Gauss-Seidel kernel while keeping a fast convergence rate, a fine-grained kernel fusion technique is developed to alleviate the bandwidth load on local storage with small capacity, and a low overhead thread collaboration method is presented to efficiently move data between threads and hide its cost with data transfer operations. Test results show that the optimized HPCG code is able to exploit 73.0% of the theoretical memory bandwidth, and scale to over 42 million heterogeneous cores with 95.5% weak-scaling efficiency and 5.91 Pflop/s performance. We also study how the performance can be improved if the specific rules of HPCG are not fully obeyed, and design dependency preserving parallelization and vectorization methods, further boosting performance to 27.6 Pflop/s.

References

YearCitations

Page 1