Performance and Power Analysis of High-Density Multi-GPGPU Architectures: A Preliminary Case Study

TLDR

High‑density GPGPU architectures are emerging as promising, high‑performance, power‑efficient alternatives to homogeneous systems, offering raw compute power that exceeds current supercomputers and could serve as next‑generation supercomputer platforms. This study investigates the compute performance and power consumption of common benchmarks and scientific applications on such heterogeneous systems to address the increased development complexity. Using a Dell C4130 server equipped with up to four NVIDIA K80 GPUs, the authors benchmarked the high‑performance Linpack (HPL) and molecular dynamics simulators NAMD, LAMMPS, and GROMACS. Compared to a dual‑Xeon E5‑2690 v3 system, the 4‑GPU server achieved 7 TFLOPS (9× faster) with 4 GFLOPS/W in HPL, and delivered 7.8×, 16×, and 3.3× speedups for NAMD, LAMMPS, and GROMACS respectively, while consuming 2.3–2.6× more power, demonstrating superior performance and power efficiency in a space‑efficient design.

Abstract

A system architecture with high-density general purpose graphic processing unit (GPGPU) is emerging as a promising solution that can offer high compute performance and performance-per-watt for building cluster supercomputers. The raw compute power of these heterogeneous systems greatly exceeds the current prevailing homogenous systems, motivating their rapid adoption. These heterogeneous systems do however increase the complexity of developing parallel applications and there is a need to investigate the compute performances and associated power consumption of common benchmarks and scientific computing applications. In this paper, we present the performance and power studies through using the Dell C4130 server that integrates up to 4 GPGPU cards and NVIDIA GPGPU K80 is used. The high performance Linpack (HPL) and molecular dynamics (MD) simulators including NAMD, LAMMPS and GROMACS are tested. Through comparing 4-K80 and 2-Xeon E5-2690 v3 systems, we show that: (1) for HPL tests, the 4- GPU server delivers up to 7 TFLOPS that is 9 times faster than the 2-CPU system and its power efficiency is 4 GFLOPS per Watt, (2) for MD tests, NAMD on 4-GPU server achieves 7.8 times speedup and it uses 2.3 times power consumption compared to 2-CPU system, and LAMMPS achieves 16 times speedup and it uses 2.6 times power consumption, and GROMACS achieves 3.3 times speed up and it uses 2.6 times power consumption. These preliminary results demonstrated that the novel high-density multi-GPGPU architecture offers high performances for computing intensive applications and molecular simulators with superior power efficiencies in a space efficient design. In future, such heterogeneous architecture could be a powerful alternative solution for next generation supercomputer systems.

References

Page 1

	Year	Citations

Page 1