Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes

TLDR

HPC systems increasingly use hierarchical hardware, with multi‑core shared‑memory nodes linked by a network. This work compares pure MPI, pure OpenMP, and hybrid MPI+OpenMP programming models on such hierarchical architectures and proposes future standardization directions. The authors analyze the strengths and challenges of each model, evaluating communication, memory usage, and load balance across the node interconnect and within nodes. Hybrid models outperform others in specific scenarios by reducing communication and memory overhead and improving load balance, while machine topology strongly influences performance and must be considered in all applications.

Abstract

Today most systems in high-performance computing (HPC) feature a hierarchical hardware design: Shared memory nodes with several multi-core CPUs are connected via a network infrastructure. Parallel programming must combine distributed memory parallelization on the node interconnect with shared memory parallelization inside each node. We describe potentials and challenges of the dominant programming models on hierarchically structured hardware: Pure MPI (Message Passing Interface), pure OpenMP (with distributed shared memory extensions) and hybrid MPI+OpenMP in several flavors. We pinpoint cases where a hybrid programming model can indeed be the superior solution because of reduced communication needs and memory consumption, or improved load balance. Furthermore we show that machine topology has a significant impact on performance for all parallelization strategies and that topology awareness should be built into all applications in the future. Finally we give an outlook on possible standardization goals and extensions that could make hybrid programming easier to do with performance in mind.

References

Page 1

	Year	Citations

Page 1