Understanding Off-Chip Memory Contention of Parallel Programs in Multicore Systems

Abstract

Memory contention is an important performance issue in current multicore architectures. In this paper, we focus on understanding how off-chip memory contention affects the performance of parallel applications. Using measurements conducted on state-of-the-art multicore systems, we observed that off-chip memory traffic is not always bursty, as it was previously reported in literature. Burstiness depends on the problem size. Small problem sizes lead to bursty memory traffic, and generate small off-chip contention. In contrast, when large program sizes cause memory contention, the memory traffic is non-bursty. Based on these observations, we propose an analytical model that relates the growth of memory contention to the number of active cores and to the problem size, for both uniform (UMA) and non-uniform memory access (NUMA) systems. Our model differs from measurements on average by less than 14\%. Contention for off-chip memory grows exponentially with the number of active cores, but adding additional memory controllers reduces the memory contention. For programs such as the penta diagonal solver SP from NPB benchmark, with a large matrix of $162^3$ elements (input size C), our analysis shows that memory contention increases the total number of processor cycles to execute the program by more than ten times on a machine with 24 cores.

References

Page 1

	Year	Citations

Page 1