Publication | Open Access
Managing GPU Concurrency in Heterogeneous Architectures
129
Citations
46
References
2014
Year
Unknown Venue
Cluster ComputingHeterogeneous ComputingEngineeringGpu ConcurrencyHomogeneous ArchitecturesComputer ArchitectureGpu ComputingHeterogeneous ArchitecturesParallel ComputingComputer EngineeringHeterogeneous SystemsComputer ScienceGpu ClusterGpu ArchitectureEdge ComputingCloud ComputingMany-core ArchitectureParallel ProgrammingSystem Software
Heterogeneous CPU–GPU systems are expected to dominate many computing domains, but their design is complicated by the need to balance resource utilization and avoid interference between CPU and GPU workloads. The authors show that GPU applications monopolize shared resources and propose an integrated concurrency management strategy to address this problem. The strategy dynamically adjusts GPU thread‑level parallelism based on GPU core state and system‑wide memory and network congestion, offering two schemes—CM‑CPU to boost CPU performance and CM‑BAL to balance CPU and GPU performance. Evaluations show CM‑CPU raises CPU performance by 24% at the cost of an 11% GPU slowdown, while CM‑BAL improves both CPU and GPU performance by 7% and enables users to trade off performance between the two.
Heterogeneous architectures consisting of general-purpose CPUs and throughput-optimized GPUs are projected to be the dominant computing platforms for many classes of applications. The design of such systems is more complex than that of homogeneous architectures because maximizing resource utilization while minimizing shared resource interference between CPU and GPU applications is difficult. We show that GPU applications tend to monopolize the shared hardware resources, such as memory and network, because of their high thread-level parallelism (TLP), and discuss the limitations of existing GPU-based concurrency management techniques when employed in heterogeneous systems. To solve this problem, we propose an integrated concurrency management strategy that modulates the TLP in GPUs to control the performance of both CPU and GPU applications. This mechanism considers both GPU core state and system-wide memory and network congestion information to dynamically decide on the level of GPU concurrency to maximize system performance. We propose and evaluate two schemes: one (CM-CPU) for boosting CPU performance in the presence of GPU interference, the other (CM-BAL) for improving both CPU and GPU performance in a balanced manner and thus overall system performance. Our evaluations show that the first scheme improves average CPU performance by 24%, while reducing average GPU performance by 11%. The second scheme provides 7% average performance improvement for both CPU and GPU applications. We also show that our solution allows the user to control performance trade-offs between CPUs and GPUs.
| Year | Citations | |
|---|---|---|
Page 1
Page 1