Chai: Collaborative heterogeneous applications for integrated-architectures

TLDR

Heterogeneous system architectures are moving toward tighter integration with shared virtual memory, memory coherence, and system‑wide atomics, and programming languages such as OpenCL 2.0, CUDA 8.0, and C++ AMP are emerging to exploit these features for productive CPU‑GPU collaboration. The paper aims to evaluate these new architectures and programming languages by providing a benchmark suite that enables researchers to experiment with close CPU‑GPU collaboration. The authors introduce Chai, a suite of 14 benchmarks that span data partitioning, fine‑grain, and coarse‑grain collaboration patterns, each implemented in OpenCL, C++ AMP, and CUDA with and without the latest heterogeneous architecture features. The study characterizes benchmark performance across input sizes and collaboration modes, showing how emerging heterogeneous architecture features affect application performance.

Abstract

Heterogeneous system architectures are evolving towards tighter integration among devices, with emerging features such as shared virtual memory, memory coherence, and systemwide atomics. Languages, device architectures, system specifications, and applications are rapidly adapting to the challenges and opportunities of tightly integrated heterogeneous platforms. Programming languages such as OpenCL 2.0, CUDA 8.0, and C++ AMP allow programmers to exploit these architectures for productive collaboration between CPU and GPU threads. To evaluate these new architectures and programming languages, and to empower researchers to experiment with new ideas, a suite of benchmarks targeting these architectures with close CPU-GPU collaboration is needed. In this paper, we classify applications that target heterogeneous architectures into generic collaboration patterns including data partitioning, fine-grain task partitioning, and coarse-grain task partitioning. We present Chai, a new suite of 14 benchmarks that cover these patterns and exercise different features of heterogeneous architectures with varying intensity. Each benchmark in Chai has seven different implementations in different programming models such as OpenCL, C++ AMP, and CUDA, and with and without the use of the latest heterogeneous architecture features. We characterize the behavior of each benchmark with respect to varying input sizes and collaboration combinations, and evaluate the impact of using the emerging features of heterogeneous architectures on application performance.

References

Page 1

	Year	Citations

Page 1