Publication | Closed Access
Adaptive heterogeneous scheduling for integrated GPUs
102
Citations
37
References
2014
Year
Unknown Venue
Cluster ComputingHeterogeneous ComputingEngineeringComputer ArchitectureAsymmetric SchedulingGpu ComputingIntegrated GpusCompute KernelData ScienceParallel ComputingComputer EngineeringHeterogeneous SystemsComputer ScienceGpu ClusterMultiple KernelsGpu ArchitectureProgram AnalysisEdge ComputingCloud ComputingParallel ProgrammingLow-overhead Online Profiling
Integrated CPU‑GPU processors share resources such as physical memory, reducing communication costs and enabling programmers to exploit both cores for a single application. This work introduces adaptive scheduling techniques for integrated CPU‑GPU processors. The authors propose two online profiling‑based schedulers—naïve and asymmetric—where the asymmetric variant automatically partitions data‑parallel kernel work between CPU and GPU using low‑overhead profiling, adapts to load imbalance, varying kernel workloads, and multiple kernels, and operates without offline processing.
Many processors today integrate a CPU and GPU on the same die, which allows them to share resources like physical memory and lowers the cost of CPU-GPU communication. As a consequence, programmers can effectively utilize both the CPU and GPU to execute a single application. This paper presents novel adaptive scheduling techniques for integrated CPU-GPU processors. We present two online profiling-based scheduling algorithms: naïve and asymmetric. Our asymmetric scheduling algorithm uses low-overhead online profiling to automatically partition the work of data-parallel kernels between the CPU and GPU without input from application developers. It does profiling on the CPU and GPU in a way that it doesn't penalize GPU-centric workloads that run significantly faster on the GPU. It adapts to application characteristics by addressing: 1) load imbalance via irregularity caused by, e.g., data-dependent control flow, 2) different amounts of work on each kernel call, and 3) multiple kernels with different characteristics. Unlike many existing approaches primarily targeting NVIDIA discrete GPUs, our scheduling algorithm does not require offline processing.
| Year | Citations | |
|---|---|---|
Page 1
Page 1