Publication | Closed Access
Orchestrated scheduling and prefetching for GPGPUs
185
Citations
39
References
2013
Year
Unknown Venue
Cluster ComputingGpu ArchitectureEngineeringGpu BenchmarkingEdge ComputingLong Memory LatenciesHigh-performance ArchitectureComputer EngineeringComputer ArchitectureParallel ProgrammingComputer ScienceThread SchedulingGpgpu ArchitecturesParallel ComputingGpu ClusterGpu Computing
In this paper, we present techniques that coordinate the thread scheduling and prefetching decisions in a General Purpose Graphics Processing Unit (GPGPU) architecture to better tolerate long memory latencies. We demonstrate that existing warp scheduling policies in GPGPU architectures are unable to effectively incorporate data prefetching. The main reason is that they schedule consecutive warps, which are likely to access nearby cache blocks and thus prefetch accurately for one another, back-to-back in consecutive cycles. This either 1) causes prefetches to be generated by a warp too close to the time their corresponding addresses are actually demanded by another warp, or 2) requires sophisticated prefetcher designs to correctly predict the addresses required by a future "far-ahead" warp while executing the current warp.
| Year | Citations | |
|---|---|---|
Page 1
Page 1