Publication | Closed Access
DaCache
35
Citations
34
References
2015
Year
Unknown Venue
Hardware SecurityCluster ComputingGpu ArchitectureDivergence-aware CacheEngineeringGpu BenchmarkingEdge ComputingHigh-performance ArchitectureCloud ComputingComputer ArchitectureComputer EngineeringCachingWarp SchedulingParallel ProgrammingComputer ScienceGpu Cache BlocksParallel ComputingGpu Computing
The lock-step execution model of GPU requires a warp to have the data blocks for all its threads before execution. However, there is a lack of salient cache mechanisms that can recognize the need of managing GPU cache blocks at the warp level for increasing the number of warps ready for execution. In addition, warp scheduling is very important for GPU-specific cache management to reduce both intra- and inter-warp conflicts and maximize data locality. In this paper, we propose a Divergence-Aware Cache (DaCache) management that can orchestrate L1D cache management and warp scheduling together for GPGPUs. In DaCache, the insertion position of an incoming data block depends on the fetching warp's scheduling priority. Blocks of warps with lower priorities are inserted closer to the LRU position of the LRU-chain so that they have shorter lifetime in cache. This fine-grained insertion policy is extended to prioritize coherent loads over divergent loads so that coherent loads are less vulnerable to both inter- and intra-warp thrashing. DaCache also adopts a constrained replacement policy with L1D bypassing to sustain a good supply of Fully Cached Warps (FCW), along with a dynamic mechanism to adjust FCW during runtime. Our experiments demonstrate that DaCache achieves 40.4% performance improvement over the baseline GPU and outperforms two state-of-the-art thrashing-resistant techniques RRIP and DIP by 40% and 24.9%, respectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1