Publication | Closed Access
Spare register aware prefetching for graph algorithms on GPUs
50
Citations
40
References
2014
Year
Unknown Venue
Cluster ComputingIrregular Control FlowGpu ArchitectureEngineeringGraph TheoryGraph AlgorithmsComputer ArchitectureComputer EngineeringMore Graph AlgorithmsParallel ProgrammingComputer ScienceParallel ComputingGpu ClusterGpu ComputingSpare Register
More and more graph algorithms are being GPU enabled. Graph algorithm implementations on GPUs have irregular control flow and are memory-intensive with many irregular/data-dependent memory accesses. Due to these factors graph algorithms on GPUs have low execution efficiency. In this work we propose a mechanism to improve the execution efficiency of graph algorithms by improving their memory access latency tolerance. We propose a mechanism for prefetching data for load pairs that have one load dependent on the other - such pairs are common in graph algorithms. Our mechanism detects the target loads in hardware and injects instructions into the pipeline to prefetch data into spare registers that are not being used by any active threads. By prefetching data into registers, early eviction of prefetched data can be eliminated. We also propose a mechanism that uses the compiler to identify the target loads. Our mechanism improves performance over no prefetching by 10% on average and upto 51% for nine memory intensive graph algorithm kernels.
| Year | Citations | |
|---|---|---|
Page 1
Page 1