Concepedia

Publication | Closed Access

Spare register aware prefetching for graph algorithms on GPUs

50

Citations

40

References

2014

Year

Abstract

More and more graph algorithms are being GPU enabled. Graph algorithm implementations on GPUs have irregular control flow and are memory-intensive with many irregular/data-dependent memory accesses. Due to these factors graph algorithms on GPUs have low execution efficiency. In this work we propose a mechanism to improve the execution efficiency of graph algorithms by improving their memory access latency tolerance. We propose a mechanism for prefetching data for load pairs that have one load dependent on the other - such pairs are common in graph algorithms. Our mechanism detects the target loads in hardware and injects instructions into the pipeline to prefetch data into spare registers that are not being used by any active threads. By prefetching data into registers, early eviction of prefetched data can be eliminated. We also propose a mechanism that uses the compiler to identify the target loads. Our mechanism improves performance over no prefetching by 10% on average and upto 51% for nine memory intensive graph algorithm kernels.

References

YearCitations

Page 1