Publication | Closed Access
Enabling coordinated register allocation and thread-level parallelism optimization for GPUs
71
Citations
44
References
2015
Year
Unknown Venue
Hardware SecurityGpu ArchitectureEngineeringCompute KernelGpu BenchmarkingParallel Performance EvaluationComputer EngineeringComputer ArchitectureMaximum Thread-level ParallelismMemory AccessThread SwitchingParallel ProgrammingComputer ScienceParallel ComputingThread-level Parallelism OptimizationGpu ClusterGpu Computing
The key to high performance on GPUs lies in the massive threading to enable thread switching and hide the latency of function unit and memory access. However, running with the maximum thread-level parallelism (TLP) does not necessarily lead to the optimal performance due to the excessive thread contention for cache resource. As a result, thread throttling techniques are employed to limit the number of threads that concurrently execute to preserve the data locality. On the other hand, GPUs are equipped with a large register file to enable fast context switch between threads. However, thread throttling techniques that are designed to mitigate cache contention, lead to under utilization of registers. Register allocation is a significant factor for performance as it not just determines the single-thread performance, but indirectly affects the TLP.
| Year | Citations | |
|---|---|---|
Page 1
Page 1