Publication | Closed Access
Revisiting sorting for GPGPU stream architectures
146
Citations
41
References
2010
Year
Unknown Venue
Cluster ComputingGpu ArchitectureEngineeringHardware AccelerationGpgpu Stream ArchitecturesGpu BenchmarkingEfficient StrategiesComputer EngineeringComputer ArchitectureLarge SequencesParallel ProgrammingComputer ScienceGpgpu Stream ProcessorsParallel ComputingGpu ClusterGpu Computing
This poster presents efficient strategies for sorting large sequences of fixed-length keys (and values) using GPGPU stream processors. Compared to the state-of-the-art, our radix sorting methods exhibit speedup of at least 2x for all generations of NVIDIA GPGPUs, and up to 3.7x for current GT200-based models. Our implementations demonstrate sorting rates of 482 million key-value pairs per second, and 550 million keys per second (32-bit). For this domain of sorting problems, we believe our sorting primitive to be the fastest available for any fully-programmable microarchitecture.
| Year | Citations | |
|---|---|---|
Page 1
Page 1