Publication | Closed Access
Parallel Radix Sort on the AMD Fusion Accelerated Processing Unit
20
Citations
10
References
2013
Year
Unknown Venue
Cluster ComputingEngineeringComputer ArchitectureParallel ImplementationSupercomputer ArchitectureGpu ComputingData ScienceHigh-performance ArchitectureFusion SortParallel ComputingMassively-parallel ComputingComputer EngineeringComputer ScienceParallel SortGpu ArchitectureEdge ComputingParallel ProcessingParallel Performance EvaluationMany-core ArchitectureParallel ProgrammingParallel Radix SortData-level Parallelism
We design, implement and evaluate a parallel radix sort that simultaneously utilizes the CPU and GPU devices on the AMD Fusion APU. The parallel sort, referred to as Fusion Sort, partitions the sort keys between the CPU and GPU devices and utilizes the integrated memory system of the APU to avoid data copying between the devices. We identify three design issues that impact overhead and performance: the granularity of sharing between the two devices, the scheme of data partitioning and the allocation of data in memory regions accessible by each device. We present three variants of Fusion Sort that share data at coarse and fine granularities and with fixed and variable data partitioning schemes. In each variant, data is allocated to minimize the overhead of non-preferred memory accesses of each device. Our evaluation shows that fine-grain sharing with variable data partitioning performs the best. Further, Fusion Sort outperforms CPU-only and GPU-only parallel radix sorts by up to 1.8X and 1.9X respectively. These results demonstrate the viability of the integrated memory system of the APU in the context of sorting.
| Year | Citations | |
|---|---|---|
Page 1
Page 1