Publication | Closed Access
Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction
108
Citations
28
References
2016
Year
Unknown Venue
Cluster ComputingEngineeringComputer ArchitectureMap-reduceEfficient Data ReductionHardware ArchitectureSharp TechnologyData ScienceParallel ComputingData ManagementMassively-parallel ComputingComputer EngineeringComputer ScienceData-intensive ComputingSystem ParallelismScalable ComputingCollective Operation ProcessingEdge ComputingParallel ProcessingCloud ComputingParallel Performance EvaluationParallel ProgrammingDistributed Data StoreData-level ParallelismMassive Data Processing
Increased system size and a greater reliance on utilizing system parallelism to achieve computational needs, requires innovative system architectures to meet the simulation challenges. As a step towards a new network class of co-processors - intelligent network devices, which manipulate data traversing the data-center network, this paper describes the SHArP technology designed to offload collective operation processing to the network. This is implemented in Mellanox's SwitchIB-2 ASIC, using innetwork trees to reduce data from a group of sources, and to distribute the result. Multiple parallel jobs with several partially overlapping groups are supported each with several reduction operations in-flight. Large performance enhancements are obtained, with an improvement of a factor of 2.1 for an eight byte MPI_Allreduce() operation on 128 hosts, going from 6.01 to 2.83 microseconds. Pipelining is used for an improvement of a factor of 3.24 in the latency of a 4096 byte MPI_Allreduce() operations, declining from 46.93 to 14.48 microseconds.
| Year | Citations | |
|---|---|---|
Page 1
Page 1