Publication | Closed Access
Round-Robin Synchronization: Mitigating Communication Bottlenecks in Parameter Servers
92
Citations
33
References
2019
Year
Cluster ComputingEngineeringMachine LearningDistributed AlgorithmsFault-tolerant MessagingRound-robin SynchronizationParallel AlgorithmsSynchronization ProtocolComputing SystemsParallel ComputingDistributed ModelGpu ClustersRefinement QualityComputer EngineeringDistributed SystemsComputer ScienceDeep LearningGpu ClusterDistributed ComputingParallel ProcessingParallel LearningParallel ProgrammingAsynchronous Systems
Deep learning is usually performed in GPU clusters where each worker machine iteratively refines the model parameters by communicating the update with the Parameter Server (PS). More often than not, workers communicate in a synchronous manner, so as to avoid using out-of-dated parameters and make high-quality refinement in each iteration. However, as all workers synchronize with the PS simultaneously, the communication becomes a severe bottleneck. To address this problem, in this paper we propose the Round-Robin Synchronous Parallel (R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> SP) scheme, which coordinates workers to make updates in an evenly-gapped, round-robin manner. This way, R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> SP can minimize the network contention at a minimum cost of the refinement quality. We further extend R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> SP to heterogeneous clusters by adaptively tuning the batch size of each worker based on its processing capability. We have implemented R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> SP as a ready-to-use python library for status-quo deep learning frameworks. EC2 deployment in GPU clusters show that R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> SP effectively mitigates the communication bottlenecks, accelerating the training of popular image classification models by up to 25%.
| Year | Citations | |
|---|---|---|
Page 1
Page 1