Concepedia

Publication | Closed Access

Round-Robin Synchronization: Mitigating Communication Bottlenecks in Parameter Servers

92

Citations

33

References

2019

Year

Chen Chen, Wei Wang, Bo Li

Unknown Venue

Abstract

Deep learning is usually performed in GPU clusters where each worker machine iteratively refines the model parameters by communicating the update with the Parameter Server (PS). More often than not, workers communicate in a synchronous manner, so as to avoid using out-of-dated parameters and make high-quality refinement in each iteration. However, as all workers synchronize with the PS simultaneously, the communication becomes a severe bottleneck. To address this problem, in this paper we propose the Round-Robin Synchronous Parallel (R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> SP) scheme, which coordinates workers to make updates in an evenly-gapped, round-robin manner. This way, R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> SP can minimize the network contention at a minimum cost of the refinement quality. We further extend R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> SP to heterogeneous clusters by adaptively tuning the batch size of each worker based on its processing capability. We have implemented R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> SP as a ready-to-use python library for status-quo deep learning frameworks. EC2 deployment in GPU clusters show that R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> SP effectively mitigates the communication bottlenecks, accelerating the training of popular image classification models by up to 25%.

References

YearCitations

Page 1