Round-Robin Synchronization: Mitigating Communication Bottlenecks in Parameter Servers

Abstract

Deep learning is usually performed in GPU clusters where each worker machine iteratively refines the model parameters by communicating the update with the Parameter Server (PS). More often than not, workers communicate in a synchronous manner, so as to avoid using out-of-dated parameters and make high-quality refinement in each iteration. However, as all workers synchronize with the PS simultaneously, the communication becomes a severe bottleneck. To address this problem, in this paper we propose the Round-Robin Synchronous Parallel (R 2 SP) scheme, which coordinates workers to make updates in an evenly-gapped, round-robin manner. This way, R 2 SP can minimize the network contention at a minimum cost of the refinement quality. We further extend R 2 SP to heterogeneous clusters by adaptively tuning the batch size of each worker based on its processing capability. We have implemented R 2 SP as a ready-to-use python library for status-quo deep learning frameworks. EC2 deployment in GPU clusters show that R 2 SP effectively mitigates the communication bottlenecks, accelerating the training of popular image classification models by up to 25%.

References

Page 1

	Year	Citations

Page 1