LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed\n Learning

Abstract

This paper presents a new class of gradient methods for distributed machine\nlearning that adaptively skip the gradient calculations to learn with reduced\ncommunication and computation. Simple rules are designed to detect\nslowly-varying gradients and, therefore, trigger the reuse of outdated\ngradients. The resultant gradient-based algorithms are termed Lazily Aggregated\nGradient --- justifying our acronym LAG used henceforth. Theoretically, the\nmerits of this contribution are: i) the convergence rate is the same as batch\ngradient descent in strongly-convex, convex, and nonconvex smooth cases; and,\nii) if the distributed datasets are heterogeneous (quantified by certain\nmeasurable constants), the communication rounds needed to achieve a targeted\naccuracy are reduced thanks to the adaptive reuse of lagged gradients.\nNumerical experiments on both synthetic and real data corroborate a significant\ncommunication reduction compared to alternatives.\n