Communication Efficient Distributed Machine Learning with the Parameter Server

TLDR

The paper introduces a third‑generation parameter‑server framework and a new algorithm that exploits it to solve non‑convex, non‑smooth problems with convergence guarantees. The framework incorporates two relaxations that balance system performance and algorithm efficiency, and is evaluated on large‑scale l1‑regularized logistic regression and ICA tasks using 636 TB of real data. Experiments show the framework scales machine learning to larger problems and systems than previously achieved.

Abstract

This paper describes a third-generation parameter server framework for distributed machine learning. This framework offers two relaxations to balance system performance and algorithm efficiency. We propose a new algorithm that takes advantage of this framework to solve non-convex non-smooth problems with convergence guarantees. We present an in-depth analysis of two large scale machine learning problems ranging from l1 -regularized logistic regression on CPUs to reconstruction ICA on GPUs, using 636TB of real data with hundreds of billions of samples and dimensions. We demonstrate using these examples that the parameter server framework is an effective and straightforward way to scale machine learning to larger problems and systems than have been previously achieved.

References

Page 1

	Year	Citations

Page 1