Publication | Open Access
Byzantine Stochastic Gradient Descent
104
Citations
22
References
2018
Year
Byzantine FailuresMachine LearningEngineeringStochastic OptimizationDistributed AlgorithmsPerformance GuaranteeStochastic GradientsByzantine FaultDistributed OptimizationComputer ScienceDistributed Learning
This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the $m$ machines which allegedly compute stochastic gradients every iteration, an $α$-fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds $\varepsilon$-approximate minimizers of convex functions in $T = \tilde{O}\big( \frac{1}{\varepsilon^2 m} + \frac{α^2}{\varepsilon^2} \big)$ iterations. In contrast, traditional mini-batch SGD needs $T = O\big( \frac{1}{\varepsilon^2 m} \big)$ iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sampling complexity and time complexity.
| Year | Citations | |
|---|---|---|
Page 1
Page 1