Publication | Open Access
Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates
542
Citations
27
References
2018
Year
Mathematical ProgrammingByzantine-robust Distributed LearningLarge-scale Distributed LearningMachine LearningData ScienceDistributed AlgorithmsEdge ComputingEngineeringFederated LearningConvex OptimizationLarge Scale OptimizationDistributed Ai SystemProbability TheoryDistributed LearningComputer ScienceSecurity IssuesGradient Descent Algorithms
In large‑scale distributed learning, security concerns grow, especially in decentralized settings where some units may exhibit Byzantine failures. The paper develops robust distributed learning algorithms that achieve optimal statistical performance and introduces a communication‑efficient median‑based algorithm requiring only one round. The authors propose robust distributed gradient descent algorithms using median and trimmed mean operations, and a single‑round median algorithm that completes in one communication round. The algorithms attain sharp, order‑optimal statistical error rates for strongly convex, non‑strongly convex, and smooth non‑convex losses, and the single‑round median algorithm matches the optimal rate for strongly convex quadratic loss.
In large-scale distributed learning, security issues have become increasingly important. Particularly in a decentralized environment, some computing units may behave abnormally, or even exhibit Byzantine failures -- arbitrary and potentially adversarial behavior. In this paper, we develop distributed learning algorithms that are provably robust against such failures, with a focus on achieving optimal statistical performance. A main result of this work is a sharp analysis of two robust distributed gradient descent algorithms based on median and trimmed mean operations, respectively. We prove statistical error rates for three kinds of population loss functions: strongly convex, non-strongly convex, and smooth non-convex. In particular, these algorithms are shown to achieve order-optimal statistical error rates for strongly convex losses. To achieve better communication efficiency, we further propose a median-based distributed algorithm that is provably robust, and uses only one communication round. For strongly convex quadratic loss, we show that this algorithm achieves the same optimal error rate as the robust distributed gradient descent algorithms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1