Publication | Closed Access
On the Convergence of Decentralized Gradient Descent
613
Citations
35
References
2016
Year
Decentralized Machine LearningNetwork ScienceEngineeringDistributed AlgorithmsDecentralized Gradient DescentNetwork AnalysisData PrivacyConsensus ProblemDistributed Problem SolvingDistributed Ai SystemComputer ScienceDistributed LearningLocal Negative-gradient StepConvergence AnalysisDecentralised System
Consider the consensus problem of minimizing $f(x)=\sum_{i=1}^n f_i(x)$, where $x\in{\mathbb{R}}^p$ and each $f_i$ is only known to the individual agent $i$ in a connected network of $n$ agents. To solve this problem and obtain the solution, all the agents collaborate with their neighbors through information exchange. This type of decentralized computation does not need a fusion center, offers better network load balance, and improves data privacy. This paper studies the decentralized gradient descent method [A. Nedic and A. Ozdaglar, IEEE Trans. Automat. Control, 54 (2009), pp. 48--61], in which each agent $i$ updates its local variable $x_{(i)}\in{\mathbb{R}}^n$ by combining the average of its neighbors' with a local negative-gradient step $-\alpha \nabla f_i(x_{(i)})$. The method is described by the iteration $ x_{(i)}(k+1) \gets \sum_{j=1}^n w_{ij} x_{(j)}(k) - \alpha \nabla f_i(x_{(i)}(k)),\ \text{for each agent}~i, $ where $w_{ij}$ is nonzero only if $i$ and $j$ are neighbors or $i=j$ and the matrix $W=[w_{ij}] \in \mathbb{R}^{n \times n}$ is symmetric and doubly stochastic. This paper analyzes the convergence of this iteration and derives its rate of convergence under the assumption that each $f_i$ is proper closed convex and lower bounded, $\nabla f_i$ is Lipschitz continuous with constant $L_{f_i}>0$, and the stepsize $\alpha$ is fixed. Provided that $\alpha \le \min\{(1+\lambda_n(W))/L_h,1/L_{\bar{f}}\}$, where $L_h=\max_i\{L_{f_i}\}$ and $L_{\bar{f}}=\frac{1}{n}\sum_{i=1}^{n}L_{f_i}$, the objective errors of all the local solutions and the networkwide mean solution reduce at rates of $O(1/k)$ until they reach a level of $O(\alpha)$. If $f_i$ are strongly convex with modulus $\mu_{f_i}$ and $\alpha \le \min\{(1+\lambda_n(W))/L_h,1/(L_{\bar{f}}+\mu_{\bar{f}})\}$, where $\mu_{\bar{f}}=\frac{1}{n}\sum_{i=1}^{n}\mu_{f_i}$, then all the local solutions and the mean solution converge to the global minimizer $x^*$ at a linear rate until reaching an $O(\alpha)$-neighborhood of $x^*$. We also develop an iteration for decentralized basis pursuit and establish its linear convergence to an $O(\alpha)$-neighborhood of the true sparse signal. This analysis reveals how the convergence of $ x_{(i)}(k+1) \gets \sum_{j=1}^n w_{ij} x_{(j)}(k) - \alpha \nabla f_i(x_{(i)}(k)),\ \text{for each agent}~i, $ depends on the stepsize, function convexity, and network spectrum.
| Year | Citations | |
|---|---|---|
Page 1
Page 1