On the Convergence of Decentralized Gradient Descent

Abstract

Consider the consensus problem of minimizing $f(x)=\sum_{i=1}^n f_i(x)$, where $x\in{\mathbb{R}}^p$ and each $f_i$ is only known to the individual agent $i$ in a connected network of $n$ agents. To solve this problem and obtain the solution, all the agents collaborate with their neighbors through information exchange. This type of decentralized computation does not need a fusion center, offers better network load balance, and improves data privacy. This paper studies the decentralized gradient descent method [A. Nedic and A. Ozdaglar, IEEE Trans. Automat. Control, 54 (2009), pp. 48--61], in which each agent $i$ updates its local variable $x_{(i)}\in{\mathbb{R}}^n$ by combining the average of its neighbors' with a local negative-gradient step $-\alpha \nabla f_i(x_{(i)})$. The method is described by the iteration $ x_{(i)}(k+1) \gets \sum_{j=1}^n w_{ij} x_{(j)}(k) - \alpha \nabla f_i(x_{(i)}(k)),\ \text{for each agent}~i, $ where $w_{ij}$ is nonzero only if $i$ and $j$ are neighbors or $i=j$ and the matrix $W=[w_{ij}] \in \mathbb{R}^{n \times n}$ is symmetric and doubly stochastic. This paper analyzes the convergence of this iteration and derives its rate of convergence under the assumption that each $f_i$ is proper closed convex and lower bounded, $\nabla f_i$ is Lipschitz continuous with constant $L_{f_i}>0$, and the stepsize $\alpha$ is fixed. Provided that $\alpha \le \min\{(1+\lambda_n(W))/L_h,1/L_{\bar{f}}\}$, where $L_h=\max_i\{L_{f_i}\}$ and $L_{\bar{f}}=\frac{1}{n}\sum_{i=1}^{n}L_{f_i}$, the objective errors of all the local solutions and the networkwide mean solution reduce at rates of $O(1/k)$ until they reach a level of $O(\alpha)$. If $f_i$ are strongly convex with modulus $\mu_{f_i}$ and $\alpha \le \min\{(1+\lambda_n(W))/L_h,1/(L_{\bar{f}}+\mu_{\bar{f}})\}$, where $\mu_{\bar{f}}=\frac{1}{n}\sum_{i=1}^{n}\mu_{f_i}$, then all the local solutions and the mean solution converge to the global minimizer $x^*$ at a linear rate until reaching an $O(\alpha)$-neighborhood of $x^*$. We also develop an iteration for decentralized basis pursuit and establish its linear convergence to an $O(\alpha)$-neighborhood of the true sparse signal. This analysis reveals how the convergence of $ x_{(i)}(k+1) \gets \sum_{j=1}^n w_{ij} x_{(j)}(k) - \alpha \nabla f_i(x_{(i)}(k)),\ \text{for each agent}~i, $ depends on the stepsize, function convexity, and network spectrum.

References

Page 1

	Year	Citations

Page 1