Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

TLDR

Stochastic gradient descent is popular for large‑scale optimization but converges slowly asymptotically because of inherent variance. The authors introduce stochastic variance reduced gradient (SVRG) to mitigate this variance problem. SVRG explicitly reduces variance in SGD by incorporating a variance‑reduction step that eliminates the need for storing gradients. SVRG attains fast convergence rates comparable to SDCA and SAG, offers a simpler analysis, and, without requiring gradient storage, is more applicable to complex tasks such as structured prediction and neural‑network learning.

Abstract

Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem, we introduce an explicit variance reduction method for stochastic gradient descent which we call stochastic variance reduced gradient (SVRG). For smooth and strongly convex functions, we prove that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG). However, our analysis is significantly simpler and more intuitive. Moreover, unlike SDCA or SAG, our method does not require the storage of gradients, and thus is more easily applicable to complex problems such as some structured prediction problems and neural network learning.

References

Page 1

	Year	Citations

Page 1