Dual Averaging Method for Regularized Stochastic Learning and Online Optimization

TLDR

We consider regularized stochastic learning and online optimization problems, where the objective is the sum of a convex loss and a simple regularization term such as an l1‑norm to promote sparsity. The authors develop extensions of Nesterov's dual averaging method that exploit regularization structure in online learning. The methods adjust variables by solving a minimization over the running average of past subgradients plus the full regularization term, and an accelerated variant is provided for Lipschitz‑smooth losses. For l1‑regularization, the method yields sparse solutions and attains optimal convergence rates or regret bounds typical of stochastic and online convex optimization.

Abstract

We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as l1-norm for promoting sparsity. We develop extensions of Nesterov's dual averaging method, that can exploit the regularization structure in an online setting. At each iteration of these methods, the learning variables are adjusted by solving a simple minimization problem that involves the running average of all past subgradients of the loss function and the whole regularization term, not just its subgradient. In the case of l1-regularization, our method is particularly effective in obtaining sparse solutions. We show that these methods achieve the optimal convergence rates or regret bounds that are standard in the literature on stochastic and online convex optimization. For stochastic learning problems in which the loss functions have Lipschitz continuous gradients, we also present an accelerated version of the dual averaging method.

References

Page 1

	Year	Citations

Page 1