Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

Abstract

Stochastic subgradient methods are widely used, well analyzed, and constitute effective tools for optimization and online learning. Stochastic gradient methods ’ popularity and appeal are largely due to their simplicity, as they largely follow predetermined procedural schemes. However, most common subgradient approaches are oblivious to the characteristics of the data being observed. We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. The adaptation, in essence, allows us to find needles in haystacks in the form of very predictive but rarely seenfeatures. Ourparadigmstemsfromrecentadvancesinstochasticoptimizationandonlinelearning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. In a companion paper, we validate experimentally our theoretical analysis and show that the adaptive subgradient approach outperforms state-of-the-art, but non-adaptive, subgradient algorithms. 1

References

Page 1

	Year	Citations
ImageNet: A large-scale hierarchical image database Jia Deng, Wei Dong, Richard Socher, 2009 IEEE Conference on Computer Vision and Pattern Recognition EngineeringMachine LearningImage RetrievalImage DatabaseImage Recognition (Computer Vision)	2009	60.2K
UCI Machine Learning Repository Arthur Asuncion Medical Entomology and Zoology EngineeringMachine LearningData ScienceData MiningPattern Recognition	2007	24.3K
Term-weighting approaches in automatic text retrieval Gerard Salton, Chris Buckley Information Processing & Management Natural Language ProcessingEngineeringInformation RetrievalIntelligent Information RetrievalComputational Linguistics	1988	9.3K
A new approach to variable metric algorithms R. Fletcher The Computer Journal Mathematical ProgrammingNumerical AnalysisQuadratic TerminationLarge-scale Global OptimizationEngineering	1970	4K
RCV1: A New Benchmark Collection for Text Categorization Research David Lewis, Yiming Yang, Tony Rose, Journal of Machine Learning Research EngineeringSemantic WebCorpus LinguisticsJournalismText Mining	2004	2.6K
Smooth minimization of non-smooth functions Yu. Nesterov Mathematical Programming EngineeringVariational AnalysisDerivative-free OptimizationSmooth MinimizationFunctional Analysis	2004	2.5K
Robust Stochastic Approximation Approach to Stochastic Programming Arkadi Nemirovski, Anatoli Juditsky, Guanghui Lan, SIAM Journal on Optimization	2009	2.1K
Online convex programming and generalized infinitesimal gradient ascent Martin Zinkevich	2003	1.7K
Online Passive-Aggressive Algorithms Koby Crammer, Ofer Dekel, Joseph Keshet, Artificial IntelligenceMathematical ProgrammingEngineeringMachine LearningSequence Prediction	2006	1.4K
Efficient projections onto the<i>l</i><sub>1</sub>-ball for learning in high dimensions John C. Duchi, Shai Shalev‐Shwartz, Yoram Singer,	2008	1.2K

Page 1