Publication | Open Access
No More Pesky Learning Rates
52
Citations
8
References
2012
Year
The performance of stochastic gradient de-scent (SGD) depends critically on how learn-ing rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across sam-ples. In our approach, learning rates can in-crease as well as decrease, making it suitable for non-stationary problems. Using a num-ber of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of SGD or other adaptive approaches with their best settings obtained through systematic search, and effectively re-moves the need for learning rate tuning. 1.
| Year | Citations | |
|---|---|---|
Page 1
Page 1