Concepedia

Publication | Open Access

No More Pesky Learning Rates

52

Citations

8

References

2012

Year

Abstract

The performance of stochastic gradient de-scent (SGD) depends critically on how learn-ing rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across sam-ples. In our approach, learning rates can in-crease as well as decrease, making it suitable for non-stationary problems. Using a num-ber of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of SGD or other adaptive approaches with their best settings obtained through systematic search, and effectively re-moves the need for learning rate tuning. 1.

References

YearCitations

Page 1