No More Pesky Learning Rates

Abstract

The performance of stochastic gradient de-scent (SGD) depends critically on how learn-ing rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across sam-ples. In our approach, learning rates can in-crease as well as decrease, making it suitable for non-stationary problems. Using a num-ber of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of SGD or other adaptive approaches with their best settings obtained through systematic search, and effectively re-moves the need for learning rate tuning. 1.

References

Page 1

	Year	Citations

Page 1