Stability and Generalization of Learning Algorithms that Converge to\n Global Optima

Abstract

We establish novel generalization bounds for learning algorithms that\nconverge to global minima. We do so by deriving black-box stability results\nthat only depend on the convergence of a learning algorithm and the geometry\naround the minimizers of the loss function. The results are shown for nonconvex\nloss functions satisfying the Polyak-{\\L}ojasiewicz (PL) and the quadratic\ngrowth (QG) conditions. We further show that these conditions arise for some\nneural networks with linear activations. We use our black-box results to\nestablish the stability of optimization algorithms such as stochastic gradient\ndescent (SGD), gradient descent (GD), randomized coordinate descent (RCD), and\nthe stochastic variance reduced gradient method (SVRG), in both the PL and the\nstrongly convex setting. Our results match or improve state-of-the-art\ngeneralization bounds and can easily be extended to similar optimization\nalgorithms. Finally, we show that although our results imply comparable\nstability for SGD and GD in the PL setting, there exist simple neural networks\nwith multiple local minima where SGD is stable but GD is not.\n