Fine-Grained Analysis of Stability and Generalization for Stochastic\n Gradient Descent

Abstract

Recently there are a considerable amount of work devoted to the study of the\nalgorithmic stability and generalization for stochastic gradient descent (SGD).\nHowever, the existing stability analysis requires to impose restrictive\nassumptions on the boundedness of gradients, strong smoothness and convexity of\nloss functions. In this paper, we provide a fine-grained analysis of stability\nand generalization for SGD by substantially relaxing these assumptions.\nFirstly, we establish stability and generalization for SGD by removing the\nexisting bounded gradient assumptions. The key idea is the introduction of a\nnew stability measure called on-average model stability, for which we develop\nnovel bounds controlled by the risks of SGD iterates. This yields\ngeneralization bounds depending on the behavior of the best model, and leads to\nthe first-ever-known fast bounds in the low-noise setting using stability\napproach. Secondly, the smoothness assumption is relaxed by considering loss\nfunctions with Holder continuous (sub)gradients for which we show that optimal\nbounds are still achieved by balancing computation and stability. To our best\nknowledge, this gives the first-ever-known stability and generalization bounds\nfor SGD with even non-differentiable loss functions. Finally, we study learning\nproblems with (strongly) convex objectives but non-convex loss functions.\n