Learning Overparameterized Neural Networks via Stochastic Gradient\n Descent on Structured Data

Abstract

Neural networks have many successful applications, while much less\ntheoretical understanding has been gained. Towards bridging this gap, we study\nthe problem of learning a two-layer overparameterized ReLU neural network for\nmulti-class classification via stochastic gradient descent (SGD) from random\ninitialization. In the overparameterized setting, when the data comes from\nmixtures of well-separated distributions, we prove that SGD learns a network\nwith a small generalization error, albeit the network has enough capacity to\nfit arbitrary labels. Furthermore, the analysis provides interesting insights\ninto several aspects of learning neural networks and can be verified based on\nempirical studies on synthetic data and on the MNIST dataset.\n

References

Page 1

	Year	Citations

Page 1