Concepedia

Publication | Open Access

Understanding deep learning requires rethinking generalization

1.1K

Citations

20

References

2016

Year

TLDR

Deep neural networks, despite their large size, often exhibit a very small difference between training and test performance, a phenomenon traditionally attributed to model family properties or training regularization. The study seeks to show that conventional explanations fail to account for the strong generalization observed in large neural networks. To test this, the authors performed extensive systematic experiments on modern convolutional networks trained with stochastic gradient methods. The experiments revealed that these networks can perfectly fit random labels or random noise, and theoretical analysis demonstrates that depth‑two networks with more parameters than data points possess perfect finite‑sample expressivity, implying that regularization alone does not explain their generalization.

Abstract

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models.

References

YearCitations

2016

214.9K

2017

75.5K

2015

39.5K

2014

34.2K

2016

30.2K

2015

24.2K

1989

13.4K

1998

1.2K

2007

968

2014

714

Page 1