Simple and Effective Regularization Methods for Training on Noisily\n Labeled Data with Generalization Guarantee

Abstract

Over-parameterized deep neural networks trained by simple first-order methods\nare known to be able to fit any labeling of data. Such over-fitting ability\nhinders generalization when mislabeled training examples are present. On the\nother hand, simple regularization methods like early-stopping can often achieve\nhighly nontrivial performance on clean test data in these scenarios, a\nphenomenon not theoretically understood. This paper proposes and analyzes two\nsimple and intuitive regularization methods: (i) regularization by the distance\nbetween the network parameters to initialization, and (ii) adding a trainable\nauxiliary variable to the network output for each training example.\nTheoretically, we prove that gradient descent training with either of these two\nmethods leads to a generalization guarantee on the clean data distribution\ndespite being trained using noisy labels. Our generalization analysis relies on\nthe connection between wide neural network and neural tangent kernel (NTK). The\ngeneralization bound is independent of the network size, and is comparable to\nthe bound one can get when there is no label noise. Experimental results verify\nthe effectiveness of these methods on noisily labeled datasets.\n

References

Page 1

	Year	Citations

Page 1