Publication | Open Access
SIGUA: Forgetting May Make Learning with Noisy Labels More Robust
64
Citations
0
References
2018
Year
Artificial IntelligenceStructured PredictionLlm Fine-tuningGradient DescentEngineeringMachine LearningLanguage LearningSocial SciencesNatural Language ProcessingData ScienceMemorySemi-supervised LearningSupervised LearningLearning ProblemCognitive ScienceComputer ScienceDeep LearningOver-parameterized Deep NetworksForgetting May MakeLearning With Noisy LabelsNoisy Labels
Over‑parameterized deep networks trained on data with noisy labels tend to gradually memorize the noise, causing overfitting even when correction techniques are applied. The study proposes a strategy to mitigate memorization of noisy labels in deep learning. SIGUA applies standard gradient descent to samples deemed good while performing learning‑rate‑reduced gradient ascent on samples deemed bad, thereby pulling optimization toward generalization and reinforcing desired memorization. Experiments show that SIGUA robustifies two baseline methods, yielding significant performance gains.
Given data with noisy labels, over-parameterized deep networks can gradually memorize the data, and fit everything in the end. Although equipped with corrections for noisy labels, many learning methods in this area still suffer overfitting due to undesired memorization. In this paper, to relieve this issue, we propose stochastic integrated gradient underweighted ascent (SIGUA): in a mini-batch, we adopt gradient descent on good data as usual, and learning-rate-reduced gradient ascent on bad data; the proposal is a versatile approach where data goodness or badness is w.r.t. desired or undesired memorization given a base learning method. Technically, SIGUA pulls optimization back for generalization when their goals conflict with each other; philosophically, SIGUA shows forgetting undesired memorization can reinforce desired memorization. Experiments demonstrate that SIGUA successfully robustifies two typical base learning methods, so that their performance is often significantly improved.