Publication | Open Access
Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels
1.5K
Citations
32
References
2018
Year
Artificial IntelligenceConvolutional Neural NetworkEngineeringMachine LearningCross Entropy LossAutoencodersDnn ArchitectureData SciencePattern RecognitionSemi-supervised LearningNeural Scaling LawSupervised LearningData AugmentationLoss FunctionsMachine Learning ModelComputer ScienceDeep LearningDeep Neural NetworksTransfer LearningNoisy Labels
Deep neural networks excel across many domains but rely on large, accurately labeled datasets, and label noise can severely degrade their performance, prompting the recent use of mean absolute error as a noise‑robust alternative. This work proposes a family of theoretically grounded loss functions that generalize MAE and categorical cross‑entropy to mitigate the poor performance of MAE on deep networks with noisy labels. The proposed losses integrate seamlessly into any existing DNN architecture and training algorithm, offering robustness across a wide range of noisy‑label scenarios. Experiments on CIFAR‑10, CIFAR‑100, and Fashion‑MNIST with synthetically generated noisy labels demonstrate that the new losses achieve strong performance in these settings.
Deep neural networks (DNNs) have achieved tremendous success in a variety of applications across many disciplines. Yet, their superior performance comes with the expensive cost of requiring correctly annotated large-scale datasets. Moreover, due to DNNs' rich capacity, errors in training labels can hamper performance. To combat this problem, mean absolute error (MAE) has recently been proposed as a noise-robust alternative to the commonly-used categorical cross entropy (CCE) loss. However, as we show in this paper, MAE can perform poorly with DNNs and challenging datasets. Here, we present a theoretically grounded set of noise-robust loss functions that can be seen as a generalization of MAE and CCE. Proposed loss functions can be readily applied with any existing DNN architecture and algorithm, while yielding good performance in a wide range of noisy label scenarios. We report results from experiments conducted with CIFAR-10, CIFAR-100 and FASHION-MNIST datasets and synthetically generated noisy labels.
| Year | Citations | |
|---|---|---|
Page 1
Page 1