Publication | Open Access
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
226
Citations
50
References
2019
Year
Semi-supervised LearningData AugmentationEngineeringMachine LearningData ScienceData MiningPattern RecognitionHeavy Class-imbalanceClass ImbalanceKnowledge DiscoveryAdversarial Machine LearningGood GeneralizationComputer ScienceDeep LearningStatisticsSupervised LearningImbalanced DatasetsMargin-based Generalization Bound
Deep learning algorithms can fare poorly when the training dataset suffers from heavy class‑imbalance but the testing criterion requires good generalization on less frequent classes. The authors propose two novel methods—a label‑distribution‑aware margin loss and a deferred re‑weighting schedule—to improve performance on heavily imbalanced datasets. The LDAM loss replaces cross‑entropy and can be combined with re‑weighting or re‑sampling, while the deferred re‑weighting schedule trains an initial representation before applying re‑weighting, and both methods are evaluated on benchmark vision tasks such as iNaturalist 2018. Our experiments show that either method alone already improves over existing techniques and their combination yields even better performance gains.
Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-imbalance but the testing criterion requires good generalization on less frequent classes. We design two novel methods to improve performance in such scenarios. First, we propose a theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound. This loss replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling. Second, we propose a simple, yet effective, training schedule that defers re-weighting until after the initial stage, allowing the model to learn an initial representation while avoiding some of the complications associated with re-weighting or re-sampling. We test our methods on several benchmark vision tasks including the real-world imbalanced dataset iNaturalist 2018. Our experiments show that either of these methods alone can already improve over existing techniques and their combination achieves even better performance gains.
| Year | Citations | |
|---|---|---|
Page 1
Page 1