Class-Balanced Loss Based on Effective Number of Samples

TLDR

Long‑tailed data distributions, where a few classes dominate and many are under‑represented, pose a critical challenge for large‑scale real‑world datasets. The authors contend that the marginal benefit of adding more samples diminishes as the sample size grows. They propose measuring an effective number of samples by assigning each point a small neighboring volume, yielding the formula $(1-\beta^n)/(1-\beta)$, and use this to re‑weight the loss, creating a class‑balanced loss evaluated on CIFAR, ImageNet, and iNaturalist. Experiments demonstrate that training with this class‑balanced loss yields significant performance improvements on long‑tailed datasets.

Abstract

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.

References

Page 1

	Year	Citations
Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang, Shaoqing Ren, Image ClassificationDeep Neural NetworksMachine VisionImage AnalysisMachine Learning	2016	214.9K
ImageNet classification with deep convolutional neural networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Communications of the ACM Convolutional Neural NetworkEngineeringMachine LearningNeural NetworkImagenet Classification	2017	75.5K
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan, Andrew Zisserman arXiv (Cornell University) Geometric LearningConvolutional Neural NetworkEngineeringMachine LearningConvolutional Network Depth	2014	75.4K
ImageNet: A large-scale hierarchical image database Jia Deng, Wei Dong, Richard Socher, 2009 IEEE Conference on Computer Vision and Pattern Recognition EngineeringMachine LearningImage RetrievalImage DatabaseImage Recognition (Computer Vision)	2009	60.2K
Going deeper with convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Image ClassificationDeep Neural NetworksImage AnalysisMachine LearningData Science	2015	46.2K
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky, Jia Deng, Hao Su, International Journal of Computer Vision Image ClassificationConvolutional Neural NetworkMachine VisionImage AnalysisEngineering	2015	39.5K
SMOTE: Synthetic Minority Over-sampling Technique Nitesh V. Chawla, Kevin W. Bowyer, Lawrence Hall, Journal of Artificial Intelligence Research	2002	29.6K
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting Yoav Freund, Robert E. Schapire Journal of Computer and System Sciences Mathematical ProgrammingDecision-theoretic GeneralizationEngineeringMachine LearningData Science	1997	19.8K
Distributed Representations of Words and Phrases and their Compositionality Tomáš Mikolov, Ilya Sutskever, Kai Chen, arXiv (Cornell University) EngineeringMachine LearningAir CanadaMultilingual PretrainingCorpus Linguistics	2013	18.1K
Focal Loss for Dense Object Detection Tsung-Yi Lin, Priya Goyal, Ross Girshick, IEEE Transactions on Pattern Analysis and Machine Intelligence Image ClassificationConvolutional Neural NetworkImage AnalysisMachine VisionMachine Learning	2018	9.3K

Page 1