Disentangling Label Distribution for Long-tailed Visual Recognition

TLDR

Long‑tailed visual recognition is traditionally evaluated by training on a long‑tailed source distribution and testing on a uniform target, a protocol that is impractical when the target is also long‑tailed and suffers from entanglement between source distribution and model predictions. The paper formulates long‑tailed visual recognition as a label‑shift problem and aims to disentangle the source label distribution from model predictions, proposing the LADE loss. We first present a baseline that post‑processes cross‑entropy‑trained predictions to match the target distribution, then introduce the LADE loss that optimally disentangles source distribution from predictions using a Donsker‑Varadhan bound. The baseline post‑processing method outperforms prior work but can be improved, and the proposed LADE loss achieves state‑of‑the‑art results on CIFAR‑100‑LT, Places‑LT, ImageNet‑LT, iNaturalist 2018, and outperforms existing methods across various shifted target distributions.

Abstract

The current evaluation protocol of long-tailed visual recognition trains the classification model on the long-tailed source label distribution and evaluates its performance on the uniform target label distribution. Such protocol has questionable practicality since the target may also be long-tailed. Therefore, we formulate long-tailed visual recognition as a label shift problem where the tar-get and source label distributions are different. One of the significant hurdles in dealing with the label shift problem is the entanglement between the source label distribution and the model prediction. In this paper, we focus on disentangling the source label distribution from the model prediction. We first introduce a simple but over-looked baseline method that matches the target label distribution by post-processing the model prediction trained by the cross-entropy loss and the Softmax function. Al-though this method surpasses state-of-the-art methods on benchmark datasets, it can be further improved by directly disentangling the source label distribution from the model prediction in the training phase. Thus, we propose a novel method, LAbel distribution DisEntangling (LADE) loss based on the optimal bound of Donsker-Varadhan representation. LADE achieves state-of-the-art performance on benchmark datasets such as CIFAR-100-LT, Places-LT, ImageNet-LT, and iNaturalist 2018. Moreover, LADE out-performs existing methods on various shifted target label distributions, showing the general adaptability of our pro-posed method.

References

Page 1

	Year	Citations

Page 1