Long-Tailed Multi-Label Visual Recognition by Collaborative Training on Uniform and Re-balanced Samplings

TLDR

Long-tailed distributions are common in multi‑label visual recognition, and training directly on such data yields poor performance on tail classes while re‑balanced sampling can improve tail accuracy but may harm head classes due to label co‑occurrence. This work proposes a collaborative training framework that jointly learns from uniform and re‑balanced samplings to boost performance on both head and tail classes. The method uses a two‑branch network, with one branch trained on uniform sampling and the other on re‑balanced sampling, each using binary‑cross‑entropy loss with learnable logit compensation, and a cross‑branch consistency loss aligns the two branches, and the authors evaluated the approach on VOC‑LT and COCO‑LT datasets. Results demonstrate that the proposed approach substantially outperforms prior state‑of‑the‑art methods for long‑tailed multi‑label visual recognition.

Abstract

Long-tailed data distribution is common in many multi-label visual recognition tasks and the direct use of these data for training usually leads to relatively low performance on tail classes. While re-balanced data sampling can improve the performance on tail classes, it may also hurt the performance on head classes in training due to label co-occurrence. In this paper, we propose a new approach to train on both uniform and re-balanced samplings in a collaborative way, resulting in performance improvement on both head and tail classes. More specifically, we design a visual recognition network with two branches: one takes the uniform sampling as input while the other takes the re-balanced sampling as the input. For each branch, we conduct visual recognition using a binary-cross-entropy-based classification loss with learnable logit compensation. We further define a new cross-branch loss to enforce the consistency when the same input image goes through the two branches. We conduct extensive experiments on VOC-LT and COCO-LT datasets. The results show that the proposed method significantly outperforms previous state-of-the-art methods on long-tailed multi-label visual recognition.

References

Page 1

	Year	Citations

Page 1