Publication | Closed Access
MLTR: Multi-Label Classification with Transformer
43
Citations
38
References
2022
Year
Convolutional Neural NetworkEngineeringMachine LearningMulti-label Image ClassificationImage ClassificationImage AnalysisData SciencePattern RecognitionMulti-label Transformer ArchitectureVideo TransformerUnified ClassificationMachine VisionObject DetectionVision Language ModelComputer ScienceDeep LearningComputer VisionMulti-label ClassificationObject Labels
The task of multi-label image classification is to recognize all the object labels presented in an image. Though advancing for years, small objects, and objects with high conditional probability are still the main bottlenecks of previous convolutional neural network (CNN) based models, limited by convolutional kernels' representational capacity. Recent vision transformer networks utilize the self-attention mechanism to extract the feature of pixel granularity. It expresses richer local semantic information, while insufficient for mining global spatial dependence. In this paper, we point out the three crucial problems that CNN-based methods encounter and explore the possibility of conducting specific transformer modules to settle them. We put forward a Multi-label Transformer architecture (MlTr) constructed with windows partitioning, in-window pixel attention, cross-window attention, particularly improving the performance of multi-label image classification tasks. The proposed MlTr shows state-of-the-art results on various prevalent multi-label datasets such as MS-COCO, Pascal-VOC, NUS-WIDE with 88.8%, 95.8%, 65.5% respectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1