Publication | Closed Access
Transformer-based Dual Relation Graph for Multi-label Image Recognition
88
Citations
40
References
2021
Year
Simultaneous RecognitionImage ObjectsEngineeringMachine LearningWord EmbeddingsNatural Language ProcessingImage AnalysisText-to-image RetrievalData SciencePattern RecognitionMachine VisionFeature LearningMultiple ObjectsVision Language ModelComputer ScienceImage SimilarityDeep LearningComputer VisionMulti-label Image Recognition
The simultaneous recognition of multiple objects in one image remains a challenging task, spanning multiple events in the recognition field such as various object scales, inconsistent appearances, and confused inter-class relationships. Recent research efforts mainly resort to the statistic label co-occurrences and linguistic word embedding to enhance the unclear semantics. Different from these researches, in this paper, we propose a novel Transformer-based Dual Relation learning framework, constructing complementary relationships by exploring two aspects of correlation, i.e., structural relation graph and semantic relation graph. The structural relation graph aims to capture long-range correlations from object context, by developing a cross-scale transformer-based architecture. The semantic graph dynamically models the semantic meanings of image objects with explicit semantic-aware constraints. In addition, we also incorporate the learnt structural relationship into the semantic graph, constructing a joint relation graph for robust representations. With the collaborative learning of these two effective relation graphs, our approach achieves new state-of-the-art on two popular multi-label recognition benchmarks, i.e. MS-COCO and VOC 2007 dataset.
| Year | Citations | |
|---|---|---|
2016 | 214.9K | |
1997 | 93.8K | |
2014 | 75.4K | |
2023 | 73.5K | |
2009 | 60.2K | |
2018 | 45.3K | |
2020 | 21.2K | |
2017 | 20.1K | |
2009 | 19K | |
2025 | 16K |
Page 1
Page 1