Publication | Closed Access
Task-Adaptive Attention for Image Captioning
337
Citations
23
References
2021
Year
Natural Language ProcessingDiversity RegularizationMultimodal LlmImage AnalysisMachine LearningAttention MechanismsEngineeringText-to-image RetrievalVisual GroundingTask-adaptive AttentionTask-adaptive Attention ModuleVision Language ModelVisual Question AnsweringDeep LearningComputer VisionMachine Translation
Attention mechanisms are now widely used in image captioning models. However, most attention models only focus on visual features. When generating syntax related words, little visual information is needed. In this case, these attention models could mislead the word generation. In this paper, we propose Task-Adaptive Attention module for image captioning, which can alleviate this misleading problem and learn implicit non-visual clues which can be helpful for the generation of non-visual words. We further introduce a diversity regularization to enhance the expression ability of the Task-Adaptive Attention module. Extensive experiments on the MSCOCO captioning dataset demonstrate that by plugging our Task-Adaptive Attention module into a vanilla Transformer-based image captioning model, performance improvement can be achieved.
| Year | Citations | |
|---|---|---|
Page 1
Page 1