Publication | Open Access
Multi-modal Sentence Summarization with Modality Attention and Image Filtering
98
Citations
24
References
2018
Year
Unknown Venue
EngineeringSentence SummarizationVideo SummarizationCorpus LinguisticsAutomatic SummarizationText MiningNatural Language ProcessingVisual GroundingText SummarizationComputational LinguisticsVisual Question AnsweringMachine TranslationModality AttentionVision Language ModelDeep LearningSentence Summarization TaskComputer VisionMulti-modal SummarizationVisual Information
In this paper, we introduce a multi-modal sentence summarization task that produces a short summary from a pair of sentence and image. This task is more challenging than sentence summarization. It not only needs to effectively incorporate visual features into standard text summarization framework, but also requires to avoid noise of image. To this end, we propose a modality-based attention mechanism to pay different attention to image patches and text units, and we design image filters to selectively use visual information to enhance the semantics of the input sentence. We construct a multimodal sentence summarization dataset and extensive experiments on this dataset demonstrate that our models significantly outperform conventional models which only employ text as input. Further analyses suggest that sentence summarization task can benefit from visually grounded representations from a variety of aspects.
| Year | Citations | |
|---|---|---|
Page 1
Page 1