Publication | Open Access
Image Description using Visual Dependency Representations
293
Citations
11
References
2013
Year
Unknown Venue
Natural Language ProcessingMultimodal LlmMachine VisionImage AnalysisData ScienceEngineeringPattern RecognitionText-to-image RetrievalVisual GroundingMain EventVision Language ModelVisual Question AnsweringImage DescriptionDeep LearningVisual Dependency RepresentationsComputer Vision
Describing the main event of an image involves identifying the objects depicted and predicting the relationships between them. Previous approaches have represented images as unstructured bags of regions, which makes it difficult to accurately predict meaningful relationships between regions. In this paper, we introduce visual dependency representations to capture the relationships between the objects in an image, and hypothesize that this representation can improve image description. We test this hypothesis using a new data set of region-annotated images, associated with visual dependency representations and gold-standard descriptions. We describe two template-based description generation models that operate over visual dependency representations. In an image description task, we find that these models outperform approaches that rely on object proximity or corpus information to generate descriptions on both automatic measures and on human judgements.
| Year | Citations | |
|---|---|---|
Page 1
Page 1