Publication | Closed Access
Visual Dialog
34
Citations
53
References
2018
Year
Natural Language ProcessingArtificial IntelligenceAi AgentMultimodal LlmEngineeringVisual GroundingCoco DatasetVision Language ModelVisual Question AnsweringComputer ScienceConversation AnalysisRobot LearningSpoken Dialog SystemDeep LearningVisual DialogComputer Vision
We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being sufficiently grounded in vision to allow objective evaluation of individual responses and benchmark progress. We develop a novel two-person real-time chat data-collection protocol to curate a large-scale Visual Dialog dataset (VisDial). VisDial v0.9 has been released and consists of dialog question-answer pairs from 10-round, human-human dialogs grounded in images from the COCO dataset.
| Year | Citations | |
|---|---|---|
Page 1
Page 1