Publication | Open Access
Answering visual questions with conversational crowd assistants
70
Citations
14
References
2013
Year
Unknown Venue
Artificial IntelligenceEngineeringCommunicationComputer AccessibilityMultimodal InteractionVisual Question AnsweringConversation AnalysisMultimodal Human Computer InterfaceInteractive Crowd SupportAssistive TechnologyBlind UsersAccessibility ChallengesVision Language ModelComputer ScienceMobile AccessibilityComputer VisionVisual QuestionsVisual ReasoningEye TrackingHuman-computer InteractionArtsInteractive Computing
Blind people face a range of accessibility challenges in their everyday lives, from reading the text on a package of food to traveling independently in a new place. Answering general questions about one's visual surroundings remains well beyond the capabilities of fully automated systems, but recent systems are showing the potential of engaging on-demand human workers (the crowd) to answer visual questions. The input to such systems has generally been a single image, which can limit the interaction with a worker to one question; or video streams where systems have paired the end user with a single worker, limiting the benefits of the crowd. In this paper, we introduce Chorus:View, a system that assists users over the course of longer interactions by engaging workers in a continuous conversation with the user about a video stream from the user's mobile device. We demonstrate the benefit of using multiple crowd workers instead of just one in terms of both latency and accuracy, then conduct a study with 10 blind users that shows Chorus:View answers common visual questions more quickly and accurately than existing approaches. We conclude with a discussion of users' feedback and potential future work on interactive crowd support of blind users.
| Year | Citations | |
|---|---|---|
Page 1
Page 1