Publication | Closed Access
RegionSpeak
66
Citations
25
References
2015
Year
Unknown Venue
Image AnalysisEngineeringData ScienceVisual GroundingVisual ReasoningAmazon Mechanical TurkVision Language ModelMultimodal InteractionVisual InformationHuman-computer InteractionVisual Question AnsweringComputer ScienceComputer VisionVisual Questions
Blind people often seek answers to their visual questions from remote sources, however, the commonly adopted single-image, single-response model does not always guarantee enough bandwidth between users and sources. This is especially true when questions concern large sets of information, or spatial layout, e.g., where is there to sit in this area, what tools are on this work bench, or what do the buttons on this machine do? Our RegionSpeak system addresses this problem by providing an accessible way for blind users to (i) combine visual information across multiple photographs via image stitching, em (ii) quickly collect labels from the crowd for all relevant objects contained within the resulting large visual area in parallel, and (iii) then interactively explore the spatial layout of the objects that were labeled. The regions and descriptions are displayed on an accessible touchscreen interface, which allow blind users to interactively explore their spatial layout. We demonstrate that workers from Amazon Mechanical Turk are able to quickly and accurately identify relevant regions, and that asking them to describe only one region at a time results in more comprehensive descriptions of complex images. RegionSpeak can be used to explore the spatial layout of the regions identified. It also demonstrates broad potential for helping blind users to answer difficult spatial layout questions.
| Year | Citations | |
|---|---|---|
Page 1
Page 1