Publication | Closed Access
Real-time captioning by groups of non-experts
197
Citations
28
References
2012
Year
Unknown Venue
Artificial IntelligenceEngineeringMachine LearningSpeech RecognitionNatural Language ProcessingMultimodal LlmCaption SpeechData ScienceReal-time CaptioningComputational LinguisticsSpeech InterfaceVoice RecognitionLanguage StudiesReal-time LanguageMachine TranslationVision Language ModelPeople Immediate AccessComputer ScienceSpeech CommunicationSpeech TechnologySpeech ProcessingSpeech InputSpeech PerceptionDeaf PeopleLinguistics
Real-time captioning provides deaf and hard of hearing people immediate access to spoken language and enables participation in dialogue with others. Low latency is critical because it allows speech to be paired with relevant visual cues. Currently, the only reliable source of real-time captions are expensive stenographers who must be recruited in advance and who are trained to use specialized keyboards. Automatic speech recognition (ASR) is less expensive and available on-demand, but its low accuracy, high noise sensitivity, and need for training beforehand render it unusable in real-world situations. In this paper, we introduce a new approach in which groups of non-expert captionists (people who can hear and type) collectively caption speech in real-time on-demand. We present Legion:Scribe, an end-to-end system that allows deaf people to request captions at any time. We introduce an algorithm for merging partial captions into a single output stream in real-time, and a captioning interface designed to encourage coverage of the entire audio stream. Evaluation with 20 local participants and 18 crowd workers shows that non-experts can provide an effective solution for captioning, accurately covering an average of 93.2% of an audio stream with only 10 workers and an average per-word latency of 2.9 seconds. More generally, our model in which multiple workers contribute partial inputs that are automatically merged in real-time may be extended to allow dynamic groups to surpass constituent individuals (even experts) on a variety of human performance tasks.
| Year | Citations | |
|---|---|---|
Page 1
Page 1