Mutual disambiguation of recognition errors in a multimodel architecture

TLDR

Multimodal systems are being designed to integrate complementary modalities for mutual disambiguation of errors, aiming for robust performance and interface designs that support diverse real‑world usage. In this study, over 2,000 multimodal utterances from native and accented English speakers were processed by a multimodal system, logged, and analyzed. The results show that multimodal systems achieve significant mutual disambiguation, especially for accented users, producing recognition rates comparable to native speakers and indicating that future multimodal architectures can outperform individual recognition technologies.

Abstract

As a new generation of multimodal/media systems begins to define itself, researchers are attempting to learn how to combine different modes into strategically integrated whole systems. In theory, well designed multimodal systems should be able to integrate complementary modalities in a manner that supports mutual disambiguation (MD) of errors and leads to more robust performance. In this study, over 2,000 multimodal utterances by both native and accented speakers of English were processed by a multimodal system, and then logged and analyzed. The results confirmed that multimodal systems can indeed support significant levels of MD, and also higher levels of MD for the more challenging accented users. As a result, although speech recognition as a stand-alone performed far more poorly for accented speakers, their multimodal recognition rates did not differ from those of native speakers. Implications are discussed for the development of future multimodal architectures that can perform in a more robust and stable manner than individual recognition technologies. Also discussed is the design of interfaces that support diversity in tangible ways, and that function well under challenging real-world usage conditions,

References

Page 1

	Year	Citations

Page 1