Concepedia

Publication | Open Access

Visual features for context-aware speech recognition

38

Citations

36

References

2017

Year

Abstract

Automatic transcriptions of consumer generated multi-media content such as “Youtube” videos still exhibit high word error rates. Such data typically occupies a very broad domain, has been recorded in challenging conditions, with cheap hardware and a focus on the visual modality, and may have been post-processed or edited.

References

YearCitations

Page 1