Visual features for context-aware speech recognition

Concepedia

Publication | Open Access

DOI Full Paper Access

Citations

References

2017

Year

Abhinav Gupta, Yajie Miao, Leonardo Neves, Florian Metze

Unknown Venue

Abstract

Automatic transcriptions of consumer generated multi-media content such as “Youtube” videos still exhibit high word error rates. Such data typically occupies a very broad domain, has been recorded in challenging conditions, with cheap hardware and a focus on the visual modality, and may have been post-processed or edited.

References

Page 1

	Year	Citations

Page 1