Publication | Open Access
Visual features for context-aware speech recognition
38
Citations
36
References
2017
Year
Unknown Venue
Automatic transcriptions of consumer generated multi-media content such as “Youtube” videos still exhibit high word error rates. Such data typically occupies a very broad domain, has been recorded in challenging conditions, with cheap hardware and a focus on the visual modality, and may have been post-processed or edited.
| Year | Citations | |
|---|---|---|
Page 1
Page 1