Publication | Closed Access
Improved speaker independent lip reading using speaker adaptive training and deep neural networks
70
Citations
22
References
2016
Year
Unknown Venue
EngineeringMachine LearningSpeech RecognitionData SciencePhoneticsSpeaker-dependent Lip-readingRobust Speech RecognitionVoice RecognitionHealth SciencesDeep LearningSpeech CommunicationSpeech TechnologySpeaker Independent LipDeep Neural NetworksMulti-speaker Speech RecognitionSpeech ProcessingSpeaker Adaptive TrainingSpeech InputSpeech PerceptionMedium Size VocabularySpeaker Recognition
Recent improvements in tracking and feature extraction mean that speaker-dependent lip-reading of continuous speech using a medium size vocabulary (around 1000 words) is realistic. However, the recognition of previously unseen speakers has been found to be a very challenging task, because of the large variation in lip-shapes across speakers and the lack of large, tracked databases of visual features, which are very expensive to produce. By adapting a technique that is established in speech recognition but has not previously been used in lip-reading, we show that error-rates for speaker-independent lip-reading can be very significantly reduced. Furthermore, we show that error-rates can be even further reduced by the additional use of Deep Neural Networks (DNN). We also find that there is no need to map phonemes to visemes for context-dependent visual speech transcription.
| Year | Citations | |
|---|---|---|
Page 1
Page 1