Publication | Closed Access
Lipreading by neural networks: Visual preprocessing, learning, and sensory integration
44
Citations
10
References
1993
Year
Unknown Venue
EngineeringMachine LearningSocial SciencesSpeech RecognitionSensory IntegrationImage AnalysisSensory NeurosciencePattern RecognitionVisual FeaturesRobust Speech RecognitionVoice RecognitionHybrid SystemCognitive ScienceVisual Preprocessing AlgorithmsComputer ScienceDeep LearningSpeech CommunicationComputer VisionSpeech TechnologyMulti-speaker Speech RecognitionSpeech ProcessingNeuroscienceSpeech InputSpeech Perception
We have developed visual preprocessing algorithms for extracting phonologically relevant features from the grayscale video image of a speaker, to provide speaker-independent inputs for an automatic lipreading (speechreading) system. Visual features such as mouth open/closed, tongue visible/not-visible, teeth visible/notvisible, and several shape descriptors of the mouth and its motion are all rapidly computable in a manner quite insensitive to lighting conditions. We formed a hybrid speechreading system consisting of two time delay neural networks (video and acoustic) and integrated their responses by means of independent opinion pooling - the Bayesian optimal method given conditional independence, which seems to hold for our data. This hybrid system had an error rate 25% lower than that of the acoustic subsystem alone on a five-utterance speaker-independent task, indicating that video can be used to improve speech recognition.
| Year | Citations | |
|---|---|---|
Page 1
Page 1