Publication | Closed Access
Large vocabulary automatic speech recognition for children
100
Citations
27
References
2015
Year
Unknown Venue
EngineeringCldnn Acoustic ModelSpoken Language ProcessingNeural Network ClassifierLanguage LearningLanguage ProcessingSpeech RecognitionNatural Language ProcessingChild LanguageYoutube KidsLanguage AcquisitionAutomatic RecognitionSpoken Language UnderstandingHealth SciencesDeep LearningSpeech CommunicationSpeech TechnologySpeech AnalysisSpeech AcousticsSpeech ProcessingSpeech InputSpeech PerceptionLinguistics
Recently, Google launched YouTube Kids, a mobile application for children, that uses a speech recognizer built specifically for recognizing children’s speech. In this paper we present techniques we explored to build such a system. We describe the use of a neural network classifier to identify matched acoustic training data, filtering data for language modeling to reduce the chance of producing offensive results. We also compare long short-term memory (LSTM) recurrent networks to convolutional, LSTM, deep neural networks (CLDNN). We found that a CLDNN acoustic model outperforms an LSTM across a variety of different conditions, but does not specifically model child speech relatively better than adult. Overall, these findings allow us to build a successful, state-of-the-art large vocabulary speech recognizer for both children and adults.
| Year | Citations | |
|---|---|---|
Page 1
Page 1