Automatic Speech Recognition (ASR) Systems for Children: A Systematic Literature Review

TLDR

Automatic speech recognition converts acoustic signals to text, yet most research has focused on adult speech, leaving children’s speech—challenging due to large articulatory, acoustic, physical, and linguistic variations—largely unexplored despite its importance in many real‑world applications. This review investigates the current research focus, commonly used acoustic feature extraction methods, acoustic models, datasets, and toolkits for children’s speech recognition, and evaluates emerging techniques’ potential. The authors conducted a systematic literature review of 76 papers published between 2009 and 2020 on ASR for children.

Abstract

Automatic speech recognition (ASR) is one of the ways used to transform acoustic speech signals into text. Over the last few decades, an enormous amount of research work has been done in the research area of speech recognition (SR). However, most studies have focused on building ASR systems based on adult speech. The recognition of children’s speech was neglected for some time, which means that the field of children’s SR research is wide open. Children’s SR is a challenging task due to the large variations in children’s articulatory, acoustic, physical, and linguistic characteristics compared to adult speech. Thus, the field became a very attractive area of research and it is important to understand where the main center of attention is, and what are the most widely used methods for extracting acoustic features, various acoustic models, speech datasets, the SR toolkits used during the recognition process, and so on. ASR systems or interfaces are extensively used and integrated into various real-life applications, such as search engines, the healthcare industry, biometric analysis, car systems, the military, aids for people with disabilities, and mobile devices. A systematic literature review (SLR) is presented in this work by extracting the relevant information from 76 research papers published from 2009 to 2020 in the field of ASR for children. The objective of this review is to throw light on the trends of research in children’s speech recognition and analyze the potential of trending techniques to recognize children’s speech.

References

Page 1

	Year	Citations

Page 1