Publication | Closed Access
Building transcribed speech corpora quickly and cheaply for many languages
75
Citations
2
References
2010
Year
Unknown Venue
EngineeringSpeech CorpusSpoken Language ProcessingCorpus LinguisticsSpeech RecognitionNatural Language ProcessingComputational LinguisticsSpeech InterfaceTranscribed AudioAutomatic RecognitionVoice RecognitionMachine TranslationHealth SciencesSpeech CorporaSpeech SynthesisSpeech OutputComputer ScienceText-to-speechSpeech CommunicationAcoustic ConditionsVoiceSpeech TranslationSpeech AcousticsSpeech ProcessingSpeech InputVoice TechnologyLinguistics
We present a system for quickly and cheaply building transcribed speech corpora containing utterances from many speakers in a variety of acoustic conditions. The system consists of a client application running on an Android mobile device with an intermittent Internet connection to a server. The client application collects demographic information about the speaker, fetches textual prompts from the server for the speaker to read, records the speaker’s voice, and uploads the audio and associated metadata to the server. The system has so far been used to collect over 3000 hours of transcribed audio in 17 languages around the world. Index Terms: speech corpora, speech recognition, internationalization
| Year | Citations | |
|---|---|---|
Page 1
Page 1