Publication | Closed Access
Macrophone: an American English telephone speech corpus for the Polyphone project
44
Citations
3
References
2002
Year
Unknown Venue
Speech SciencesSpeech CorpusSpoken Language ProcessingCommunicationPhonologyCorpus LinguisticsSpeech RecognitionNatural Language ProcessingLanguage DocumentationPhoneticsComputational LinguisticsSpeech InterfaceAutomatic RecognitionVoice RecognitionCorpus AnalysisLanguage StudiesSpoken Language UnderstandingHealth SciencesPolyphone ProjectLinguistic MaterialsTelephone Speech SuitableSpeech CommunicationSpeech TechnologyVoiceTelephone NetworkSpeech AcousticsSpeech ProcessingSpeech InputLinguistics
Macrophone is a corpus of approximately 200000 utterances, recorded over the telephone from a broad sample of about 5000 American speakers. Sponsored by the Linguistic Data Consortium (LDC), it is the first of a series of similar data sets that will be collected for major languages of the world in a cooperative project called Polyphone. It is designed to provide telephone speech suitable for the development of automatic voice-interactive telephone services. In particular, Macrophone contains training material for applications in transportation, scheduling, ticketing, database access, shopping, and other automated telephone interactions. In addition to being phonetically balanced, the spoken material refers to times, locations, monetary amounts, and interactive operations. The utterances are spoken by respondents into telephone handsets and recorded directly in 8-bit mu-law digital form through a T1 connection to the usual switched telephone network. The paper describes the design of the linguistic materials in the corpus, and the process of solicitation, collection, transcription, and file preparation for the Macrophone corpus.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
| Year | Citations | |
|---|---|---|
Page 1
Page 1