An acoustic-phonetic data base

Abstract

DARPA has sponsored the design and collection of a large speech data base. Six hundred and thirty speakers read ten sentences each. Two sentences were constant for all speakers; the remaining eight sentences were selected from a set of 450 designed at MIT and 1890 selected at TI from text sources. The set of sentences is phonetically rich, balanced, and deep. Although all recordings were made in Dallas, we sampled as many varieties of American English as possible. Selection of volunteer speakers was based on their childhood locality to give a balanced representation of geographical origins. The subject population is adult; 70% male; young (63% in their twenties); well educated (78% with bachelor's degree); and predominantly white (96%). Recordings were made in a noise-reducing sound booth using a Sennheiser headset microphone and digitized at 20 kHz. A natural reading style was encouraged. The recordings are complete, and time-registered phonetic transcriptions are being added to the 6300 speech files at MIT. A version of the complete data base (16-kHz sample rate, with acoustic-phonetic transcriptions—approximately 50 megabytes of data) will be made available to researchers through the National Bureau of Standards. [Work supported by DARPA.]