Publication | Closed Access
Spoken Corpus Transcription
30
Citations
0
References
1994
Year
Transcription SchemeSpeech SciencesLarge CorpusSpeech CorpusBritish National CorpusSpoken Language ProcessingSpoken FrenchCorpus LinguisticsSpeech RecognitionApplied LinguisticsGrammarCorpus AnalysisLanguage StudiesMachine TranslationHealth SciencesSpoken Corpus TranscriptionSpeech OutputProsody (Linguistics)Speech CommunicationOrthographySpeech AcousticsLanguage CorpusSpeech ProcessingSpeech PerceptionLinguistics
In this second of two papers on the construction of a large corpus of spoken language, the processing and transcription of the spoken text component of the British National Corpus is described. The transcription scheme is termed enhanced orthographic. In general, standard written forms of words are used, together with certain non-standard written forms including contractions (e. g. can‘t, don’t) and variant spellings (e. g. dunno, gonna, cos). A control list of permisible non-standard forms has been maintained. Pauses (vocalized and non-vocalized), hesitation, false starts, overlapping speech, and repetition are indicated together with some paralinguistic information. Example transcripts are included.