Concepedia

Abstract

In this second of two papers on the construction of a large corpus of spoken language, the processing and transcription of the spoken text component of the British National Corpus is described. The transcription scheme is termed enhanced orthographic. In general, standard written forms of words are used, together with certain non-standard written forms including contractions (e. g. can‘t, don’t) and variant spellings (e. g. dunno, gonna, cos). A control list of permisible non-standard forms has been maintained. Pauses (vocalized and non-vocalized), hesitation, false starts, overlapping speech, and repetition are indicated together with some paralinguistic information. Example transcripts are included.