Assessing Manually Corrected Broad Phonetic Transcriptions in the Spoken Dutch Corpus

Abstract

For research and development purposes in the areas of phonetics and speech technology, phonetically transcribed speech may be of great value. In the near future, the Spoken Dutch Corpus (CGN) is going to offer such transcriptions for about one thousand hours of spoken Dutch, of which 90% will consist of automatic transcriptions and 10% of manually produced transcriptions. An advantage of automatically produced transcriptions is that they are maximally reliable; they are however not necessarily maximally accurate. One way of making them more accurate is having them checked and modified manually, but it is widely accepted that human transcriptions tend to be subjective and unreliable. The goal of this paper is to establish if human CGN transcribers succeeded in making accurate transcriptions by correcting automatic transcriptions, while maintaining a high level of reliability.

References

Page 1

	Year	Citations

Page 1