Publication | Closed Access
Word Segmentation in the Spoken Dutch Corpus
12
Citations
8
References
2002
Year
EngineeringSpeech CorpusPart-of-speech TaggingSpoken Language ProcessingWord SegmentationCorpus LinguisticsSpeech RecognitionApplied LinguisticsNatural Language ProcessingGenerated SegmentationLanguage DocumentationText SegmentationComputational LinguisticsPhoneticsGrammarLanguage StudiesAutomatic SegmentationMachine TranslationSpeech CommunicationSpeech AnalysisLanguage RecognitionLanguage CorpusSpeech ProcessingSpeech PerceptionLinguistics
This paper describes the aims of the word segmentation in the Spoken Dutch Corpus (Corpus Gesproken Nederlands, CGN), and the procedures to create it. For one million words, a manually verified segmentation will be created, whereas the remaining nine million words will only come with an automatically generated segmentation. Described are our efforts to create the best possible automatic word segmentation from an auditory verified phonetic transcription, and the development of a protocol for the manual verification of that automatic segmentation. The paper also mentions some figures concerning the manual verification of the first hundred thousand words.
| Year | Citations | |
|---|---|---|
Page 1
Page 1