Publication | Closed Access
Improving word segmentation by simultaneously learning phonotactics
14
Citations
22
References
2008
Year
Unknown Venue
Syntactic ParsingEngineeringNeurolinguisticsSimple Unigram ModelSpoken Language ProcessingWord SegmentationPhonologyCorpus LinguisticsMbdp-1 AlgorithmSpeech RecognitionNatural Language ProcessingData ScienceText SegmentationComputational LinguisticsPhoneticsLanguage EngineeringLanguage StudiesMachine TranslationTrigram Phono-tactic ModelsLanguage RecognitionSpeech ProcessingLinguistics
The most accurate unsupervised word segmentation systems that are currently available (Brent, 1999; Venkataraman, 2001; Goldwater, 2007) use a simple unigram model of phonotactics. While this simplifies some of the calculations, it overlooks cues that infant language acquisition researchers have shown to be useful for segmentation (Mattys et al., 1999; Mattys and Jusczyk, 2001). Here we explore the utility of using bigram and trigram phono-tactic models by enhancing Brent's (1999) MBDP-1 algorithm. The results show the improved MBDP-Phon model outperforms other unsupervised word segmentation systems (e.g., Brent, 1999; Venkataraman, 2001; Goldwater, 2007).
| Year | Citations | |
|---|---|---|
Page 1
Page 1