Publication | Open Access
Chinese segmentation and new word detection using conditional random fields
467
Citations
20
References
2004
Year
Unknown Venue
EngineeringMachine LearningMultiple LexiconsDomain KnowledgeCorpus LinguisticsText MiningChinese Word SegmentationNatural Language ProcessingSpeech RecognitionData ScienceText SegmentationText RecognitionComputational LinguisticsLanguage EngineeringLanguage StudiesCharacter RecognitionNamed-entity RecognitionMachine TranslationChinese SegmentationText ProcessingLinguisticsPo Tagging
Chinese word segmentation is a difficult, important and widely-studied sequence modeling problem. This paper demonstrates the ability of linear-chain conditional random fields (CRFs) to perform robust and accurate Chinese word segmentation by providing a principled framework that easily supports the integration of domain knowledge in the form of multiple lexicons of characters and words. We also present a probabilistic new word detection method, which further improves performance. Our system is evaluated on four datasets used in a recent comprehensive Chinese word segmentation competition. State-of-the-art performance is obtained.
| Year | Citations | |
|---|---|---|
Page 1
Page 1