Publication | Open Access
A maximum entropy Chinese character-based parser
47
Citations
15
References
2003
Year
Unknown Venue
Syntactic ParsingEngineeringPart-of-speech TaggingMaximum Entropy ParserWord SegmentationCorpus LinguisticsText MiningNatural Language ProcessingSyntaxComputational LinguisticsChinese TreebankGrammarLanguage StudiesMachine TranslationSemantic ParsingShallow ParsingParsingTreebanksLinguisticsPo Tagging
The paper presents a maximum entropy Chinese character-based parser trained on the Chinese Treebank ("CTB" henceforth). Word-based parse trees in CTB are first converted into character-based trees, where word-level part-of-speech (POS) tags become constituent labels and character-level tags are derived from word-level POS tags. A maximum entropy parser is then trained on the character-based corpus. The parser does word-segmentation, POS-tagging and parsing in a unified framework. An average label F-measure 81.4% and word-segmentation F-measure 96.0% are achieved by the parser. Our results show that word-level POS tags can improve significantly word-segmentation, but higher-level syntactic strutures are of little use to word segmentation in the maximum entropy parser. A word-dictionary helps to improve both word-segmentation and parsing accuracy.
| Year | Citations | |
|---|---|---|
Page 1
Page 1