Concepedia

Publication | Open Access

Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

946

Citations

10

References

2000

Year

Abstract

This paper presents results for a maximum-entropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.

References

YearCitations

1996

3.1K

1999

2K

1999

1.5K

2000

1.3K

1996

1.3K

1998

432

2000

325

2002

93

1997

68

1997

22

Page 1