Publication | Open Access
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger
946
Citations
10
References
2000
Year
Unknown Venue
Syntactic ParsingEngineeringTaggingKnowledge ExtractionPart-of-speech TaggingCorpus LinguisticsText MiningSpeech RecognitionNatural Language ProcessingSyntaxInformation RetrievalData ScienceComputational LinguisticsPenn TreebankLanguage EngineeringGrammarLanguage StudiesMachine TranslationNlp TaskKnowledge DiscoveryUnseen WordsInformation ExtractionSpeech TaggerTreebanksKnowledge SourcesLinguisticsPo Tagging
This paper presents results for a maximum-entropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
| Year | Citations | |
|---|---|---|
1996 | 3.1K | |
1999 | 2K | |
1999 | 1.5K | |
2000 | 1.3K | |
1996 | 1.3K | |
1998 | 432 | |
2000 | 325 | |
2002 | 93 | |
1997 | 68 | |
1997 | 22 |
Page 1
Page 1