Publication | Open Access
Improving Accuracy in Word Class Tagging through the Combination of Machine Learning Systems
186
Citations
38
References
2001
Year
EngineeringTaggingPart-of-speech TaggingCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningComputational LinguisticsDocument ClassificationGrammarLanguage StudiesLanguage ModelsCombination TaggersMachine TranslationWord Class TaggingAutomatic ClassificationNlp TaskKnowledge DiscoveryLanguage TechnologyComputer ScienceSame Nlp TaskSemantic TaggingMachine Learning SystemsLinguisticsPo Tagging
The study investigates whether combining outputs from multiple language models can surpass the accuracy of the best single model. The authors trained four distinct tagger generators on the same corpora and combined their outputs via voting strategies and second‑stage classifiers to improve tagging accuracy. Combined taggers achieved higher accuracy than any single component, reducing error rates by up to 24.3 % on the LOB corpus.
We examine how differences in language models, learned by different data-driven systems performing the same NLP task, can be exploited to yield a higher accuracy than the best individual system. We do this by means of experiments involving the task of morphosyntactic word class tagging, on the basis of three different tagged corpora. Four well-known tagger generators (hidden Markov model, memory-based, transformation rules, and maximum entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second-stage classifiers. All combination taggers outperform their best component. The reduction in error rate varies with the material in question, but can be as high as 24.3% with the LOB corpus.
| Year | Citations | |
|---|---|---|
Page 1
Page 1