Concepedia

TLDR

Named Entity recognition systems are often said to require extensive gazetteers, and compiling such lists is considered a bottleneck in NER design. This study presents an NER system that integrates rule‑based grammars with maximum entropy statistical models. The system’s performance was evaluated using gazetteers of varying types and sizes on MUC‑7 competition data. Results show that small gazetteers of well‑known names are sufficient for the task, and the experiments suggest domain independence.

Abstract

It is often claimed that Named Entity recognition systems need extensive gazetteers---lists of names of people, organisations, locations, and other named entities. Indeed, the compilation of such gazetteers is sometimes mentioned as a bottleneck in the design of Named Entity recognition systems.We report on a Named Entity recognition system which combines rule-based grammars with statistical (maximum entropy) models. We report on the system's performance with gazetteers of different types and different sizes, using test material from the MUC-7 competition. We show that, for the text type and task of this competition, it is sufficient to use relatively small gazetteers of well-known names, rather than large gazetteers of low-frequency names. We conclude with observations about the domain independence of the competition and of our experiments.

References

YearCitations

Page 1