Publication | Closed Access
Transforming Wikipedia into Named Entity Training Data
106
Citations
14
References
2008
Year
Unknown Venue
Statistical named entity recognisers require costly hand‑labelled training data, so most existing corpora are small. The study aims to use Wikipedia to create a massive corpus of named entity annotated text. The authors transform Wikipedia links into named entity annotations by classifying target articles into entity types such as person, organization, and location. Compared to MUC, CONLL, and BBN corpora, the Wikipedia‑derived corpus generally performs better in cross‑corpus train/test settings.
Statistical named entity recognisers require costly hand-labelled training data and, as a result, most existing corpora are small. We exploit Wikipedia to create a massive corpus of named entity annotated text. We transform Wikipedia’s links into named entity annotations by classifying the target articles into common entity types (e.g. person, organisation and location). Comparing to MUC, CONLL and BBN corpora, Wikipedia generally performs better than other cross-corpus train/test pairs.
| Year | Citations | |
|---|---|---|
Page 1
Page 1