Deriving a large scale taxonomy from Wikipedia

TLDR

The study aims to use Wikipedia’s category system as a conceptual network to derive a large‑scale taxonomy. We label semantic relations between categories using connectivity‑based methods and lexicosyntactic matching, then evaluate the resulting taxonomy against ResearchCyc and benchmark datasets. The resulting taxonomy contains extensive subsumption (isa) relations, and its quality is validated by comparison with ResearchCyc and semantic similarity benchmarks.

Abstract

We take the category system in Wikipedia as a conceptual network. We label the semantic relations between categories using methods based on connectivity in the network and lexicosyntactic matching. As a result we are able to derive a large scale taxonomy containing a large amount of subsumption, i.e. isa, relations. We evaluate the quality of the created resource by comparing it with ResearchCyc, one of the largest manually annotated ontologies, as well as computing semantic similarity between words in benchmarking datasets.

References

Page 1

	Year	Citations

Page 1