Publication | Closed Access
Deriving a large scale taxonomy from Wikipedia
447
Citations
33
References
2007
Year
EngineeringKnowledge ExtractionLarge Scale TaxonomySemantic WebSemanticsCorpus LinguisticsSemantic WikiCategory SystemText MiningNatural Language ProcessingInformation RetrievalData ScienceComputational LinguisticsOntology LearningLanguage StudiesOntology AlignmentEntity DisambiguationKnowledge DiscoveryTerminology ExtractionSemantic NetworkConceptual NetworkLinguisticsSemantic Similarity
The study aims to use Wikipedia’s category system as a conceptual network to derive a large‑scale taxonomy. We label semantic relations between categories using connectivity‑based methods and lexicosyntactic matching, then evaluate the resulting taxonomy against ResearchCyc and benchmark datasets. The resulting taxonomy contains extensive subsumption (isa) relations, and its quality is validated by comparison with ResearchCyc and semantic similarity benchmarks.
We take the category system in Wikipedia as a conceptual network. We label the semantic relations between categories using methods based on connectivity in the network and lexicosyntactic matching. As a result we are able to derive a large scale taxonomy containing a large amount of subsumption, i.e. isa, relations. We evaluate the quality of the created resource by comparing it with ResearchCyc, one of the largest manually annotated ontologies, as well as computing semantic similarity between words in benchmarking datasets.
| Year | Citations | |
|---|---|---|
Page 1
Page 1