Publication | Closed Access
Semantic Annotation of Documents Based on Wikipedia Concepts
15
Citations
0
References
2018
Year
Unknown Venue
EngineeringKnowledge ExtractionSemantic WebSemanticsCorpus LinguisticsSemantic WikiText MiningNatural Language ProcessingLarge WikipediaInformation RetrievalComputational LinguisticsOntology LearningLanguage StudiesSemantic AnnotationEntity DisambiguationKnowledge DiscoveryTerminology ExtractionSemantic TaggingAnnotation ToolParallel ProcessingLinguistics
Semantic annotation is the task of augmenting an unstructured textual document with semantic information, such as concepts from an ontology. In wikification, the Wikipedia is used as an ontology and its pages (articles) are regarded as (representations of) concepts. We describe an efficient approach for annotating a document with relevant concepts from the Wikipedia. A global disambiguation method based on constructing a mention-concept graph and computing pagerank over it is used to identify a coherent set of relevant concepts considering the input document as a whole. The presented approach is suitable for parallel processing and can support any language for which a sufficiently large Wikipedia is available. Several heuristics involved in the disambiguation of candidate annotations are discussed and an experimental evaluation of their influence is presented.