Concepedia

Publication | Closed Access

Named Entity Location Prediction Combining Twitter and Web

10

Citations

43

References

2020

Year

Abstract

Knowledge bases are critical to many applications. However, they are greatly incomplete. Enriching knowledge bases with new entities and new location attributes becomes increasingly important. Given a named entity with tweets and Web documents where the entity appears, we aim to predict the entity city-level location combining the geographical location knowledge embedded in both Twitter and Web. This task is helpful for knowledge base enrichment and tweet location prediction. In this paper we propose NELPTW, the first unsupervised framework for <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><b>N</b></u> amed <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><b>E</b></u> ntity <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><b>L</b></u> ocation <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><b>P</b></u> rediction by leveraging the knowledge from <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><b>T</b></u> witter and <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><b>W</b></u> eb. Based on each data source, NELPTW utilizes a linear function ranking model to generate several rankings to the candidate location set for each entity. To combine the knowledge from two sources which have different reliability and importance for the location prediction, an unsupervised rank aggregation algorithm is developed to aggregate multiple rankings for each entity to obtain a better ranking. A learning algorithm based on the EM method is proposed to automatically learn the parameters of the ranking model without requiring any training labels. The experimental results over a real world Twitter and Web data set show that our framework significantly outperforms the baselines in terms of accuracy.

References

YearCitations

Page 1