Location Extraction from Social Media

TLDR

Location extraction, also called toponym extraction, involves geoparsing and geotagging to assign spatial coordinates to location mentions in text. The study evaluates five state‑of‑the‑art location extraction algorithms, including a geoparsing method built on OpenStreetMap and a language‑model‑based geotagging approach. The authors benchmarked the methods on geoparsing tweets and geotagging Flickr posts, using DBpedia, Geonames, and Google Geocoder as third‑party baselines, and also performed a qualitative recall test on tweets during major news events. The OpenStreetMap‑based geoparsing achieved F1 > 0.90 for English, the language‑model geotagging reached F1@1 km = 0.49, and the map database scored R@20 > 0.60 in the qualitative recall, with detailed strengths, weaknesses, and failure analysis presented.

Abstract

Location extraction, also called “toponym extraction,” is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. This article evaluates five “best-of-class” location extraction algorithms. We develop a geoparsing algorithm using an OpenStreetMap database, and a geotagging algorithm using a language model constructed from social media tags and multiple gazetteers. Third-party work evaluated includes a DBpedia-based entity recognition and disambiguation approach, a named entity recognition and Geonames gazetteer approach, and a Google Geocoder API approach. We perform two quantitative benchmark evaluations, one geoparsing tweets and one geotagging Flickr posts, to compare all approaches. We also perform a qualitative evaluation recalling top N location mentions from tweets during major news events. The OpenStreetMap approach was best (F1 0.90+) for geoparsing English, and the language model approach was best (F1 0.66) for Turkish. The language model was best (F1@1km 0.49) for the geotagging evaluation. The map database was best (R@20 0.60+) in the qualitative evaluation. We report on strengths, weaknesses, and a detailed failure analysis for the approaches and suggest concrete areas for further research.

References

Page 1

	Year	Citations

Page 1