Concepedia

TLDR

Location extraction, also called toponym extraction, involves geoparsing and geotagging to assign spatial coordinates to location mentions in text. The study evaluates five state‑of‑the‑art location extraction algorithms, including a geoparsing method built on OpenStreetMap and a language‑model‑based geotagging approach. The authors benchmarked the methods on geoparsing tweets and geotagging Flickr posts, using DBpedia, Geonames, and Google Geocoder as third‑party baselines, and also performed a qualitative recall test on tweets during major news events. The OpenStreetMap‑based geoparsing achieved F1 > 0.90 for English, the language‑model geotagging reached F1@1 km = 0.49, and the map database scored R@20 > 0.60 in the qualitative recall, with detailed strengths, weaknesses, and failure analysis presented.

Abstract

Location extraction, also called “toponym extraction,” is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. This article evaluates five “best-of-class” location extraction algorithms. We develop a geoparsing algorithm using an OpenStreetMap database, and a geotagging algorithm using a language model constructed from social media tags and multiple gazetteers. Third-party work evaluated includes a DBpedia-based entity recognition and disambiguation approach, a named entity recognition and Geonames gazetteer approach, and a Google Geocoder API approach. We perform two quantitative benchmark evaluations, one geoparsing tweets and one geotagging Flickr posts, to compare all approaches. We also perform a qualitative evaluation recalling top N location mentions from tweets during major news events. The OpenStreetMap approach was best (F1 0.90+) for geoparsing English, and the language model approach was best (F1 0.66) for Turkish. The language model was best (F1@1km 0.49) for the geotagging evaluation. The map database was best (R@20 0.60+) in the qualitative evaluation. We report on strengths, weaknesses, and a detailed failure analysis for the approaches and suggest concrete areas for further research.

References

YearCitations

Page 1