Publication | Closed Access
Web-a-where
528
Citations
17
References
2004
Year
Unknown Venue
Web MiningSemantic TaggingEngineeringTaggingInformation RetrievalData MiningGeographic Information RetrievalData ScienceGeospatial SemanticsGeo/geo AmbiguityWeb PagesLocation-aware Social MediumSemantic WebGeographic FocusCorpus LinguisticsText Mining
Geotagging Web pages requires resolving geo/non‑geo and geo/geo ambiguities, such as distinguishing the city from the person named Berlin or the country from the animal named Turkey, and the authors describe a fast, scalable tagging process that addresses these challenges.
We describe Web-a-Where, a system for associating geography with Web pages. Web-a-Where locates mentions of places and determines the place each name refers to. In addition, it assigns to each page a geographic focus --- a locality that the page discusses as a whole. The tagging process is simple and fast, aimed to be applied to large collections of Web pages and to facilitate a variety of location-based applications and data analyses.Geotagging involves arbitrating two types of ambiguities: geo/non-geo and geo/geo. A geo/non-geo ambiguity occurs when a place name also has a non-geographic meaning, such as a person name (e.g., Berlin) or a common word (Turkey). Geo/geo ambiguity arises when distinct places have the same name, as in London, England vs. London, Ontario.An implementation of the tagger within the framework of the WebFountain data mining system is described, and evaluated on several corpora of real Web pages. Precision of up to 82% on individual geotags is achieved. We also evaluate the relative contribution of various heuristics the tagger employs, and evaluate the focus-finding algorithm using a corpus pretagged with localities, showing that as many as 91% of the foci reported are correct up to the country level.
| Year | Citations | |
|---|---|---|
Page 1
Page 1