Publication | Closed Access
Deduplicating a places database
30
Citations
11
References
2014
Year
Unknown Venue
EngineeringGeographic Information RetrievalCentral ParkLocalizationLocation-based ServiceGeographic Information SystemsInformation RetrievalData ScienceData MiningPlaces DatabaseData IntegrationPublic HealthDatabase ConstructionPhysical LocationData ManagementSimilarity SearchKnowledge DiscoveryComputer ScienceUrban GeographyUrban DesignGeospatial SemanticsCentral Park CafeLocation Information
We consider the problem of resolving duplicates in a database of places, where a place is defined as any entity that has a name and a physical location. When other auxiliary attributes like phone and full address are not available, deduplication based solely on names and approximate location becomes an exceptionally challenging problem that requires both domain knowledge as well an local geographical knowledge. For example, the pairs "Newpark Mall Gap Outlet" and "Newpark Mall Sears Outlet" have a high string similarity, but determining that they are different requires the domain knowledge that they represent two different store names in the same mall. Similarly, in most parts of the world, a local business called "Central Park Cafe" might simply be referred to by "Central Park", except in New York, where the keyword "Cafe" in the name becomes important to differentiate it from the famous park in the city.
| Year | Citations | |
|---|---|---|
1988 | 9.3K | |
2003 | 1.4K | |
2003 | 926 | |
1998 | 863 | |
2010 | 704 | |
2003 | 476 | |
2006 | 107 | |
1999 | 71 | |
2021 | 67 | |
2006 | 64 |
Page 1
Page 1