Publication | Closed Access
Automatically Extracting Structure from Free Text Addresses.
318
Citations
21
References
2000
Year
Unknown Venue
In this paper we present a novel way to automatically elementize postal addresses seen as a plain text string into atomic structured elements like ”City ” and ”Street name”. This is an essential step in all warehouse data cleaning activities. In spite of the practical importance of the problem and the technical challenges it offers, research effort on the topic has been limited. Existing commercial approaches are based on hand-tuned, rule-based approaches that are brittle and require extensive manual effort when moved to a different postal system. We present a Hidden Markov Model based approach that can work with just about any address domain when seeded with a small training data set. Experiments on real-life datasets yield accuracy of 89 % on a heterogeneous nationwide database of Indian postal addresses and 99.6 % on US addresses that tend to be more templatized. 1
| Year | Citations | |
|---|---|---|
Page 1
Page 1