Concepedia

Abstract

This paper describes enhancements made to techniques currently used to search large databases of proper names. Improvements included use of a Hidden Markov Model (HMM) statistical classifier to identify the likely linguistic provenance of a surname, and application of language-specific rules to generate plausible spelling variations of names. These two components were incorporated into a prototype front-end system driving existing name search procedures. HMM models and sets of linguistic rules were constructed for Farsi, Spanish and Vietnamese surnames and tested on a database of over 11,000 entries. Preliminary evaluation indicates improved retrieval of 20--30% as measured by number of correct items retrieved.

References

YearCitations

Page 1