Publication | Closed Access
Name-ethnicity classification from open sources
238
Citations
14
References
2009
Year
Unknown Venue
EthnicityEngineeringEthnicity ClassifierEducationEthnic Group RelationCorpus LinguisticsText MiningRaceNatural Language ProcessingClassification MethodInformation RetrievalData ScienceData MiningPattern RecognitionAfrican American StudiesDocument ClassificationRacial GroupBiostatisticsEthnic StudiesNamed-entity RecognitionStatisticsEthnicity IdentificationAutomatic ClassificationOpen SourcesKnowledge DiscoveryAuthor ProfilingIntelligent ClassificationEthnic IdentityHidden Markov Models
The problem of ethnicity identification from names has a variety of important applications, including biomedical research, demographic studies, and marketing. Here we report on the development of an ethnicity classifier where all training data is extracted from public, non-confidential (and hence somewhat unreliable) sources. Our classifier uses hidden Markov models (HMMs) and decision trees to classify names into 13 cultural/ethnic groups with individual group accuracy comparable accuracy to earlier binary (e.g., Spanish/non-Spanish) classifiers. We have applied this classifier to over 20 million names from a large-scale news corpus, identifying interesting temporal and spatial trends on the representation of particular cultural/ethnic groups.
| Year | Citations | |
|---|---|---|
Page 1
Page 1