Publication | Closed Access
Using term informativeness for named entity detection
56
Citations
15
References
2005
Year
Unknown Venue
EngineeringSemantic WebEntity DetectionCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceComputational LinguisticsLanguage StudiesContent AnalysisNamed-entity RecognitionMixture ModelsEntity DisambiguationNlp TaskLinguisticsKnowledge DiscoveryTerminology ExtractionInformal CommunicationInformation ExtractionKeyword ExtractionMixture Model Likelihood
Informal communication (e-mail, bulletin boards) poses a difficult learning environment because traditional grammatical and lexical information are noisy. Other information is necessary for tasks such as named entity detection. How topic-centric, or informative, a word is can be valuable information. It is well known that informative words are best modeled by "heavy-tailed" distributions, such as mixture models. However, informativeness scores do not take full advantage of this fact. We introduce a new informativeness score that directly utilizes mixture model likelihood to identify informative words. We use the task of extracting restaurant names from bulletin board posts as a way to determine effectiveness. We find that our "mixture score" is weakly effective alone and highly effective when combined with Inverse Document Frequency. We compare against other informativeness criteria and find that only Residual IDF is competitive against our combined IDF/Mixture score.
| Year | Citations | |
|---|---|---|
Page 1
Page 1