Publication | Closed Access
A maximum likelihood model for topic classification of broadcast news
57
Citations
6
References
1997
Year
Unknown Venue
EngineeringCommunicationCorpus LinguisticsJournalismText MiningWord EmbeddingsNatural Language ProcessingInformation RetrievalData ScienceData MiningComputational LinguisticsDocument ClassificationNews AnalyticsLanguage StudiesNews SemanticsContent AnalysisAccurate Topic ClassificationAutomatic ClassificationKnowledge DiscoveryTopic ClassificationNew AlgorithmMaximum Likelihood ModelTopic ModelLinguistics
We describe a new algorithm for topic classification that allows discrimination among thousands of topics. A mixture of topics explicitly models the fact that each story has multiple topics, that different words are related to different topics, and that most of the words are not related to any topic. The resulting model, trained by EM, has sharper distributions of words that result in more accurate topic classification. We tested the algorithm on transcribed broadcast news texts. When trained on one year of stories containing over 5,000 different topics and tested on new (later) stories the first choice topic was among the manually annotated choices 76% of the time.
| Year | Citations | |
|---|---|---|
Page 1
Page 1