Publication | Closed Access
A Topic Model Based on Poisson Decomposition
10
Citations
39
References
2017
Year
Unknown Venue
EngineeringCorpus LinguisticsText MiningNatural Language ProcessingLatent ModelingInformation RetrievalData ScienceComputational LinguisticsPoisson Decomposition ModelDocument ClassificationLanguage StudiesContent AnalysisStatisticsDocument ClusteringKnowledge DiscoveryCoherent TopicsPoisson DecompositionVector Space ModelTopic ModelAppropriate Statistical DistributionsKeyword ExtractionLinguistics
Determining appropriate statistical distributions for modeling text corpora is important for accurate estimation of numerical characteristics. Based on the validity of the test on a claim that the data conforms to Poisson distribution we propose Poisson decomposition model (PDM), a statistical model for modeling count data of text corpora, which can straightly capture each document's multidimensional numerical characteristics on topics. In PDM, each topic is represented as a parameter vector with multidimensional Poisson distribution, which can be easily normalized to multinomial term probabilities and each document is represented as measurements on topics and thereby reduced to a measurement vector on topics. We use gradient descent methods and sampling algorithm for parameter estimation. We carry out extensive experiments on the topics produced by our models. The results demonstrate our approach can extract more coherent topics and is competitive in document clustering by using the PDM-based features, compared to PLSI and LDA.
| Year | Citations | |
|---|---|---|
Page 1
Page 1