Publication | Closed Access
The generalized dirichlet distribution in enhanced topic detection
34
Citations
22
References
2012
Year
Unknown Venue
Bayesian StatisticEngineeringGeneralized Dirichlet DistributionBayesian InferenceText MiningNatural Language ProcessingLatent ModelingInformation RetrievalData ScienceData MiningStatisticsBayesian Hierarchical ModelingDocument ClusteringEffective Topic CorrelationKnowledge DiscoveryDirichlet DistributionComputer ScienceTopic ModelKeyword ExtractionStatistical Inference
We present a new, robust and computationally efficient Hierarchical Bayesian model for effective topic correlation modeling. We model the prior distribution of topics by a Generalized Dirichlet distribution (GD) rather than a Dirichlet distribution as in Latent Dirichlet Allocation (LDA). We define this model as GD-LDA. This framework captures correlations between topics, as in the Correlated Topic Model (CTM) and Pachinko Allocation Model (PAM), and is faster to infer than CTM and PAM. GD-LDA is effective to avoid over-fitting as the number of topics is increased. As a tree model, it accommodates the most important set of topics in the upper part of the tree based on their probability mass. Thus, GD-LDA provides the ability to choose significant topics effectively. To discover topic relationships, we perform hyper-parameter estimation based on Monte Carlo EM Estimation. We provide results using Empirical Likelihood(EL) in 4 public datasets from TREC and NIPS. Then, we present the performance of GD-LDA in ad hoc information retrieval (IR) based on MAP, [email protected], and Discounted Gain. We discuss an empirical comparison of the fitting time. We demonstrate significant improvement over CTM, LDA, and PAM for EL estimation. For all the IR measures, GD-LDA shows higher performance than LDA, the dominant topic model in IR. All these improvements with a small increase in fitting time than LDA, as opposed to CTM and PAM.
| Year | Citations | |
|---|---|---|
Page 1
Page 1