Publication | Closed Access
The research on text clustering based on LDA joint model
13
Citations
31
References
2017
Year
EngineeringCorpus LinguisticsText MiningNatural Language ProcessingLda ModelInformation RetrievalData ScienceData MiningText SegmentationDocument ClassificationLanguage StudiesContent AnalysisDocument ClusteringKnowledge DiscoveryText ClusteringComputer ScienceVector Space ModelTopic ModelKeyword ExtractionCluster AlgorithmLinguistics
This paper proposed a cluster algorithm based on the combination of LDA (Latent Dirichlet allocation) probabilistic topic model and VSM (Vector Space Model), with the three-tier framework adopted containing text, topic and feature word. Although LDA alone has the ability to seek out the hidden topi c knowledge, it is hard for the low-dimensional model to remain the integrity of the text information, leading to insufficient capacity for distinguishing texts. The paper is set to launch the cluster analysis in turns of feature words and topic through integrating two model above. With a better mix of LDA and VSM, the clustering effect will be improved, paralleling determining the optimal clustering number K of the K-means algorithms and optimum topic number T of LDA model. In order to design the algorithms more scientifically and effectively, silhouette coefficient and Dunn coefficient have been brought in to make assessments.
| Year | Citations | |
|---|---|---|
Page 1
Page 1