Publication | Closed Access
A Novel Hybrid Clustering Algorithm for Topic Detection on Chinese Microblogging
22
Citations
20
References
2019
Year
Cluster ComputingEngineeringPublic OpinionHot TopicsCommunicationChinese MicrobloggingJournalismText MiningNatural Language ProcessingComputational Social ScienceSocial MediaData ScienceData MiningContent AnalysisSocial Medium MiningDocument ClusteringKnowledge DiscoveryTopic ModelTopic DetectionArtsSocial Medium Data
The hot topics discussed on microblogs mirror public opinion, so the topic detection on microblogs is of great significance for the detection and management of public opinion. However, it is difficult for traditional clustering algorithms to handle the large-scale microblogging data with various topics and high noise. Therefore, we propose a three-layer hybrid algorithm to tackle this problem. In the first layer, we use the K -means algorithm, in which the initial center selection optimized to group the microblog texts efficiently. We then subdivide big clusters and isolate noise text to get purer clusters. In the second layer, we adopt the agglomerative nesting (AGNES) algorithm to merge the small clusters referring to the same topic. Then, we exclude most noise, reducing their further impact on the K -means in the third layer which corrects the erroneous merging occurring in AGNES. Experiments show that our algorithm outperforms some related traditional algorithms on the clustering of real microblogging data set and performs well in the topic detection.
| Year | Citations | |
|---|---|---|
Page 1
Page 1