Publication | Closed Access
BTM: Topic Modeling over Short Texts
546
Citations
40
References
2014
Year
Biterm Topic ModelShort TextsEngineeringTopic ModelingText MiningWord EmbeddingsNatural Language ProcessingSocial MediaInformation RetrievalData ScienceComputational LinguisticsLanguage StudiesContent AnalysisMachine TranslationNlp TaskKnowledge DiscoveryRetrieval Augmented GenerationTopic ModelKeyword ExtractionLinguistics
Short texts are popular on today's web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as biterm topic model (BTM). BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e., biterms) in the corpus, making the inference effective with the rich corpus-level information. To cope with large scale short text data, we further introduce two online algorithms for BTM for efficient topic learning. Experiments on real-word short text collections show that BTM can discover more prominent and coherent topics, and significantly outperform the state-of-the-art baselines. We also demonstrate the appealing performance of the two online BTM algorithms on both time efficiency and topic learning.
| Year | Citations | |
|---|---|---|
Page 1
Page 1