Publication | Closed Access
Learning Topics in Short Texts by Non-negative Matrix Factorization on Term Correlation Matrix
104
Citations
23
References
2013
Year
Unknown Venue
Non-negative Matrix FactorizationShort TextsEngineeringCorpus LinguisticsText MiningWord EmbeddingsNatural Language ProcessingInformation RetrievalData ScienceComputational LinguisticsDocument ClassificationReliable TopicsLanguage StudiesContent AnalysisTerm Correlation MatrixLinguisticsKnowledge DiscoveryTerminology ExtractionComputer ScienceVector Space ModelTopic ModelMatrix FactorizationKeyword ExtractionTerm Correlation
Nowadays, short texts are very prevalent in various web applications, such as microblogs, instant messages. The severe sparsity of short texts hinders existing topic models to learn reliable topics. In this paper, we propose a novel way to tackle this problem. The key idea is to learn topics by exploring term correlation data, rather than the high-dimensional and sparse term occurrence information in documents. Such term correlation data is less sparse and more stable with the increase of the collection size, and can well capture the necessary information for topic learning. To obtain reliable topics from term correlation data, we first introduce a novel way to compute term correlation in short texts by representing each term with its co-occurred terms. Then we formulated the topic learning problem as symmetric non-negative matrix factorization on the term correlation matrix. After learning the topics, we can easily infer the topics of documents. Experimental results on three data sets show that our method provides substantially better performance than the baseline methods.
| Year | Citations | |
|---|---|---|
Page 1
Page 1