Concepedia

Publication | Closed Access

Learning Topics in Short Texts by Non-negative Matrix Factorization on Term Correlation Matrix

104

Citations

23

References

2013

Year

Abstract

Nowadays, short texts are very prevalent in various web applications, such as microblogs, instant messages. The severe sparsity of short texts hinders existing topic models to learn reliable topics. In this paper, we propose a novel way to tackle this problem. The key idea is to learn topics by exploring term correlation data, rather than the high-dimensional and sparse term occurrence information in documents. Such term correlation data is less sparse and more stable with the increase of the collection size, and can well capture the necessary information for topic learning. To obtain reliable topics from term correlation data, we first introduce a novel way to compute term correlation in short texts by representing each term with its co-occurred terms. Then we formulated the topic learning problem as symmetric non-negative matrix factorization on the term correlation matrix. After learning the topics, we can easily infer the topics of documents. Experimental results on three data sets show that our method provides substantially better performance than the baseline methods.

References

YearCitations

Page 1