Concepedia

Abstract

In order to mine latent semantics from text data, word embedding and topic modeling are two major methodologies in industry. From a pragmatic perspective, each of these two lines of semantic models faces increasing challenges from real-life applications. However, modern text mining tasks typically require a panoramic view of the latent semantics. Hence, discovering heterogeneous semantics (e.g., heterogeneous types of latent topics) is critical for the performance of these tasks, and it is necessary to design a model that meets this demand. Furthermore, with the arrival of the big data era and the increasing awareness of data privacy, it is necessary to study the issues of mining heterogeneous semantics with high efficiency while avoiding compromising data privacy. In this work, we develop a novel method called Heterogeneous Latent Topic Discovery (HLTD) which seamlessly integrates topic modeling with word embedding to discover heterogeneous latent topics. By coupling parameter-server architecture with new private sampling algorithms, HLTD can be efficiently trained with effective protection of underlying data privacy. We evaluate HLTD through a wide range of qualitative and quantitative metrics in industry. Extensive experiments demonstrates the superiority of HLTD over the state-of-the-arts.

References

YearCitations

Page 1