Publication | Closed Access
Heterogeneous Latent Topic Discovery for Semantic Text Mining
41
Citations
32
References
2021
Year
EngineeringTopic ModelingCorpus LinguisticsText MiningWord EmbeddingsNatural Language ProcessingLatent ModelingInformation RetrievalData ScienceData MiningWord EmbeddingDocument ClusteringUnstructured DataKnowledge DiscoverySemantic Text MiningComputer ScienceDistributional SemanticsLatent SemanticsTopic ModelKeyword Extraction
In order to mine latent semantics from text data, word embedding and topic modeling are two major methodologies in industry. From a pragmatic perspective, each of these two lines of semantic models faces increasing challenges from real-life applications. However, modern text mining tasks typically require a panoramic view of the latent semantics. Hence, discovering heterogeneous semantics (e.g., heterogeneous types of latent topics) is critical for the performance of these tasks, and it is necessary to design a model that meets this demand. Furthermore, with the arrival of the big data era and the increasing awareness of data privacy, it is necessary to study the issues of mining heterogeneous semantics with high efficiency while avoiding compromising data privacy. In this work, we develop a novel method called Heterogeneous Latent Topic Discovery (HLTD) which seamlessly integrates topic modeling with word embedding to discover heterogeneous latent topics. By coupling parameter-server architecture with new private sampling algorithms, HLTD can be efficiently trained with effective protection of underlying data privacy. We evaluate HLTD through a wide range of qualitative and quantitative metrics in industry. Extensive experiments demonstrates the superiority of HLTD over the state-of-the-arts.
| Year | Citations | |
|---|---|---|
Page 1
Page 1