Publication | Closed Access
Text Classification with Topic-based Word Embedding and Convolutional Neural Networks
51
Citations
24
References
2016
Year
Unknown Venue
EngineeringMachine LearningBiomedical Literature IndexingLarge Language ModelText MiningWord EmbeddingsNatural Language ProcessingData ScienceComputational LinguisticsDocument ClassificationText ClassificationLanguage StudiesBiomedical Text MiningContent AnalysisMachine TranslationAutomatic ClassificationBiomedical LiteratureDeep LearningTopic ModelText ProcessingLinguistics
Recently, distributed word embeddings trained by neural language models are commonly used for text classification with Convolutional Neural Networks (CNNs). In this paper, we propose a novel neural language model, Topic-based Skip-gram, to learn topic-based word embeddings for biomedical literature indexing with CNNs. Topic-based Skip-gram leverages textual content with topic models, e.g., Latent Dirichlet Allocation (LDA), to capture precise topic-based word relationship and then integrate it into distributed word embedding learning. We then describe two multimodal CNN architectures, which are able to employ different kinds of word embeddings at the same time for text classification. Through extensive experiments conducted on several real-world datasets, we demonstrate that combination of our Topic-based Skip-gram and multimodal CNN architectures outperforms state-of-the-art methods in biomedical literature indexing, clinical note annotation and general textual benchmark dataset classification.
| Year | Citations | |
|---|---|---|
Page 1
Page 1