Publication | Open Access
Network-Based Bag-of-Words Model for Text Classification
91
Citations
33
References
2020
Year
EngineeringSemantic WebHybrid NetworkCorpus LinguisticsText MiningWord EmbeddingsNatural Language ProcessingInformation RetrievalData ScienceComputational LinguisticsDocument ClassificationText ClassificationLanguage StudiesAutomatic ClassificationKnowledge DiscoveryBow ModelIntelligent ClassificationVector Space ModelKeyword ExtractionLinguisticsSemantic Similarity
The rapidly developing internet and other media have produced a tremendous amount of text data, making it a challenging and valuable task to find a more effective way to analyze text data by machine. Text representation is the first step for a machine to understand the text, and the commonly used text representation method is the Bag-of-Words (BoW) model. To form the vector representation of a document, the BoW model separately matches and counts each element in the document, neglecting much correlation information among words. In this paper, we propose a network-based bag-of-words model, which collects high-level structural and semantic meaning of the words. Because the structural and semantic information of a network reflects the relationship between nodes, the proposed model can distinguish the relation of words. We apply the proposed model to text classification and compare the performance of the proposed model with different text representation methods on four document datasets. The results show that the proposed method achieves the best performance with high efficiency. Using the Eccentricity property of the network as features can get the highest accuracy. We also investigate the influence of different network structures in the proposed method. Experimental results reveal that, for text classification, the dynamic network is more suitable than the static network and the hybrid network.
| Year | Citations | |
|---|---|---|
Page 1
Page 1