Publication | Closed Access
Index-based Online Text Classification for SMS Spam Filtering
37
Citations
9
References
2010
Year
EngineeringCorpus LinguisticsText MiningNatural Language ProcessingSpam FilteringInformation RetrievalData ScienceData MiningComputational LinguisticsDocument ClassificationEnsemble ClassifierSms Spam FilteringLanguage StudiesAutomatic ClassificationKnowledge DiscoveryIntelligent ClassificationText IndexingChinese Sms MessageSearch Engine IndexingIndex ModelsLinguistics
We proposed a novel index-based online text classification method, investigated two index models, and compared the performances of various index granularities for English and Chinese SMS message. Based on the proposed method, six individual classifiers were implemented according to various text features of Chinese message, which were further combined to form an ensemble classifier. The experimental results from English corpus show that the relevant feature among words can increase the classification confidence and the trigram co-occurrence feature of words is an appropriate relevant feature. The experimental results from real Chinese corpus show that the performance of classifier applying word-level index model is better than the one applying document-level index model. The trigram segment outperforms the exact segment in indexing, so it is not necessary to segment Chinese text exactly when indexing by our proposed method. Applying parallel multi-thread ensemble learning, our proposed method has constant time complexity, which is critical to large scale data and online filtering.
| Year | Citations | |
|---|---|---|
Page 1
Page 1