Publication | Closed Access
A comparison of text‐classification techniques applied to Arabic text
46
Citations
15
References
2009
Year
EngineeringArabic Morphological AnalysisArabic OrthographyMedia ArabicNaïve Bayes AlgorithmsText MiningNatural Language ProcessingClassification MethodArabic Text SimplificationArabicData MiningEnglish TextComputational LinguisticsArabic TextDocument ClassificationLanguage StudiesArabic ReadabilityAutomatic ClassificationNaïve BayesText ProcessingLinguistics
Text‑classification research has largely focused on English, with few studies on Arabic, whose linguistic characteristics and preprocessing challenges differ markedly from English. This study implements and evaluates three automatic text‑classification methods—kNN, Rocchio, and Naïve Bayes—on Arabic documents. The authors classified a corpus of 1,445 Arabic documents across nine categories using kNN, Rocchio, and Naïve Bayes. Results show Naïve Bayes outperformed kNN and Rocchio in Arabic text classification.
Abstract Many algorithms have been implemented for the problem of text classification. Most of the work in this area was carried out for English text. Very little research has been carried out on Arabic text. The nature of Arabic text is different than that of English text, and preprocessing of Arabic text is more challenging. This paper presents an implementation of three automatic text‐classification techniques for Arabic text. A corpus of 1445 Arabic text documents belonging to nine categories has been automatically classified using the kNN, Rocchio, and naïve Bayes algorithms. The research results reveal that Naïve Bayes was the best performer, followed by kNN and Rocchio.
| Year | Citations | |
|---|---|---|
Page 1
Page 1