Comparing Arabic NLP tools for Hadith Classification

Abstract

Text classification is the process of classifying documents into a predefined set of categories based on their content. As Arabic words may have more complicated forms than many other languages, it is challenging to choose the indexing unit and to get rid of affixes. In this paper we compare the performance of different techniques for classifying Al-Hadith Al-Shareef which was analyzed with six Arabic tools (Al-Stem Darwish, Al-Stem Alex, Khoja’s stemmer, Quadrigrams, Trigrams and a disambiguation tool based on AraMorph). We also compare three classification techniques implemented on WEKA toolkit; namely decision trees (DT), Naive Bayes algorithm (NB) and SVM algorithm (Support Vector Machines). We used the TF-IDF to compute the relative frequency of each word in a particular document and the cross validation to evaluate the result of the classifiers. Experimental results show that Khoja’s stemmer outperformed the other tools and that the SVM classifier achieves the highest accuracy followed by the Naive Bayes classifier, and decisions trees classifier respectively.

References

Page 1

	Year	Citations

Page 1