Publication | Closed Access
Comparing Arabic NLP tools for Hadith Classification
17
Citations
12
References
2015
Year
Unknown Venue
EngineeringSpeech CorpusArabic Morphological AnalysisArabic OrthographyCorpus LinguisticsText MiningNatural Language ProcessingClassification MethodLanguage DocumentationData ScienceArabicPattern RecognitionData MiningComputational LinguisticsDocument ClassificationLanguage StudiesArabic ReadabilitySvm ClassifierArabic Syntactic AnalysisNaive Bayes ClassifierAutomatic ClassificationKnowledge DiscoveryIntelligent ClassificationComputer ScienceWeka ToolkitArabic Dialect Morphological AnalysisHadith ClassificationData ClassificationLanguage CorpusClassificationLinguistics
Text classification is the process of classifying documents into a predefined set of categories based on their content. As Arabic words may have more complicated forms than many other languages, it is challenging to choose the indexing unit and to get rid of affixes. In this paper we compare the performance of different techniques for classifying Al-Hadith Al-Shareef which was analyzed with six Arabic tools (Al-Stem Darwish, Al-Stem Alex, Khoja’s stemmer, Quadrigrams, Trigrams and a disambiguation tool based on AraMorph). We also compare three classification techniques implemented on WEKA toolkit; namely decision trees (DT), Naive Bayes algorithm (NB) and SVM algorithm (Support Vector Machines). We used the TF-IDF to compute the relative frequency of each word in a particular document and the cross validation to evaluate the result of the classifiers. Experimental results show that Khoja’s stemmer outperformed the other tools and that the SVM classifier achieves the highest accuracy followed by the Naive Bayes classifier, and decisions trees classifier respectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1