Publication | Closed Access
Text categorization using Rocchio algorithm and random forest algorithm
30
Citations
17
References
2017
Year
Unknown Venue
EngineeringRocchio AlgorithmCorpus LinguisticsText MiningNatural Language ProcessingClassification MethodInformation RetrievalData ScienceData MiningPattern RecognitionDocument ClassificationRandom Forest AlgorithmManual Text CategorizationAutomatic ClassificationKnowledge DiscoveryIntelligent ClassificationComputer ScienceData ClassificationBig Data
Millions of file uploads and downloads happen every minute resulting in big data creation and manual text categorization is not possible. Hence, there is a need for automatic categorization of documents that makes storage and retrieval more efficient. This research paper proposes a hybrid text categorization model that combines both Rocchio algorithm and Random Forest algorithm to perform Multi-label text categorization. Stop word remover and word stemmer has been used to overcome the limitations in Rocchio Algorithm. Random Forest model takes minimal categories as input to reduce its error rate. Experiments were done on standard text categorization datasets. Our proposed model is found to be more efficient in categorizing the documents when compared with other text categorization models such as fuzzy relevance clustering, ML-KNN (Multi-label KNN) and Naïve-Bayes Algorithms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1