Publication | Closed Access
Support vector machines for spam categorization
1.5K
Citations
14
References
1999
Year
EngineeringMachine LearningText MiningNatural Language ProcessingSpam FilteringSupport Vector MachineClassification MethodInformation RetrievalData ScienceData MiningPattern RecognitionManagementSupport Vector MachinesAutomatic ClassificationPredictive AnalyticsKnowledge DiscoveryIntelligent ClassificationComputer ScienceBinary FeaturesClassificationDecision TreesSpam Categorization
The study compares support vector machines to Ripper, Rocchio, and boosting decision trees for classifying e‑mail as spam or nonspam. The algorithms were evaluated on two datasets, one limited to the 1000 best features and another with over 7000 features. SVMs achieved the best performance with binary features, matched boosting trees in accuracy and speed, and required significantly less training time.
We study the use of support vector machines (SVM's) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number of features were constrained to the 1000 best features and another data set where the dimensionality was over 7000. SVM's performed best when using binary features. For both data sets, boosting trees and SVM's had acceptable test performance in terms of accuracy and speed. However, SVM's had significantly less training time.
| Year | Citations | |
|---|---|---|
Page 1
Page 1