Publication | Closed Access
Spam Filtering
12
Citations
24
References
2014
Year
Unknown Venue
Natural Language ProcessingSpam FilteringDocument ClusteringEngineeringInformation RetrievalMachine LearningData SciencePattern RecognitionData MiningAutomatic ClassificationKnowledge DiscoveryUnwanted Mail MessagesIntelligent ClassificationComputer ScienceIncremental ClusteringUnsupervised Machine LearningText MiningActive Learning
This paper introduces a method that deals with unwanted mail messages by combining active learning with incremental clustering. The proposed approach is motivated by the fact that the user cannot provide the correct category for all received messages. The email messages are divided into chronological batches (e.g. one per day). The user is asked to give the correct categories (labels) for the messages of the first batch and from then on the proposed algorithm decides when to ask for a new label, based on a clustering of the messages that is incrementally updated. We test different variants of the algorithm on a number of different datasets and show that it achieves very good results with only 2% of all email messages labelled by the user.
| Year | Citations | |
|---|---|---|
Page 1
Page 1