Concepedia

Abstract

We present a new spam filter which acts as an additional layer in the spam filtering process. This filter is based on what we call a representative vocabulary. Spam e-mails are divided into categories in which each category is represented by a set of tokens which form a representative text (RT). Tokens are strings of characters (words, sentences, or sometimes meaningless strings of characters). This RT is used to compute a resemblance ratio with incoming e-mails. With this ratio, we decide whether the incoming e-mail is a spam. This filter was implemented and integrated to Spamihilator software. Some experimental and interesting results are presented.

References

YearCitations

Page 1