Publication | Closed Access
Design and Evaluation of a Real-Time URL Spam Filtering Service
469
Citations
39
References
2011
Year
Unknown Venue
Spam FilteringAbuse DetectionComputational Social ScienceReal-time SystemSocial MediaInformation RetrievalData ScienceWeb Service SpamInformation SecurityEngineeringPhishingData PrivacyComputer ScienceArtsWeb AnalyticsReal-time ProtectionText MiningInformation Filtering System
Scams, phishing, and malware are common threats to web services such as social networks and URL shorteners, yet email‑based spam filters are inadequate for these platforms. The study introduces Monarch, a real‑time URL spam filter, and evaluates its viability and challenges across diverse web services. Monarch operates by crawling submitted URLs in real time and classifying them as spam, while the study examines differences between email and Twitter spam, such as the use of public hosting and redirectors. Monarch delivers accurate, real‑time protection, but spam traits vary across services; email spam differs qualitatively from Twitter spam, and the system scales to handle 15 million URLs/day on Twitter for under $800/day.
On the heels of the widespread adoption of web services such as social networks and URL shorteners, scams, phishing, and malware have become regular threats. Despite extensive research, email-based spam filtering techniques generally fall short for protecting other web services. To better address this need, we present Monarch, a real-time system that crawls URLs as they are submitted to web services and determines whether the URLs direct to spam. We evaluate the viability of Monarch and the fundamental challenges that arise due to the diversity of web service spam. We show that Monarch can provide accurate, real-time protection, but that the underlying characteristics of spam do not generalize across web services. In particular, we find that spam targeting email qualitatively differs in significant ways from spam campaigns targeting Twitter. We explore the distinctions between email and Twitter spam, including the abuse of public web hosting and redirector services. Finally, we demonstrate Monarch's scalability, showing our system could protect a service such as Twitter -- which needs to process 15 million URLs/day -- for a bit under $800/day.
| Year | Citations | |
|---|---|---|
Page 1
Page 1