Publication | Closed Access
Large-Scale Automatic Classification of Phishing Pages
304
Citations
21
References
2010
Year
Unknown Venue
Phishing websites, fraudulent sites that impersonate a trusted third party to gain access to private data, continue to cost Internet users over a billion dollars each year. In this paper, we describe the design and performance characteristics of a scalable machine learning classifier we developed to detect phishing websites. We use this classifier to maintain Google’s phishing blacklist automatically. Our classifier analyzes millions of pages a day, examining the URL and the contents of a page to determine whether or not a page is phishing. Unlike previous work in this field, we train the classifier on a noisy dataset consisting of millionsofsamplesfrompreviously collectedliveclassification data. Despite the noise in the training data, our classifier learns a robust model for identifying phishing pages which correctly classifies more than 90 % of phishing pages several weeks after training concludes.
| Year | Citations | |
|---|---|---|
Page 1
Page 1