Publication | Closed Access
SMOTE Implementation on Phishing Data to Enhance Cybersecurity
46
Citations
32
References
2018
Year
Unknown Venue
CybersecurityMachine LearningEngineeringInformation SecurityInformation ForensicsData Mining SecurityText MiningSpam FilteringData ScienceData MiningClass ImbalanceData ManagementThreat (Computer)Threat DetectionPredictive AnalyticsKnowledge DiscoveryData PrivacyComputer SciencePhishing DatasetImbalanced DatasetData SecurityCryptographySmote ImplementationPhishing ScamPhishing
Phishing is a form of cybersecurity threat where the criminal tries to gain access to users personal information by infecting their system using malware and viruses. Appearing to come from legitimate sources, it is very easy to fall in the phishing scam. As each phishing data contains features that are consistently different from another, using a predefined set of rules makes a system useless. Data mining techniques can be applied to collected network traffic and build models to predict future attacks. However, since most of the data packets are legitimate, the model tends to produce a bias towards positive results in this imbalanced dataset. In this study, we investigate how prediction accuracy varies in a balanced dataset against an imbalanced one. SMOTE is applied to balance the dataset. XGBoost, Random Forest and Support Vector Machines have been applied on the phishing dataset. Results show much higher accuracy rates with SMOTE application. The highest jump in accuracy has been recorded in XGBoost from 89.87% to 97.17% showing that SMOTE is an effective tool in phishing data monitoring.
| Year | Citations | |
|---|---|---|
Page 1
Page 1