Publication | Closed Access
An Investigation of SMOTE based Methods for Imbalanced Datasets with Data Complexity Analysis
91
Citations
94
References
2022
Year
EngineeringMachine LearningClass Imbalance ProblemMining MethodsOptimization-based Data MiningClassification MethodData ScienceData MiningPattern RecognitionData ComplexitiesClass ImbalanceIntelligent Data AnalysisStatisticsImbalanced DatasetsData Complexity AnalysisPredictive AnalyticsKnowledge DiscoveryComputer ScienceBinary ClassData ClassificationData TreatmentClassifier SystemBig Data
Many binary class datasets in real-life applications are affected by class imbalance problem. Data complexities like noise examples, class overlap and small disjuncts problems are observed to play a key role in producing poor classification performance. These complexities tend to exist in tandem with class imbalance problem. Synthetic Minority Oversampling Technique (SMOTE) is a well-known method to re-balance the number of examples in imbalanced datasets. However, this technique cannot effectively tackle data complexities and it also has the capability of magnifying the degree of complexities. Also, the performance of the SMOTE is still not satisfactory. Therefore, various SMOTE variants have been proposed to overcome the downsides of SMOTE either by combining SMOTE with other algorithms or modifying the existing SMOTE algorithm. This paper aims to comparatively review the algorithms applied in SMOTE variants and investigate which data complexities are being addressed in what variants. Series of experiments are conducted on 24 binary class imbalanced datasets to observe the changes in the data complexity measures after SMOTE variants were applied in these datasets. The evaluation metrics like G-Mean and F1-Score are also analyzed to investigate the difference in classification performance between SMOTE variants.
| Year | Citations | |
|---|---|---|
Page 1
Page 1