Publication | Closed Access
Exploratory Undersampling for Class-Imbalance Learning
2.4K
Citations
45
References
2008
Year
EngineeringMachine LearningRoc CurveClassification MethodData ScienceData MiningPattern RecognitionClass ImbalanceClass-imbalance LearningMultiple Classifier SystemStatisticsSupervised LearningPredictive AnalyticsKnowledge DiscoveryComputer ScienceMajority ClassDeep LearningClass-imbalance ProblemsClassifier System
Undersampling mitigates class imbalance by using only a subset of majority class examples, but it often discards many informative majority instances. The authors propose two algorithms to address this loss of majority class information. EasyEnsemble generates several majority subsets, trains a learner on each, and aggregates their outputs, whereas BalanceCascade sequentially removes correctly classified majority instances from subsequent learners. Both methods achieve higher AUC, F‑measure, and G‑mean than many existing techniques and match undersampling’s training time when using the same number of weak classifiers, outperforming other approaches.
Undersampling is a popular method in dealing with class-imbalance problems, which uses only a subset of the majority class and thus is very efficient. The main deficiency is that many majority class examples are ignored. We propose two algorithms to overcome this deficiency. EasyEnsemble samples several subsets from the majority class, trains a learner using each of them, and combines the outputs of those learners. BalanceCascade trains the learners sequentially, where in each step, the majority class examples that are correctly classified by the current trained learners are removed from further consideration. Experimental results show that both methods have higher Area Under the ROC Curve, F-measure, and G-mean values than many existing class-imbalance learning methods. Moreover, they have approximately the same training time as that of undersampling when the same number of weak classifiers is used, which is significantly faster than other methods.
| Year | Citations | |
|---|---|---|
Page 1
Page 1