A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM

TLDR

Class imbalance is pervasive across domains, leading to suboptimal models when trained directly, and has spurred diverse approaches such as sampling, cost‑sensitive, and hybrid methods, with ensembles often yielding more robust decision boundaries. The authors propose BEBS, an ensemble of SVMs that incorporates borderline information via Extrapolation Borderline‑SMOTE, to improve learning on imbalanced data. BEBS generates synthetic minority samples near the decision boundary using Extrapolation Borderline‑SMOTE and trains an ensemble of SVMs through bagging to correct boundary skew. Experiments on open‑access datasets demonstrate that BEBS outperforms existing methods, marking the first ensemble SVM approach that leverages borderline information for imbalanced learning.

Abstract

Class imbalance ubiquitously exists in real life, which has attracted much interest from various domains. Direct learning from imbalanced dataset may pose unsatisfying results overfocusing on the accuracy of identification and deriving a suboptimal model. Various methodologies have been developed in tackling this problem including sampling, cost-sensitive, and other hybrid ones. However, the samples near the decision boundary which contain more discriminative information should be valued and the skew of the boundary would be corrected by constructing synthetic samples. Inspired by the truth and sense of geometry, we designed a new synthetic minority oversampling technique to incorporate the borderline information. What is more, ensemble model always tends to capture more complicated and robust decision boundary in practice. Taking these factors into considerations, a novel ensemble method, called Bagging of Extrapolation Borderline-SMOTE SVM (BEBS), has been proposed in dealing with imbalanced data learning (IDL) problems. Experiments on open access datasets showed significant superior performance using our model and a persuasive and intuitive explanation behind the method was illustrated. As far as we know, this is the first model combining ensemble of SVMs with borderline information for solving such condition.

References

Page 1

	Year	Citations

Page 1