Borderline over-sampling for imbalanced data classification

TLDR

Imbalanced data sets lead standard classifiers to perform poorly on minority‑class predictions. The study introduces a borderline over‑sampling technique to improve minority‑class accuracy. The method trains an SVM on synthetic minority points generated only near the decision boundary, using extrapolation to expand sparse minority regions and interpolation to consolidate the boundary. Experiments demonstrate that the borderline over‑sampling outperforms existing over‑sampling techniques.

Abstract

Traditional classification algorithms usually provide poor accuracy on the prediction of the minority class of imbalanced data sets. This paper proposes a new method for dealing with imbalanced data sets by over-sampling the borderline minority class instances. A Support Vector Machine (SVM) classifier is then trained to predict future instances. Compared with other over-sampling methods, the proposed method focuses only on the minority class instances residing along the decision boundary, due to the fact that this region is the most crucial for establishing the decision boundary. Furthermore, the artificial minority instances are generated in such a way that the regions of the minority class with fewer majority class instances would be expanded by extrapolation, otherwise the current boundary of the minority class would be consolidated by interpolation. Experimental results show that the proposed method achieves a better performance than other over-sampling methods.

References

Page 1

	Year	Citations

Page 1