Publication | Closed Access
Borderline over-sampling for imbalanced data classification
568
Citations
26
References
2011
Year
EngineeringMachine LearningSupport Vector MachineClassification MethodBorderline Over-samplingData ScienceData MiningPattern RecognitionClass ImbalanceManagementStatisticsPredictive AnalyticsKnowledge DiscoveryIntelligent ClassificationMinority Class InstancesData ClassificationArtificial Minority InstancesMinority ClassCost-sensitive Learning
Imbalanced data sets lead standard classifiers to perform poorly on minority‑class predictions. The study introduces a borderline over‑sampling technique to improve minority‑class accuracy. The method trains an SVM on synthetic minority points generated only near the decision boundary, using extrapolation to expand sparse minority regions and interpolation to consolidate the boundary. Experiments demonstrate that the borderline over‑sampling outperforms existing over‑sampling techniques.
Traditional classification algorithms usually provide poor accuracy on the prediction of the minority class of imbalanced data sets. This paper proposes a new method for dealing with imbalanced data sets by over-sampling the borderline minority class instances. A Support Vector Machine (SVM) classifier is then trained to predict future instances. Compared with other over-sampling methods, the proposed method focuses only on the minority class instances residing along the decision boundary, due to the fact that this region is the most crucial for establishing the decision boundary. Furthermore, the artificial minority instances are generated in such a way that the regions of the minority class with fewer majority class instances would be expanded by extrapolation, otherwise the current boundary of the minority class would be consolidated by interpolation. Experimental results show that the proposed method achieves a better performance than other over-sampling methods.
| Year | Citations | |
|---|---|---|
Page 1
Page 1