Concepedia

Publication | Closed Access

An Improved Oversampling Method for imbalanced Data–SMOTE Based on Canopy and K-means

17

Citations

7

References

2019

Year

Abstract

Synthetic Minority Oversampling Technique (SMOTE) is a preferable method used to solve the imbalanced data classification issues. However, its efficiency for resolving the issues of minority sample classification still need to be improved. In order to balance its value and shortcome, we designed a perfected algorithm called “C-K-SMOTE’', which is a mixture clustering algorithm of the Canopy and K-means. For the final purpose of obtaining an approximately balanced data, first we use Canopy to achieve the approximate clustering, then use the K-means to obtain the accurate clustering, and after that we apply the SMOTE to increase the number of minority samples. The referential imbalanced data sets used in the article are selected from KEEL (Knowledge Extraction on Evolutionary Learning). By adopting random forest disaggregated model to carry experiments, SMOTE's efficiency of balancing the imbalanced databases is verified.

References

YearCitations

Page 1