Investigating the Performance of Smote for Class Imbalanced Learning: A Case Study of Credit Scoring Datasets

Abstract

Classification of datasets is one of the major issues encountered by the data mining community. This problem heightens when the real world datasets is also imbalanced in nature. A dataset happens to be imbalanced when the numbers of observations belonging to rare class are greatly outnumbered by the observations of another class. Class with greater number of observation is called the majority or the negative class, while the other with rare observations is referred to as the minority or the positive class. Literature represents number of resampling techniques that address the problem of class imbalance. One of the most important strategies is to resample the datasets that aim to balance the number of minority or majority observations by over-sampling or under-sampling respectively. This paper aims to investigates and analyze the performance of most widely used oversampling procedure Synthetic Minority Oversampling Technique (SMOTE) for different thresholds of oversampling using four classifiers for three credit scoring datasets.

References

Page 1

	Year	Citations
Random Forests Leo Breiman Machine Learning	2001	119.3K
SMOTE: Synthetic Minority Over-sampling Technique Nitesh V. Chawla, Kevin W. Bowyer, Lawrence Hall, Journal of Artificial Intelligence Research	2002	29.6K
Data mining: concepts and techniques Jiawei Han, Micheline Kamber Choice Reviews Online	2012	28.8K
UCI Machine Learning Repository Arthur Asuncion Medical Entomology and Zoology EngineeringMachine LearningData ScienceData MiningPattern Recognition	2007	24.3K
Induction of Decision Trees J. R. Quinlan Machine Learning EngineeringMachine LearningData ScienceData MiningRule Induction	1986	14.5K
Induction of decision trees J. R. Quinlan Machine Learning EngineeringMachine LearningData ScienceData MiningRule Induction	1986	12.3K
ADASYN: Adaptive synthetic sampling approach for imbalanced learning Haibo He, Yang Bai, Edwardo A. Garcia, Artificial IntelligenceData ClassificationClassification MethodEngineeringMachine Learning	2008	4.3K
Instance-Based Learning Algorithms David W. Aha, Dennis Kibler, Marc K. Albert Machine Learning Artificial IntelligenceInstance-based LearningMultiple Instance LearningEngineeringMachine Learning	1991	4.1K
Instance-based learning algorithms David W. Aha, Dennis Kibler, Marc K. Albert Machine Learning Artificial IntelligenceInstance-based LearningMultiple Instance LearningEngineeringMachine Learning	1991	2.9K
The Nature of Statistical Learning Theory Stephan R. Sain Technometrics Data ClassificationStatistical LearningEngineeringData MiningPattern Recognition	1996	1.4K

Page 1