An experimental comparison of classification algorithms for imbalanced credit scoring data sets

TLDR

Credit scoring data sets are often imbalanced, with far fewer defaulting loans than non‑defaulting ones. This study compares traditional and advanced classification techniques—including logistic regression, neural networks, decision trees, gradient boosting, least‑squares SVMs, and random forests—for predicting loan default in imbalanced credit scoring data sets. Using five real‑world credit scoring data sets, the authors progressively under‑sample the minority class to increase imbalance, then evaluate each classifier’s performance with the area under the ROC curve and assess statistical significance with Friedman's and Nemenyi tests. Random forest and gradient boosting classifiers outperform other methods, while C4.5 decision trees, quadratic discriminant analysis, and k‑nearest neighbours perform significantly worse under high class imbalance.

Abstract

In this paper, we set out to compare several techniques that can be used in the analysis of imbalanced credit scoring data sets. In a credit scoring context, imbalanced data sets frequently occur as the number of defaulting loans in a portfolio is usually much lower than the number of observations that do not default. As well as using traditional classification techniques such as logistic regression, neural networks and decision trees, this paper will also explore the suitability of gradient boosting, least square support vector machines and random forests for loan default prediction. Five real-world credit scoring data sets are used to build classifiers and test their performance. In our experiments, we progressively increase class imbalance in each of these data sets by randomly under-sampling the minority class of defaulters, so as to identify to what extent the predictive power of the respective techniques is adversely affected. The performance criterion chosen to measure this effect is the area under the receiver operating characteristic curve (AUC); Friedman's statistic and Nemenyi post hoc tests are used to test for significance of AUC differences between techniques. The results from this empirical study indicate that the random forest and gradient boosting classifiers perform very well in a credit scoring context and are able to cope comparatively well with pronounced class imbalances in these data sets. We also found that, when faced with a large class imbalance, the C4.5 decision tree algorithm, quadratic discriminant analysis and k-nearest neighbours perform significantly worse than the best performing classifiers.

References

Page 1

	Year	Citations

Page 1