Distribution-balanced stratified cross-validation for accuracy estimation

Abstract

Abstract Cross-validation has often been applied in machine learning research for estimating the accuracies of classifiers. In this work, we propose an extension to this method, called distribution-balanced stratified cross-validation (DBSCV), which improves the estimation quality by providing balanced intraclass distributions when partitioning a data set into multiple folds. We have tested DBSCV on nine real-world and three artificial domains using the C4.5 decision trees classifier. The results show that DBSCV performs better (has smaller biases) than the regular stratified crossvalidationin most cases, especially when the number of folds is small. The analysis and experiments based on three artificial data sets also reveal that DBSCV is particularly effective when multiple intraclass clusters exist in a data set. Keywords: Cross-VALIDATION Machine Learning Research True Accuracy Classifier

References

Page 1

	Year	Citations

Page 1