Publication | Closed Access
Distribution-balanced stratified cross-validation for accuracy estimation
233
Citations
8
References
2000
Year
EngineeringMachine LearningAccuracy And PrecisionBalanced Intraclass DistributionsVerification And ValidationClassification MethodData ScienceData MiningUncertainty QuantificationPattern RecognitionClass ImbalanceManagementStatisticsMultiple Classifier SystemPredictive AnalyticsKnowledge DiscoveryComputer ScienceData ClassificationStatistical InferenceClassificationAccuracy EstimationClassifier SystemAbstract Cross-validationBig Data
Abstract Cross-validation has often been applied in machine learning research for estimating the accuracies of classifiers. In this work, we propose an extension to this method, called distribution-balanced stratified cross-validation (DBSCV), which improves the estimation quality by providing balanced intraclass distributions when partitioning a data set into multiple folds. We have tested DBSCV on nine real-world and three artificial domains using the C4.5 decision trees classifier. The results show that DBSCV performs better (has smaller biases) than the regular stratified crossvalidationin most cases, especially when the number of folds is small. The analysis and experiments based on three artificial data sets also reveal that DBSCV is particularly effective when multiple intraclass clusters exist in a data set. Keywords: Cross-VALIDATION Machine Learning Research True Accuracy Classifier
| Year | Citations | |
|---|---|---|
Page 1
Page 1