Concepedia

Publication | Closed Access

A study of cross-validation and bootstrap for accuracy estimation and model selection

10.7K

Citations

11

References

1995

Year

Ron Kohavi

Unknown Venue

Abstract

We review accuracy estimation methods and compare the two most common methods: crossvalidation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), ten-fold cross-validation may be better than the more expensiveleaveone -out cross-validation. We report on a largescale experiment---over half a million runs of C4.5 and a Naive-Bayes algorithm---to estimate the effects of different parameters on these algorithms on real-world datasets. For crossvalidation, wevary the number of folds and whether the folds are stratified or not# for bootstrap, wevary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours, the best method to use for model selection is ten-fold stratified cross validation, even if computation power allows using more folds.

References

YearCitations

Page 1