Improvements on Cross-Validation: The 632+ Bootstrap Method

TLDR

The accuracy of predictive models is commonly assessed by cross‑validation, which is nearly unbiased but highly variable, making it an important question for model comparison and selection. The study seeks to determine the error rate of a predictive rule by exploring bootstrap estimates of prediction error as smoothed alternatives to cross‑validation. The authors employ a nonparametric bootstrap approach, specifically the .632+ rule, to estimate both point and variability of prediction error across a range of classification rules, including smooth and unsmooth methods, using simulation experiments. The .632+ bootstrap method substantially outperforms cross‑validation in 24 simulation experiments.

Abstract

Abstract A training set of data has been used to construct a rule for predicting future responses. What is the error rate of this rule? This is an important question both for comparing models and for assessing a final selected model. The traditional answer to this question is given by cross-validation. The cross-validation estimate of prediction error is nearly unbiased but can be highly variable. Here we discuss bootstrap estimates of prediction error, which can be thought of as smoothed versions of cross-validation. We show that a particular bootstrap method, the .632+ rule, substantially outperforms cross-validation in a catalog of 24 simulation experiments. Besides providing point estimates, we also consider estimating the variability of an error rate estimate. All of the results here are nonparametric and apply to any possible prediction rule; however, we study only classification problems with 0–1 loss in detail. Our simulations include "smooth" prediction rules like Fisher's linear discriminant function and unsmooth ones like nearest neighbors.

References

Page 1

	Year	Citations

Page 1