Concepedia

Publication | Closed Access

Validation of Regression Models: Methods and Examples

190

Citations

0

References

1977

Year

TLDR

Methods to determine the validity of regression models include comparing model predictions and coefficients with theory and collecting new data to test predictions. An expository review of these methods is presented. The review presents these methods, including comparison with theory and data splitting or cross‑validation to estimate coefficients and assess prediction accuracy, and provides several illustrative examples. The authors conclude that data splitting is effective when new data collection is impractical, and recommend the DUPLEX algorithm for dividing data into estimation and prediction sets when no obvious variable exists.

Abstract

Methods to determine the validity of regression models include comparison of model predictions and coefficients with theory, collection of new data to check model predictions. comparison of results with theoretical model calculations, and data splitting or cross-validation in which a portion of the data is used to estimate the model coefficients, and the remainder of the data is used to measure the prediction accuracy of the model. An expository review of these methods is presented. It is concluded that data splitting is an effective method of model validation when it is not practical to collect new data to test the model. The DUPLEX algorithm, developed by R. W. Kennard, is recommended for dividing the data into the estimation set and prediction set when there is no obvious variable such as time to use as a basis to split the data. Several examples are included to illustrate the various methods of model validation.