Comparative Studies on Some Metrics for External Validation of QSPR Models

TLDR

QSPR models predict properties of untested chemicals and are used to prioritize synthesis, but their reliability depends on external validation, which is typically assessed using metrics derived from a single test set. The study questions the use of a single test set for external validation and concludes that predictive quality should be assessed using a specific validation metric. The study found that prediction quality varies across external validation metrics, with r²(m) consistently yielding significantly different and more stringent values than others, especially CCC, making r²(m) the preferred metric for regulatory decision support.

Abstract

Quantitative structure-property relationship (QSPR) models used for prediction of property of untested chemicals can be utilized for prioritization plan of synthesis and experimental testing of new compounds. Validation of QSPR models plays a crucial role for judgment of the reliability of predictions of such models. In the QSPR literature, serious attention is now given to external validation for checking reliability of QSPR models, and predictive quality is in the most cases judged based on the quality of predictions of property of a single test set as reflected in one or more external validation metrics. Here, we have shown that a single QSPR model may show a variable degree of prediction quality as reflected in some variants of external validation metrics like Q²(F1), Q²(F2), Q²(F3), CCC, and r²(m) (all of which are differently modified forms of predicted variance, which theoretically may attain a maximum value of 1), depending on the test set composition and test set size. Thus, this report questions the appropriateness of the common practice of the "classic" approach of external validation based on a single test set and thereby derives a conclusion about predictive quality of a model on the basis of a particular validation metric. The present work further demonstrates that among the considered external validation metrics, r²(m) shows statistically significantly different numerical values from others among which CCC is the most optimistic or less stringent. Furthermore, at a given level of threshold value of acceptance for external validation metrics, r²(m) provides the most stringent criterion (especially with Δr²(m) at highest tolerated value of 0.2) of external validation, which may be adopted in the case of regulatory decision support processes.

References

Page 1

	Year	Citations

Page 1