A modified Hosmer–Lemeshow test for large data sets

Abstract

The Hosmer–Lemeshow test is a widely used method for evaluating the goodness of fit of logistic regression models. But its power is much influenced by the sample size, like other chi-square tests. Paul, Pennell, and Lemeshow (2013 Paul, P., M. L. Pennell, and S. Lemeshow. 2013. Standardizing the power of the Hosmer–Lemeshow goodness of fit test in large data sets. Statistics in Medicine 32:67–80.[Crossref], [PubMed], [Web of Science ®] , [Google Scholar]) considered using a large number of groups for large data sets to standardize the power. But simulations show that their method performs poorly for some models. In addition, it does not work when the sample size is larger than 25,000. In the present paper, we propose a modified Hosmer–Lemeshow test that is based on estimation and standardization of the distribution parameter of the Hosmer–Lemeshow statistic. We provide a mathematical derivation for obtaining the critical value and power of our test. Through simulations, we can see that our method satisfactorily standardizes the power of the Hosmer–Lemeshow test. It is especially recommendable for enough large data sets, as the power is rather stable. A bank marketing data set is also analyzed for comparison with existing methods.

References

Page 1

	Year	Citations

Page 1