Confidence intervals for population reliability coefficients: Evaluation of methods, recommendations, and software for composite measures.

TLDR

Composite scores, defined as the sum of component items, are widely used across disciplines, and their reliability is a key psychometric concern that has been extensively studied. The authors aim to address the lack of uncertainty quantification in point estimates of reliability coefficients by advocating for confidence intervals. They conducted three large‑scale Monte Carlo simulations to compare confidence interval methods for four reliability coefficients—alpha, omega, hierarchical omega, and categorical omega—under diverse conditions. The simulations show that bootstrap confidence intervals perform best for hierarchical omega with continuous items and for categorical omega with categorical items, leading to a general recommendation of bootstrap methods in these contexts.

Abstract

A composite score is the sum of a set of components. For example, a total test score can be defined as the sum of the individual items. The reliability of composite scores is of interest in a wide variety of contexts due to their widespread use and applicability to many disciplines. The psychometric literature has devoted considerable time to discussing how to best estimate the population reliability value. However, all point estimates of a reliability coefficient fail to convey the uncertainty associated with the estimate as it estimates the population value. Correspondingly, a confidence interval is recommended to convey the uncertainty with which the population value of the reliability coefficient has been estimated. However, many confidence interval methods for bracketing the population reliability coefficient exist and it is not clear which method is most appropriate in general or in a variety of specific circumstances. We evaluate these confidence interval methods for 4 reliability coefficients (coefficient alpha, coefficient omega, hierarchical omega, and categorical omega) under a variety of conditions with 3 large-scale Monte Carlo simulation studies. Our findings lead us to generally recommend bootstrap confidence intervals for hierarchical omega for continuous items and categorical omega for categorical items. All of the methods we discuss are implemented in the freely available R language and environment via the MBESS package.

References

Page 1

	Year	Citations

Page 1