Classical and Bayesian interpretation of the Birge test of consistency and its generalized version for correlated results from interlaboratory evaluations

Abstract

A well-known test of consistency in the results from an interlaboratory evaluation is the Birge test, named after its developer Raymond T Birge, a physicist. We show that the Birge test of consistency may be interpreted as a classical test of the null hypothesis that the variances of the results are less than or equal to their stated values against the alternative hypothesis that the variances of the results are greater than their stated values. A modern protocol for hypothesis testing is to calculate the classical p-value of the test statistic. The p-value is the maximum probability under the null hypothesis of realizing in conceptual replications a value of the test statistic equal to or larger than the realized (observed) value of the test statistic. The null hypothesis is rejected when the p-value is too small. We show that, interestingly, the classical p-value of the Birge test statistic is equal to the Bayesian posterior probability of the null hypothesis based on suitably chosen non-informative improper prior distributions for the unknown statistical parameters. Thus the Birge test may be interpreted also as a Bayesian test of the null hypothesis. The Birge test of consistency was developed for those interlaboratory evaluations where the results are uncorrelated. We present a general test of consistency for both correlated and uncorrelated results. Then we show that the classical p-value of the general test statistic is equal to the Bayesian posterior probability of the null hypothesis based on non-informative prior distributions. The general test makes it possible to check the consistency of correlated results from interlaboratory evaluations. The Birge test is a special case of the general test.

References

Page 1

	Year	Citations

Page 1