Large sample standard errors of kappa and weighted kappa.

TLDR

Kappa and weighted kappa are agreement coefficients for nominal scales, with weighted kappa allowing differential weighting of disagreements, and their standard‑error formulas were originally derived under conflicting assumptions about fixed marginal totals and binomial cell variation, assuming a generalized hypergeometric distribution for the cell counts. Everitt (1968) derived the exact variances of kappa by modeling cell counts with a generalized hypergeometric distribution, but found the resulting expressions too complex for routine use and proposed simpler binomial‑based approximations. Both the original and the simplified standard‑error expressions are incorrect because they rely on incompatible assumptions about fixed marginals and binomial variation, rendering the published formulas unreliable.

Abstract

The statistics kappa (Cohen, 1960) and weighted kappa (Cohen, 1968) were introduced to provide coefficients of agreement between two raters for nominal scales. Kappa is appropriate when all disagreements may be considered equally serious, and weighted kappa is appropriate when the relative seriousness of the different possible disagreements can be specified. The papers describing these two statistics also present expressions for their standard errors. These expressions are incorrect, having been derived from the contradictory assumptions of fixed marginal totals and binomial variation of cell frequencies. Everitt (1968) derived the exact variances of weighted and unweighted kappa when the parameters are zero by assuming a generalized hypergeometric distribution. He found these expressions to be far too complicated for routine use, and offered, as alternatives, expressions derived by assuming binomial distributions. These alternative expressions are incorrect, essentially for the same reason as above. Assume that N subjects are distributed into k* cells by each of them being assigned to one of k categories by one rater and, independently, to one of the same k categories by a second

References

Page 1

	Year	Citations

Page 1