Measurement of Observer Agreement

TLDR

Statistical measures such as κ and weighted κ are used in diagnostic imaging to express observer agreement, assess reliability of imaging methods, and evaluate reproducibility of disease classifications. The review focuses on chance‑corrected indices κ and weighted κ, and briefly references other less‑frequent agreement measures such as multiple‑rater κ. Illustrative examples show how κ calculations are affected by disease prevalence and the number of rating categories. © RSNA, 2003.

Abstract

Statistical measures are described that are used in diagnostic imaging for expressing observer agreement in regard to categorical data. The measures are used to characterize the reliability of imaging methods and the reproducibility of disease classifications and, occasionally with great care, as the surrogate for accuracy. The review concentrates on the chance-corrected indices, κ and weighted κ. Examples from the imaging literature illustrate the method of calculation and the effects of both disease prevalence and the number of rating categories. Other measures of agreement that are used less frequently, including multiple-rater κ, are referenced and described briefly. © RSNA, 2003

References

Page 1

	Year	Citations

Page 1