Learning Calibrated Medical Image Segmentation via Multi-rater Agreement Modeling

TLDR

Medical image analysis often relies on multiple expert annotations, yet common practices such as majority voting ignore the rich agreement/disagreement information inherent in these multi‑rater labels. The authors introduce MRNet to explicitly model multi‑rater agreement and disagreement for calibrated medical image segmentation. MRNet uses an expertise‑aware inference module to embed rater expertise and reconstructs multi‑rater gradings from coarse predictions, leveraging agreement cues to enhance segmentation. MRNet is the first method to produce calibrated predictions across expertise levels, achieving superior performance over state‑of‑the‑art on five diverse medical segmentation tasks, with code publicly released.

Abstract

In medical image analysis, it is typical to collect multiple annotations, each from a different clinical expert or rater, in the expectation that possible diagnostic errors could be mitigated. Meanwhile, from the computer vision practitioner viewpoint, it has been a common practice to adopt the ground-truth labels obtained via either the majority-vote or simply one annotation from a preferred rater. This process, however, tends to overlook the rich information of agreement or disagreement ingrained in the raw multi-rater annotations. To address this issue, we propose to explicitly model the multi-rater (dis-)agreement, dubbed MRNet, which has two main contributions. First, an expertise-aware inferring module or EIM is devised to embed the expertise level of individual raters as prior knowledge, to form high-level semantic features. Second, our approach is capable of reconstructing multi-rater gradings from coarse predictions, with the multi-rater (dis-)agreement cues being further exploited to improve the segmentation performance. To our knowledge, our work is the first in producing calibrated predictions under different expertise levels for medical image segmentation. Extensive empirical experiments are conducted across five medical segmentation tasks of diverse imaging modalities. In these experiments, superior performance of our MRNet is observed comparing to the state-of-the-arts, indicating the effectiveness and applicability of our MRNet toward a wide range of medical segmentation tasks. Source code is publicly available.

References

Page 1

	Year	Citations

Page 1