Improving Calibration and Out-of-Distribution Detection in Deep Models for Medical Image Segmentation

TLDR

Medical image segmentation models based on convolutional neural networks are powerful yet struggle with small datasets, overconfidence, and silent failures on out‑of‑distribution data. The study aims to resolve these key issues in medical image segmentation models, including overconfidence and OOD failures. The authors employ multitask learning across diverse datasets to improve calibration and introduce a spectral‑analysis method on CNN feature maps for OOD detection. Multitask learning markedly improves calibration, and the spectral‑analysis approach accurately distinguishes OOD images, outperforming uncertainty‑based and classification‑based competitors.

Abstract

Convolutional neural networks (CNNs) have proved to be powerful medical image segmentation models. In this study, we address some of the main unresolved issues regarding these models. Specifically, training of these models on small medical image datasets is still challenging, with many studies promoting techniques such as transfer learning. Moreover, these models are infamous for producing overconfident predictions and for failing silently when presented with out-of-distribution (OOD) test data. In this article, for improving prediction calibration we advocate for multitask learning, i.e., training a single model on several different datasets, spanning different organs of interest and different imaging modalities. We show that multitask learning can significantly improve model confidence calibration. For OOD detection, we propose a novel method based on spectral analysis of CNN feature maps. We show that different datasets, representing different imaging modalities and/or different organs of interest, have distinct spectral signatures, which can be used to identify whether or not a test image is similar to the images used for training. We show that our proposed method is more accurate than several competing methods, including methods based on prediction uncertainty and image classification.

References

Page 1

	Year	Citations

Page 1