AbdomenCT-1K: Is Abdominal Organ Segmentation a Solved Problem?

TLDR

Automatic abdominal organ segmentation has become a solved problem in many benchmark datasets, with state‑of‑the‑art methods matching inter‑rater variability, yet most existing datasets are limited to single‑center, single‑phase, single‑vendor, or single‑disease cases, raising questions about generalizability. This study conducts a large‑scale evaluation of liver, kidney, spleen, and pancreas segmentation to expose gaps in current methods. The authors present AbdomenCT‑1K, a 1,000‑scan dataset from 12 medical centers featuring multi‑phase, multi‑vendor, and multi‑disease cases, and establish four benchmarks—fully supervised, semi‑supervised, weakly supervised, and continual learning—along with simple, effective baseline methods for each. The evaluation reveals that state‑of‑the‑art methods fail to generalize across distinct medical centers, imaging phases, and unseen diseases, highlighting unsolved segmentation challenges.

Abstract

With the unprecedented developments in deep learning, automatic segmentation of main abdominal organs seems to be a solved problem as state-of-the-art (SOTA) methods have achieved comparable results with inter-rater variability on many benchmark datasets. However, most of the existing abdominal datasets only contain single-center, single-phase, single-vendor, or single-disease cases, and it is unclear whether the excellent performance can generalize on diverse datasets. This paper presents a large and diverse abdominal CT organ segmentation dataset, termed AbdomenCT-1K, with more than 1000 (1K) CT scans from 12 medical centers, including multi-phase, multi-vendor, and multi-disease cases. Furthermore, we conduct a large-scale study for liver, kidney, spleen, and pancreas segmentation and reveal the unsolved segmentation problems of the SOTA methods, such as the limited generalization ability on distinct medical centers, phases, and unseen diseases. To advance the unsolved problems, we further build four organ segmentation benchmarks for fully supervised, semi-supervised, weakly supervised, and continual learning, which are currently challenging and active research topics. Accordingly, we develop a simple and effective method for each benchmark, which can be used as out-of-the-box methods and strong baselines. We believe the AbdomenCT-1K dataset will promote future in-depth research towards clinical applicable abdominal organ segmentation methods.

References

Page 1

	Year	Citations

Page 1