Segment Anything Model (SAM) for Digital Pathology: Assess Zero-shot Segmentation on Whole Slide Imaging

TLDR

SAM is a foundation model for image segmentation that is attractive for medical image analysis, especially digital pathology where training data are scarce. The study evaluates SAM’s zero‑shot segmentation on whole‑slide imaging tasks—tumor, non‑tumor tissue, and cell nuclei segmentation—and suggests that few‑shot fine‑tuning may improve dense‑object performance. SAM was trained on over 1 billion masks from 11 million licensed images and supports zero‑shot segmentation with prompts such as points, boxes, and masks; the authors assessed its performance on representative WSI tasks. Zero‑shot SAM achieves remarkable accuracy for large connected objects but fails to consistently segment dense instance objects even with 20 prompts per image, and the authors identify limitations in image resolution, multi‑scale handling, prompt selection, and the need for fine‑tuning.

Abstract

The segment anything model (SAM) was released as a foundation model for image segmentation. The promptable segmentation model was trained by over 1 billion masks on 11M licensed and privacy-respecting images. The model supports zero-shot image segmentation with various segmentation prompts (e.g., points, boxes, masks). It makes the SAM attractive for medical image analysis, especially for digital pathology where the training data are rare. In this study, we evaluate the zero-shot segmentation performance of SAM model on representative segmentation tasks on whole slide imaging (WSI), including (1) tumor segmentation, (2) non-tumor tissue segmentation, (3) cell nuclei segmentation. Core Results: The results suggest that the zero-shot SAM model achieves remarkable segmentation performance for large connected objects. However, it does not consistently achieve satisfying performance for dense instance object segmentation, even with 20 prompts (clicks/boxes) on each image. We also summarized the identified limitations for digital pathology: (1) image resolution, (2) multiple scales, (3) prompt selection, and (4) model fine-tuning. In the future, the few-shot fine-tuning with images from downstream pathological segmentation tasks might help the model to achieve better performance in dense object segmentation.