A Graph-Transformer for Whole Slide Image Classification

TLDR

Deep learning is powerful for whole slide image analysis, yet patch‑based methods introduce label noise and ignore important WSI‑level information. The authors propose a Graph‑Transformer (GTP) that fuses a graph representation of a WSI with a vision transformer to predict disease grade. Using 4,818 WSIs from CPTAC, NLST, and TCGA, the authors built GTP by first training a contrastive feature extractor on NLST, then representing patch features as graph nodes, and finally applying the transformer framework, also adding GraphCAM for saliency mapping. GTP achieved 91.2 % mean accuracy on three‑label classification in cross‑validation and 82.3 % on external TCGA data, demonstrating interpretable and effective WSI‑level classification.

Abstract

Deep learning is a powerful tool for whole slide image (WSI) analysis. Typically, when performing supervised deep learning, a WSI is divided into small patches, trained and the outcomes are aggregated to estimate disease grade. However, patch-based methods introduce label noise during training by assuming that each patch is independent with the same label as the WSI and neglect overall WSI-level information that is significant in disease grading. Here we present a Graph-Transformer (GT) that fuses a graph-based representation of an WSI and a vision transformer for processing pathology images, called GTP, to predict disease grade. We selected 4,818 WSIs from the Clinical Proteomic Tumor Analysis Consortium (CPTAC), the National Lung Screening Trial (NLST), and The Cancer Genome Atlas (TCGA), and used GTP to distinguish adenocarcinoma (LUAD) and squamous cell carcinoma (LSCC) from adjacent non-cancerous tissue (normal). First, using NLST data, we developed a contrastive learning framework to generate a feature extractor. This allowed us to compute feature vectors of individual WSI patches, which were used to represent the nodes of the graph followed by construction of the GTP framework. Our model trained on the CPTAC data achieved consistently high performance on three-label classification (normal versus LUAD versus LSCC: mean accuracy = 91.2 ± 2.5%) based on five-fold cross-validation, and mean accuracy = 82.3 ± 1.0% on external test data (TCGA). We also introduced a graph-based saliency mapping technique, called GraphCAM, that can identify regions that are highly associated with the class label. Our findings demonstrate GTP as an interpretable and effective deep learning framework for WSI-level classification.

References

Page 1

	Year	Citations

Page 1