Accurate recognition of colorectal cancer with semi-supervised deep learning on pathological images

Abstract

Abstract Background Machine-assisted recognition of colorectal cancer (CRC) has been mainly focused on supervised deep learning that suffers from a significant bottleneck of requiring massive labeled data. We hypothesize that semi-supervised deep learning leveraging a small amount of labeled data with abundant available unlabeled data can provide a powerful alternative strategy. Method We proposed a semi-supervised model based on the mean teacher architecture that provides pathological predictions at both patch- and patient-levels. We demonstrated the general utility of the model utilizing 13,111 CRC whole slide images from 8,803 subjects gathered from 13 independent centers. We compared our proposed method with the prevailing supervised learning and six pathologists. Two extended evaluations on 15,000 lung and 294,912 lymph node images were also performed to confirm the generality of utility of semi-supervised learning for different cancers. Results With a small amount of labeled training patches (∼3,150 labeled, ∼40,950 unlabeled or ∼6,300 labeled, ∼37,800 unlabeled), the semi-supervised learning (SSL) performed significantly better than the supervised learning (SL, which only used the labeled data) (area under the curve, AUC: 0.90 ± 0.06 vs. 0.84 ± 0.07, P value = 0.02 or AUC: 0.98 ± 0.01 vs. 0.92 ± 0.04, P value = 0.0004). Moreover, we found no significant difference between SL using massive ∼44,100 labeled patches and SSL (∼6,300 labeled, ∼37,800 unlabeled) at patch-level diagnoses (AUC:0.98 ± 0.01 vs. 0.987 ± 0.01, P value = 0.134) and patient-level diagnoses (average AUC: 0.974 vs. 0.980, P value = 0.117). SSL was close to human pathologists in diagnosis performance (average AUC: 0.972 vs. 0.969). This extended evaluation on lung and lymph node also confirmed when a small amount of labeled data were used, SSL was better than SL, and achieved similar performance as that of SL with massive labeling. Conclusions We reported that SSL can achieve excellent performance through a multi-center study. Because SSL dramatically reduces the need and cost of pathological image annotation, it has great potential to effectively build pathological artificial intelligence (AI) platforms in practice.

References

Page 1

	Year	Citations

Page 1