Federated learning and differential privacy for medical image analysis

TLDR

Large‑scale datasets have driven the AI revolution, but medical imaging lacks such data due to privacy concerns and the scarcity of publicly available multi‑centric datasets. The study aims to demonstrate a feasible path forward by applying a differentially private federated learning framework to histopathology image analysis. Using the TCGA dataset, the authors simulated a distributed environment varying IIDness, provider count, and dataset size, and evaluated domain effects through external validation. They found that private, distributed training achieves performance comparable to conventional training while providing strong privacy guarantees, confirming the viability and reliability of differentially private federated learning for medical image analysis.

Abstract

Abstract The artificial intelligence revolution has been spurred forward by the availability of large-scale datasets. In contrast, the paucity of large-scale medical datasets hinders the application of machine learning in healthcare. The lack of publicly available multi-centric and diverse datasets mainly stems from confidentiality and privacy concerns around sharing medical data. To demonstrate a feasible path forward in medical image imaging, we conduct a case study of applying a differentially private federated learning framework for analysis of histopathology images, the largest and perhaps most complex medical images. We study the effects of IID and non-IID distributions along with the number of healthcare providers, i.e., hospitals and clinics, and the individual dataset sizes, using The Cancer Genome Atlas (TCGA) dataset, a public repository, to simulate a distributed environment. We empirically compare the performance of private, distributed training to conventional training and demonstrate that distributed training can achieve similar performance with strong privacy guarantees. We also study the effect of different source domains for histopathology images by evaluating the performance using external validation. Our work indicates that differentially private federated learning is a viable and reliable framework for the collaborative development of machine learning models in medical image analysis.

References

Page 1

	Year	Citations

Page 1