RadImageNet: An Open Radiologic Deep Learning Research Dataset for Effective Transfer Learning

TLDR

The study demonstrates that pretraining on millions of radiologic images outperforms ImageNet pretraining for downstream medical tasks via transfer learning. The authors retrospectively extracted labeled radiologic images from 2005–2020 outpatient studies, trained RadImageNet models from random initialization, and compared them to ImageNet models using AUC and Dice metrics across eight classification and two segmentation tasks. RadImageNet contains 1.35 million annotated images from 131 872 patients across diverse modalities, and its pretrained models achieved significant AUC gains over ImageNet on both small (e.g., 9.4% for thyroid nodules) and larger datasets (e.g., 6.1% for COVID‑19), with improved lesion localization and interpretability. The dataset is released under a CC BY 4.0 license.

Abstract

To demonstrate the value of pretraining with millions of radiologic images compared with ImageNet photographic images on downstream medical applications when using transfer learning. This retrospective study included patients who underwent a radiologic study between 2005 and 2020 at an outpatient imaging facility. Key images and associated labels from the studies were retrospectively extracted from the original study interpretation. These images were used for RadImageNet model training with random weight initiation. The RadImageNet models were compared with ImageNet models using the area under the receiver operating characteristic curve (AUC) for eight classification tasks and using Dice scores for two segmentation problems. The RadImageNet database consists of 1.35 million annotated medical images in 131 872 patients who underwent CT, MRI, and US for musculoskeletal, neurologic, oncologic, gastrointestinal, endocrine, abdominal, and pulmonary pathologic conditions. For transfer learning tasks on small datasets-thyroid nodules (US), breast masses (US), anterior cruciate ligament injuries (MRI), and meniscal tears (MRI)-the RadImageNet models demonstrated a significant advantage (P < .001) to ImageNet models (9.4%, 4.0%, 4.8%, and 4.5% AUC improvements, respectively). For larger datasets-pneumonia (chest radiography), COVID-19 (CT), SARS-CoV-2 (CT), and intracranial hemorrhage (CT)-the RadImageNet models also illustrated improved AUC (P < .001) by 1.9%, 6.1%, 1.7%, and 0.9%, respectively. Additionally, lesion localizations of the RadImageNet models were improved by 64.6% and 16.4% on thyroid and breast US datasets, respectively. RadImageNet pretrained models demonstrated better interpretability compared with ImageNet models, especially for smaller radiologic datasets.Keywords: CT, MR Imaging, US, Head/Neck, Thorax, Brain/Brain Stem, Evidence-based Medicine, Computer Applications-General (Informatics) Supplemental material is available for this article. Published under a CC BY 4.0 license.See also the commentary by Cadrin-Chênevert in this issue.

References

Page 1

	Year	Citations

Page 1