COVID-Transformer: Interpretable COVID-19 Detection Using Vision Transformer for Healthcare

TLDR

Accurate, rapid COVID‑19 testing is critical, yet testing capacity is limited and manual chest X‑ray review is slow and error‑prone. The study proposes a Vision Transformer‑based deep learning pipeline to detect COVID‑19 from chest X‑ray images. The authors assembled a 30 k‑image chest X‑ray dataset from three open‑source sources, fine‑tuned several baseline CNNs, and trained a Vision Transformer to classify COVID‑19. The Vision Transformer achieved 98 % accuracy (AUC 99 %) on binary detection, 92 % accuracy (AUC 98 %) on multi‑class classification, outperformed all baseline CNNs, and its Grad‑CAM visualizations aid radiologist interpretation.

Abstract

In the recent pandemic, accurate and rapid testing of patients remained a critical task in the diagnosis and control of COVID-19 disease spread in the healthcare industry. Because of the sudden increase in cases, most countries have faced scarcity and a low rate of testing. Chest X-rays have been shown in the literature to be a potential source of testing for COVID-19 patients, but manually checking X-ray reports is time-consuming and error-prone. Considering these limitations and the advancements in data science, we proposed a Vision Transformer-based deep learning pipeline for COVID-19 detection from chest X-ray-based imaging. Due to the lack of large data sets, we collected data from three open-source data sets of chest X-ray images and aggregated them to form a 30 K image data set, which is the largest publicly available collection of chest X-ray images in this domain to our knowledge. Our proposed transformer model effectively differentiates COVID-19 from normal chest X-rays with an accuracy of 98% along with an AUC score of 99% in the binary classification task. It distinguishes COVID-19, normal, and pneumonia patient’s X-rays with an accuracy of 92% and AUC score of 98% in the Multi-class classification task. For evaluation on our data set, we fine-tuned some of the widely used models in literature, namely, EfficientNetB0, InceptionV3, Resnet50, MobileNetV3, Xception, and DenseNet-121, as baselines. Our proposed transformer model outperformed them in terms of all metrics. In addition, a Grad-CAM based visualization is created which makes our approach interpretable by radiologists and can be used to monitor the progression of the disease in the affected lungs, assisting healthcare.

References

Page 1

	Year	Citations

Page 1