S1 Machine Learning for Classification of Indeterminate Biliary Strictures During Cholangioscopy

Abstract

Introduction: Indeterminate biliary strictures remain a diagnostic challenge despite advancements in radiologic, endoscopic, and laboratory testing. More than 25% of patients presumed to have malignant strictures during cholangioscopy show benign pathology after major surgical intervention. Interpretation of the visual findings during cholangioscopy remains challenging even for experienced endoscopists. We therefore aimed to develop a software tool that classifies indeterminate biliary strictures as benign or malignant using both cholangioscopy images and clinical data. Methods: Our dataset included cholangioscopy images and clinical data from a retrospective cohort of patients undergoing cholangioscopy for evaluation of indeterminate biliary strictures. We annotated images for abnormal features suggestive of malignancy, including papillary mass, dilated and tortuous vessels and ulceration. We trained a convolutional neural network (CNN) based on ResNet-18 to detect presence of abnormal image features and tested it in patients of independent centers (external validation). We used multiple outputation to analyze the patient as the unit of analysis and estimated accuracy, sensitivity, specificity, positive and negative predictive values (PPV and NPV), and area under the receiver operating characteristic curve (AUC). Results: A total of 1,371,605 cholangioscopy images were obtained from 528 patients at 25 centers (13 North America, 7 Asia, 2 Europe, 2 Australia, 1 South America). Our training set included data from 254 patients at 14 centers, and the test set included data from 95 patients at 8 other independent centers. Table 1 shows the proportion of patients with abnormal cholangioscopy image features according to their final diagnosis. For detection of abnormal image features, the CNN showed a sensitivity of 0.81 (95% confidence interval: 0.72 to 0.91); specificity 0.91 (0.86 to 0.97); PPV 0.93 (0.88 to 0.98); NPV 0.77 (0.66 to 0.88); and AUC 0.86 (0.80 to 0.92). Conclusion: Using data from a large cohort of patients across the world, we trained and externally validated a CNN that can detect key cholangioscopy image features suggestive of malignancy and thus support intra-procedural decision-making. Our next step is to enhance the CNN with clinical data and evaluate it for diagnosing and predicting malignancy in indeterminate biliary strictures. This can improve clinical outcomes through accurate diagnosis of disease and prevention of unwarranted surgical intervention.Table 1.: Proportion of patients with cholangioscopy images showing abnormal features (suggestive of malignancy) according to their final diagnosis.