Calibration of computational tools for missense variant pathogenicity classification and ClinGen recommendations for PP3/BP4 criteria

TLDR

ACMG/AMP guidelines use computational predictors as supporting evidence (PP3/BP4), but lack quantitative score intervals and consensus requirements, prompting us to develop a probabilistic framework to quantify evidence strengths. The study extends this framework to computational predictors, establishing a standard that translates tool scores into PP3 and BP4 evidence strengths. We previously described a probabilistic framework that quantifies evidence strengths within ACMG/AMP recommendations, and here we estimate local positive predictive values to calibrate any computational tool, deriving score thresholds for each evidence strength across thirteen missense variant tools using independent datasets. Most tools achieved supporting evidence for both pathogenic and benign classification, several reached moderate and strong levels, one reached very strong for benign, leading to recommendations for evidence‑based revisions of PP3/BP4 criteria.

Abstract

Recommendations from the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) for interpreting sequence variants specify the use of computational predictors as "supporting" level of evidence for pathogenicity or benignity using criteria PP3 and BP4, respectively. However, score intervals defined by tool developers, and ACMG/AMP recommendations that require the consensus of multiple predictors, lack quantitative support. Previously, we described a probabilistic framework that quantified the strengths of evidence (supporting, moderate, strong, very strong) within ACMG/AMP recommendations. We have extended this framework to computational predictors and introduce a new standard that converts a tool's scores to PP3 and BP4 evidence strengths. Our approach is based on estimating the local positive predictive value and can calibrate any computational tool or other continuous-scale evidence on any variant type. We estimate thresholds (score intervals) corresponding to each strength of evidence for pathogenicity and benignity for thirteen missense variant interpretation tools, using carefully assembled independent data sets. Most tools achieved supporting evidence level for both pathogenic and benign classification using newly established thresholds. Multiple tools reached score thresholds justifying moderate and several reached strong evidence levels. One tool reached very strong evidence level for benign classification on some variants. Based on these findings, we provide recommendations for evidence-based revisions of the PP3 and BP4 ACMG/AMP criteria using individual tools and future assessment of computational methods for clinical interpretation.

References

Page 1

	Year	Citations

Page 1