Electronic medical record phenotyping using the anchor and learn framework

TLDR

Electronic medical records contain extensive patient information that can be used to generate phenotypes for real‑time clinical decision support. The authors developed a phenotype library using the anchor‑and‑learn framework, combining structured and unstructured EMR data and evaluating eight phenotypes against prospectively collected gold‑standard labels. The library, comprising 42 publicly available definitions, achieved AUCs ranging from 0.83 to 0.97 across conditions and performed comparably to models trained on 5,000 manually labeled patients, demonstrating fast, interpretable phenotyping.

Abstract

ABSTRACT Background Electronic medical records (EMRs) hold a tremendous amount of information about patients that is relevant to determining the optimal approach to patient care. As medicine becomes increasingly precise, a patient’s electronic medical record phenotype will play an important role in triggering clinical decision support systems that can deliver personalized recommendations in real time. Learning with anchors presents a method of efficiently learning statistically driven phenotypes with minimal manual intervention. Materials and Methods We developed a phenotype library that uses both structured and unstructured data from the EMR to represent patients for real-time clinical decision support. Eight of the phenotypes were evaluated using retrospective EMR data on emergency department patients using a set of prospectively gathered gold standard labels. Results We built a phenotype library with 42 publicly available phenotype definitions. Using information from triage time, the phenotype classifiers have an area under the ROC curve (AUC) of infection 0.89, cancer 0.88, immunosuppressed 0.85, septic shock 0.93, nursing home 0.87, anticoagulated 0.83, cardiac etiology 0.89, and pneumonia 0.90. Using information available at the time of disposition from the emergency department, the AUC values are infection 0.91, cancer 0.95, immunosuppressed 0.90, septic shock 0.97, nursing home 0.91, anticoagulated 0.94, cardiac etiology 0.92, and pneumonia 0.97. Discussion The resulting phenotypes are interpretable and fast to build, and perform comparably to statistically learned phenotypes developed with 5000 manually labeled patients. Conclusion Learning with anchors is an attractive option for building a large public repository of phenotype definitions that can be used for a range of health IT applications, including real-time decision support.

References

Page 1

	Year	Citations

Page 1