Concepedia

Publication | Open Access

Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system

379

Citations

24

References

2006

Year

TLDR

Electronic medical record text is a rich source of information, and the results are promising given the complexity of discharge summaries and extraction tasks. We developed a Health Information Text Extraction (HITEx) tool to extract key findings for an airway disease study, and compared its principal diagnosis, comorbidity, and smoking status outputs from 150 discharge summaries to an expert-generated gold standard. The accuracy of HITEx was 82 % for principal diagnosis, 87 % for comorbidity, and 90 % for smoking status extraction, when cases labeled “Insufficient Data” by the gold standard were excluded.

Abstract

The text descriptions in electronic medical records are a rich source of information. We have developed a Health Information Text Extraction (HITEx) tool and used it to extract key findings for a research study on airways disease. The principal diagnosis, co-morbidity and smoking status extracted by HITEx from a set of 150 discharge summaries were compared to an expert-generated gold standard. The accuracy of HITEx was 82% for principal diagnosis, 87% for co-morbidity, and 90% for smoking status extraction, when cases labeled "Insufficient Data" by the gold standard were excluded. We consider the results promising, given the complexity of the discharge summaries and the extraction tasks.

References

YearCitations

Page 1