Publication | Open Access
Extracting Smoking Status from Electronic Health Records Using NLP and Deep Learning.
15
Citations
18
References
2020
Year
EngineeringMachine LearningUnited StatesLanguage ProcessingText MiningWord EmbeddingsNatural Language ProcessingTobacco ControlData ScienceDigital HealthPublic HealthClinical LanguageTobacco UseMachine Learning ModelNlp TaskElectronic Health RecordMedical Language ProcessingDeep LearningClinical Progress NotesEpidemiologyHealth DataHealth Informatics
Half a million people die every year from smoking-related issues across the United States. It is essential to identify individuals who are tobacco-dependent in order to implement preventive measures. In this study, we investigate the effectiveness of deep learning models to extract smoking status of patients from clinical progress notes. A Natural Language Processing (NLP) Pipeline was built that cleans the progress notes prior to processing by three deep neural networks: a CNN, a unidirectional LSTM, and a bidirectional LSTM. Each of these models was trained with a pre- trained or a post-trained word embedding layer. Three traditional machine learning models were also employed to compare against the neural networks. Each model has generated both binary and multi-class label classification. Our results showed that the CNN model with a pre-trained embedding layer performed the best for both binary and multi- class label classification.
| Year | Citations | |
|---|---|---|
Page 1
Page 1