Abstract
The Epic electronic health record (EHR) is a commonly used EHR in the United States. This EHR contain large semi-structured “flowsheet” fields. Flowsheet fields lack a well-defined data dictionary and are unique to each site. We evaluated a simple free-text-like method to extract these data. As a use case, we demonstrate this method in predicting mortality during emergency department (ED) triage. We retrieved demographic and clinical data for ED visits from the Epic EHR (1/2014–12/2018). Data included structured, semi-structured flowsheet records and free-text notes. The study outcome was in-hospital death within 48 h. Most of the data were coded using a free-text-like Bag-of-Words (BoW) approach. Two machine-learning models were trained: gradient boosting and logistic regression. Term frequency-inverse document frequency was employed in the logistic regression model (LR-tf-idf). An ensemble of LR-tf-idf and gradient boosting was evaluated. Models were trained on years 2014–2017 and tested on year 2018. Among 412,859 visits, the 48-h mortality rate was 0.2%. LR-tf-idf showed AUC 0.98 (95% CI:0.98–0.99). Gradient boosting showed AUC 0.97 (95% CI:0.96–0.99). An ensemble of both showed AUC 0.99 (95% CI:0.98–0.99). In conclu-sion, a free-text-like approach can be useful for extracting knowledge from large amounts of complex semi-structured EHR data.
| Original language | English |
|---|---|
| Article number | 40 |
| Journal | Big Data and Cognitive Computing |
| Volume | 5 |
| Issue number | 3 |
| DOIs | |
| State | Published - Sep 2021 |
Keywords
- Electronic health records
- Gradient boosting
- Machine learning
Fingerprint
Dive into the research topics of 'A simple free-text-like method for extracting semi-structured data from electronic health records: Exemplified in prediction of in-hospital mortality'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver