TY - JOUR
T1 - Predicting hypertension onset from longitudinal electronic health records with deep learning
AU - Datta, Suparno
AU - Morassi Sasso, Ariane
AU - Kiwit, Nina
AU - Bose, Subhronil
AU - Nadkarni, Girish
AU - Miotto, Riccardo
AU - Böttinger, Erwin P.
N1 - Publisher Copyright:
© 2022 The Author(s). Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2022/12/1
Y1 - 2022/12/1
N2 - Objective: Hypertension has long been recognized as one of the most important predisposing factors for cardiovascular diseases and mortality. In recent years, machine learning methods have shown potential in diagnostic and predictive approaches in chronic diseases. Electronic health records (EHRs) have emerged as a reliable source of longitudinal data. The aim of this study is to predict the onset of hypertension using modern deep learning (DL) architectures, specifically long short-term memory (LSTM) networks, and longitudinal EHRs. Materials and Methods: We compare this approach to the best performing models reported from previous works, particularly XGboost, applied to aggregated features. Our work is based on data from 233 895 adult patients from a large health system in the United States. We divided our population into 2 distinct longitudinal datasets based on the diagnosis date. To ensure generalization to unseen data, we trained our models on the first dataset (dataset A "train and validation") using cross-validation, and then applied the models to a second dataset (dataset B "test") to assess their performance. We also experimented with 2 different time-windows before the onset of hypertension and evaluated the impact on model performance. Results: With the LSTM network, we were able to achieve an area under the receiver operating characteristic curve value of 0.98 in the "train and validation"dataset A and 0.94 in the "test"dataset B for a prediction time window of 1 year. Lipid disorders, type 2 diabetes, and renal disorders are found to be associated with incident hypertension. Conclusion: These findings show that DL models based on temporal EHR data can improve the identification of patients at high risk of hypertension and corresponding driving factors. In the long term, this work may support identifying individuals who are at high risk for developing hypertension and facilitate earlier intervention to prevent the future development of hypertension.
AB - Objective: Hypertension has long been recognized as one of the most important predisposing factors for cardiovascular diseases and mortality. In recent years, machine learning methods have shown potential in diagnostic and predictive approaches in chronic diseases. Electronic health records (EHRs) have emerged as a reliable source of longitudinal data. The aim of this study is to predict the onset of hypertension using modern deep learning (DL) architectures, specifically long short-term memory (LSTM) networks, and longitudinal EHRs. Materials and Methods: We compare this approach to the best performing models reported from previous works, particularly XGboost, applied to aggregated features. Our work is based on data from 233 895 adult patients from a large health system in the United States. We divided our population into 2 distinct longitudinal datasets based on the diagnosis date. To ensure generalization to unseen data, we trained our models on the first dataset (dataset A "train and validation") using cross-validation, and then applied the models to a second dataset (dataset B "test") to assess their performance. We also experimented with 2 different time-windows before the onset of hypertension and evaluated the impact on model performance. Results: With the LSTM network, we were able to achieve an area under the receiver operating characteristic curve value of 0.98 in the "train and validation"dataset A and 0.94 in the "test"dataset B for a prediction time window of 1 year. Lipid disorders, type 2 diabetes, and renal disorders are found to be associated with incident hypertension. Conclusion: These findings show that DL models based on temporal EHR data can improve the identification of patients at high risk of hypertension and corresponding driving factors. In the long term, this work may support identifying individuals who are at high risk for developing hypertension and facilitate earlier intervention to prevent the future development of hypertension.
KW - deep learning
KW - electronic health records
KW - hypertension
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85144920141&partnerID=8YFLogxK
U2 - 10.1093/jamiaopen/ooac097
DO - 10.1093/jamiaopen/ooac097
M3 - Article
AN - SCOPUS:85144920141
SN - 2574-2531
VL - 5
JO - JAMIA Open
JF - JAMIA Open
IS - 4
M1 - ooac097
ER -