TY - JOUR
T1 - Machine learning to predict mortality and critical events in a cohort of patients with COVID-19 in New York City
T2 - Model development and validation
AU - Vaid, Akhil
AU - Somani, Sulaiman
AU - Russak, Adam J.
AU - de Freitas, Jessica K.
AU - Chaudhry, Fayzan F.
AU - Paranjpe, Ishan
AU - Johnson, Kipp W.
AU - Lee, Samuel J.
AU - Miotto, Riccardo
AU - Richter, Felix
AU - Zhao, Shan
AU - Beckmann, Noam D.
AU - Naik, Nidhi
AU - Kia, Arash
AU - Timsina, Prem
AU - Lala, Anuradha
AU - Paranjpe, Manish
AU - Golden, Eddye
AU - Danieletto, Matteo
AU - Singh, Manbir
AU - Meyer, Dara
AU - O'Reilly, Paul F.
AU - Huckins, Laura
AU - Kovatch, Patricia
AU - Finkelstein, Joseph
AU - Freeman, Robert M.
AU - Argulian, Edgar
AU - Kasarskis, Andrew
AU - Percha, Bethany
AU - Aberg, Judith A.
AU - Bagiella, Emilia
AU - Horowitz, Carol R.
AU - Murphy, Barbara
AU - Nestler, Eric J.
AU - Schadt, Eric E.
AU - Cho, Judy H.
AU - Cordon-Cardo, Carlos
AU - Fuster, Valentin
AU - Charney, Dennis S.
AU - Reich, David L.
AU - Bottinger, Erwin P.
AU - Levin, Matthew A.
AU - Narula, Jagat
AU - Fayad, Zahi A.
AU - Just, Allan C.
AU - Charney, Alexander W.
AU - Nadkarni, Girish N.
AU - Glicksberg, Benjamin S.
N1 - Funding Information:
SS is a cofounder and equity owner of Monogram Orthopedics. KWJ received fees from and holds equity in Tempus Labs. JAA received research grants and personal fees from Gilead, Merck, Janssen, and Viiv; personal fees from Medicure and Theratechnologies; and research grants from Atea, Pfizer and Regeneron, all outside of the submitted work. ES is the founding CEO and equity owner of Sema4.
Funding Information:
This work was supported by U54 TR001433-05, National Center for Advancing Translational Sciences, National Institutes of Health. This work was supported in part through the computational and data resources and staff expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai, notably Sharon Nirenberg. We thank Marcus Badgeley for his assistance with final editing. We would like to dedicate this effort to Mount Sinai Health System care providers for their hard work and sacrifice.
Publisher Copyright:
©Akhil Vaid, Sulaiman Somani, Adam J Russak, Jessica K De Freitas, Fayzan F Chaudhry, Ishan Paranjpe, Kipp W Johnson, Samuel J Lee, Riccardo Miotto, Felix Richter, Shan Zhao, Noam D Beckmann, Nidhi Naik, Arash Kia, Prem Timsina, Anuradha Lala, Manish Paranjpe, Eddye Golden, Matteo Danieletto, Manbir Singh, Dara Meyer, Paul F O'Reilly, Laura Huckins, Patricia Kovatch, Joseph Finkelstein, Robert M.
PY - 2020/11
Y1 - 2020/11
N2 - Background: COVID-19 has infected millions of people worldwide and is responsible for several hundred thousand fatalities. The COVID-19 pandemic has necessitated thoughtful resource allocation and early identification of high-risk patients. However, effective methods to meet these needs are lacking. Objective: The aims of this study were to analyze the electronic health records (EHRs) of patients who tested positive for COVID-19 and were admitted to hospitals in the Mount Sinai Health System in New York City; to develop machine learning models for making predictions about the hospital course of the patients over clinically meaningful time horizons based on patient characteristics at admission; and to assess the performance of these models at multiple hospitals and time points. Methods: We used Extreme Gradient Boosting (XGBoost) and baseline comparator models to predict in-hospital mortality and critical events at time windows of 3, 5, 7, and 10 days from admission. Our study population included harmonized EHR data from five hospitals in New York City for 4098 COVID-19–positive patients admitted from March 15 to May 22, 2020. The models were first trained on patients from a single hospital (n=1514) before or on May 1, externally validated on patients from four other hospitals (n=2201) before or on May 1, and prospectively validated on all patients after May 1 (n=383). Finally, we established model interpretability to identify and rank variables that drive model predictions. Results: Upon cross-validation, the XGBoost classifier outperformed baseline models, with an area under the receiver operating characteristic curve (AUC-ROC) for mortality of 0.89 at 3 days, 0.85 at 5 and 7 days, and 0.84 at 10 days. XGBoost also performed well for critical event prediction, with an AUC-ROC of 0.80 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. In external validation, XGBoost achieved an AUC-ROC of 0.88 at 3 days, 0.86 at 5 days, 0.86 at 7 days, and 0.84 at 10 days for mortality prediction. Similarly, the unimputed XGBoost model achieved an AUC-ROC of 0.78 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. Trends in performance on prospective validation sets were similar. At 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest drivers of critical event prediction, while higher age, anion gap, and C-reactive protein were the strongest drivers of mortality prediction. Conclusions: We externally and prospectively trained and validated machine learning models for mortality and critical events for patients with COVID-19 at different time horizons. These models identified at-risk patients and uncovered underlying relationships that predicted outcomes.
AB - Background: COVID-19 has infected millions of people worldwide and is responsible for several hundred thousand fatalities. The COVID-19 pandemic has necessitated thoughtful resource allocation and early identification of high-risk patients. However, effective methods to meet these needs are lacking. Objective: The aims of this study were to analyze the electronic health records (EHRs) of patients who tested positive for COVID-19 and were admitted to hospitals in the Mount Sinai Health System in New York City; to develop machine learning models for making predictions about the hospital course of the patients over clinically meaningful time horizons based on patient characteristics at admission; and to assess the performance of these models at multiple hospitals and time points. Methods: We used Extreme Gradient Boosting (XGBoost) and baseline comparator models to predict in-hospital mortality and critical events at time windows of 3, 5, 7, and 10 days from admission. Our study population included harmonized EHR data from five hospitals in New York City for 4098 COVID-19–positive patients admitted from March 15 to May 22, 2020. The models were first trained on patients from a single hospital (n=1514) before or on May 1, externally validated on patients from four other hospitals (n=2201) before or on May 1, and prospectively validated on all patients after May 1 (n=383). Finally, we established model interpretability to identify and rank variables that drive model predictions. Results: Upon cross-validation, the XGBoost classifier outperformed baseline models, with an area under the receiver operating characteristic curve (AUC-ROC) for mortality of 0.89 at 3 days, 0.85 at 5 and 7 days, and 0.84 at 10 days. XGBoost also performed well for critical event prediction, with an AUC-ROC of 0.80 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. In external validation, XGBoost achieved an AUC-ROC of 0.88 at 3 days, 0.86 at 5 days, 0.86 at 7 days, and 0.84 at 10 days for mortality prediction. Similarly, the unimputed XGBoost model achieved an AUC-ROC of 0.78 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. Trends in performance on prospective validation sets were similar. At 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest drivers of critical event prediction, while higher age, anion gap, and C-reactive protein were the strongest drivers of mortality prediction. Conclusions: We externally and prospectively trained and validated machine learning models for mortality and critical events for patients with COVID-19 at different time horizons. These models identified at-risk patients and uncovered underlying relationships that predicted outcomes.
KW - COVID-19
KW - Clinical informatics
KW - Cohort
KW - EHR
KW - Electronic health record
KW - Hospital
KW - Machine learning
KW - Mortality
KW - Performance
KW - Prediction
KW - TRIPOD
UR - http://www.scopus.com/inward/record.url?scp=85095862518&partnerID=8YFLogxK
U2 - 10.2196/24018
DO - 10.2196/24018
M3 - Article
C2 - 33027032
AN - SCOPUS:85095862518
SN - 1439-4456
VL - 22
JO - Journal of Medical Internet Research
JF - Journal of Medical Internet Research
IS - 11
M1 - e24018
ER -