TY - JOUR
T1 - A novel method leveraging time series data to improve subphenotyping and application in critically ill patients with COVID-19
AU - Oh, Wonsuk
AU - Jayaraman, Pushkala
AU - Tandon, Pranai
AU - Chaddha, Udit S.
AU - Kovatch, Patricia
AU - Charney, Alexander W.
AU - Glicksberg, Benjamin S.
AU - Nadkarni, Girish N.
N1 - Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2024/2
Y1 - 2024/2
N2 - Computational subphenotyping, a data-driven approach to understanding disease subtypes, is a prominent topic in medical research. Numerous ongoing studies are dedicated to developing advanced computational subphenotyping methods for cross-sectional data. However, the potential of time-series data has been underexplored until now. Here, we propose a Multivariate Levenshtein Distance (MLD) that can account for address correlation in multiple discrete features over time-series data. Our algorithm has two distinct components: it integrates an optimal threshold score to enhance the sensitivity in discriminating between pairs of instances, and the MLD itself. We have applied the proposed distance metrics on the k-means clustering algorithm to derive temporal subphenotypes from time-series data of biomarkers and treatment administrations from 1039 critically ill patients with COVID-19 and compare its effectiveness to standard methods. In conclusion, the Multivariate Levenshtein Distance metric is a novel method to quantify the distance from multiple discrete features over time-series data and demonstrates superior clustering performance among competing time-series distance metrics.
AB - Computational subphenotyping, a data-driven approach to understanding disease subtypes, is a prominent topic in medical research. Numerous ongoing studies are dedicated to developing advanced computational subphenotyping methods for cross-sectional data. However, the potential of time-series data has been underexplored until now. Here, we propose a Multivariate Levenshtein Distance (MLD) that can account for address correlation in multiple discrete features over time-series data. Our algorithm has two distinct components: it integrates an optimal threshold score to enhance the sensitivity in discriminating between pairs of instances, and the MLD itself. We have applied the proposed distance metrics on the k-means clustering algorithm to derive temporal subphenotypes from time-series data of biomarkers and treatment administrations from 1039 critically ill patients with COVID-19 and compare its effectiveness to standard methods. In conclusion, the Multivariate Levenshtein Distance metric is a novel method to quantify the distance from multiple discrete features over time-series data and demonstrates superior clustering performance among competing time-series distance metrics.
KW - Covid-19
KW - Electronic health records
KW - Time-series distance metrics
UR - http://www.scopus.com/inward/record.url?scp=85181762151&partnerID=8YFLogxK
U2 - 10.1016/j.artmed.2023.102750
DO - 10.1016/j.artmed.2023.102750
M3 - Article
AN - SCOPUS:85181762151
SN - 0933-3657
VL - 148
JO - Artificial Intelligence in Medicine
JF - Artificial Intelligence in Medicine
M1 - 102750
ER -