TY - GEN
T1 - Leveraging hierarchy in medical codes for predictive modeling
AU - Singh, Anima
AU - Nadkarni, Girish
AU - Guttag, John
AU - Bottinger, Erwin
N1 - Publisher Copyright:
Copyright © 2014 ACM.
PY - 2014/9/20
Y1 - 2014/9/20
N2 - ICD-9 codes are among the most important patient information recorded in electronic health records. They have been shown to be useful for predictive modeling of different adverse outcomes in patients, including diabetes and heart failure. An important characteristic of ICD-9 codes is the hierarchical relationships among different codes. Nevertheless, the most common feature representation used to incorporate ICD-9 codes in predictive models disregards the structural relationships. In this paper, we explore different methods to leverage the hierarchical structure in ICD-9 codes with the goal of improving performance of predictive models. We compare methods that leverage hierarchy by 1) incorporating the information during feature construction, 2) using a learning algorithm that addresses the structure in the ICD-9 codes when building a model, or 3) doing both. We propose and evaluate a novel feature engineering approach to leverage hierarchy, while simultaneously reducing feature dimensionality. Our experiments indicate that significant improvement in predictive performance can be achieved by properly exploiting ICD-9 hierarchy. Using two clinical tasks: predicting chronic kidney disease progression (Task-CKD), and predicting incident heart failure (Task-HF), we show that methods that use hierarchy outperform the conventional approach in F-score (0.44 vs 0.36 for Task-HF and 0.40 vs 0.37 for Task- CKD) and relative risk (4.6 vs 3.3 for Task-HF and 5.9 vs 3.8 for Task-CKD).
AB - ICD-9 codes are among the most important patient information recorded in electronic health records. They have been shown to be useful for predictive modeling of different adverse outcomes in patients, including diabetes and heart failure. An important characteristic of ICD-9 codes is the hierarchical relationships among different codes. Nevertheless, the most common feature representation used to incorporate ICD-9 codes in predictive models disregards the structural relationships. In this paper, we explore different methods to leverage the hierarchical structure in ICD-9 codes with the goal of improving performance of predictive models. We compare methods that leverage hierarchy by 1) incorporating the information during feature construction, 2) using a learning algorithm that addresses the structure in the ICD-9 codes when building a model, or 3) doing both. We propose and evaluate a novel feature engineering approach to leverage hierarchy, while simultaneously reducing feature dimensionality. Our experiments indicate that significant improvement in predictive performance can be achieved by properly exploiting ICD-9 hierarchy. Using two clinical tasks: predicting chronic kidney disease progression (Task-CKD), and predicting incident heart failure (Task-HF), we show that methods that use hierarchy outperform the conventional approach in F-score (0.44 vs 0.36 for Task-HF and 0.40 vs 0.37 for Task- CKD) and relative risk (4.6 vs 3.3 for Task-HF and 5.9 vs 3.8 for Task-CKD).
KW - Feature hierarchy
KW - ICD-9 codes
KW - Predictive modeling
UR - http://www.scopus.com/inward/record.url?scp=84920733698&partnerID=8YFLogxK
U2 - 10.1145/2649387.2649407
DO - 10.1145/2649387.2649407
M3 - Conference contribution
AN - SCOPUS:84920733698
T3 - ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
SP - 96
EP - 103
BT - ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PB - Association for Computing Machinery
T2 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014
Y2 - 20 September 2014 through 23 September 2014
ER -