Supervised pretraining through contrastive categorical positive samplings to improve COVID-19 mortality prediction

Tingyi Wanyan, Mingquan Lin, Eyal Klang, Kartikeya M. Menon, Faris F. Gulamali, Ariful Azad, Yiye Zhang, Ying Ding, Zhangyang Wang, Fei Wang, Benjamin Glicksberg, Yifan Peng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Clinical EHR data is naturally heterogeneous, where it contains abundant sub-phenotype. Such diversity creates challenges for outcome prediction using a machine learning model since it leads to high intra-class variance. To address this issue, we propose a supervised pre-Training model with a unique embedded k-nearest-neighbor positive sampling strategy. We demonstrate the enhanced performance value of this framework theoretically and show that it yields highly competitive experimental results in predicting patient mortality in real-world COVID-19 EHR data with a total of over 7,000 patients admitted to a large, urban health system. Our method achieves a better AUROC prediction score of 0.872, which outperforms the alternative pre-Training models and traditional machine learning methods. Additionally, our method performs much better when the training data size is small (345 training instances).

Original languageEnglish
Title of host publicationProceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2022
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450393867
DOIs
StatePublished - 7 Aug 2022
Event13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2022 - Chicago, United States
Duration: 7 Aug 20228 Aug 2022

Publication series

NameProceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2022

Conference

Conference13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2022
Country/TerritoryUnited States
CityChicago
Period7/08/228/08/22

Keywords

  • Intra-class variance
  • Mortality prediction
  • Pre-Training
  • Self-supervised learning
  • Sub-phenotype
  • Supervised contrastive learning

Fingerprint

Dive into the research topics of 'Supervised pretraining through contrastive categorical positive samplings to improve COVID-19 mortality prediction'. Together they form a unique fingerprint.

Cite this