Disease progression subtype discovery from longitudinal EMR data with a majority of missing values and unknown initial time points

Ilkka Huopaniemi, Girish Nadkarni, Rajiv Nadukuru, Vaneet Lotay, Steve Ellis, Omri Gottesman, Erwin P. Bottinger

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Electronic medical records (EMR) contain a longitudinal collection of laboratory data that contains valuable phenotypic information on disease progression of a large collection of patients. These data can be potentially used in medical research or patient care; finding disease progression subtypes is a particularly important application. There are, however, two significant difficulties in utilizing this data for statistical analysis: (a) a large proportion of data is missing and (b) patients are in very different stages of disease progression and there are no well-defined start points of the time series. We present a Bayesian machine learning model that overcomes these difficulties. The method can use highly incomplete time-series measurement of varying lengths, it aligns together similar trajectories in different phases and is capable of finding consistent disease progression subtypes. We demonstrate the method on finding chronic kidney disease progression subtypes.

Original languageEnglish
Pages (from-to)709-718
Number of pages10
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
Volume2014
StatePublished - 2014

Fingerprint

Dive into the research topics of 'Disease progression subtype discovery from longitudinal EMR data with a majority of missing values and unknown initial time points'. Together they form a unique fingerprint.

Cite this