Derivation and validation of a machine learning risk score using biomarker and electronic patient data to predict progression of diabetic kidney disease

Lili Chan, Girish N. Nadkarni, Fergus Fleming, James R. McCullough, Patricia Connolly, Gohar Mosoyan, Fadi El Salem, Michael W. Kattan, Joseph A. Vassalotti, Barbara Murphy, Michael J. Donovan, Steven G. Coca, Scott M. Damrauer

Research output: Contribution to journalArticlepeer-review

52 Scopus citations


Aim: Predicting progression in diabetic kidney disease (DKD) is critical to improving outcomes. We sought to develop/validate a machine-learned, prognostic risk score (KidneyIntelX™) combining electronic health records (EHR) and biomarkers. Methods: This is an observational cohort study of patients with prevalent DKD/banked plasma from two EHR-linked biobanks. A random forest model was trained, and performance (AUC, positive and negative predictive values [PPV/NPV], and net reclassification index [NRI]) was compared with that of a clinical model and Kidney Disease: Improving Global Outcomes (KDIGO) categories for predicting a composite outcome of eGFR decline of ≥5 ml/min per year, ≥40% sustained decline, or kidney failure within 5 years. Results: In 1146 patients, the median age was 63 years, 51% were female, the baseline eGFR was 54 ml min−1 [1.73 m]−2, the urine albumin to creatinine ratio (uACR) was 6.9 mg/mmol, follow-up was 4.3 years and 21% had the composite endpoint. On cross-validation in derivation (n = 686), KidneyIntelX had an AUC of 0.77 (95% CI 0.74, 0.79). In validation (n = 460), the AUC was 0.77 (95% CI 0.76, 0.79). By comparison, the AUC for the clinical model was 0.62 (95% CI 0.61, 0.63) in derivation and 0.61 (95% CI 0.60, 0.63) in validation. Using derivation cut-offs, KidneyIntelX stratified 46%, 37% and 17% of the validation cohort into low-, intermediate- and high-risk groups for the composite kidney endpoint, respectively. The PPV for progressive decline in kidney function in the high-risk group was 61% for KidneyIntelX vs 40% for the highest risk strata by KDIGO categorisation (p < 0.001). Only 10% of those scored as low risk by KidneyIntelX experienced progression (i.e., NPV of 90%). The NRIevent for the high-risk group was 41% (p < 0.05). Conclusions: KidneyIntelX improved prediction of kidney outcomes over KDIGO and clinical models in individuals with early stages of DKD. Graphical abstract: [Figure not available: see fulltext.]

Original languageEnglish
Pages (from-to)1504-1515
Number of pages12
Issue number7
StatePublished - Jul 2021


  • Biomarkers
  • Diabetic kidney disease
  • Electronic data
  • Machine learning
  • Prediction


Dive into the research topics of 'Derivation and validation of a machine learning risk score using biomarker and electronic patient data to predict progression of diabetic kidney disease'. Together they form a unique fingerprint.

Cite this