Bi-level multi-source learning for heterogeneous block-wise missing data

Alzheimer's Disease Neuroimaging Initiative

Research output: Contribution to journalReview articlepeer-review

65 Scopus citations


Bio-imaging technologies allow scientists to collect large amounts of high-dimensional data from multiple heterogeneous sources for many biomedical applications. In the study of Alzheimer's Disease (AD), neuroimaging data, gene/protein expression data, etc., are often analyzed together to improve predictive power. Joint learning from multiple complementary data sources is advantageous, but feature-pruning and data source selection are critical to learn interpretable models from high-dimensional data. Often, the data collected has block-wise missing entries. In the Alzheimer's Disease Neuroimaging Initiative (ADNI), most subjects have MRI and genetic information, but only half have cerebrospinal fluid (CSF) measures, a different half has FDG-PET; only some have proteomic data. Here we propose how to effectively integrate information from multiple heterogeneous data sources when data is block-wise missing. We present a unified "bi-level" learning model for complete multi-source data, and extend it to incomplete data. Our major contributions are: (1) our proposed models unify feature-level and source-level analysis, including several existing feature learning approaches as special cases; (2) the model for incomplete data avoids imputing missing data and offers superior performance; it generalizes to other applications with block-wise missing data sources; (3) we present efficient optimization algorithms for modeling complete and incomplete data. We comprehensively evaluate the proposed models including all ADNI subjects with at least one of four data types at baseline: MRI, FDG-PET, CSF and proteomics. Our proposed models compare favorably with existing approaches.

Original languageEnglish
Pages (from-to)192-206
Number of pages15
Issue numberP1
StatePublished - 5 Nov 2014
Externally publishedYes


  • Alzheimer's disease
  • Block-wise missing data
  • Multi-modal fusion
  • Multi-source
  • Optimization


Dive into the research topics of 'Bi-level multi-source learning for heterogeneous block-wise missing data'. Together they form a unique fingerprint.

Cite this