A comparative analysis of ensemble classifiers: Case studies in genomics

Sean Whalen, Gaurav Pandey

Research output: Contribution to journalConference articlepeer-review

55 Scopus citations

Abstract

The combination of multiple classifiers using ensemble methods is increasingly important for making progress in a variety of difficult prediction problems. We present a comparative analysis of several ensemble methods through two case studies in genomics, namely the prediction of genetic interactions and protein functions, to demonstrate their efficacy on real-world datasets and draw useful conclusions about their behavior. These methods include simple aggregation, meta-learning, cluster-based meta-learning, and ensemble selection using heterogeneous classifiers trained on resampled data to improve the diversity of their predictions. We present a detailed analysis of these methods across 4 genomics datasets and find the best of these methods offer statistically significant improvements over the state of the art in their respective domains. In addition, we establish a novel connection between ensemble selection and meta-learning, demonstrating how both of these disparate methods establish a balance between ensemble diversity and performance.

Original languageEnglish
Article number6729565
Pages (from-to)807-816
Number of pages10
JournalProceedings - IEEE International Conference on Data Mining, ICDM
DOIs
StatePublished - 2013
Event13th IEEE International Conference on Data Mining, ICDM 2013 - Dallas, TX, United States
Duration: 7 Dec 201310 Dec 2013

Keywords

  • Bioinformatics
  • Ensemble methods
  • Ensemble selection
  • Genomics
  • Stacking
  • Supervised learning

Fingerprint

Dive into the research topics of 'A comparative analysis of ensemble classifiers: Case studies in genomics'. Together they form a unique fingerprint.

Cite this