Abstract
The combination of multiple classifiers using ensemble methods is increasingly important for making progress in a variety of difficult prediction problems. We present a comparative analysis of several ensemble methods through two case studies in genomics, namely the prediction of genetic interactions and protein functions, to demonstrate their efficacy on real-world datasets and draw useful conclusions about their behavior. These methods include simple aggregation, meta-learning, cluster-based meta-learning, and ensemble selection using heterogeneous classifiers trained on resampled data to improve the diversity of their predictions. We present a detailed analysis of these methods across 4 genomics datasets and find the best of these methods offer statistically significant improvements over the state of the art in their respective domains. In addition, we establish a novel connection between ensemble selection and meta-learning, demonstrating how both of these disparate methods establish a balance between ensemble diversity and performance.
Original language | English |
---|---|
Article number | 6729565 |
Pages (from-to) | 807-816 |
Number of pages | 10 |
Journal | Proceedings - IEEE International Conference on Data Mining, ICDM |
DOIs | |
State | Published - 2013 |
Event | 13th IEEE International Conference on Data Mining, ICDM 2013 - Dallas, TX, United States Duration: 7 Dec 2013 → 10 Dec 2013 |
Keywords
- Bioinformatics
- Ensemble methods
- Ensemble selection
- Genomics
- Stacking
- Supervised learning