Abstract
Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred (https://github.com/GauravPandeyLab/LargeGOPred).
| Original language | English |
|---|---|
| Article number | 1577 |
| Journal | F1000Research |
| Volume | 7 |
| DOIs | |
| State | Published - 2018 |
Keywords
- Protein function prediction heterogeneous ensembles machine learning high-performance computing performance evaluation
Fingerprint
Dive into the research topics of 'Large-scale protein function prediction using heterogeneous ensembles.'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver