Optimal linear ensemble of binary classifiers

Mehmet Eren Ahsen, Robert Vogel, Gustavo Stolovitzky

Research output: Contribution to journalArticlepeer-review

Abstract

Motivation: The integration of vast, complex biological data with computational models offers profound insights and predictive accuracy. Yet, such models face challenges: poor generalization and limited labeled data. Results: To overcome these difficulties in binary classification tasks, we developed the Method for Optimal Classification by Aggregation (MOCA) algorithm, which addresses the problem of generalization by virtue of being an ensemble learning method and can be used in problems with limited or no labeled data. We developed both an unsupervised (uMOCA) and a supervised (sMOCA) variant of MOCA. For uMOCA, we show how to infer the MOCA weights in an unsupervised way, which are optimal under the assumption of class-conditioned independent classifier predictions. When it is possible to use labels, sMOCA uses empirically computed MOCA weights. We demonstrate the performance of uMOCA and sMOCA using simulated data as well as actual data previously used in Dialogue on Reverse Engineering and Methods (DREAM) challenges. We also propose an application of sMOCA for transfer learning where we use pre-trained computational models from a domain where labeled data are abundant and apply them to a different domain with less abundant labeled data.

Original languageEnglish
Article numbervbae093
JournalBioinformatics Advances
Volume4
Issue number1
DOIs
StatePublished - 2024
Externally publishedYes

Fingerprint

Dive into the research topics of 'Optimal linear ensemble of binary classifiers'. Together they form a unique fingerprint.

Cite this