Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons

Alvaro Mateos, Joaquín Dopazo, Ronald Jansen, Yuhai Tu, Mark Gerstein, Gustavo Stolovitzky

Research output: Contribution to journalArticlepeer-review

101 Scopus citations


Recent advances in microarray technology have opened new ways for functional annotation of previously uncharacterised genes on a genomic scale. This has been demonstrated by unsupervised clustering of co-expressed genes and, more importantly, by supervised learning algorithms. Using prior knowledge, these algorithms can assign functional annotations based on more complex expression signatures found in existing functional classes. Previously, support vector machines (SVMs) and other machine-learning methods have been applied to a limited number of functional classes for this purpose. Here we present, for the first time, the comprehensive application of supervised neural networks (SNNs) for functional annotation. Our study is novel in that we report systematic results for ∼100 classes in the Munich Information Center for Protein Sequences (MIPS) functional catalog. We found that only ∼10% of these are learnable (based on the rate of false negatives). A closer analysis reveals that false positives (and negatives) in a machine-learning context are not necessarily "false" in a biological sense. We show that the high degree of interconnections among functional classes confounds the signatures that ought to be learned for a unique class. We term this the "Borges effect" and introduce two new numerical indices for its quantification. Our analysis indicates that classification systems with a lower Borges effect are better suitable for machine learning. Furthermore, we introduce a learning procedure for combining false positives with the original class. We show that in a few iterations this process converges to a gene set that is learnable with considerably low rates of false positives and negatives and contains genes that are biologically related to the original class, allowing for a coarse reconstruction of the interactions between associated biological pathways. We exemplify this methodology using the well-studied tricarboxylic acid cycle.

Original languageEnglish
Pages (from-to)1703-1715
Number of pages13
JournalGenome Research
Issue number11
StatePublished - 1 Nov 2002
Externally publishedYes


Dive into the research topics of 'Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons'. Together they form a unique fingerprint.

Cite this