Analysis of Gene Expression Microarrays for Phenotype Classification

Andrea Califano, Gustavo Stolovitzky, Yuhai Tu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Several microarray technologies that monitor the level of expression of a large number of genes have recently emerged. Given DNA-microarray data for a set of cells characterized by a given phenotype and for a set of control cells, an important problem is to identify "patterns"of gene expression that can be used to predict cell phenotype. The potential number of such patterns is exponential in the number of genes. In this paper, we propose a solution to this problem based on a supervised learning algorithm, which differs substantially from previous schemes. It couples a complex, non-linear similarity metric, which maximizes the probability of discovering discriminative gene expression patterns, and a pattern discovery algorithm called SPLASH. The latter discovers efficiently and deterministically all statistically significant gene expression patterns in the phenotype set. Statistical significance is evaluated based on the probability of a pattern to occur by chance in the control set. Finally, a greedy set covering algorithm is used to select an optimal subset of statistically significant patterns, which form the basis for a standard likelihood ratio classification scheme. We analyze data from 60 human cancer cell lines using this method, and compare our results with those of other supervised learning schemes. Different phenotypes are studied. These include cancer morphologies (such as melanoma), molecular targets (such as mutations in the p53 gene), and therapeutic targets related to the sensitivity to an anticancer compounds. We also analyze a synthetic data set that shows that this technique is especially well suited for the analysis of sub-phenotype mixtures. For complex phenotypes, such as p53, our method produces an encouragingly low rate of false positives and false negatives and seems to outperform the others. Similar low rates are reported when predicting the efficacy of experimental anticancer compounds. This counts among the first reported studies where drug efficacy has been successfully predicted from large-scale expression data analysis.

Original languageEnglish
Title of host publicationProceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, ISMB 2000
PublisherAAAI Press
Pages1-11
Number of pages11
ISBN (Electronic)1577351150, 9781577351153
StatePublished - 2000
Externally publishedYes
Event8th International Conference on Intelligent Systems for Molecular Biology, ISMB 2000 - San Diego, United States
Duration: 19 Aug 200023 Aug 2000

Publication series

NameProceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, ISMB 2000

Conference

Conference8th International Conference on Intelligent Systems for Molecular Biology, ISMB 2000
Country/TerritoryUnited States
CitySan Diego
Period19/08/0023/08/00

Keywords

  • Clustering
  • Gene Expression Microarrays
  • Gene Expression Patterns
  • Gene Expression analysis
  • Phenotype Classification
  • Tissue Classification

Fingerprint

Dive into the research topics of 'Analysis of Gene Expression Microarrays for Phenotype Classification'. Together they form a unique fingerprint.

Cite this