TY - JOUR
T1 - An integrative approach for causal gene identification and gene regulatory pathway inference
AU - Tu, Zhidong
AU - Wang, Li
AU - Arbeitman, Michelle N.
AU - Chen, Ting
AU - Sun, Fengzhu
N1 - Funding Information:
We thank Drs. Rachel B. Brem and Leonid Kruglyak for kindly providing us the yeast genotype data. This work was inspired by collaboration with Huiying Yang from Medical Genetics Institute at Cedars-Sinai. We also thank Xianghong Jasmine Zhou for her constructive suggestions. We are very grateful to the researchers who performed all the experiments to generate the data that were used by this study. We apologize to those whose works are not cited due to page limit. This research was supported by NIH/NSF joint mathematical biology initiative DMS-0241102 and by NIH P50 HG 002790.
PY - 2006/7/15
Y1 - 2006/7/15
N2 - Motivation: Gene expression variation can often be linked to certain chromosomal regions and are tightly associated with phenotypic variation such as disease conditions. Inferring the causal genes for the expression variation is of great importance but rather challenging as the linked region generally contains multiple genes. Even when a single candidate gene is proposed, the underlying biological mechanism by which the regulation is enforced remains unknown. Novel approaches are needed to both infer the causal genes and generate hypothesis on the underlying regulatory mechanisms. Results: We propose a new approach which aims at achieving the above objectives by integrating genotype information, gene expression, protein-protein interaction, protein phosphorylation, and transcription factor (TF)-DNA binding information. A network based stochastic algorithm is designed to infer the causal genes and identify the underlying regulatory pathways. We first quantitatively verified our method by a test using data generated by yeast knock-out experiments. Over 40% of inferred causal genes are correct, which is significantly better than 10% by random guess. We then applied our method to a recent genome-wide expression variation study in yeast. We show that our method can correctly identify the causal genes and effectively output experimentally verified pathways. New potential gene regulatory pathways are generated and presented as a global network.
AB - Motivation: Gene expression variation can often be linked to certain chromosomal regions and are tightly associated with phenotypic variation such as disease conditions. Inferring the causal genes for the expression variation is of great importance but rather challenging as the linked region generally contains multiple genes. Even when a single candidate gene is proposed, the underlying biological mechanism by which the regulation is enforced remains unknown. Novel approaches are needed to both infer the causal genes and generate hypothesis on the underlying regulatory mechanisms. Results: We propose a new approach which aims at achieving the above objectives by integrating genotype information, gene expression, protein-protein interaction, protein phosphorylation, and transcription factor (TF)-DNA binding information. A network based stochastic algorithm is designed to infer the causal genes and identify the underlying regulatory pathways. We first quantitatively verified our method by a test using data generated by yeast knock-out experiments. Over 40% of inferred causal genes are correct, which is significantly better than 10% by random guess. We then applied our method to a recent genome-wide expression variation study in yeast. We show that our method can correctly identify the causal genes and effectively output experimentally verified pathways. New potential gene regulatory pathways are generated and presented as a global network.
UR - http://www.scopus.com/inward/record.url?scp=33747880011&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btl234
DO - 10.1093/bioinformatics/btl234
M3 - Article
C2 - 16873511
AN - SCOPUS:33747880011
SN - 1367-4803
VL - 22
SP - e489-e496
JO - Bioinformatics
JF - Bioinformatics
IS - 14
ER -