TY - JOUR
T1 - Generating a robust statistical causal structure over 13 cardiovascular disease risk factors using genomics data
AU - Yazdani, Azam
AU - Yazdani, Akram
AU - Samiei, Ahmad
AU - Boerwinkle, Eric
N1 - Publisher Copyright:
© 2016 The Authors.
PY - 2016/4/1
Y1 - 2016/4/1
N2 - Understanding causal relationships among large numbers of variables is a fundamental goal of biomedical sciences and can be facilitated by Directed Acyclic Graphs (DAGs) where directed edges between nodes represent the influence of components of the system on each other. In an observational setting, some of the directions are often unidentifiable because of Markov equivalency. Additional exogenous information, such as expert knowledge or genotype data can help establish directionality among the endogenous variables. In this study, we use the method of principle component analysis to extract information across the genome in order to generate a robust statistical causal network among phenotypes, the variables of primary interest. The method is applied to 590,020 SNP genotypes measured on 1596 individuals to generate the statistical causal network of 13 cardiovascular disease risk factor phenotypes. First, principal component analysis was used to capture information across the genome. The principal components were then used to identify a robust causal network structure, GDAG, among the phenotypes. Analyzing a robust causal network over risk factors reveals the flow of information in direct and alternative paths, as well as determining predictors and good targets for intervention. For example, the analysis identified BMI as influencing multiple other risk factor phenotypes and a good target for intervention to lower disease risk.
AB - Understanding causal relationships among large numbers of variables is a fundamental goal of biomedical sciences and can be facilitated by Directed Acyclic Graphs (DAGs) where directed edges between nodes represent the influence of components of the system on each other. In an observational setting, some of the directions are often unidentifiable because of Markov equivalency. Additional exogenous information, such as expert knowledge or genotype data can help establish directionality among the endogenous variables. In this study, we use the method of principle component analysis to extract information across the genome in order to generate a robust statistical causal network among phenotypes, the variables of primary interest. The method is applied to 590,020 SNP genotypes measured on 1596 individuals to generate the statistical causal network of 13 cardiovascular disease risk factor phenotypes. First, principal component analysis was used to capture information across the genome. The principal components were then used to identify a robust causal network structure, GDAG, among the phenotypes. Analyzing a robust causal network over risk factors reveals the flow of information in direct and alternative paths, as well as determining predictors and good targets for intervention. For example, the analysis identified BMI as influencing multiple other risk factor phenotypes and a good target for intervention to lower disease risk.
KW - Cardiovascular disease risk factors
KW - Causal network
KW - Conditional independency
KW - Data integration
KW - Granularity DAG
KW - Partial correlation
UR - http://www.scopus.com/inward/record.url?scp=84962866775&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2016.01.012
DO - 10.1016/j.jbi.2016.01.012
M3 - Article
C2 - 26827624
AN - SCOPUS:84962866775
SN - 1532-0464
VL - 60
SP - 114
EP - 119
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
ER -