TY - JOUR

T1 - Generating a robust statistical causal structure over 13 cardiovascular disease risk factors using genomics data

AU - Yazdani, Azam

AU - Yazdani, Akram

AU - Samiei, Ahmad

AU - Boerwinkle, Eric

N1 - Publisher Copyright:
© 2016 The Authors.

PY - 2016/4/1

Y1 - 2016/4/1

N2 - Understanding causal relationships among large numbers of variables is a fundamental goal of biomedical sciences and can be facilitated by Directed Acyclic Graphs (DAGs) where directed edges between nodes represent the influence of components of the system on each other. In an observational setting, some of the directions are often unidentifiable because of Markov equivalency. Additional exogenous information, such as expert knowledge or genotype data can help establish directionality among the endogenous variables. In this study, we use the method of principle component analysis to extract information across the genome in order to generate a robust statistical causal network among phenotypes, the variables of primary interest. The method is applied to 590,020 SNP genotypes measured on 1596 individuals to generate the statistical causal network of 13 cardiovascular disease risk factor phenotypes. First, principal component analysis was used to capture information across the genome. The principal components were then used to identify a robust causal network structure, GDAG, among the phenotypes. Analyzing a robust causal network over risk factors reveals the flow of information in direct and alternative paths, as well as determining predictors and good targets for intervention. For example, the analysis identified BMI as influencing multiple other risk factor phenotypes and a good target for intervention to lower disease risk.

AB - Understanding causal relationships among large numbers of variables is a fundamental goal of biomedical sciences and can be facilitated by Directed Acyclic Graphs (DAGs) where directed edges between nodes represent the influence of components of the system on each other. In an observational setting, some of the directions are often unidentifiable because of Markov equivalency. Additional exogenous information, such as expert knowledge or genotype data can help establish directionality among the endogenous variables. In this study, we use the method of principle component analysis to extract information across the genome in order to generate a robust statistical causal network among phenotypes, the variables of primary interest. The method is applied to 590,020 SNP genotypes measured on 1596 individuals to generate the statistical causal network of 13 cardiovascular disease risk factor phenotypes. First, principal component analysis was used to capture information across the genome. The principal components were then used to identify a robust causal network structure, GDAG, among the phenotypes. Analyzing a robust causal network over risk factors reveals the flow of information in direct and alternative paths, as well as determining predictors and good targets for intervention. For example, the analysis identified BMI as influencing multiple other risk factor phenotypes and a good target for intervention to lower disease risk.

KW - Cardiovascular disease risk factors

KW - Causal network

KW - Conditional independency

KW - Data integration

KW - Granularity DAG

KW - Partial correlation

UR - http://www.scopus.com/inward/record.url?scp=84962866775&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2016.01.012

DO - 10.1016/j.jbi.2016.01.012

M3 - Article

C2 - 26827624

AN - SCOPUS:84962866775

SN - 1532-0464

VL - 60

SP - 114

EP - 119

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

ER -