TY - JOUR
T1 - A robust and efficient statistical method for genetic association studies using case and control samples from multiple cohorts
AU - Wang, Minghui
AU - Wang, Lin
AU - Jiang, Ning
AU - Jia, Tianye
AU - Luo, Zewei
N1 - Funding Information:
We thank Dr. Thomas Gasser of Neurodegenerative Diseases and German Center for Neurodegenerative Diseases (Germany) and Dr. Andrew B Singleton at National Institute on Aging (NIH, USA) for allowing us to re-analyze the Parkinson’s disease datasets. We thank two anonymous reviewers for their comments and suggestions which have been useful for improving presentation of the paper. This study was supported by research grants from the Leverhulme Trust (UK) and The National Basic Research Program of China (2012CB316505). ZL is also supported by China’s National Natural Science Foundation.
PY - 2013/2/8
Y1 - 2013/2/8
N2 - Background: The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case-control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc.Results: We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson's disease (PD) case-control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk.Conclusions: We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS.
AB - Background: The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case-control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc.Results: We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson's disease (PD) case-control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk.Conclusions: We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS.
KW - Case and control samples
KW - Genome-wide association study
KW - Linkage disequilibrium
KW - Multiple cohorts
KW - Parkinson's disease
KW - Robust statistical approach
UR - http://www.scopus.com/inward/record.url?scp=84873454244&partnerID=8YFLogxK
U2 - 10.1186/1471-2164-14-88
DO - 10.1186/1471-2164-14-88
M3 - Article
C2 - 23394771
AN - SCOPUS:84873454244
SN - 1471-2164
VL - 14
JO - BMC Genomics
JF - BMC Genomics
IS - 1
M1 - 88
ER -