TY - JOUR
T1 - Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores
AU - Schizophrenia Working Group of the Psychiatric Genomics Consortium
AU - Psychosis Endophenotypes International Consortium
AU - Wellcome Trust Case Control Consortium
AU - Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study
AU - Hereditary Breast and Ovarian Cancer Research Group Netherlands (HEBON)
AU - Vilhjálmsson, Bjarni J.
AU - Yang, Jian
AU - Finucane, Hilary K.
AU - Gusev, Alexander
AU - Lindström, Sara
AU - Ripke, Stephan
AU - Genovese, Giulio
AU - Loh, Po Ru
AU - Bhatia, Gaurav
AU - Do, Ron
AU - Hayeck, Tristan
AU - Won, Hong Hee
AU - Neale, Benjamin M.
AU - Corvin, Aiden
AU - Walters, James T.R.
AU - Farh, Kai How
AU - Holmans, Peter A.
AU - Lee, Phil
AU - Bulik-Sullivan, Brendan
AU - Collier, David A.
AU - Huang, Hailiang
AU - Pers, Tune H.
AU - Agartz, Ingrid
AU - Agerbo, Esben
AU - Albus, Margot
AU - Alexander, Madeline
AU - Amin, Farooq
AU - Bacanu, Silviu A.
AU - Begemann, Martin
AU - Belliveau, Richard A.
AU - Bene, Judit
AU - Bergen, Sarah E.
AU - Bevilacqua, Elizabeth
AU - Bigdeli, Tim B.
AU - Black, Donald W.
AU - Bruggeman, Richard
AU - Cai, Guiqing
AU - Cohen, David
AU - Davis, Kenneth L.
AU - Drapeau, Elodie
AU - Friedman, Joseph I.
AU - Haroutunian, Vahram
AU - Purcell, Shaun M.
AU - Reichenberg, Abraham
AU - Roussos, Panos
AU - Ruderfer, Douglas M.
AU - Silverman, Jeremy M.
AU - Buxbaum, Joseph D.
AU - Kenny, Eimear E.
AU - Belbin, Gillian
N1 - Funding Information:
We thank Shamil Sunayev, Brendan Bulik-Sullivan, Liming Liang, Naomi Wray, Daniel Sørensen, and Esben Agerbo for useful discussions. We would also like to thank Toni Clarke for useful comments on the software. This research was supported by NIH grants R01 GM105857, R03 CA173785, and U19 CA148065-01. B.J.V. was supported by Danish Council for Independent Research grant DFF-1325-0014. H.K.F. was supported by the Fannie and John Hertz Foundation. This study made use of data generated by the Wellcome Trust Case Control Consortium (WTCCC) and the Wellcome Trust Sanger Institute. A full list of the investigators who contributed to the generation of the WTCCC data is available at www.wtccc.org.uk . Funding for the WTCCC project was provided by the Wellcome Trust under award 076113.
Publisher Copyright:
© 2015 The American Society of Human Genetics. All rights reserved.
PY - 2015/1/1
Y1 - 2015/1/1
N2 - Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R2 increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.
AB - Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R2 increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.
UR - http://www.scopus.com/inward/record.url?scp=84952665106&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2015.09.001
DO - 10.1016/j.ajhg.2015.09.001
M3 - Article
C2 - 26430803
AN - SCOPUS:84952665106
SN - 0002-9297
VL - 97
SP - 576
EP - 592
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 4
ER -