TY - JOUR
T1 - Use of weighted reference panels based on empirical estimates of ancestry for capturing untyped variation
AU - Egyud, Matthew R.L.
AU - Gajdos, Zofia K.Z.
AU - Butler, Johannah L.
AU - Tischfield, Sam
AU - Marchand, Loic
AU - Kolonel, Laurence N.
AU - Haiman, Christopher A.
AU - Henderson, Brian E.
AU - Hirschhorn, Joel N.
N1 - Funding Information:
Acknowledgments We thank P. de Bakker of the Broad Institute for support in analyzing the generated data, and members of the Hirsch-horn lab for helpful discussions. Finally, we extend our deepest thanks to the participants of the International HapMap Project, Multiethnic Cohort, and Human Genome Diversity Panel, without whom none of this work would have been possible. This work was supported by grant R01DK075787 to JNH and a Strategic Program for Asthma Research award to JNH from the American Asthma Foundation.
PY - 2009
Y1 - 2009
N2 - Many association methods use a subset of genotyped single nucleotide polymorphisms (SNPs) to capture or infer genotypes at other untyped SNPs. We and others previously showed that tag SNPs selected to capture common variation using data from The International HapMap Consortium (Nature 437:1299-1320, 2005), The International HapMap Consortium (Nature 449:851-861, 2007) could also capture variation in populations of similar ancestry to HapMap reference populations (de Bakker et al. in Nat Genet 38:1298-1303, 2006 González-Neira et al. in Genome Res 16:323-330, 2006 Montpetit et al. in PLoS Genet 2:282-290, 2006; Mueller et al. in Am J Hum Genet 76:387-398, 2005). To capture variation in admixed populations or populations less similar to HapMap panels, a "cosmopolitan approach," in which all samples from HapMap are used as a single reference panel, was proposed. Here we refine this suggestion and show that use of a "weighted reference panel," constructed based on empirical estimates of ancestry in the target population (relative to available reference panels), is more efficient than the cosmopolitan approach. Weighted reference panels capture, on average, only slightly fewer common variants (minor allele frequency > 5%) than the cosmopolitan approach (mean r2 = 0.977 vs. 0.989, 94.5% variation captured vs. 96.8% at r2 > 0.8), across the five populations of the Multiethnic Cohort, but entail approximately 25% fewer tag SNPs per panel (average 538 vs. 718). These results extend a recent study in two Indian populations (Pemberton et al. in Ann Hum Genet 72:535-546, 2008). Weighted reference panels are potentially useful for both the selection of tag SNPs in diverse populations and perhaps in the design of reference panels for imputation of untyped genotypes in genome-wide association studies in admixed populations.
AB - Many association methods use a subset of genotyped single nucleotide polymorphisms (SNPs) to capture or infer genotypes at other untyped SNPs. We and others previously showed that tag SNPs selected to capture common variation using data from The International HapMap Consortium (Nature 437:1299-1320, 2005), The International HapMap Consortium (Nature 449:851-861, 2007) could also capture variation in populations of similar ancestry to HapMap reference populations (de Bakker et al. in Nat Genet 38:1298-1303, 2006 González-Neira et al. in Genome Res 16:323-330, 2006 Montpetit et al. in PLoS Genet 2:282-290, 2006; Mueller et al. in Am J Hum Genet 76:387-398, 2005). To capture variation in admixed populations or populations less similar to HapMap panels, a "cosmopolitan approach," in which all samples from HapMap are used as a single reference panel, was proposed. Here we refine this suggestion and show that use of a "weighted reference panel," constructed based on empirical estimates of ancestry in the target population (relative to available reference panels), is more efficient than the cosmopolitan approach. Weighted reference panels capture, on average, only slightly fewer common variants (minor allele frequency > 5%) than the cosmopolitan approach (mean r2 = 0.977 vs. 0.989, 94.5% variation captured vs. 96.8% at r2 > 0.8), across the five populations of the Multiethnic Cohort, but entail approximately 25% fewer tag SNPs per panel (average 538 vs. 718). These results extend a recent study in two Indian populations (Pemberton et al. in Ann Hum Genet 72:535-546, 2008). Weighted reference panels are potentially useful for both the selection of tag SNPs in diverse populations and perhaps in the design of reference panels for imputation of untyped genotypes in genome-wide association studies in admixed populations.
UR - http://www.scopus.com/inward/record.url?scp=63249104734&partnerID=8YFLogxK
U2 - 10.1007/s00439-009-0627-8
DO - 10.1007/s00439-009-0627-8
M3 - Article
C2 - 19184111
AN - SCOPUS:63249104734
SN - 0340-6717
VL - 125
SP - 295
EP - 303
JO - Human Genetics
JF - Human Genetics
IS - 3
ER -