Imputation-based assessment of next generation rare exome variant arrays

Alicia R. Martin, Gerard Tse, Carlos D. Bustamante, Eimear E. Kenny

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations


A striking finding from recent large-scale sequencing efforts is that the vast majority of variants in the human genome are rare and found within single populations or lineages. These observations hold important implications for the design of the next round of disease variant discovery efforts-if genetic variants that influence disease risk follow the same trend, then we expect to see population-specific disease associations that require large samples sizes for detection. To address this challenge, and due to the still prohibitive cost of sequencing large cohorts, researchers have developed a new generation of low-cost genotyping arrays that assay rare variation previously identified from large exome sequencing studies. Genotyping approaches rely not only on directly observing variants, but also on phasing and imputation methods that use publicly available reference panels to infer unobserved variants in a study cohort. Rare variant exome arrays are intentionally enriched for variants likely to be disease causing, and here we assay the ability of the first commercially available rare exome variant array (the Illumina Infinium HumanExome BeadChip) to also tag other potentially damaging variants not molecularly assayed. Using full sequence data from chromosome 22 from the phase I 1000 Genomes Project, we evaluate three methods for imputation (BEAGLE, MaCH-Admix, and SHAPEIT2/IMPUTE2) with the rare exome variant array under varied study panel sizes, reference panel sizes, and LD structures via population differences. We find that imputation is more accurate across both the genome and exome for common variant arrays than the next generation array for all allele frequencies, including rare alleles. We also find that imputation is the least accurate in African populations, and accuracy is substantially improved for rare variants when the same population is included in the reference panel. Depending on the goals of GWAS researchers, our results will aid budget decisions by helping determine whether money is best spent sequencing the genomes of smaller sample sizes, genotyping larger sample sizes with rare and/or common variant arrays and imputing SNPs, or some combination of the two.

Original languageEnglish
Pages (from-to)262-272
Number of pages11
JournalPacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
StatePublished - 2014
Event19th Pacific Symposium on Biocomputing, PSB 2014 - Kohala Coast, United States
Duration: 3 Jan 20147 Jan 2014


Dive into the research topics of 'Imputation-based assessment of next generation rare exome variant arrays'. Together they form a unique fingerprint.

Cite this