TY - JOUR
T1 - Profiling and Leveraging Relatedness in a Precision Medicine Cohort of 92,455 Exomes
AU - Staples, Jeffrey
AU - Maxwell, Evan K.
AU - Gosalia, Nehal
AU - Gonzaga-Jauregui, Claudia
AU - Snyder, Christopher
AU - Hawes, Alicia
AU - Penn, John
AU - Ulloa, Ricardo
AU - Bai, Xiaodong
AU - Lopez, Alexander E.
AU - Van Hout, Cristopher V.
AU - O'Dushlaine, Colm
AU - Teslovich, Tanya M.
AU - McCarthy, Shane E.
AU - Balasubramanian, Suganthi
AU - Kirchner, H. Lester
AU - Leader, Joseph B.
AU - Murray, Michael F.
AU - Ledbetter, David H.
AU - Shuldiner, Alan R.
AU - Yancoupolos, George D.
AU - Dewey, Frederick E.
AU - Carey, David J.
AU - Overton, John D.
AU - Baras, Aris
AU - Habegger, Lukas
AU - Reid, Jeffrey G.
N1 - Publisher Copyright:
© 2018 American Society of Human Genetics
PY - 2018/5/3
Y1 - 2018/5/3
N2 - Large-scale human genetics studies are ascertaining increasing proportions of populations as they continue growing in both number and scale. As a result, the amount of cryptic relatedness within these study cohorts is growing rapidly and has significant implications on downstream analyses. We demonstrate this growth empirically among the first 92,455 exomes from the DiscovEHR cohort and, via a custom simulation framework we developed called SimProgeny, show that these measures are in line with expectations given the underlying population and ascertainment approach. For example, within DiscovEHR we identified ∼66,000 close (first- and second-degree) relationships, involving 55.6% of study participants. Our simulation results project that >70% of the cohort will be involved in these close relationships, given that DiscovEHR scales to 250,000 recruited individuals. We reconstructed 12,574 pedigrees by using these relationships (including 2,192 nuclear families) and leveraged them for multiple applications. The pedigrees substantially improved the phasing accuracy of 20,947 rare, deleterious compound heterozygous mutations. Reconstructed nuclear families were critical for identifying 3,415 de novo mutations in ∼1,783 genes. Finally, we demonstrate the segregation of known and suspected disease-causing mutations, including a tandem duplication that occurs in LDLR and causes familial hypercholesterolemia, through reconstructed pedigrees. In summary, this work highlights the prevalence of cryptic relatedness expected among large healthcare population-genomic studies and demonstrates several analyses that are uniquely enabled by large amounts of cryptic relatedness.
AB - Large-scale human genetics studies are ascertaining increasing proportions of populations as they continue growing in both number and scale. As a result, the amount of cryptic relatedness within these study cohorts is growing rapidly and has significant implications on downstream analyses. We demonstrate this growth empirically among the first 92,455 exomes from the DiscovEHR cohort and, via a custom simulation framework we developed called SimProgeny, show that these measures are in line with expectations given the underlying population and ascertainment approach. For example, within DiscovEHR we identified ∼66,000 close (first- and second-degree) relationships, involving 55.6% of study participants. Our simulation results project that >70% of the cohort will be involved in these close relationships, given that DiscovEHR scales to 250,000 recruited individuals. We reconstructed 12,574 pedigrees by using these relationships (including 2,192 nuclear families) and leveraged them for multiple applications. The pedigrees substantially improved the phasing accuracy of 20,947 rare, deleterious compound heterozygous mutations. Reconstructed nuclear families were critical for identifying 3,415 de novo mutations in ∼1,783 genes. Finally, we demonstrate the segregation of known and suspected disease-causing mutations, including a tandem duplication that occurs in LDLR and causes familial hypercholesterolemia, through reconstructed pedigrees. In summary, this work highlights the prevalence of cryptic relatedness expected among large healthcare population-genomic studies and demonstrates several analyses that are uniquely enabled by large amounts of cryptic relatedness.
KW - compound heterozygous mutation phasing
KW - cryptic relatedness
KW - de novo mutations
KW - exome sequencing
KW - familial hypercholesterolemia
KW - family structure
KW - healthcare population-based genetic study
KW - identity by decent
KW - pedigree reconstruction
KW - precision medicine
KW - relationship inference
UR - http://www.scopus.com/inward/record.url?scp=85046144236&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2018.03.012
DO - 10.1016/j.ajhg.2018.03.012
M3 - Article
C2 - 29727688
AN - SCOPUS:85046144236
SN - 0002-9297
VL - 102
SP - 874
EP - 889
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 5
ER -