TY - JOUR
T1 - A deep catalogue of protein-coding variation in 983,578 individuals
AU - Senior Partnerships and Business Operations
AU - Business Operations and Administrative Coordinators
AU - RGC-ME Cohort Partners
AU - Accelerated Cures
AU - African Descent and Glaucoma Evaluation Study (ADAGES) III
AU - Age-related macular degeneration in the Amish
AU - Albert Einstein College of Medicine
AU - Amish Connectome Project
AU - Amish Research Clinic
AU - The Australia and New Zealand MS Genetics Consortium
AU - Center for Non-Communicable Diseases (CNCD)
AU - Cincinnati Children’s Hospital
AU - Columbia University
AU - Dallas Heart Study
AU - Diabetic Retinopathy Clinical Research (DRCR) Retina Network
AU - Duke University
AU - Flinders University of South Australia
AU - Indiana Biobank
AU - Indiana University School of Medicine
AU - Kaiser Permanente
AU - Mayo Clinic
AU - Mexico City Prospective Study (MCPS)
AU - MyCode-DiscovEHR Geisinger Health System Biobank
AU - National Institute of Mental Health
AU - Northwestern University
AU - Penn Medicine BioBank
AU - Primary Open-Angle African American Glaucoma Genetics (POAAG) study
AU - Regeneron–Mt. Sinai BioMe Biobank
AU - UAB GWAS in African Americans with rheumatoid arthritis
AU - UAB Whole exome sequencing of systemic lupus erythematosus patients
AU - University of California, Los Angeles
AU - University of Colorado School of Medicine
AU - University of Michigan Medical School
AU - University of Ottawa
AU - University of Pennsylvania
AU - University of Pittsburgh
AU - University of Texas Health Science Center at Houston
AU - Vanderbilt University Medical Center
AU - Regeneron Genetics Center
AU - RGC Management and Leadership Team
AU - Sequencing and Lab Operations
AU - Clinical Informatics
AU - Genome Informatics and Data Engineering
AU - Analytical Genetics and Data Science
AU - Therapeutic Area Genetics
AU - Research Program Management and Strategic Initiatives
AU - Sun, Kathie Y.
AU - Bai, Xiaodong
AU - Chen, Siying
AU - Bao, Suying
AU - Zhang, Chuanyi
AU - Kapoor, Manav
AU - Backman, Joshua
AU - Joseph, Tyler
AU - Maxwell, Evan
AU - Mitra, George
AU - Gorovits, Alexander
AU - Mansfield, Adam
AU - Boutkov, Boris
AU - Gokhale, Sujit
AU - Habegger, Lukas
AU - Marcketta, Anthony
AU - Locke, Adam E.
AU - Ganel, Liron
AU - Hawes, Alicia
AU - Kessler, Michael D.
AU - Sharma, Deepika
AU - Staples, Jeffrey
AU - Bovijn, Jonas
AU - Gelfman, Sahar
AU - Di Gioia, Alessandro
AU - Rajagopal, Veera M.
AU - Lopez, Alexander
AU - Varela, Jennifer Rico
AU - Alegre-Díaz, Jesús
AU - Berumen, Jaime
AU - Tapia-Conyer, Roberto
AU - Kuri-Morales, Pablo
AU - Torres, Jason
AU - Emberson, Jonathan
AU - Collins, Rory
AU - Abecasis, Gonçalo
AU - Coppola, Giovanni
AU - Deubler, Andrew
AU - Economides, Aris
AU - Ferrando, Adolfo
AU - Lotta, Luca A.
AU - Shuldiner, Alan
AU - Siminovitch, Katherine
AU - Beechert, Christina
AU - Brian, Erin D.
AU - Cremona, Laura M.
AU - Du, Hang
AU - Forsythe, Caitlin
AU - Gu, Zhenhua
AU - Bottinger, Erwin
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/7/18
Y1 - 2024/7/18
N2 - Rare coding variants that substantially affect function provide insights into the biology of a gene1–3. However, ascertaining the frequency of such variants requires large sample sizes4–8. Here we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. In total, 23% of the Regeneron Genetics Center Million Exome (RGC-ME) data come from individuals of African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The catalogue includes more than 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss of function (LOF), we identify 3,988 LOF-intolerant genes, including 86 that were previously assessed as tolerant and 1,153 that lack established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions that are depleted of missense variants despite being tolerant of pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this resource of coding variation from the RGC-ME dataset publicly accessible through a variant allele frequency browser.
AB - Rare coding variants that substantially affect function provide insights into the biology of a gene1–3. However, ascertaining the frequency of such variants requires large sample sizes4–8. Here we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. In total, 23% of the Regeneron Genetics Center Million Exome (RGC-ME) data come from individuals of African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The catalogue includes more than 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss of function (LOF), we identify 3,988 LOF-intolerant genes, including 86 that were previously assessed as tolerant and 1,153 that lack established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions that are depleted of missense variants despite being tolerant of pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this resource of coding variation from the RGC-ME dataset publicly accessible through a variant allele frequency browser.
UR - http://www.scopus.com/inward/record.url?scp=85199223480&partnerID=8YFLogxK
U2 - 10.1038/s41586-024-07556-0
DO - 10.1038/s41586-024-07556-0
M3 - Article
C2 - 38768635
AN - SCOPUS:85199223480
SN - 0028-0836
VL - 631
SP - 583
EP - 592
JO - Nature
JF - Nature
IS - 8021
ER -