TY - JOUR
T1 - Genome-wide evaluation of the effect of short tandem repeat variation on local DNA methylation
AU - Martin-Trujillo, Alejandro
AU - Garg, Paras
AU - Patel, Nihir
AU - Jadhav, Bharati
AU - Sharp, Andrew J.
N1 - Funding Information:
First and foremost, we thank all individuals who donated bio-specimens and participated in this work for their willingness to contribute to scientific research. We thank all the investigators and consortiums who facilitated access to the data sets deposited in the dbGaP repository (http://www.ncbi.nlm.nih .gov/gap). The Pediatric Cardiac Genomics Consortium (PCGC) program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health (NHLBI, NIH), U.S. Department of Health and Human Services through grants UM1HL128711, UM1HL098162, UM1HL098147, UM1HL098123, UM1HL128761, and U01HL131003. This manuscript was not prepared in collaboration with investigators of the PCGC, has not been reviewed and/or approved by the PCGC, and does not necessarily reflect the opinions of the PCGC investigators or the NHLBI. In addition to data sets generated from PCGC, data used in the preparation of this article were obtained from the Parkinson’s Progression Markers Initiative (PPMI) database (https://www .ppmi-info.org/access-data-specimens/data). For up-to-date information on the study, visit https://www.ppmi-info.org/about-ppmi. PPMI, a public-private partnership, is funded by the Michael J. Fox Foundation for Parkinson’s Research and funding partners, a full list of which can be found at https://www.ppmi-info.org/about-ppmi/who-we-are/study-sponsors/. This work was supported in part by NIH grant R01NS105781 to A.J.S. and postdoctoral and early-career fellowships to A.M.T. from the American Heart Association (18POST34080396) and the NHLBI Biodata Catalyst (5120339), respectively. Research reported in this paper was supported by the Office of Research Infrastructure of the National Institutes of Health under award number S10OD018522. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This work was supported in part through the computational resources and staff expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Funding Information:
First and foremost, we thank all individuals who donated bio-specimens and participated in this work for their willingness to contribute to scientific research. We thank all the investigators and consortiums who facilitated access to the data sets deposited in the dbGaP repository (http://www.ncbi.nlm.nih .gov/gap). The Pediatric Cardiac Genomics Consortium (PCGC) program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health (NHLBI, NIH), U.S. Department of Health and Human Services through grants UM1HL128711, UM1HL098162, UM1HL098147, UM1HL098123, UM1HL128761, and U01HL131003. This manuscript was not prepared in collaboration with investigators of the PCGC, has not been reviewed and/or approved by the PCGC, and does not necessarily reflect the opinions of the PCGC investigators or the NHLBI. In addition to data sets generated from PCGC, data used in the preparation of this article were obtained from the Parkinson’s Progression Markers Initiative (PPMI) database (https://www .ppmi-info.org/access-data-specimens/data). For up-to-date information on the study, visit https://www.ppmi-info.org/aboutppmi. PPMI, a public-private partnership, is funded by the Michael J. Fox Foundation for Parkinson’s Research and funding partners, a full list of which can be found at https://www.ppmiinfo.org/about-ppmi/who-we-are/study-sponsors/. This work was supported in part by NIH grant R01NS105781 to A.J.S. and postdoctoral and early-career fellowships to A.M.T. from the American Heart Association (18POST34080396) and the NHLBI Biodata Catalyst (5120339), respectively. Research reported in this paper was supported by the Office of Research Infrastructure of the National Institutes of Health under award number S10OD018522. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This work was supported in part through the computational resources and staff expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher Copyright:
© 2023 Martin-Trujillo et al.
PY - 2023
Y1 - 2023
N2 - Short tandem repeats (STRs) contribute significantly to genetic diversity in humans, including disease-causing variation. Although the effect of STR variation on gene expression has been extensively assessed, their impact on epigenetics has been poorly studied and limited to specific genomic regions. Here, we investigated the hypothesis that some STRs act as independent regulators of local DNA methylation in the human genome and modify risk of common human traits. To address these questions, we first analyzed two independent data sets comprising PCR-free whole-genome sequencing (WGS) and genome-wide DNA methylation levels derived from whole-blood samples in 245 (discovery cohort) and 484 individuals (replication cohort). Using genotypes for 131,635 polymorphic STRs derived from WGS using HipSTR, we identified 11,870 STRs that associated with DNA methylation levels (mSTRs) of 11,774 CpGs (Bonferroni P < 0.001) in our discovery cohort, with 90% successfully replicating in our second cohort. Subsequently, through fine-mapping using CAVIAR we defined 585 of these mSTRs as the likely causal variants underlying the observed associations (fm-mSTRs) and linked a fraction of these to previously reported genome-wide association study signals, providing insights into the mechanisms underlying complex human traits. Furthermore, by integrating gene expression data, we observed that 12.5% of the tested fm-mSTRs also modulate expression levels of nearby genes, reinforcing their regulatory potential. Overall, our findings expand the catalog of functional sequence variants that affect genome regulation, highlighting the importance of incorporating STRs in future genetic association analysis and epigenetics data for the interpretation of trait-associated variants.
AB - Short tandem repeats (STRs) contribute significantly to genetic diversity in humans, including disease-causing variation. Although the effect of STR variation on gene expression has been extensively assessed, their impact on epigenetics has been poorly studied and limited to specific genomic regions. Here, we investigated the hypothesis that some STRs act as independent regulators of local DNA methylation in the human genome and modify risk of common human traits. To address these questions, we first analyzed two independent data sets comprising PCR-free whole-genome sequencing (WGS) and genome-wide DNA methylation levels derived from whole-blood samples in 245 (discovery cohort) and 484 individuals (replication cohort). Using genotypes for 131,635 polymorphic STRs derived from WGS using HipSTR, we identified 11,870 STRs that associated with DNA methylation levels (mSTRs) of 11,774 CpGs (Bonferroni P < 0.001) in our discovery cohort, with 90% successfully replicating in our second cohort. Subsequently, through fine-mapping using CAVIAR we defined 585 of these mSTRs as the likely causal variants underlying the observed associations (fm-mSTRs) and linked a fraction of these to previously reported genome-wide association study signals, providing insights into the mechanisms underlying complex human traits. Furthermore, by integrating gene expression data, we observed that 12.5% of the tested fm-mSTRs also modulate expression levels of nearby genes, reinforcing their regulatory potential. Overall, our findings expand the catalog of functional sequence variants that affect genome regulation, highlighting the importance of incorporating STRs in future genetic association analysis and epigenetics data for the interpretation of trait-associated variants.
UR - http://www.scopus.com/inward/record.url?scp=85149999132&partnerID=8YFLogxK
U2 - 10.1101/gr.277057.122
DO - 10.1101/gr.277057.122
M3 - Article
C2 - 36577521
AN - SCOPUS:85149999132
SN - 1088-9051
VL - 33
SP - 184
EP - 196
JO - Genome Research
JF - Genome Research
IS - 2
ER -