TY - JOUR
T1 - NIMBus
T2 - a negative binomial regression based Integrative Method for mutation Burden Analysis
AU - Zhang, Jing
AU - Liu, Jason
AU - McGillivray, Patrick
AU - Yi, Caroline
AU - Lochovsky, Lucas
AU - Lee, Donghoon
AU - Gerstein, Mark
N1 - Publisher Copyright:
© 2020, The Author(s).
PY - 2020/12/1
Y1 - 2020/12/1
N2 - Background: Identifying frequently mutated regions is a key approach to discover DNA elements influencing cancer progression.However, it is challenging to identify these burdened regions due to mutation rate heterogeneity across the genome and across different individuals. Moreover, it is known that this heterogeneity partially stems from genomic confounding factors, such as replication timing and chromatin organization. The increasing availability of cancer whole genome sequences and functional genomics data from the Encyclopedia of DNA Elements (ENCODE) may help address these issues. Results: We developed a negative binomial regression-based Integrative Method for mutation Burden analysiS (NIMBus). Our approach addresses the over-dispersion of mutation count statistics by (1) using a Gamma–Poisson mixture model to capture the mutation-rate heterogeneity across different individuals and (2) estimating regional background mutation rates by regressing the varying local mutation counts against genomic features extracted from ENCODE. We applied NIMBus to whole-genome cancer sequences from the PanCancer Analysis of Whole Genomes project (PCAWG) and other cohorts. It successfully identified well-known coding and noncoding drivers, such as TP53 and the TERT promoter. To further characterize the burdening of non-coding regions, we used NIMBus to screen transcription factor binding sites in promoter regions that intersect DNase I hypersensitive sites (DHSs). This analysis identified mutational hotspots that potentially disrupt gene regulatory networks in cancer. We also compare this method to other mutation burden analysis methods. Conclusion: NIMBus is a powerful tool to identify mutational hotspots. The NIMBus software and results are available as an online resource at github.gersteinlab.org/nimbus.
AB - Background: Identifying frequently mutated regions is a key approach to discover DNA elements influencing cancer progression.However, it is challenging to identify these burdened regions due to mutation rate heterogeneity across the genome and across different individuals. Moreover, it is known that this heterogeneity partially stems from genomic confounding factors, such as replication timing and chromatin organization. The increasing availability of cancer whole genome sequences and functional genomics data from the Encyclopedia of DNA Elements (ENCODE) may help address these issues. Results: We developed a negative binomial regression-based Integrative Method for mutation Burden analysiS (NIMBus). Our approach addresses the over-dispersion of mutation count statistics by (1) using a Gamma–Poisson mixture model to capture the mutation-rate heterogeneity across different individuals and (2) estimating regional background mutation rates by regressing the varying local mutation counts against genomic features extracted from ENCODE. We applied NIMBus to whole-genome cancer sequences from the PanCancer Analysis of Whole Genomes project (PCAWG) and other cohorts. It successfully identified well-known coding and noncoding drivers, such as TP53 and the TERT promoter. To further characterize the burdening of non-coding regions, we used NIMBus to screen transcription factor binding sites in promoter regions that intersect DNase I hypersensitive sites (DHSs). This analysis identified mutational hotspots that potentially disrupt gene regulatory networks in cancer. We also compare this method to other mutation burden analysis methods. Conclusion: NIMBus is a powerful tool to identify mutational hotspots. The NIMBus software and results are available as an online resource at github.gersteinlab.org/nimbus.
KW - Mutation count overdispersion
KW - Mutation rate estimation
KW - Mutation rate heterogeneity
KW - Somatic mutation burden
UR - https://www.scopus.com/pages/publications/85093359866
U2 - 10.1186/s12859-020-03758-1
DO - 10.1186/s12859-020-03758-1
M3 - Article
C2 - 33092526
AN - SCOPUS:85093359866
SN - 1471-2105
VL - 21
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - 1
M1 - 474
ER -