TY - JOUR
T1 - Control-independent mosaic single nucleotide variant detection with DeepMosaic
AU - NIMH Brain Somatic Mosaicism Network
AU - Yang, Xiaoxu
AU - Xu, Xin
AU - Breuss, Martin W.
AU - Antaki, Danny
AU - Ball, Laurel L.
AU - Chung, Changuk
AU - Shen, Jiawei
AU - Li, Chen
AU - George, Renee D.
AU - Wang, Yifan
AU - Bae, Taejeong
AU - Cheng, Yuhe
AU - Abyzov, Alexej
AU - Wei, Liping
AU - Alexandrov, Ludmil B.
AU - Sebat, Jonathan L.
AU - Averbuj, Dan
AU - Roy, Subhojit
AU - Courchesne, Eric
AU - Huang, August Y.
AU - D’Gama, Alissa
AU - Dias, Caroline
AU - Walsh, Christopher A.
AU - Ganz, Javier
AU - Lodato, Michael
AU - Miller, Michael
AU - Li, Pengpeng
AU - Rodin, Rachel
AU - Hill, Robert
AU - Bizzotto, Sara
AU - Khoshkhoo, Sattar
AU - Zhou, Zinan
AU - Lee, Alice
AU - Barton, Alison
AU - Galor, Alon
AU - Chu, Chong
AU - Bohrson, Craig
AU - Gulhan, Doga
AU - Maury, Eduardo
AU - Lim, Elaine
AU - Lim, Euncheon
AU - Melloni, Giorgio
AU - Cortes, Isidro
AU - Lee, Jake
AU - Luquette, Joe
AU - Yang, Lixing
AU - Sherman, Maxwell
AU - Coulter, Michael
AU - Chess, Andrew J.
AU - Akbarian, Schahram
N1 - Funding Information:
We thank Y. Dou for helping to set up the MosaicForecast pipeline. We thank M. K. Gilson for the help with computational resources. We thank P. J. Park, G. W. Cottrell, J. V. Moran, M. Gymrek, P. J. Reed, A. Y. Huang, S.-J. Cheng and Y. Chen for their valuable comments, help and suggestions. This work was supported by the National Institute of Mental Health (NIMH) (grant nos. U01MH108898 and R01MH124890 to J.G.G.), Rady Children’s Institute for Genomic Medicine and the Howard Hughes Medical Institute. We thank San Diego Supercomputer Center (grant no. TG-IBN190021 to X.Y. and J.G.G.) for computational help. This publication includes data generated at the UC San Diego IGM Genomics Center using an Illumina NovaSeq 6000 platform that was purchased with funding from a National Institutes of Health SIG grant (no. S10OD026929 X.Y. and J.G.G.).
Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Nature America, Inc.
PY - 2023
Y1 - 2023
N2 - Mosaic variants (MVs) reflect mutagenic processes during embryonic development and environmental exposure, accumulate with aging and underlie diseases such as cancer and autism. The detection of noncancer MVs has been computationally challenging due to the sparse representation of nonclonally expanded MVs. Here we present DeepMosaic, combining an image-based visualization module for single nucleotide MVs and a convolutional neural network-based classification module for control-independent MV detection. DeepMosaic was trained on 180,000 simulated or experimentally assessed MVs, and was benchmarked on 619,740 simulated MVs and 530 independent biologically tested MVs from 16 genomes and 181 exomes. DeepMosaic achieved higher accuracy compared with existing methods on biological data, with a sensitivity of 0.78, specificity of 0.83 and positive predictive value of 0.96 on noncancer whole-genome sequencing data, as well as doubling the validation rate over previous best-practice methods on noncancer whole-exome sequencing data (0.43 versus 0.18). DeepMosaic represents an accurate MV classifier for noncancer samples that can be implemented as an alternative or complement to existing methods.
AB - Mosaic variants (MVs) reflect mutagenic processes during embryonic development and environmental exposure, accumulate with aging and underlie diseases such as cancer and autism. The detection of noncancer MVs has been computationally challenging due to the sparse representation of nonclonally expanded MVs. Here we present DeepMosaic, combining an image-based visualization module for single nucleotide MVs and a convolutional neural network-based classification module for control-independent MV detection. DeepMosaic was trained on 180,000 simulated or experimentally assessed MVs, and was benchmarked on 619,740 simulated MVs and 530 independent biologically tested MVs from 16 genomes and 181 exomes. DeepMosaic achieved higher accuracy compared with existing methods on biological data, with a sensitivity of 0.78, specificity of 0.83 and positive predictive value of 0.96 on noncancer whole-genome sequencing data, as well as doubling the validation rate over previous best-practice methods on noncancer whole-exome sequencing data (0.43 versus 0.18). DeepMosaic represents an accurate MV classifier for noncancer samples that can be implemented as an alternative or complement to existing methods.
UR - http://www.scopus.com/inward/record.url?scp=85145373232&partnerID=8YFLogxK
U2 - 10.1038/s41587-022-01559-w
DO - 10.1038/s41587-022-01559-w
M3 - Article
AN - SCOPUS:85145373232
SN - 1087-0156
JO - Nature Biotechnology
JF - Nature Biotechnology
ER -