TY - JOUR
T1 - Identification of discriminative gene-level and protein-level features associated with pathogenic gain-of-function and loss-of-function variants
AU - Sevim Bayrak, Cigdem
AU - Stein, David
AU - Jain, Aayushee
AU - Chaudhary, Kumardeep
AU - Nadkarni, Girish N.
AU - Van Vleck, Tielman T.
AU - Puel, Anne
AU - Boisson-Dupuis, Stephanie
AU - Okada, Satoshi
AU - Stenson, Peter D.
AU - Cooper, David N.
AU - Schlessinger, Avner
AU - Itan, Yuval
N1 - Publisher Copyright:
© 2021 American Society of Human Genetics
PY - 2021/12/2
Y1 - 2021/12/2
N2 - Identifying whether a given genetic mutation results in a gene product with increased (gain-of-function; GOF) or diminished (loss-of-function; LOF) activity is an important step toward understanding disease mechanisms because they may result in markedly different clinical phenotypes. Here, we generated an extensive database of documented germline GOF and LOF pathogenic variants by employing natural language processing (NLP) on the available abstracts in the Human Gene Mutation Database. We then investigated various gene- and protein-level features of GOF and LOF variants and applied machine learning and statistical analyses to identify discriminative features. We found that GOF variants were enriched in essential genes, for autosomal-dominant inheritance, and in protein binding and interaction domains, whereas LOF variants were enriched in singleton genes, for protein-truncating variants, and in protein core regions. We developed a user-friendly web-based interface that enables the extraction of selected subsets from the GOF/LOF database by a broad set of annotated features and downloading of up-to-date versions. These results improve our understanding of how variants affect gene/protein function and may ultimately guide future treatment options.
AB - Identifying whether a given genetic mutation results in a gene product with increased (gain-of-function; GOF) or diminished (loss-of-function; LOF) activity is an important step toward understanding disease mechanisms because they may result in markedly different clinical phenotypes. Here, we generated an extensive database of documented germline GOF and LOF pathogenic variants by employing natural language processing (NLP) on the available abstracts in the Human Gene Mutation Database. We then investigated various gene- and protein-level features of GOF and LOF variants and applied machine learning and statistical analyses to identify discriminative features. We found that GOF variants were enriched in essential genes, for autosomal-dominant inheritance, and in protein binding and interaction domains, whereas LOF variants were enriched in singleton genes, for protein-truncating variants, and in protein core regions. We developed a user-friendly web-based interface that enables the extraction of selected subsets from the GOF/LOF database by a broad set of annotated features and downloading of up-to-date versions. These results improve our understanding of how variants affect gene/protein function and may ultimately guide future treatment options.
KW - database
KW - feature importance
KW - functional consequence
KW - gain-of-function
KW - genetic variants
KW - loss-of-function
KW - machine learning
KW - natural language processing
KW - online server
UR - http://www.scopus.com/inward/record.url?scp=85119910465&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2021.10.007
DO - 10.1016/j.ajhg.2021.10.007
M3 - Article
C2 - 34762822
AN - SCOPUS:85119910465
SN - 0002-9297
VL - 108
SP - 2301
EP - 2318
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 12
ER -