TY - JOUR
T1 - Predicting the functional impact of protein mutations
T2 - Application to cancer genomics
AU - Reva, Boris
AU - Antipin, Yevgeniy
AU - Sander, Chris
N1 - Funding Information:
Funding for open access charge: National Institutes of Health (grant R01 CA132744-02).
PY - 2011/9
Y1 - 2011/9
N2 - As large-scale re-sequencing of genomes reveals many protein mutations, especially in human cancer tissues, prediction of their likely functional impact becomes important practical goal. Here, we introduce a new functional impact score (FIS) for amino acid residue changes using evolutionary conservation patterns. The information in these patterns is derived from aligned families and sub-families of sequence homologs within and between species using combinatorial entropy formalism. The score performs well on a large set of human protein mutations in separating disease-associated variants (∼19200), assumed to be strongly functional, from common polymorphisms (∼35600), assumed to be weakly functional (area under the receiver operating characteristic curve of ∼0.86). In cancer, using recurrence, multiplicity and annotation for ∼10000 mutations in the COSMIC database, the method does well in assigning higher scores to more likely functional mutations ('drivers'). To guide experimental prioritization, we report a list of about 1000 top human cancer genes frequently mutated in one or more cancer types ranked by likely functional impact; and, an additional 1000 candidate cancer genes with rare but likely functional mutations. In addition, we estimate that at least 5 of cancer-relevant mutations involve switch of function, rather than simply loss or gain of function.
AB - As large-scale re-sequencing of genomes reveals many protein mutations, especially in human cancer tissues, prediction of their likely functional impact becomes important practical goal. Here, we introduce a new functional impact score (FIS) for amino acid residue changes using evolutionary conservation patterns. The information in these patterns is derived from aligned families and sub-families of sequence homologs within and between species using combinatorial entropy formalism. The score performs well on a large set of human protein mutations in separating disease-associated variants (∼19200), assumed to be strongly functional, from common polymorphisms (∼35600), assumed to be weakly functional (area under the receiver operating characteristic curve of ∼0.86). In cancer, using recurrence, multiplicity and annotation for ∼10000 mutations in the COSMIC database, the method does well in assigning higher scores to more likely functional mutations ('drivers'). To guide experimental prioritization, we report a list of about 1000 top human cancer genes frequently mutated in one or more cancer types ranked by likely functional impact; and, an additional 1000 candidate cancer genes with rare but likely functional mutations. In addition, we estimate that at least 5 of cancer-relevant mutations involve switch of function, rather than simply loss or gain of function.
UR - http://www.scopus.com/inward/record.url?scp=80053189298&partnerID=8YFLogxK
U2 - 10.1093/nar/gkr407
DO - 10.1093/nar/gkr407
M3 - Article
C2 - 21727090
AN - SCOPUS:80053189298
SN - 0305-1048
VL - 39
SP - e118
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 17
ER -