TY - JOUR
T1 - Building a Hybrid Physical-Statistical Classifier for Predicting the Effect of Variants Related to Protein-Drug Interactions
AU - Wang, Bo
AU - Yan, Chengfei
AU - Lou, Shaoke
AU - Emani, Prashant
AU - Li, Bian
AU - Xu, Min
AU - Kong, Xiangmeng
AU - Meyerson, William
AU - Yang, Yucheng T.
AU - Lee, Donghoon
AU - Gerstein, Mark
N1 - Publisher Copyright:
© 2019 Elsevier Ltd
PY - 2019/9/3
Y1 - 2019/9/3
N2 - A key issue in drug design is how population variation affects drug efficacy by altering binding affinity (BA) in different individuals, an essential consideration for government regulators. Ideally, we would like to evaluate the BA perturbations of millions of single-nucleotide variants (SNVs). However, only hundreds of protein-drug complexes with SNVs have experimentally characterized BAs, constituting too small a gold standard for straightforward statistical model training. Thus, we take a hybrid approach: using physically based calculations to bootstrap the parameterization of a full model. In particular, we do 3D structure-based docking on ∼10,000 SNVs modifying known protein-drug complexes to construct a pseudo gold standard. Then we use this augmented set of BAs to train a statistical model combining structure, ligand and sequence features and illustrate how it can be applied to millions of SNVs. Finally, we show that our model has good cross-validated performance (97% AUROC) and can also be validated by orthogonal ligand-binding data. Genetic variation may affect drug efficacy by altering its binding affinity to the protein target. GenoDock, developed by Wang et al., is a statistical model to predict the impacts of SNVs on protein-drug interactions by combining genomic, structural and physicochemical features.
AB - A key issue in drug design is how population variation affects drug efficacy by altering binding affinity (BA) in different individuals, an essential consideration for government regulators. Ideally, we would like to evaluate the BA perturbations of millions of single-nucleotide variants (SNVs). However, only hundreds of protein-drug complexes with SNVs have experimentally characterized BAs, constituting too small a gold standard for straightforward statistical model training. Thus, we take a hybrid approach: using physically based calculations to bootstrap the parameterization of a full model. In particular, we do 3D structure-based docking on ∼10,000 SNVs modifying known protein-drug complexes to construct a pseudo gold standard. Then we use this augmented set of BAs to train a statistical model combining structure, ligand and sequence features and illustrate how it can be applied to millions of SNVs. Finally, we show that our model has good cross-validated performance (97% AUROC) and can also be validated by orthogonal ligand-binding data. Genetic variation may affect drug efficacy by altering its binding affinity to the protein target. GenoDock, developed by Wang et al., is a statistical model to predict the impacts of SNVs on protein-drug interactions by combining genomic, structural and physicochemical features.
KW - drug resistance
KW - machine learning
KW - nsSNV
KW - protein-drug interactions
UR - http://www.scopus.com/inward/record.url?scp=85071367902&partnerID=8YFLogxK
U2 - 10.1016/j.str.2019.06.001
DO - 10.1016/j.str.2019.06.001
M3 - Article
C2 - 31279629
AN - SCOPUS:85071367902
SN - 0969-2126
VL - 27
SP - 1469-1481.e3
JO - Structure
JF - Structure
IS - 9
ER -