TY - JOUR
T1 - Predictive models of aqueous solubility of organic compounds built on A large dataset of high integrity
AU - Sun, Hongmao
AU - Shah, Pranav
AU - Nguyen, Kimloan
AU - Yu, Kyeong Ri
AU - Kerns, Ed
AU - Kabir, Md
AU - Wang, Yuhong
AU - Xu, Xin
N1 - Publisher Copyright:
© 2019
PY - 2019/7/15
Y1 - 2019/7/15
N2 - Aqueous solubility is one of the most important properties in drug discovery, as it has profound impact on various drug properties, including biological activity, pharmacokinetics (PK), toxicity, and in vivo efficacy. Both kinetic and thermodynamic solubilities are determined during different stages of drug discovery and development. Since kinetic solubility is more relevant in preclinical drug discovery research, especially during the structure optimization process, we have developed predictive models for kinetic solubility with in-house data generated from 11,780 compounds collected from over 200 NCATS intramural research projects. This represents one of the largest kinetic solubility datasets of high quality and integrity. Based on the customized atom type descriptors, the support vector classification (SVC) models were trained on 80% of the whole dataset, and exhibited high predictive performance for estimating the solubility of the remaining 20% compounds within the test set. The values of the area under the receiver operating characteristic curve (AUC-ROC) for the compounds in the test sets reached 0.93 and 0.91, when the threshold for insoluble compounds was set to 10 and 50 μg/mL respectively. The predictive models of aqueous solubility can be used to identify insoluble compounds in drug discovery pipeline, provide design ideas for improving solubility by analyzing the atom types associated with poor solubility and prioritize compound libraries to be purchased or synthesized.
AB - Aqueous solubility is one of the most important properties in drug discovery, as it has profound impact on various drug properties, including biological activity, pharmacokinetics (PK), toxicity, and in vivo efficacy. Both kinetic and thermodynamic solubilities are determined during different stages of drug discovery and development. Since kinetic solubility is more relevant in preclinical drug discovery research, especially during the structure optimization process, we have developed predictive models for kinetic solubility with in-house data generated from 11,780 compounds collected from over 200 NCATS intramural research projects. This represents one of the largest kinetic solubility datasets of high quality and integrity. Based on the customized atom type descriptors, the support vector classification (SVC) models were trained on 80% of the whole dataset, and exhibited high predictive performance for estimating the solubility of the remaining 20% compounds within the test set. The values of the area under the receiver operating characteristic curve (AUC-ROC) for the compounds in the test sets reached 0.93 and 0.91, when the threshold for insoluble compounds was set to 10 and 50 μg/mL respectively. The predictive models of aqueous solubility can be used to identify insoluble compounds in drug discovery pipeline, provide design ideas for improving solubility by analyzing the atom types associated with poor solubility and prioritize compound libraries to be purchased or synthesized.
KW - Atom typing descriptors
KW - In silico ADME model
KW - Kinetic solubility
KW - Prediction
KW - Support vector classification (SVC)
UR - http://www.scopus.com/inward/record.url?scp=85066623926&partnerID=8YFLogxK
U2 - 10.1016/j.bmc.2019.05.037
DO - 10.1016/j.bmc.2019.05.037
M3 - Article
C2 - 31176566
AN - SCOPUS:85066623926
SN - 0968-0896
VL - 27
SP - 3110
EP - 3114
JO - Bioorganic and Medicinal Chemistry
JF - Bioorganic and Medicinal Chemistry
IS - 14
ER -