ADis-QSAR: a machine learning model based on biological activity differences of compounds

Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, Magariños María P, Mosquera Juan F, Mutowo P, Nowotka M, Gordillo-Marañón M, Hunter F, Junco L, Mugumbate G, Rodriguez-Lopez M, Atkinson F, Bosc N, Radoux Chris J, Segura-Cabrera A, Hersey A, Leach Andrew R (2018) ChEMBL: towards direct deposition of bioassay data. J Nucleic Acids 47(D1):D930–D940. https://doi.org/10.1093/nar/gky1075

Article  CAS  Google Scholar 

Zhu H (2020) Big data and artificial intelligence modeling for drug discovery. Annu Rev Pharmacol 60:573–589. https://doi.org/10.1146/annurev-pharmtox-010919-023324

Article  CAS  Google Scholar 

Muhammad U, Uzairu A, Ebuka Arthur D (2018) Review on: quantitative structure activity relationship (QSAR) modeling. https://ijaar.org/articles/Volume4-Number5/Sciences-Technology-Engineering/ijaar-ste-v4n5-may18-p6.pdf. Accessed 19 Apr. 2018

Gedeck P, Kramer C, Ertl P (2010) Computational analysis of structure–activity relationships. Prog Med Chem 49:113–160. https://doi.org/10.1016/S0079-6468(10)49004-9

Article  CAS  PubMed  Google Scholar 

Xiong Y, Qiao Y, Kihara D, Zhang H-Y, Zhu X, Wei D-Q (2019) Survey of machine learning techniques for prediction of the isoform specificity of cytochrome P450 substrates. Curr Drug Metab 20(3):229–235. https://doi.org/10.2174/1389200219666181019094526

Article  CAS  PubMed  Google Scholar 

Seddon G, Lounnas V, McGuire R, van den Bergh T, Bywater RP, Oliveira L, Vriend G (2012) Drug design for ever, from hype to hope. J Comput Aided Mol Des 26(1):137–150. https://doi.org/10.1007/s10822-011-9519-9

Article  CAS  PubMed  PubMed Central  Google Scholar 

Piir G, Kahn I, García-Sosa AT, Sild S, Ahte P, Maran U (2018) Best practices for QSAR model reporting: physical and chemical properties, ecotoxicity, environmental fate, human health, and toxicokinetics endpoints. Environ Health Perspect 126(12):126001. https://doi.org/10.1289/EHP3264

Article  CAS  PubMed  PubMed Central  Google Scholar 

Reker D, Schneider G (2015) Active-learning strategies in computer-assisted drug discovery. Drug Discov Today 20(4):458–465. https://doi.org/10.1016/j.drudis.2014.12.004

Article  PubMed  Google Scholar 

Dearden JC (2017) The history and development of quantitative structure-activity relationships (QSARs). Oncology: breakthroughs in research and practice. IGI Global, UK. https://doi.org/10.4018/978-1-5225-0549-5.ch003

Livingstone DJ (2000) The characterization of chemical structures using molecular properties, a survey. J Chem Inf Comput 40(2):195–209. https://doi.org/10.1021/ci990162i

Article  CAS  Google Scholar 

Hansch C, Fujita T (1964) p-σ-π analysis. A method for the correlation of biological activity and chemical structure. J Am Chem Soc 86(8):1616–1626. https://doi.org/10.1021/ja01062a035

Article  CAS  Google Scholar 

Todeschini R, Consonni V (2008) Handbook of molecular descriptors. John Wiley & Sons, New York. https://doi.org/10.1002/9783527613106

Book  Google Scholar 

Fujita T, Iwasa J, Hansch C (1964) A new substituent constant, π, derived from partition coefficients. J Am Chem Soc 86(23):5175–5180. https://doi.org/10.1021/ja01077a028

Article  CAS  Google Scholar 

Ivanciuc O (2000) QSAR comparative study of Wiener descriptors for weighted molecular graphs. J Chem Inf Comput 40(6):1412–1422. https://doi.org/10.1021/ci000068y

Article  CAS  Google Scholar 

Randić M (1991) Generalized molecular descriptors. J Math Chem 7(1):155–168. https://doi.org/10.1007/BF01200821

Article  Google Scholar 

Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput 42(6):1273–1280. https://doi.org/10.1021/ci010132r

Article  CAS  Google Scholar 

Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754. https://doi.org/10.1021/ci100050t

Article  CAS  PubMed  Google Scholar 

Cramer RD, Patterson DE, Bunce JD (1988) Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110(18):5959–5967. https://doi.org/10.1021/ja00226a005

Article  CAS  PubMed  Google Scholar 

Ragno R (2019) www.3d-qsar. Com: a web portal that brings 3-D QSAR to all electronic devices—the Py-CoMFA web application as tool to build models from pre-aligned datasets. J Comput Aided Mol Des 33:855–864. https://doi.org/10.1007/s10822-019-00231-x

Article  CAS  PubMed  Google Scholar 

Pajor K (2020) Search for biological descriptors enabling artificial intelligence (AI) based quantified structure activity/relationship (QSAR/QSPR) models. https://ruj.uj.edu.pl/xmlui/handle/item/248823. Accessed 19 Apr. 2020

Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A (2020) QSAR without borders. Chem Soc Rev 49(11):3525–3564. https://doi.org/10.1039/D0CS00098A

Article  CAS  PubMed  PubMed Central  Google Scholar 

Xu J (2022) Evolving drug design methodology: from QSAR to AIDD. ChemRxiv. https://doi.org/10.26434/chemrxiv-2022-9fwmg

Article  Google Scholar 

D’Souza S, Prema K, Balaji S (2020) Machine learning models for drug–target interactions: current knowledge and future directions. Drug Discov Today 25(4):748–756. https://doi.org/10.1016/j.drudis.2020.03.003

Article  CAS  PubMed  Google Scholar 

Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 18(6):463–477. https://doi.org/10.1038/s41573-019-0024-5

Article  CAS  PubMed  PubMed Central  Google Scholar 

Siramshetty VB, Nguyen D-T, Martinez NJ, Southall NT, Simeonov A, Zakharov AV (2020) Critical assessment of artificial intelligence methods for prediction of hERG channel inhibition in the “big data” era. J Chem Inf Model 60(12):6007–6019. https://doi.org/10.1021/acs.jcim.0c00884

Article  CAS  PubMed  Google Scholar 

Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B (2012) ChEMBL: a large-scale bioactivity database for drug discovery. J Nucleic Acids 40(D1):D1100–D1107. https://doi.org/10.1093/nar/gkr777

Article  CAS  Google Scholar 

Mysinger MM, Carchia M, Irwin JJ, Shoichet BK (2012) Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J Med Chem 55(14):6582–6594. https://doi.org/10.1021/jm300687e

Article  CAS  PubMed  PubMed Central  Google Scholar 

Landrum G (2013) RDKit: a software suite for cheminformatics, computational chemistry, and predictive modeling. http://www.rdkit.org/RDKit_Overview.pdf. Accessed 19 Apr. 2013

Rácz A, Bajusz D, Héberger K (2021) Effect of dataset size and train/test split ratios in QSAR/QSPR multiclass classification. Molecules 26(4):1111. https://doi.org/10.3390/molecules26041111

Article  CAS  PubMed  PubMed Central  Google Scholar 

Datta S, Das S (2015) Near-Bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs. Int J Neural Netw 70:39–52. https://doi.org/10.1016/j.neunet.2015.06.005

Article  Google Scholar 

Zhang L, Fourches D, Sedykh A, Zhu H, Golbraikh A, Ekins S, Clark J, Connelly MC, Sigal M, Hodges D, Guiguemde A, Guy RK, Tropsha A (2013) Discovery of novel antimalarial compounds enabled by QSAR-based virtual screening. J Chem Inf Model 53(2):475–492. https://doi.org/10.1021/ci300421n

Article  CAS  PubMed  PubMed Central  Google Scholar 

Butina D (1999) Unsupervised data base clustering based on daylight’s fingerprint and tanimoto similarity: a fast and automated way to cluster small and large data sets. J Chem Inf Comput 39(4):747–750. https://doi.org/10.1021/ci9803381

Article  CAS  Google Scholar 

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in python. https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf?ref=https:/. Accessed 19 Apr. 2011

Byvatov E, Schneider G (2003) Support vector machine applications in bioinformatics https://europepmc.org/article/med/15130823. Accessed 19 Apr. 2003

Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Model 43(6):1947–1958. https://doi.org/10.1021/ci034160g

Article  CAS  Google Scholar 

Sheridan RP, Wang WM, Liaw A, Ma J, Gifford EM (2016) Extreme gradient boosting as a method for quantitative structure–activity relationships. J Chem Inf Model 56(12):2353–2360. https://doi.org/10.1021/acs.jcim.6b00591

Article  CAS  PubMed  Google Scholar 

Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ArXiv.org 2016:785–794. https://doi.org/10.48550/arXiv.1603.02754

Berrar D (2019) Cross-Validation. In: Ranganathan S, Gribskov M, Nakai K, Schönbach C (eds) Encyclopedia of Bioinformatics and Computational Biology. Academic Press, Oxford. https://doi.org/10.1016/B978-0-12-809633-8.20349-X

Chapter  Google Scholar 

Xu Y, Goodacre R (2018) On splitting training and validation set: a comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning. J Anal Test 2(3):249–262. https://doi.org/10.1007/s41664-018-0068-2

Article  PubMed  PubMed Central  Google Scholar 

Agrawal T (2021) Hyperparameter optimization using scikit-learn. Hyperparameter optimization in machine learning. Springer, USA. https://doi.org/10.1007/978-1-4842-6579-6_2

Chapter  Google Scholar 

Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010

Article  Google Scholar 

Stehman SV (1997) Selecting and interpreting measures of thematic classification accuracy. Remote Sens Lett 62(1):77–89. https://doi.org/10.1016/S0034-4257(97)00083-7

Article  Google Scholar 

Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data 17(3):299–310. https://doi.org/10.1109/TKDE.2005.50

Article  CAS  Google Scholar 

Baell JB, Holloway GA (2010) New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J Med Chem 53(7):2719–2740.

Comments (0)

No login
gif