G protein-coupled receptors (GPCRs) have been the targets for more than 40% of the currently approved drugs. Although neural networks can effectively improve the accuracy of prediction with the biological activity, the result is undesirable in the limited orphan GPCRs (oGPCRs) datasets. To this end, we proposed Multi-source Transfer Learning with Graph Neural Network, called MSTL-GNN, to bridge this gap. Firstly, there are three ideal sources of data for transfer learning, oGPCRs, experimentally validated GPCRs, and invalidated GPCRs similar to the former one. Secondly, the SIMLEs format GPCRs convert to graphics, and they can be the input of Graph Neural Network (GNN) and ensemble learning for improving prediction accuracy. Finally, our experiments show that MSTL-GNN remarkably improves the prediction of GPCRs ligand activity value compared with previous studies. On average, the two evaluation indexes we adopted, R2 and Root-mean-square deviation (RMSE). Compared with the state-of-the-art work MSTL-GNN increased up to 67.13% and 17.22%, respectively. The effectiveness of MSTL-GNN in the field of GPCR Drug discovery with limited data also paves the way for other similar application scenarios.
Citation: Shizhen Huang, ShaoDong Zheng, Ruiqi Chen. Multi-source transfer learning with Graph Neural Network for excellent modelling the bioactivities of ligands targeting orphan G protein-coupled receptors[J]. Mathematical Biosciences and Engineering, 2023, 20(2): 2588-2608. doi: 10.3934/mbe.2023121
G protein-coupled receptors (GPCRs) have been the targets for more than 40% of the currently approved drugs. Although neural networks can effectively improve the accuracy of prediction with the biological activity, the result is undesirable in the limited orphan GPCRs (oGPCRs) datasets. To this end, we proposed Multi-source Transfer Learning with Graph Neural Network, called MSTL-GNN, to bridge this gap. Firstly, there are three ideal sources of data for transfer learning, oGPCRs, experimentally validated GPCRs, and invalidated GPCRs similar to the former one. Secondly, the SIMLEs format GPCRs convert to graphics, and they can be the input of Graph Neural Network (GNN) and ensemble learning for improving prediction accuracy. Finally, our experiments show that MSTL-GNN remarkably improves the prediction of GPCRs ligand activity value compared with previous studies. On average, the two evaluation indexes we adopted, R2 and Root-mean-square deviation (RMSE). Compared with the state-of-the-art work MSTL-GNN increased up to 67.13% and 17.22%, respectively. The effectiveness of MSTL-GNN in the field of GPCR Drug discovery with limited data also paves the way for other similar application scenarios.
[1] | A. S. Hauser, M. M. Attwood, M. Rask-Andersen, H. B. Schiöth, D. E. Gloriam, Trends in GPCR drug discovery: new agents, targets and indications, Nat. Rev. Drug. Discov., 16 (2017), 829–842. https://doi.org/10.1038/nrd.2017.178 doi: 10.1038/nrd.2017.178 |
[2] | L. M. Slosky, M. G. Caron, L. S. Barak, Biased allosteric modulators: New frontiers in GPCR drug discovery, Trends Pharmacol. Sci., 42 (2021), 283–299. https://doi.org/10.1016/j.tips.2020.12.005 doi: 10.1016/j.tips.2020.12.005 |
[3] | F. Zhang, V. Lemaur, W. Choi, P. Kafle, S. Seki, J. Cornil, et al., Repurposing DNA-binding agents as H-bonded organic semiconductors, Nat. Commun., 10 (2019), 4217. https://doi.org/10.1038/s41467-019-12248-9 doi: 10.1038/s41467-019-12248-9 |
[4] | S. Chung, T. Funakoshi, O. Civelli, Orphan GPCR research, British J. Pharmacol., 153 (2008), S339–S346. https://doi.org/10.1038/sj.bjp.0707606 doi: 10.1038/sj.bjp.0707606 |
[5] | W. K. Kroeze, M. F. Sassano, X.-P. Huang, K. Lansu, J. D. McCorvy, P. M. Giguère, et al., PRESTO-Tango as an open-source resource for interrogation of the druggable human GPCRome, Nat. Struct. Mol. Biol., 22 (2015), 362–369. https://doi.org/10.1038/nsmb.3014 doi: 10.1038/nsmb.3014 |
[6] | A. T. Ehrlich, G. Maroteaux, A. Robe, L. Venteo, M. T. Nasseef, L. C. van Kempen, et al., Expression map of 78 brain-expressed mouse orphan GPCRs provides a translational resource for neuropsychiatric research, Commun. Biol., 1 (2018), 1–14. https://doi.org/10.1038/s42003-018-0106-7 doi: 10.1038/s42003-018-0106-7 |
[7] | M. Zhao, Z. Wang, M. Yang, Y. Ding, M. Zhao, H. Wu, et al., The Roles of Orphan G Protein-Coupled Receptors in Autoimmune Diseases, Clinic. Rev. Allerg. Immunol., 60 (2021), 220–243. https://doi.org/10.1007/s12016-020-08829-y doi: 10.1007/s12016-020-08829-y |
[8] | J. Colette, E. Avé, B. Grenier-Boley, A.-S. Coquel, K. Lesellier, K. Puget, Bioinformatics-based discovery and identification of new biologically active peptides for GPCR deorphanization, J. Peptide Sci., 13 (2007), 568–574. https://doi.org/10.1002/psc.898 doi: 10.1002/psc.898 |
[9] | A. Jabeen, S. Ranganathan, Applications of machine learning in GPCR bioactive ligand discovery, Current Opin. Structural Biol., 55 (2019), 66–76. https://doi.org/10.1016/j.sbi.2019.03.022 doi: 10.1016/j.sbi.2019.03.022 |
[10] | H. A. L. Filipe, L. M. S. Loura, Molecular dynamics simulations: Advances and applications, Molecules, 27 (2022), 2105. https://doi.org/10.3390/molecules27072105 doi: 10.3390/molecules27072105 |
[11] | A. Cereto-Massagué, M. J. Ojeda, C. Valls, M. Mulero, S. Garcia-Vallvé, and G. Pujadas, Molecular fingerprint similarity search in virtual screening, Methods, 71 (2015), 58–63. https://doi.org/10.1016/j.ymeth.2014.08.005 doi: 10.1016/j.ymeth.2014.08.005 |
[12] | R. Wang, S. Li, L. Cheng, M. H. Wong, K. S. Leung, Predicting associations among drugs, targets and diseases by tensor decomposition for drug repositioning, BMC Bioinform., 20 (2019), 628. https://doi.org/10.1186/s12859-019-3283-6 doi: 10.1186/s12859-019-3283-6 |
[13] | B. Jan, H. Farman, M. Khan, M. Imran, I. U. Islam, A. Ahmad, et al., Deep learning in big data Analytics: A comparative study, Comput. Electr. Eng., 75 (2019), 275–287. https://doi.org/10.1016/j.compeleceng.2017.12.009 doi: 10.1016/j.compeleceng.2017.12.009 |
[14] | P. Singh, S. S. Bose, Ambiguous D-means fusion clustering algorithm based on ambiguous set theory: Special application in clustering of CT scan images of COVID-19, Knowledge-Based Systems, 231 (2021), 107432. https://doi.org/10.1016/j.knosys.2021.107432 doi: 10.1016/j.knosys.2021.107432 |
[15] | O. Cabral-Marques, G. Halpert, L. F. Schimke, Y. Ostrinski, A. Vojdani, G. C. Baiocchi, et al., Autoantibodies targeting GPCRs and RAS-related molecules associate with COVID-19 severity, Nat. Commun., 13 (2022), 1220. https://doi.org/10.1038/s41467-022-28905-5 doi: 10.1038/s41467-022-28905-5 |
[16] | W. Tong, H. Hong, H. Fang, Q. Xie, R. Perkins, Decision Forest: Combining the Predictions of Multiple Independent Decision Tree Models, J. Chem. Inf. Comput. Sci., 43 (2003), 525–531. https://doi.org/10.1021/ci020058s doi: 10.1021/ci020058s |
[17] | E. Lounkine, F. Nigsch, J. L. Jenkins, M. Glick, Activity-Aware Clustering of High Throughput Screening Data and Elucidation of Orthogonal Structure–Activity Relationships, J. Chem. Inf. Model., 51 (2011), 3158–3168. https://doi.org/10.1021/ci2004994 doi: 10.1021/ci2004994 |
[18] | K. A. Carpenter, D. S. Cohen, J. T. Jarrell, X. Huang, Deep learning and virtual drug screening, Future Med Chem, 10 (2018), 2557–2567. https://doi.org/10.4155/fmc-2018-0314 doi: 10.4155/fmc-2018-0314 |
[19] | J. Wu, Q. Zhang, W. Wu, T. Pang, H. Hu, W. K. B. Chan, et al., WDL-RF: predicting bioactivities of ligand molecules acting with G protein-coupled receptors by combining weighted deep learning and random forest, Bioinformatics, 34 (2018), 2271–2282. https://doi.org/10.1093/bioinformatics/bty070 doi: 10.1093/bioinformatics/bty070 |
[20] | S. Hu, P. Chen, P. Gu, and B. Wang, A Deep Learning-Based Chemical System for QSAR Prediction, IEEE Journal of Biomedical and Health Informatics, 24 (2020), 3020–3028. https://doi.org/10.1109/JBHI.2020.2977009 doi: 10.1109/JBHI.2020.2977009 |
[21] | J. M. Stokes, K. Yang, K. Swanson, W. Jin, A. Cubillos-Ruiz, N. M. Donghia, et al., A Deep Learning Approach to Antibiotic Discovery, Cell, 180 (2020), 688–702. e13. https://doi.org/10.1016/j.cell.2020.01.021 doi: 10.1016/j.cell.2020.01.021 |
[22] | A. P. Bento, A. Hersey, E. Félix, G. Landrum, A. Gaulton, F. Atkinson, et al., An open source chemical structure curation pipeline using RDKit, J. Cheminform., 12 (2020), 51. https://doi.org/10.1186/s13321-020-00456-1 doi: 10.1186/s13321-020-00456-1 |
[23] | S. Vijayakumar, V. Kant, and P. Das, LeishInDB: A web-accessible resource for small molecule inhibitors against Leishmania sp, Acta Trop., 190 (2019), 375–379. https://doi.org/10.1016/j.actatropica.2018.12.022 doi: 10.1016/j.actatropica.2018.12.022 |
[24] | K. P. Singh, S. Gupta, Nano-QSAR modeling for predicting biological activity of diverse nanomaterials, RSC Adv., 4 (2014), 13215–13230. https://doi.org/10.1039/C4RA01274G doi: 10.1039/C4RA01274G |
[25] | K. Lech, A. Figiel, A. Wojdyło, M. Korzeniowska, M. Serowik, M. Szarycz, Drying Kinetics and Bioactivity of Beetroot Slices Pretreated in Concentrated Chokeberry Juice and Dried with Vacuum Microwaves, Dry. Technol., 33 (2015), 1644–1653. https://doi.org/10.1080/07373937.2015.1075209 doi: 10.1080/07373937.2015.1075209 |
[26] | J. Wu, C. Lan, X. Ye, J. Deng, W. Huang, X. Yang, et al., Disclosing incoherent sparse and low-rank patterns inside homologous GPCR tasks for better modelling of ligand bioactivities, Front Comput. Sci., 16 (2021), 164322. https://doi.org/10.1007/s11704-021-0478-6 doi: 10.1007/s11704-021-0478-6 |
[27] | The UniProt Consortium, UniProt: A hub for protein information, Nucleic Acids Res., 43 (2015), D204–D212. https://doi.org/10.1093/nar/gku989 doi: 10.1093/nar/gku989 |
[28] | W. K. B. Chan, H. Zhang, J. Yang, J. R. Brender, J. Hur, A. Özgür, et al., GLASS: A comprehensive database for experimentally validated GPCR-ligand associations, Bioinformatics, 31 (2015), 3035–3042. https://doi.org/10.1093/bioinformatics/btv302 doi: 10.1093/bioinformatics/btv302 |