It is vital for the annotation of uncharacterized proteins by protein function prediction. At present, Deep Neural Network based protein function prediction is mainly carried out for dataset of small scale proteins or Gene Ontology, and usually explore the relationships between single protein feature and function tags. The practical methods for large-scale multi-features protein prediction still need to be studied in depth. This paper proposes a DNN based protein function prediction approach IGP-DNN. This method uses Grasshopper Optimization Algorithm (GOA) and Intuitionistic Fuzzy c-Means clustering (IFCM) based protein function modules extracting algorithm to extract the features of protein modules, utilizing Kernel Principal Component Analysis (KPCA) method to reduce the dimensionality of the protein attribute information, and integrating module features and attribute features. Inputting integrated data into DNN through multiple hidden layers to classify proteins and predict protein functions. In the experiments, the F-measure value of IGP-DNN on the DIP dataset reaches 0.4436, which shows better performance.
Citation: Wenjun Xu, Zihao Zhao, Hongwei Zhang, Minglei Hu, Ning Yang, Hui Wang, Chao Wang, Jun Jiao, Lichuan Gu. Deep neural learning based protein function prediction[J]. Mathematical Biosciences and Engineering, 2022, 19(3): 2471-2488. doi: 10.3934/mbe.2022114
It is vital for the annotation of uncharacterized proteins by protein function prediction. At present, Deep Neural Network based protein function prediction is mainly carried out for dataset of small scale proteins or Gene Ontology, and usually explore the relationships between single protein feature and function tags. The practical methods for large-scale multi-features protein prediction still need to be studied in depth. This paper proposes a DNN based protein function prediction approach IGP-DNN. This method uses Grasshopper Optimization Algorithm (GOA) and Intuitionistic Fuzzy c-Means clustering (IFCM) based protein function modules extracting algorithm to extract the features of protein modules, utilizing Kernel Principal Component Analysis (KPCA) method to reduce the dimensionality of the protein attribute information, and integrating module features and attribute features. Inputting integrated data into DNN through multiple hidden layers to classify proteins and predict protein functions. In the experiments, the F-measure value of IGP-DNN on the DIP dataset reaches 0.4436, which shows better performance.
[1] | L. C. Gu, Y. Y. Han, C. Wang, W. Chen, J. Jiao, X. Yuan, Module overlapping structure detection in PPI using an improved link similarity-based Markov clustering algorithm, Neural. Comput. Appl., 31 (2019), 1481–1490. https://doi.org/10.1007/s00521-018-3508-z doi: 10.1007/s00521-018-3508-z |
[2] | R. Cao, C. Freitas, L. Chan, M. Sun, H. Jiang, Z. Chen, ProLanGO: protein function prediction using neural machine translation based on a recurrent neural network, Molecules, 22 (2017), 1732. https://doi.org/10.3390/molecules22101732 doi: 10.3390/molecules22101732 |
[3] | B. Szalkai, V. Grolmusz, SECLAF: a webserver and deep neural network design tool for hierarchical biological sequence classification, Bioinformatics, 34 (2018), 2487–2489. https://doi.org/10.1093/bioinformatics/bty116 doi: 10.1093/bioinformatics/bty116 |
[4] | A. Tavanaei, A.S. Maida, A. Kaniymattam, R. Loganantharaj, Towards recognition of protein function based on its structure using deep convolutional networks, In 2016 IEEE Int. Conf. Bioinform. Biomed. (BIBM). IEEE, 2016,145–149. https://doi.org/10.1109/BIBM.2016.7822509 |
[5] | V. Gligorijević, M. Barot, R. Bonneau, deepNF: deep network fusion for protein function prediction, Bioinformatics, 34 (2018), 3873–3881. https://doi.org/10.1093/bioinformatics/bty440. doi: 10.1093/bioinformatics/bty440 |
[6] | R. Fa, D. Cozzetto, C. Wan, D. T. Jones, Predicting human protein function with multi-task deep neural networks, PloS one, 13 (2018), e0198216. https://doi.org/10.1371/journal.pone.0198216 doi: 10.1371/journal.pone.0198216 |
[7] | X. Zou, G. Wang, G. Yu, Protein function prediction using deep restricted Boltzmann machines, BioMed Res. Int., 2017 (2017), 1729301. https://doi.org/10.1371/journal.pone.0198216 doi: 10.1371/journal.pone.0198216 |
[8] | A. S. Rifaioglu, T. Doğan, M. J. Martin, R. Cetin-Atalay, V. Atalay, DEEPred: automated protein function prediction with multi-task feed-forward deep neural networks, Sci. Rep., 9 (2019), 1–16. https://doi.org/10.1038/s41598-019-43708-3 doi: 10.1038/s41598-019-43708-3 |
[9] | C. J. Zhang, H. Tang, W. C. Li, H. Lin, W. Chen, K. C. Chou, iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition, Oncotarget, 7 (2016), 69783. https://doi.org/10.18632/oncotarget.11975 doi: 10.18632/oncotarget.11975 |
[10] | Y. Pan, D. Liu, L. Deng, Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties, PloS one, 12 (2017), e0179314. https://doi.org/10.1371/journal.pone.0179314 doi: 10.1371/journal.pone.0179314 |
[11] | Y. Liu, S. Shen, H. Fang, K. X. Chen, An overview of protein function prediction methods, Chin. J. Bioinform., 11 (2013), 33–38. |
[12] | S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res., 25 (1997), 3389–3402. https://doi.org/10.1093/nar/25.17.3389. doi: 10.1093/nar/25.17.3389 |
[13] | J. M. Yunes, P. C. Babbitt, Effusion: prediction of protein function from sequence similarity networks, Bioinformatics, 35 (2019), 442–451. https://doi.org/10.1093/bioinformatics/bty672. doi: 10.1093/bioinformatics/bty672 |
[14] | S. Saha, A. Prasad, P. Chatterjee, S. Basu, M. Nasipuri, Protein function prediction from dynamic protein interaction network using gene expression data, J. Bioinform. Comput. Biol., 17 (2019), 1950025. https://doi.org/10.1142/S0219720019500252. doi: 10.1142/S0219720019500252 |
[15] | B. Hoffmann, M. Zaslavskiy, J. P. Vert, V. Stoven, A new protein binding pocket similarity measure based on comparison of clouds of atoms in 3D: application to ligand prediction, BMC bioinform., 11 (2010), 99. https://doi.org/10.1186/1471-2105-11-99 doi: 10.1186/1471-2105-11-99 |
[16] | A. Yang, R. Li, W. Zhu, G. Yue, A novel method for protein function prediction based on sequence numerical features, Match-Commun. Math. Comput. Chem., 67 (2012), 833. |
[17] | L. Deng, G. Hinton, B. Kingsbury, New types of deep neural network learning for speech recognition and related applications: An overview, 2013 IEEE Int. Conf. Acoust. Speech Signal Process., 2013, 8599–8603. https://doi.org/10.1109/ICASSP.2013.6639344 doi: 10.1109/ICASSP.2013.6639344 |
[18] | C. Angermueller, T. Pärnamaa, L. Parts, O. Stegle, Deep learning for computational biology, Mol. Syst. Boil., 12 (2016), 878. https://doi.org/10.15252/msb.20156651 doi: 10.15252/msb.20156651 |
[19] | S. Min, B. Lee, S. Yoon, Deep learning in bioinformatics, Briefings Bioinform., 18 (2017), 851–869. https://doi.org/10.1093/bib/bbw068 doi: 10.1093/bib/bbw068 |
[20] | R. Cao, B. Adhikari, D. Bhattacharya, M. Sun, J. Hou, J. Cheng, QAcon: single model quality assessment using protein structural and contact information with machine learning techniques, Bioinformatics, 33 (2017), 586–588. https://doi.org/10.1093/bioinformatics/btw694 doi: 10.1093/bioinformatics/btw694 |
[21] | M. Kulmanov, M. A. Khan, R. Hoehndorf, DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier, Bioinformatics, 34 (2018), 660–668. https://doi.org/10.1093/bioinformatics/btx624 doi: 10.1093/bioinformatics/btx624 |
[22] | R. You, S. Yao, Y. Xiong, X. Huang, F. Sun, H. Mamitsuka, et al., NetGO: improving large-scale protein function prediction with massive network information, Nucleic Acids Res., 47 (2019), W379–W387. https://doi.org/10.1093/nar/gkz388 doi: 10.1093/nar/gkz388 |
[23] | S. Yao, R. You, S. Wang, Y. Xiong, X. Huang, S. Zhu, NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information, Nucleic Acids Res., 2021. https://doi.org/10.1093/nar/gkab398 doi: 10.1093/nar/gkab398 |
[24] | I. Xenarios, L. Salwinski, X. J. Duan, P. Higney, S. M. Kim, D. Eisenberg, DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions, Nucleic Acids Res., 30 (2002), 303–305. https://doi.org/10.1093/nar/30.1.303 doi: 10.1093/nar/30.1.303 |
[25] | UniProt Consortium, The universal protein resource (UniProt) in 2010, Nucleic Acids Res., 38 (2010), D142–D148. https://doi.org/10.1093/nar/gkp846 doi: 10.1093/nar/gkp846 |
[26] | M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. Cherry, et al., Gene ontology: tool for the unification of biology, Nat. Genet., 25 (2000), 25–29. https://doi.org/10.1038/75556 doi: 10.1038/75556 |
[27] | S. Pu, J. Wong, B. Turner, E. Cho, S. J. Wodak, Up-to-date catalogues of yeast protein complexes, Nucleic Acids Res., 37 (2009), 825–831. https://doi.org/10.1093/nar/gkn1005 doi: 10.1093/nar/gkn1005 |
[28] | A. C. Gavin, M. Bösche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, et al., Functional organization of the yeast proteome by systematic analysis of protein complexes, Nature, 415 (2002), 141–147. https://doi.org/10.1038/415141a doi: 10.1038/415141a |
[29] | J. Q. Tang, J. L. Wu, Protein function prediction method based on PPI network and machine learning, J. Comput. Appl., 38 (2018), 722–727. |
[30] | A. E. Lobley, T. Nugent, C. A. Orengo, D. T. Jones, FFPred: an integrated feature-based function prediction server for vertebrate proteomes, Nucleic Acids Res., 36 (2008), W297–W302. https://doi.org/10.1093/nar/gkn193 doi: 10.1093/nar/gkn193 |