Identification of hormone binding proteins based on machine learning methods

Jiu-Xin Tan; Shi-Hao Li; Zi-Mei Zhang; Cui-Xia Chen; Wei Chen; Hua Tang; Hao Lin; Jiu-Xin Tan; Shi-Hao Li; Zi-Mei Zhang; Cui-Xia Chen; Wei Chen; Hua Tang; Hao Lin

doi:10.3934/mbe.2019123

Mathematical Biosciences and Engineering

2019, Volume 16, Issue 4: 2466-2480. doi: 10.3934/mbe.2019123

Previous Article Next Article

Research article Special Issues

Identification of hormone binding proteins based on machine learning methods

1.
Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
2.
National Research Institute for Family Planning, Beijing 100081, China
3.
National Center of Human Genetic Resources, Beijing 100081, China
4.
Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
5.
Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China

Received: 21 December 2019 Accepted: 04 March 2019 Published: 22 March 2019

The soluble carrier hormone binding protein (HBP) plays an important role in the growth of human and other animals. HBP can also selectively and non-covalently interact with hormone. Therefore, accurate identification of HBP is an important prerequisite for understanding its biological functions and molecular mechanisms. Since experimental methods are still labor intensive and cost ineffective to identify HBP, it's necessary to develop computational methods to accurately and efficiently identify HBP. In this paper, a machine learning-based method was proposed to identify HBP, in which the samples were encoded by using the optimal tripeptide composition obtained based on the binomial distribution method. In the 5-fold cross-validation test, the proposed method yielded an overall accuracy of 97.15%. For the convenience of scientific community, a user-friendly webserver called HBPred2.0 was built, which could be freely accessed at http://lin-group.cn/server/HBPred2.0/.
- hormone binding protein,
- tripeptide composition,
- binomial distribution method,
- feature selection,
- support vector machine,
- webserver
Citation: Jiu-Xin Tan, Shi-Hao Li, Zi-Mei Zhang, Cui-Xia Chen, Wei Chen, Hua Tang, Hao Lin. Identification of hormone binding proteins based on machine learning methods[J]. Mathematical Biosciences and Engineering, 2019, 16(4): 2466-2480. doi: 10.3934/mbe.2019123

Related Papers:

Abstract

The soluble carrier hormone binding protein (HBP) plays an important role in the growth of human and other animals. HBP can also selectively and non-covalently interact with hormone. Therefore, accurate identification of HBP is an important prerequisite for understanding its biological functions and molecular mechanisms. Since experimental methods are still labor intensive and cost ineffective to identify HBP, it's necessary to develop computational methods to accurately and efficiently identify HBP. In this paper, a machine learning-based method was proposed to identify HBP, in which the samples were encoded by using the optimal tripeptide composition obtained based on the binomial distribution method. In the 5-fold cross-validation test, the proposed method yielded an overall accuracy of 97.15%. For the convenience of scientific community, a user-friendly webserver called HBPred2.0 was built, which could be freely accessed at http://lin-group.cn/server/HBPred2.0/.

References

[1]	G. Baumann, Growth hormone binding protein. The soluble growth hormone receptor, Minerva. Endocrinol., 27 (2002), 265–276.
[2]	J. A. Kraut and N. E. Madias, Adverse effects of the metabolic acidosis of chronic kidney disease, Adv. Chronic. Kidney Dis., 24 (2017), 289–297.
[3]	F. Sohm, I. Manfroid and A. Pezet, et al., Identification and modulation of a growth hormone-binding protein in rainbow trout (Oncorhynchus mykiss) plasma during seawater adaptation, Gen. Comp. Endocrinol., 111 (1998), 216–224.
[4]	Y. Zhang and T. A. Marchant, Identification of serum GH-binding proteins in the goldfish (Carassius auratus) and comparison with mammalian GH-binding proteins, J. Endocrinol., 161 (1999), 255–262.
[5]	I. E. Einarsdottir, N. Gong and E. Jonsson, et al., Plasma growth hormone-binding protein levels in Atlantic salmon Salmo salar during smoltification and seawater transfer, J. Fish Biol., 85 (2014), 1279–1296.
[6]	S. Fisker, J. Frystyk and L. Skriver, et al., A simple, rapid immunometric assay for determination of functional and growth hormone-occupied growth hormone-binding protein in human serum, Eur. J. Clin. Invest., 26 (1996), 779–785.
[7]	H. Tang, Y. W. Zhao and P. Zou, et al., HBPred: a tool to identify growth hormone-binding proteins, Int. J. Biol. Sci., 14 (2018), 957–964.
[8]	S. Basith, B. Manavalan and T. H. Shin, et al., iGHBP: Computational identification of growth hormone binding proteins from sequences using extremely randomised tree, Comput. Struct. Biotechnol. J., 16 (2018), 412–420.
[9]	L. Breuza, S. Poux and A. Estreicher, et al., The UniProtKB guide to the human proteome, Database (Oxford), 2016 (2016).
[10]	L. Fu, B. Niu and Z. Zhu, et al., CD-HIT: accelerated for clustering the next-generation sequencing data, Bioinformatics, 28 (2012), 3150–3152.
[11]	K. Tian, X. Zhao and S. S. Yau, Convex hull analysis of evolutionary and phylogenetic relationships between biological groups, J.Theor. Biol., 456 (2018), 34–40.
[12]	I. Dubchak, I. Muchnik and S. R. Holbrook, et al., Prediction of protein folding class using global description of amino acid sequence, Proc. Natl. Acad. Sci. U S A, 92 (1995), 8700–8704.
[13]	H. Tang, W. Chen and H. Lin, Identification of immunoglobulins using Chou's pseudo amino acid composition with feature selection technique, Mol. Biosyst., 12 (2016), 1269–1275.
[14]	K. C. Chou, Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273 (2011), 236–247.
[15]	K. C. Chou, Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins, 43 (2001), 246–255.
[16]	F. Y. Dao, H. Yang and Z. D. Su, et al., Recent advances in conotoxin classification by using machine learning methods, Molecules, 22 (2017), in press.
[17]	Q. Zou, S. Wan and Y. Ju, et al., Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy, BMC System. Biol., 10 (2016), 114.
[18]	L. Wei, R. Su and B. Wang, et al., Integration of deep feature representations and handcrafted features to improve the prediction of N⁶-methyladenosine sites, Neurocomputing, 324 (2019), 3–9.
[19]	G. H. Huang and J. C. Li, Feature extractions for computationally predicting protein post-translational modifications, Curr. Bioinform., 13 (2018), 387–395.
[20]	Q. Zou, J. Zeng and L. Cao, et al., A novel features ranking metric with application to scalable visual and bioinformatics data classification, Neurocomputing, 173 (2016), 346–354.
[21]	H. Y. Lai, X. X. Chen and W. Chen, et al., Sequence-based predictive modeling to identify cancerlectins, Oncotarget, 8 (2017), 28169–28175.
[22]	X. X. Chen, H. Tang and W. C. Li, et al., Identification of bacterial cell wall lyases via pseudo amino acid composition, Biomed. Res. Int., 2016 (2016), 1654623.
[23]	X. J. Zhu, C. Q. Feng and H. Y. Lai, et al., Predicting protein structural classes for low-similarity sequences by evaluating different features, Knowled. System., 163 (2019), 787–793.
[24]	H. Yang, W. R. Qiu and G. Q. Liu, et al., iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC, Int. J. Biol. Sci., 14 (2018), 883–891.
[25]	H. Yang, H. Tang and X. X. Chen, et al., Identification of secretory proteins in mycobacterium tuberculosis using pseudo amino acid composition, Biomed. Res. Int., 2016 (2016), 5413903.
[26]	C. Q. Feng, Z. Y. Zhang and X. J. Zhu, et al., iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators, Bioinformatics, (2018), in press.
[27]	F. Y. Dao, H. Lv and F. Wang, et al., Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique, Bioinformatics, (2018), in press.
[28]	H. Lin, Z. Y. Liang and H. Tang, et al., Identifying sigma70 promoters with novel pseudo nucleotide composition, IEEE/ACM Trans. Comput. Biol. Bioinform., (2017), in press.
[29]	W. Chen, H. Yang and P. Feng, et al., iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties, Bioinformatics, 33 (2017), 3518–3523.
[30]	W. Chen, P. Feng and T. Liu, et al., Recent advances in machine learning methods for predicting heat shock proteins, Curr. Drug Metab., (2018), in press.
[31]	D. Li, Y. Ju and Q. Zou, Protein folds prediction with hierarchical structured SVM, Curr. Proteom., 13 (2016), 79–85.
[32]	N. Zhang, S. Yu and Y. Guo, et al., Discriminating ramos and jurkat cells with image textures from diffraction imaging flow cytometry based on a support vector machine, Curr. Bioinform., 13 (2018), 50–56.
[33]	H. Yang, H. Lv and H. Ding, et al., iRNA-2OM: A sequence-based predictor for identifying 2'-o-methylation sites in homo sapiens, J. Comput. Biol., 25 (2018), 1266–1277.
[34]	P. M. Feng, H. Ding and W. Chen, et al., Naive Bayes classifier with feature selection to identify phage virion proteins, Comput. Math. Methods Med., 2013 (2013), 530696.
[35]	B. Manavalan, S. Subramaniyam and T. H. Shin, et al., Machine-learning-based prediction of cell-penetrating peptides and their uptake efficiency with improved accuracy, J. Proteom. Res., 17 (2018), 2715–2726.
[36]	P. M. Feng, W. Chen and H. Lin, et al., iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition, Anal. Biochem., 442 (2013), 118–125.
[37]	P. M. Feng, H. Lin and W. Chen, Identification of antioxidants from sequence information using naive Bayes, Comput. Math. Method. Med., 2013 (2013), 567529.
[38]	P. Feng, H. Yang and H. Ding, et al., iDNA6mA-PseKNC: Identifying DNA N(6)-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC, Genomics, (2018), in press.
[39]	W. Chen, P. M. Feng and E. Z. Deng, et al., iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition, Anal. Biochem., 462 (2014), 76–83.
[40]	L. Z. Yuan, E. F. Yong and Z. Wei, et al., Using quadratic discriminant analysis to predict protein secondary structure based on chemical shifts, Curr. Bioinform., 12 (2017), 52–56.
[41]	W. Chen, H. Lv, and F. Nie, et al., i6mA-Pred: Identifying DNA N6-methyladenine sites in the rice genome, Bioinformatics, (2019), in press.
[42]	Y. Bao, S. Marini and T. Tamura, et al., Toward more accurate prediction of caspase cleavage sites: a comprehensive review of current methods, tools and features, Brief Bioinform., (2018), in press.
[43]	H. Tang, C. M. Zhang and R. Chen, et al., Identification of secretory proteins of malaria parasite by feature selection technique, Letter. Organic Chem., 14 (2017), 621–624.
[44]	H. Tang, R. Z. Cao and W. Wang, et al., A two-step discriminated method to identify thermophilic proteins, Int. J. Biomath., 10 (2017), in press.
[45]	S. Patel, R. Tripathi and V. Kumari, et al., DeepInteract: Deep neural network based protein-protein interaction prediction tool, Curr. Bioinform., 12 (2017), 551–557.
[46]	R. Z. Cao, B. Adhikari and D. Bhattacharya, et al., QAcon: single model quality assessment using protein structural and contact information with machine learning techniques, Bioinform., 33 (2017), 586–588.
[47]	R. Cao, C. Freitas and L. Chan, et al., ProLanGO: Protein function prediction using neural machine translation based on a recurrent neural network, Molecules, 22 (2017), in press.
[48]	B. Manavalan, T. H. Shin and M. O. Kim, et al., PIP-EL: A new ensemble learning method for improved proinflammatory peptide predictions, Front. Immunol., 9 (2018), 1783.
[49]	B. Manavalan, T. H. Shin and G. Lee, PVP-SVM: Sequence-based prediction of phage virion proteins using a support vector machine, Front. Microbiol., 9 (2018), 476.
[50]	T. Cui, L. Zhang and Y. Huang, et al., MNDR v2.0: an updated resource of ncRNA-disease associations in mammals, Nucleic Acids Res., 46 (2018), D371–D374.
[51]	T. Zhang, P. Tan and L. Wang, et al., RNALocate: a resource for RNA subcellular localizations, Nucleic Acids Res., 45 (2017), D135–D138.
[52]	Y. Yi, Y. Zhao and C. Li, et al., RAID v2.0: an updated resource of RNA-associated interactions across organisms, Nucleic Acids Res., 45 (2017), D115–D118.
[53]	Z.Y. Liang, H.Y. Lai and H. Yang, et al., Pro54DB: a database for experimentally verified sigma-54 promoters, Bioinformatics, 33 (2017), 467–469.
[54]	J. Song, Y. Wang and F. Li, et al., iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites, Brief Bioinform., (2018), in press.
[55]	J. Song, F. Li and A. Leier, et al., PROSPERous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy, Bioinformatics, 34 (2018), 684–687.
[56]	R. Cao and J. Cheng, Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks, Methods, 93 (2016), 84–91.
[57]	W. Chen, P.M. Feng and E.Z. Deng, et al., iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition, Anal. Biochem., 462 (2014), 76–83.
[58]	I. Naseem, S. Khan and R. Togneri, et al., ECMSRC: A sparse learning approach for the prediction of extracellular matrix proteins, Curr. Bioinform., 12 (2017), 361–368.
[59]	R. Z. Cao, D. Bhattacharya and J. Hou, et al., DeepQA: improving the estimation of single protein model quality with deep belief networks, BMC Bioinform., 17 (2016), in press.
[60]	B. Manavalan, S. Basith and T. H. Shin, et al., MLACP: machine-learning-based prediction of anticancer peptides, Oncotarget, 8 (2017), 77121–77136.
[61]	B. Manavalan, S. Basith and T. H. Shin, et al., mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation, Bioinformatics, (2018), in press.
[62]	B. Manavalan, R. G. Govindaraj and T. H. Shin, et al., iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction, Front. Immunol., 9 (2018), 1695.
[63]	B. Manavalan, T. H. Shin and G. Lee, DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest, Oncotarget, 9 (2018), 1944–1956.

Reader Comments

Your name:*

Email:*
© 2019 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)