Research article Special Issues

A viral protein identifying framework based on temporal convolutional network

  • Received: 17 December 2018 Accepted: 31 January 2019 Published: 27 February 2019
  • The interaction between viral proteins and small molecule compounds is the basis of drug design. Therefore, it is a fundamental challenge to identify viral proteins according to their amino acid sequences in the field of biopharmaceuticals. The traditional prediction methods su er from the data imbalance problem and take too long computation time. To this end, this paper proposes a deep learning framework for virus protein identifying. In the framework, we employ Temporal Convolutional Network(TCN) instead of Recurrent Neural Network(RNN) for feature extraction to improve computation e ciency. We also customize the cost-sensitive loss function of TCN and introduce the misclassification cost of training samples into the weight update of Gradient Boosting Decision Tree(GBDT) to address data imbalance problem. Experiment results show that our framework not only outperforms traditional data imbalance methods but also greatly reduces the computation time with slight performance enhancement.

    Citation: Hanyu Zhao, Chao Che, Bo Jin, Xiaopeng Wei. A viral protein identifying framework based on temporal convolutional network[J]. Mathematical Biosciences and Engineering, 2019, 16(3): 1709-1717. doi: 10.3934/mbe.2019081

    Related Papers:

  • The interaction between viral proteins and small molecule compounds is the basis of drug design. Therefore, it is a fundamental challenge to identify viral proteins according to their amino acid sequences in the field of biopharmaceuticals. The traditional prediction methods su er from the data imbalance problem and take too long computation time. To this end, this paper proposes a deep learning framework for virus protein identifying. In the framework, we employ Temporal Convolutional Network(TCN) instead of Recurrent Neural Network(RNN) for feature extraction to improve computation e ciency. We also customize the cost-sensitive loss function of TCN and introduce the misclassification cost of training samples into the weight update of Gradient Boosting Decision Tree(GBDT) to address data imbalance problem. Experiment results show that our framework not only outperforms traditional data imbalance methods but also greatly reduces the computation time with slight performance enhancement.


    加载中


    [1] O. P. Zhirnov, A. L. Ksenofontov and N. D. Klenk, Influenza A virus M1 matrix protein is similar to protease inhibitors, Dokl. Akad. Nauk, 367(1999), 690–693.
    [2] S. Niu, T. Huang and K. Feng, et al., Prediction of tyrosine sulfation with mRMR feature selection and analysis, J. Proteome Res., 9(2010), 6490–6497.
    [3] S. Bai, J. Z. Kolter, and V. Koltun, An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling, preprint, arXiv:0707.0078.
    [4] Z. C. Lipton, J. Berkowitz, and C. Elkan, A critical review of recurrent neural networks for sequence learning, preprint, arXiv:1506.00019.
    [5] J. H. Friedman, Greedy function approximation: A gradient boosting machine, Ann. Stat., 29(2001), 1189–1232.
    [6] D. Wang, N. K. Lee and T. S. Dillon, et al., Protein sequences classification using radial basis function (RBF) neural networks, In: International Conference on Neural Information Processing, 2(2002), 764–768.
    [7] C. Lin, Y. Zou and J. Qin, et al., Hierarchical Classification of Protein Folds Using a Novel Ensemble Classifier, PLoS One, 8(2013), e56499.
    [8] T. K. Lee and T. Nguyen, Protein Family Classication with Neural Networks, Stanford University, 2016, Available online: https://cs224d.stanford.edu/reports/LeeNguyen.pdf.
    [9] H. Li, H. Yu and X. Gong, A Deep Learning Model for Predicting RNA-Binding Proteins Only from Primary Sequences, J. Comput. Res. Dev., 55(2018), 93–101.
    [10] N. V. Chawla, K. W. Bowyer and L. O. Hall, et al., SMOTE: synthetic minority over-sampling technique, J. Artif. Intell. Res., 16(2002), 321-357.
    [11] W. Fan, S. J. Stolfo and J. Zhang, et al., AdaCost: Misclassification Cost-Sensitive Boosting, In: Sixteenth International Conference on Machine Learning, (1999), 97–105.
    [12] Y. Freund and R.E. Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, J. Comput. Syst. Sci., 55(1997), 119–139.
    [13] J. Long, E. Shelhamer and T. Darrell, Fully convolutional networks for semantic segmentation, In: IEEE Conference on Computer Vision and Pattern Recognition, 39(2015), 640–651.
    [14] T. J. Brazil, Causal-Convolution-A New Method for the Transient Analysis of Linear Systems at Microwave Frequencies, IEEE T. Microw. Theory, 43(1995), 315–323.
    [15] A. V. D. Oord, S. Dieleman and H. Zen, et al., WaveNet: A Generative Model for Raw Audio, preprint, arXiv:1609.03499.
    [16] K. He, X. Zhang and S. Ren, et al., Deep Residual Learning for Image Recognition, In: IEEE Conference on Computer Vision and Pattern Recognition, (2016), 770–778.
    [17] P. Branco, L. Torgo and R. Ribeiro, A survey of predictive modelling under imbalanced distributions. preprint, arXiv:1505.01658.
  • Reader Comments
  • © 2019 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(4170) PDF downloads(684) Cited by(6)

Article outline

Figures and Tables

Figures(1)  /  Tables(5)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog