Research article Special Issues

On the existence of Hopf bifurcations in the sequential and distributive double phosphorylation cycle

  • Protein phosphorylation cycles are important mechanisms of the post translational modification of a protein and as such an integral part of intracellular signaling and control. We consider the sequential phosphorylation and dephosphorylation of a protein at two binding sites. While it is known that proteins where phosphorylation is processive and dephosphorylation is distributive admit oscillations (for some value of the rate constants and total concentrations) it is not known whether or not this is the case if both phosphorylation and dephosphorylation are distributive. We study simplified mass action models of sequential and distributive phosphorylation and show that for each of those there do not exist rate constants and total concentrations where a Hopf bifurcation occurs. To arrive at this result we use convex parameters to parametrize the steady state and Hurwitz matrices.

    Citation: Carsten Conradi, Elisenda Feliu, Maya Mincheva. On the existence of Hopf bifurcations in the sequential and distributive double phosphorylation cycle[J]. Mathematical Biosciences and Engineering, 2020, 17(1): 494-513. doi: 10.3934/mbe.2020027

    Related Papers:

    [1] Pingping Sun, Yongbing Chen, Bo Liu, Yanxin Gao, Ye Han, Fei He, Jinchao Ji . DeepMRMP: A new predictor for multiple types of RNA modification sites using deep learning. Mathematical Biosciences and Engineering, 2019, 16(6): 6231-6241. doi: 10.3934/mbe.2019310
    [2] Hasan Zulfiqar, Rida Sarwar Khan, Farwa Hassan, Kyle Hippe, Cassandra Hunt, Hui Ding, Xiao-Ming Song, Renzhi Cao . Computational identification of N4-methylcytosine sites in the mouse genome with machine-learning method. Mathematical Biosciences and Engineering, 2021, 18(4): 3348-3363. doi: 10.3934/mbe.2021167
    [3] Lei Chen, Ruyun Qu, Xintong Liu . Improved multi-label classifiers for predicting protein subcellular localization. Mathematical Biosciences and Engineering, 2024, 21(1): 214-236. doi: 10.3934/mbe.2024010
    [4] Yuanqian Yao, Jianlin Lv, Guangyao Wang, Xiaohua Hong . Multi-omics analysis and validation of the tumor microenvironment of hepatocellular carcinoma under RNA modification patterns. Mathematical Biosciences and Engineering, 2023, 20(10): 18318-18344. doi: 10.3934/mbe.2023814
    [5] Mingshuai Chen, Xin Zhang, Ying Ju, Qing Liu, Yijie Ding . iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM. Mathematical Biosciences and Engineering, 2022, 19(12): 13829-13850. doi: 10.3934/mbe.2022644
    [6] Jianhua Jia, Mingwei Sun, Genqiang Wu, Wangren Qiu . DeepDN_iGlu: prediction of lysine glutarylation sites based on attention residual learning method and DenseNet. Mathematical Biosciences and Engineering, 2023, 20(2): 2815-2830. doi: 10.3934/mbe.2023132
    [7] Carsten Conradi, Elisenda Feliu, Maya Mincheva . On the existence of Hopf bifurcations in the sequential and distributive double phosphorylation cycle. Mathematical Biosciences and Engineering, 2020, 17(1): 494-513. doi: 10.3934/mbe.2020027
    [8] Yunyun Liang, Shengli Zhang, Huijuan Qiao, Yinan Cheng . iEnhancer-MFGBDT: Identifying enhancers and their strength by fusing multiple features and gradient boosting decision tree. Mathematical Biosciences and Engineering, 2021, 18(6): 8797-8814. doi: 10.3934/mbe.2021434
    [9] Ying Xu, Jinyong Cheng . Secondary structure prediction of protein based on multi scale convolutional attention neural networks. Mathematical Biosciences and Engineering, 2021, 18(4): 3404-3422. doi: 10.3934/mbe.2021170
    [10] Yuqing Qian, Tingting Shang, Fei Guo, Chunliang Wang, Zhiming Cui, Yijie Ding, Hongjie Wu . Identification of DNA-binding protein based multiple kernel model. Mathematical Biosciences and Engineering, 2023, 20(7): 13149-13170. doi: 10.3934/mbe.2023586
  • Protein phosphorylation cycles are important mechanisms of the post translational modification of a protein and as such an integral part of intracellular signaling and control. We consider the sequential phosphorylation and dephosphorylation of a protein at two binding sites. While it is known that proteins where phosphorylation is processive and dephosphorylation is distributive admit oscillations (for some value of the rate constants and total concentrations) it is not known whether or not this is the case if both phosphorylation and dephosphorylation are distributive. We study simplified mass action models of sequential and distributive phosphorylation and show that for each of those there do not exist rate constants and total concentrations where a Hopf bifurcation occurs. To arrive at this result we use convex parameters to parametrize the steady state and Hurwitz matrices.


    Protein post-translational modification is an important chemical process, which plays a key role in regulating cell functions [1] and also changes the physical and chemical properties of protein. More than 400 post-translational modifications including methylation [2], acetylation [3], phosphorylation [4], and S-nitrosylation (SNO) [5] have been discovered so far. As SNO is a reversible post-translational modification of proteins, a large number of studies have shown that SNO plays an important role in multiple biological processes such as redox signal transduction [6], cell signal transduction [7], cell senescence [8], and transcription [9]. SNO is also related to many human diseases such as cancer [10], Alzheimer's disease [11], and chronic renal failure [12]. Therefore, a well-grounded understanding of SNO is of great significance for the study of basic biological processes [9,13] and the development of drugs [14]. In recent years, many SNO sites have been identified through molecular signals [15,16], but identification of SNO sites still faces some challenges including low accuracy, time-consuming and labor-intensive. With the continuous development of computer technology, a large number of computational models have been used to predict the specific sites of SNO modification.

    Many post-translational modifications of proteins have been detected by a variety of computational models. Qiu identified phosphorylated [17] and acetylated [18] proteins with GO notations. GPS-SNO [19], SNOSite [20], iSNO-PseAAC [21], PreSNO [5] and RecSNO [22] have been applied to the prediction of SNO sites. The GPS-SNO, SNOSite and iSNO-PseAAC models use relatively small data sets. In addition, many negative samples in these data sets are now experimentally verified as positive samples. The data sets used by PreSNO and RecSNO are relatively large and new, but there is still room for improvement in the performance of the model.

    On the basis of previous research, this work established two models for predicting SNO proteins and sites. In predicting SNO proteins, a bag-of-words model has been proposed on the basis of KNN scoring matrix obtained from proteins' GO annotation information [18], PseAAC [23,24] of amino acids sequence. Fusion of multiple features can more comprehensively reflect the information of the protein sequence and improve the prediction results. A combination of oversampling technique and random deletion method are applied to balance the training set since the issue is involved in imbalanced data sets. In predicting SNO sites, two feature extraction methods, TPC [25] and CKSAAP [26], are used to extract the features of protein sequence fragments. In order to eliminate the redundancy and noise information of the original feature space, elastic nets [27] are used to reduce the dimensionality of the feature space after the fusion strategy is performed on the original features. Random Forest severed as the classifiers and be verified with 5-fold cross-validation. The specific flow chart is shown in Figure 1.

    Figure 1.  The framework of RF-SNOPS.

    To obtain a scientifical prediction result, a strict benchmark data set is essential. The UniProKB has been accepted by most bioinformatics researchers. Here, the negative samples are extracted from the UniProKB and the positive sample are extracted from Xie's [28], which is a high-quality data set based on extensive literature research. The protein sequence can be expressed as:

    P=R1R2R3RiRL (1)

    where Ri represents the i-th amino acid residue, and L represents the length of the protein sequence.

    In order to identify SNO proteins, we constructed a benchmark data set similar to dataset of Hasan et al. [5], which consists of 3113 SNO proteins. Every one of the positive samples, i.e., SNO proteins, has at least one SNO site. For negative samples, we randomly selected 18, 047 proteins without any SNO site from the UniProKB. In order to make the results more rigorous, the CD-HIT was used to remove 30% of the 3113 positive samples and 18, 047 negative samples. Finally, 2192 positive samples and 7809 negative samples are collected in the proposed benchmark data set.

    The benchmark data set for predicting SNO sites are the same as Hasan et al. [5], which consists of 3383 positive samples and 3365 negative samples. A potential SNO(C) site-containing peptide sample can be generally expressed by

    Pξ=RξR(ξ1)R2R1CR+1R+2R+(ξ1)R+ξ (2)

    where the subscript ξ is an integer, Rξ represents the ξ-th upstream amino acid residue from the center, the R+ξ the ξ-th downstream amino acid residue, and so forth. If the number of left or right residues of the center C is less than ξ, then the pseudo amino acid "X" would be used to supplement the sequence. The (2ξ+1)-tuple peptide sample Pξ can be further classified into the following two categories:

    ¯Pξξ{¯P+ξ,ifitscenterisaSONsite¯P¯ξ,otherwise (3)

    where ¯P+ξ denotes a true SNO segment C with at its center, ¯P¯ξ a corresponding false SNO segment, and the symbol means "a member of" in the set theory.

    GO-KNN [18] features were extracted on the basis of the GO annotations of proteins. In this work, we need to find out the GO terms of all protein sequences and calculate the distance between proteins. Take protein P1 as an example, for anyone of other proteins, for example, P2, then their GO terms can be listed as P1GO={GO11,GO12,,GO1M} and P2GO={GO21,GO22,,GO2N} are obtained. If there is no GO term for a protein, we will replace it with GO terms of its homologous protein. The distance between two proteins can be calculated with Eq (4):

    Distance(P1,P2)=1P1GOP2GOP1GOP2GO (4)

    where, GO1i and GO2i represent the i-th GO of P1 and P2, respectively. The M and N represent the numbers of GO, respectively, ∪and ∩ are the union and intersection in the set theory, and ⌊ ⌋ represents the number of elements in the set. Then, the GO-KNN features could be extracted according to the following steps: 1) Sorting the calculated distances in ascending order; 2) Selecting the first k near neighbors of the test protein; 3) Calculating the percentage of positive samples in the k neighbors. In this study, k were selected as 2, 4, 8, 16, 62, 64, 128, 256, 512, 1024. In this way, a 10-dimensional feature vector (x1,x2,,x10) could represents the protein P1.

    A bag-of-words model [29] based on the physical and chemical properties of protein has been used in identifying GPCR-drug interaction. The main steps are listed as follows: 1) Encoding the protein sequence with its physical and chemical properties. Up to now, scientists have obtained various physical and chemical properties of 20 common amino acids [30]. After careful experimental comparison, hydrophilicity was selected as an indicator for the proposed model. 2) Designing wordbooks for protein. When the window sizes are 1, 2 and 3, and the step size of the moving window is 1, the coding sequence is divided into segments of different lengths. Segments of length 1 form wordbook WB1, segments of length 2 form wordbook WB2, and segments of length 3 form wordbook WB3. When the window size is 2, the step size of moving the window is still 1. But the window at this time is different from the above, it is separated by an amino acid. At this time, the coding sequence is divided into fragments of length 2, and these fragments form the wordbook WB4. 3) Clustering the word books. We divided the words in the wordbook WB1 into 20 sub-groups according to the types of amino acids. Words in WB2,WB3 and WB4 were clustered with K-means algorithm. The numbers of clusters were 16, 62 and 16. 4) Calculating the ratio of the number of each amino acid to the total number of words in the vocabulary with Eq (5).

    XWBji=XWBjiN   i=1,,K   j=1,2,3,4 (5)

    here, K is the number of clusters in the wordbook WBj, XWBji is the number of words in the i-th category of the wordbook WBj, and N is the total number of words in the wordbook WBj. Then a 114-D feature vector was formed for a given protein sequence, i.e. (XWB11,,XWB120,XWB21,,XWB216,XWB31,...XWB362,XWB41...,XWB416).

    PseAAC [23,24] is a very popular feature for bioinformatics. In this work, six physical and chemical properties of hydrophobicity, hydrophilicity, molecular side chain mass, PK1, PK2 and PI are used. We first used Eq (6) to transform the original physical and chemical properties of amino acids:

    Wa(i)=W0a(i)20i=1W0a(i)2020i=1[W0a(i)20i=1W0a(i)20]220 (6)

    where, a{1,2,,6} and i{1,2,,20}. W0a(i) represents the value of the ath original physical and chemical properties of the ith amino acid. We substitute the values of the transformed physical and chemical properties with Eq (7):

    Θ(Ri,Rj)=166a=1[Wa(Rj)Wa(Ri)]2 (7)

    where, W1(Rj)represents the value of hydrophobicity of Rj. By analogy, W6(Ri)represents the PI value of Ri. Then the correlation factor of each layer can be obtained by using the Eq (8):

    θλ=1LλLλi=1Θ(Ri,Ri+λ)λ<L (8)

    where θλ represents the correlation factor of the λ-th layer of the protein sequence. Finally, the protein sequence is converted into a feature by Eq (9):

    xi={fi20i=1fi+ωλj=1θj(1i20)ωθi2020i=1fi+ωλj=1θj(20+1i20+λ) (9)

    here, fi represents the frequency of the i-th amino acid, ω is 0.5, and λ is 5. In this way, a 25-dimensional feature vector is formed.

    In order to reduce the adverse effect of unbalanced data on the performance of the model, many methods for dealing with unbalanced data have been proposed, such as Synthetic Minority Oversampling Technique [31] (SMOTE) and Random Under Sampler [32] (RUS). SMOTE is a method proposed by Chawla et al. It has been used to predict protein sites [27] and improve the prognostic assessment of lung cancer [33]. RUS is a very simple and popular method of under-sampling. It can be used in pediatric pneumonia detection [34] and convolutional neural network performance improvement issues [35]. In this study, we combined these two methods to process the data. SMOTE is used to oversample the positive samples, and RUS is used to under-sample the negative samples. In the end, the number of processed positive samples is equal to negative samples. The specific process is shown in Figure 2.

    Figure 2.  Balance database processing.

    CSKAAP [25] has been widely used in protein site prediction [26] since it can effectively express internal laws for a given protein sequence. The protein fragment is composed of 20 common amino acids and a pseudo amino acid, which contains 441 residue pairs (AA, AC, ..., XX) for each l. Here l represents the space between each residue pair. The following formula is used to calculate the characteristics of the fragment:

    (NAANT,NACNT,NADNT,,NXXNT)441 (10)

    here NAA,NAC, represent the number of times the corresponding amino acid pair appears in the fragment, L is the length of the protein fragment. NT=Ll1. In this study, the values of l are 0, 1, 2, 3, 4, and the corresponding NT are 40, 39, 38, 37 and 36, respectively. Then, a 2205-D feature vector is formed.

    Based on the structural properties of proteins, researchers have proposed the tripeptide composition (TPC). It has been used to predict protein subcellular localization [36] identification of plasmodium mitochondrial proteins [37]. TPC calculates the frequency of three consecutive amino acids, and then a protein fragment can be represented by a 9261-dimensional vector.

    Pi=Ni92611Ni (11)

    where Ni represents the total number of i-th in 9261 tripeptides.

    The elastic net proposed by Zou and Hastie [38] is an effective feature selection method. By introducing the L1,L2 norm into a simple linear regression model, the elastic net can not only perform continuous shrinkage and automatically select variables at the same time, but also predict related variables. At present, elastic nets have been widely used in protein site prediction [27,39] and achieved good results.

    In this study, four indicators were used to evaluate the performance of the models. They are accuracy (ACC), sensitivity (SN), specificity (SP) and Matthews correlation coefficient (MCC) [40], which are defined by Eq (12):

    {Sn=TPTP+FNSp=TNTN+FPACC=TP+TNTP+FP+TN+FNMCC=TP×TNFP×FN(TP+FP)×(TP+FN)×(TN+FP)×(TN+FN) (12)

    In predicting SNO proteins, TP indicates the number of proteins that are predicted to have SNO sites and actually have SNO sites, and TN is the number of proteins that are predicted to have no SNO sites that are actually not have SNO sites. FP is the number of proteins without SNO sites but predicted to have SNO sites, FN is the number of proteins with SNO sites but predicted to have no SNO sites. In addition, the area under the ROC curve AUC is also used to evaluate this model.

    In predicting SNO sites, TP indicates the number of actual SNO sites predicted to be SNO sites, and TN indicates the number of non-SNO sites predicted to be not SNO sites. FP is the number of non-SNO sites predicted to be SNO sites, and FN is the number of actual SNO sites predicted to be non-SNO sites.

    Random Forest [41] is an algorithm that integrates multiple trees through the idea of ensemble learning. Its basic unit is decision tree. As a highly flexible machine learning algorithm, Random Forest (RF) has been widely used in data analysis [42], biological information [43] and technological development [44].

    Naive Bayes (NB) [45] is a simple and effective classifier, which is widely used in software defect prediction [46], medical diagnosis [47] and biological information [48]. NB is based on the Bayes theorem and the assumption of the conditional independence of features, which greatly reduces the complexity of the classification algorithm.

    K Nearest Neighbor (KNN) [49] is one of the supervised machine learning algorithms, which is widely used in face recognition [50], disease research [51] and engineering applications [52]. Its main idea is to judge the category of the predicted value based on the category of the k points closest to the predicted value.

    XGBoost [53,54] is an improved algorithm for boosting based on GBDT [55]. XGBoost is an integrated lifting algorithm that integrates many basic models to form a strong model. Because of its advantages such as good prediction effect and high training efficiency, XGBoost has been widely used in the field of data analysis.

    In this research, GO-KNN, BOW and PseAAC three kinds of feature extraction methods were used to encode the protein sequence, and obtained 10-D, 114-D and 25-D feature vectors, respectively. These three kinds of features were fused into a 149-D feature vector ALL. Through the 5-fold cross-validation, the prediction results obtained by different feature extraction are shown in Table 1.

    Table 1.  The predict results of different features.
    Feature Acc (%) Sn (%) Sp (%) MCC AUC
    GO-KNN 82.72 48.36 92.37 0.4533 0.8521
    BOW 78.83 13.04 97.30 0.1969 0.7359
    PseAAC 79.38 19.43 96.22 0.2503 0.7616
    ALL 83.77 49.49 93.40 0.4840 0.8593

     | Show Table
    DownLoad: CSV

    It can be seen from Table 1 that different features obtained varied prediction results. Among the three methods, GO-KNN has the highest ACC, Sn, MCC and AUC, of which are 82.72%, 48.36%, 0.4533 and 0.8521 respectively. The ACC, Sn, MCC and AUC of BOW are the lowest, of which are 78.83%, 13.04%, 0.1969 and 0.7359, respectively. But the Sp of BOW is 97.30%, which is the highest. After combining these three characteristics, ACC, Sn, Sp, MCC, AUC are 83.77%, 49.49%, 93.40%, 0.4840, 0.8593, respectively. Among them, ACC, Sn, MCC and AUC are all higher than those produced by GO-KNN. The results show that multi-feature fusion can improve a number of indicators. In order to better analyze the influence of different features on the prediction of SNO proteins, the prediction results obtained from the three features and their fusion features are shown in Figure 3.

    Figure 3.  Comparison of prediction results on different features.

    It can be seen from Figure 3 that the three features and their fusion affect the five evaluation indicators to some extent. They are less effective on Sn and MCC, and better on ACC, Sp and AUC. Comparing these four feature codes, the ACC, Sn, MCC and AUC of the fusion feature ALL are improved. Multi-feature fusion can reflect sequence information more comprehensively, thereby improving prediction ability. Therefore, multi-feature fusion can be used to predict SNO proteins.

    Here, SMOTE and RUS are denoted as SR balancer. We input the pre-balanced and post-balanced data sets into the model, and passed the 5-fold cross-validation to obtain the prediction results of ACC, Sn, Sp, MCC, AUC on the balanced and unbalanced data sets, as shown in the Table 2.

    Table 2.  Comparison of predict results before and after SR Balancer.
    Acc (%) Sn (%) Sp (%) MCC AUC
    Imbalance 83.77 49.49 93.40 0.4840 0.8593
    Balance 81.84 70.82 84.93 0.5178 0.8635

     | Show Table
    DownLoad: CSV

    It can be seen from Table 2 that the balanced Sn and Sp are relatively balanced. In addition, Sn, MCC and AUC have improved. Therefore, in summary, it is very necessary to balance the dataset.

    Classifiers play an important role in model prediction. This work used the above four classifiers to identify SNO proteins. After 5-fold cross-validation, the results of each classifier for ACC, Sn, Sp, MCC and AUC are shown in Table 3. It can be seen from Table 3 that the effect of random forest on various evaluation indicators is the best. In order to better compare the effects of different classifiers, the prediction results of the four classifiers are shown in Figure 4.

    Table 3.  The prediction results of different classifiers.
    Algorithms Acc (%) Sn (%) Sp (%) MCC AUC (%)
    RF 81.84 70.82 84.93 0.5178 0.8635
    NB 63.81 78.37 59.73 0.3154 0.7710
    KNN 71.97 83.44 68.75 0.4366 0.8360
    XGBoost 80.73 70.07 83.72 0.4953 0.8553

     | Show Table
    DownLoad: CSV
    Figure 4.  The ROC curves of different classification methods.

    The area under the ROC curve can evaluate the predictive performance of the model. It can be seen from Figure 4 that when the random forest is used as a classifier, the area under the ROC curve is the largest. Therefore, random forest is the best choice for the proposed model.

    In this study, two kinds of features, CKSAAP and TPC, were used, and the 2205-dimension and 9261-dimension feature vectors were obtained on the basis of above-mentioned algorithms. In order to better reflect the information of protein fragments, these features are fused into a 11, 466-dimension feature vector. Through 5-fold cross-validation, the prediction results obtained by different feature extraction are shown in Table 4.

    Table 4.  The prediction results of different feature extraction methods.
    Feature Acc (%) Sn (%) Sp (%) MCC AUC
    CKSAAP 73.97 83.67 64.27 0.4891 0.8036
    TPC 71.38 66.07 76.74 0.4305 0.8069
    ALL 75.36 86.39 64.31 0.5201 0.8196

     | Show Table
    DownLoad: CSV

    It can be seen from Table 4 that the ACC, Sn and MCC of CKSAAP are higher than those of TPC. TPC performs better than CKSAAP on Sp and AUC. After feature fusion, Acc, Sn, MCC and AUC are all higher than single feature. Therefore, feature fusion is necessary for this issue.

    Multi-information fusion can more comprehensively extract protein sequence information, but redundancy and noise information will also be generated. The dimensionality reduction method can not only retain important features, but the computational efficiency of the model will also be improved. In this paper, elastic net was used to reduce the dimensionality of the fused feature data set, and obtain the feature subset of 704. After 5-fold cross-validation, the prediction results of Random Forest are shown in Table 5.

    Table 5.  Results before and after feature selection.
    Acc (%) Sn (%) Sp (%) MCC AUC
    All 75.36 86.39 64.31 0.5201 0.8196
    Elastic net 76.02 85.68 66.33 0.5304 0.8260

     | Show Table
    DownLoad: CSV

    The features after dimensionality reduction using elastic nets, except for Sn, all other evaluation indicators have been improved. In addition, because the feature dimension is greatly reduced after dimensionality reduction, the efficiency of the model is also significantly improved.

    Four kinds of classifiers, Random Forest, Naive Bayes, K-Nearest Neighbor and XGBoost, were tested in this work for predicting SNO sites. After 5-fold cross-validation, the results were shown in Table 6. From Table 6 we can get that Naive Bayes and K-Nearest Neighbors are relatively inferior. Except for Sp, all indicators of Random Forest were the best. In order to evaluate the performance of the classifier more comprehensively, the ROC curves of different classifiers are shown in Figure 5.

    Table 6.  The prediction results of different classifiers.
    Algorithms Acc (%) Sn (%) Sp (%) MCC AUC (%)
    RF 76.02 85.68 66.33 0.5304 0.8260
    NB 69.74 79.46 59.98 0.4022 0.7605
    KNN 63.63 46.39 81.00 0.2923 0.7246
    XGBoost 72.88 74.37 71.40 0.4580 0.8015

     | Show Table
    DownLoad: CSV
    Figure 5.  The ROC curves of different classification methods.

    From Figure 5, we can clearly see that the area under the ROC curve of the random forest is the largest. Therefore, random forest has been selected as the classifier of the proposed model.

    To further evaluate the performance of this model, and we compared it with the PreSNO and RecSNO models. The prediction results of three different methods for the same data set are shown in Table 7. From Table 7, we can see that the ACC, Sn and MCC models of this model are all the highest. In addition, the Sp and AUC of the model in this paper also have good results. Therefore, the performance of this model is better than PreSNO and RecSNO.

    Table 7.  Comparison of the RF-SNOPS with other methods.
    Feature Acc (%) Sn (%) Sp (%) MCC AUC
    PreSNO 70% 54% 86% 0.42 0.84
    RecSNO 72% 79% 66% 0.45 0.79
    RF-SNOPS 76.02 85.68 66.33 0.5304 0.8260

     | Show Table
    DownLoad: CSV

    In order to identify SNO proteins, we used GO-KNN, BOW and PseAAC to extract the sequence information. GO-KNN extracted KNN neighbor information based on protein GO information, and BOW and PseAAC extracted protein sequence information based on physical and chemical properties. In addition, we used the SR balancer to process the unbalanced data set, reduce the negative impact of the unbalance on the model. Finally, Random Forest was used to make predictions. For predicting SNO sites, CKSAAP and TPC were used to extract protein fragment information. In order to improve the computational efficiency and eliminate the redundancy and noise generated by the fusion features, we used elastic nets to reduce the dimensionality of the fusion features. These processes only need to require calculation models without any physical and chemical experiments, which can save experimental costs and improves work efficiency. We hope that this work will be helpful for solving biological problems with computational methods.

    This work was supported by the grants from the National Natural Science Foundation of China (No. 31760315, 62162032, 61761023), Natural Science Foundation of Jiangxi Province, China (NO. 20202BAB202007).

    The authors have declared that no competing interest exists.



    [1] C. Conradi and A. Shiu, Dynamics of post-translational modification systems: recent progress and future directions, Biophys. J., 114 (2018), 507-515.
    [2] T. Suwanmajo and J. Krishnan, Exploring the intrinsic behaviour of multisite phosphorylation systems as part of signalling pathways, J. R. Soc. Interface, 15 (2018), 20180109.
    [3] J. Gunawardena, Multisite protein phosphorylation makes a good threshold but can be a poor switch, Proc. Natl. Acad. Sci. U.S.A., 102 (2005), 14617-14622.
    [4] C. Salazar and T. Hofer, Multisite protein phosphorylation-from molecular mechanisms to kinetic models, FEBS J., 276 (2009), 3177-3198.
    [5] M. Thomson and J. Gunawardena, Unlimited multistability in multisite phosphorylation systems, Nature, 460 (2009), 274-277.
    [6] C. Conradi, D. Flockerzi and J. Raisch, Multistationarity in the activation of an MAPK: parametrizing the relevant region in parameter space, Math. Biosci., 211 (2008), 105-131.
    [7] C. Conradi and M. Mincheva, Catalytic constants enable the emergence of bistability in dual phosphorylation, J. R. S. Interface, 11.
    [8] J. Hell and A. D. Rendall, A proof of bistability for the dual futile cycle, Nonlinear Anal. Real World Appl., 24 (2015), 175-189.
    [9] D. Flockerzi, K. Holstein and C. Conradi, N-site Phosphorylation Systems with 2N-1 Steady States, Bull. Math. Biol., 76 (2014), 1892-1916.
    [10] L. Wang and E. D. Sontag, On the number of steady states in a multiple futile cycle, J. Math. Biol., 57 (2008), 29-52.
    [11] C. Conradi and A. Shiu, A global convergence result for processive multisite phosphorylation systems, Bull. Math. Biol., 77 (2015), 126-155.
    [12] C. Conradi, M. Mincheva and A. Shiu, Emergence of oscillations in a mixed-mechanism phosphorylation system, Bull. Math. Biol., 81 (2019), 1829-1852.
    [13] T. Suwanmajo and J. Krishnan, Mixed mechanisms of multi-site phosphorylation, J. R. Soc. Interface, 12 (2015), 20141405.
    [14] N. Obatake, A. Shiu, X. Tang and A. Torres, Oscillations and bistability in a model of ERK regulation, J. Math. Biol., (2019), https://doi.org/10.1007/s00285-019-01402-y.
    [15] Y. A. Kuznetsov, Elements of Applied Bifurcation Theory, Springer-Verlag, 1995.
    [16] H. Errami, M. Eiswirth, D. Grigoriev, et al., Detection of hopf bifurcations in chemical reaction networks using convex coordinates, J. Comput. Phys., 291 (2015), 279-302.
    [17] M. El Kahoui and A. Weber, Deciding Hopf bifurcations by quantifier elimination in a softwarecomponent architecture, J. Symbolic Comput., 30 (2000), 161-179.
    [18] W. M. Liu, Criterion of hopf bifurcations without using eigenvalues, Journal of Mathematical Analysis and Applications, 182 (1994), 250-256.
    [19] X. Yang, Generalized form of Hurwitz-Routh criterion and Hopf bifurcation of higher order, Appl. Math. Lett., 15 (2002), 615-621.
    [20] B. L. Clarke, Stability of Complex Reaction Networks, 1-215, John Wiley & Sons, Ltd, 2007. Available from: https://onlinelibrary.wiley.com/doi/abs/10.1002/9780470142622.ch1.
    [21] B. L. Clarke, Stoichiometric network analysis, Cell Biochem. Biophys., 12 (1988), 237-253.
    [22] R. T. Rockafellar, Convex Analysis, Princeton University Press, 1970.
    [23] B. D. Aguda and B. L. Clarke, Bistability in chemical reaction networks: Theory and application to the peroxidase-oxidase reaction, J. Chem. Phys., 87 (1987), 3461-3470.
    [24] K. Gatermann, M. Eiswirth and A. Sensse, Toric ideals and graph theory to analyze hopf bifurcations in mass action systems, J. Symbol. Comput., 40 (2005), 1361-1382.
    [25] C. Conradi and D. Flockerzi, Switching in mass action networks based on linear inequalities, SIAM J. Appl. Dyn. Syst., 11 (2012), 110-134.
    [26] E. Feliu, C. Lax, S. Walcher, et al., Quasi-steady state and singular perturbation reduction for reaction networks with non-interacting species, arXiv, 1908.11270.
    [27] A. Cornish-Bowden, Fundamentals of Enzyme Kinetics, 3rd edition, Portland Press, London, 2004.
    [28] A. Goeke, S. Walcher and E. Zerz, Determining "small parameters" for quasi-steady state, J. Differ. Equations, 259 (2015), 1149-1180.
    [29] A. Goeke, S. Walcher and E. Zerz, Classical quasi-steady state reduction-a mathematical characterization, Phys. D, 345 (2017), 11-26.
    [30] H.-R. Tung, Precluding oscillations in Michaelis-Menten approximations of dual-site phosphorylation systems, Math. Biosci., 306 (2018), 56-59.
    [31] M. Banaji, Inheritance of oscillation in chemical reaction networks, Appl. Math. Comp., 325 (2018), 191-209.
    [32] C. Conradi, E. Feliu, M. Mincheva, et al., Identifying parameter regions for multistationarity, PLoS Comput. Biol., 13 (2017), e1005751.
    [33] A. Sadeghimanesh and E. Feliu, The multistationarity structure of networks with intermediates and a binomial core network, Bull. Math. Biol., 81 (2019), 2428-2462.
    [34] A. von Kamp, S. Thiele, O. Hädicke, et al., Use of CellNetAnalyzer in biotechnology and metabolic engineering, J. Biotechnol., 261 (2017), 221-228.
  • Reader Comments
  • © 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(4343) PDF downloads(491) Cited by(9)

Figures and Tables

Figures(1)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog