Loading [MathJax]/jax/output/SVG/jax.js
Research article Special Issues

Pseudo-Stieltjes calculus: αpseudo-differentiability, the pseudo-Stieltjes integrability and applications

  • Received: 27 August 2024 Revised: 03 November 2024 Accepted: 14 November 2024 Published: 27 November 2024
  • In this paper, the concepts of the αpseudo-differentiability and the pseudo-Stieltjes integrability are proposed, and the corresponding transformation theorems and Newton–Leibniz formula are established. The obtained results provide a framework for analyzing nonlinear differential equations.

    Citation: Caiqin Wang, Hongbin Xie, Zengtai Gong. Pseudo-Stieltjes calculus: αpseudo-differentiability, the pseudo-Stieltjes integrability and applications[J]. Electronic Research Archive, 2024, 32(11): 6467-6480. doi: 10.3934/era.2024302

    Related Papers:

    [1] Jun Yan, Tengsheng Jiang, Junkai Liu, Yaoyao Lu, Shixuan Guan, Haiou Li, Hongjie Wu, Yijie Ding . DNA-binding protein prediction based on deep transfer learning. Mathematical Biosciences and Engineering, 2022, 19(8): 7719-7736. doi: 10.3934/mbe.2022362
    [2] Haipeng Zhao, Baozhong Zhu, Tengsheng Jiang, Zhiming Cui, Hongjie Wu . Identification of DNA-protein binding residues through integration of Transformer encoder and Bi-directional Long Short-Term Memory. Mathematical Biosciences and Engineering, 2024, 21(1): 170-185. doi: 10.3934/mbe.2024008
    [3] Xiao Wang, Jianbiao Zhang, Ai Zhang, Jinchang Ren . TKRD: Trusted kernel rootkit detection for cybersecurity of VMs based on machine learning and memory forensic analysis. Mathematical Biosciences and Engineering, 2019, 16(4): 2650-2667. doi: 10.3934/mbe.2019132
    [4] Jiu-Xin Tan, Shi-Hao Li, Zi-Mei Zhang, Cui-Xia Chen, Wei Chen, Hua Tang, Hao Lin . Identification of hormone binding proteins based on machine learning methods. Mathematical Biosciences and Engineering, 2019, 16(4): 2466-2480. doi: 10.3934/mbe.2019123
    [5] Hong Yuan, Jing Huang, Jin Li . Protein-ligand binding affinity prediction model based on graph attention network. Mathematical Biosciences and Engineering, 2021, 18(6): 9148-9162. doi: 10.3934/mbe.2021451
    [6] Wenjun Xu, Zihao Zhao, Hongwei Zhang, Minglei Hu, Ning Yang, Hui Wang, Chao Wang, Jun Jiao, Lichuan Gu . Deep neural learning based protein function prediction. Mathematical Biosciences and Engineering, 2022, 19(3): 2471-2488. doi: 10.3934/mbe.2022114
    [7] Felicia Maria G. Magpantay, Xingfu Zou . Wave fronts in neuronal fields with nonlocal post-synaptic axonal connections and delayed nonlocal feedback connections. Mathematical Biosciences and Engineering, 2010, 7(2): 421-442. doi: 10.3934/mbe.2010.7.421
    [8] Hui-Yu Huang, Wei-Chang Tsai . An effcient motion deblurring based on FPSF and clustering. Mathematical Biosciences and Engineering, 2019, 16(5): 4036-4052. doi: 10.3934/mbe.2019199
    [9] Jianhua Jia, Yu Deng, Mengyue Yi, Yuhui Zhu . 4mCPred-GSIMP: Predicting DNA N4-methylcytosine sites in the mouse genome with multi-Scale adaptive features extraction and fusion. Mathematical Biosciences and Engineering, 2024, 21(1): 253-271. doi: 10.3934/mbe.2024012
    [10] Jujuan Zhuang, Kexin Feng, Xinyang Teng, Cangzhi Jia . GNet: An integrated context-aware neural framework for transcription factor binding signal at single nucleotide resolution prediction. Mathematical Biosciences and Engineering, 2023, 20(9): 15809-15829. doi: 10.3934/mbe.2023704
  • In this paper, the concepts of the αpseudo-differentiability and the pseudo-Stieltjes integrability are proposed, and the corresponding transformation theorems and Newton–Leibniz formula are established. The obtained results provide a framework for analyzing nonlinear differential equations.



    Many biological processes are carried out by the DBPs, such as specific nucleotide sequence recognition, transcription and DNA replication. Therefore, identification of DBPs has become an import subject of biology. The protein can be identified by various experimental techniques, such as ChIP-chip [1,2] and filter binding assays [3]. However, with the development of high-throughput sequencing technology, protein sequence databases have increased unprecedentedly. Proteins whose structure and function are unknown are on the rise. A rapid and accurate method for identifying and characterizing DBPs based on their protein sequence is highly desired. Computer prediction methods have been widely applied to various biological problems [4,5,6,7,8,9,10,11].

    The existing prediction methods are broadly divided into two groups. The first group is model based prediction methods. These methods that borrow prior information across sequences to predict DBPs, including amino acid composition [12,13], evolutionary information [14,15] and physicochemical [16] character. For example, Rahman et al. [17] presented a predictor named DPP-PseACC. They used Chou's PseAAC [18] to extract features from amino acid composition and Random Forest (RF) model to reduce the dimension of feature vector. Then, they applied Support Vector Machine [19] (SVM) with linear kernel to train prediction model. Similarly, StackPDB take three steps to predict DBPs, including feature extraction, feature selection and model construction. StackPDB extract protein sequence features from amino acid and composition and evolutionary information. Evolutionary information can be represented by the position specific scoring matrix (PSSM), which is generated by PSI-BLAST [20] program. In StackPDB method, PsePSSM, PSSM-TPC, EDT and RPT are used to extract PSSM. They then used extreme gradient boosting-recursive feature elimination to select the best features. Finally, the excellent feature subset is fed into the stacked ensemble classifier, which composes XGBoost, SVM and LightGBM. From the previous study [21,22,23,24,25], we can see that the protein sequence can be described by different representations, such as amino acid composition and PSSM. Because fusion methods can exploit information from all representations to effectively improve the model performance, some fusion techniques are performed in identification of DBPs.

    For example, CKA-MKL [26], HSIC-MKL [27], HKAM-MKL [28] and MLapSVM-LBS [29]. CKA-MKL, HSIC-MKL and HKAM-MKL are the Multiple Kernel Learning (MKL), which is popular early fusion techniques. MKL aims to learn optimal kernel weights. The optimal kernel is linear combined by multiple base kernels based on the related weight. CKA-MKL maximizes the cosine similarity score between the optimal and the ideal kernel. In addition, CKA-MKL introduce the Laplacian term about weights into objective function to avoid extreme situations. CKA-MKL only considers the global kernel alignment and ignore the difference information between local samples. Therefore, HKAKM-MKL both maximize the score of local and global kernel alignment. CKA-MKL and HKAM-MKL both use SVM as a classifier. HSIC-MKL maximizes the value of independence between trained samples and labels in Reproducing Kernel Hilbert Space (RKHS). Then, the optimal kernel was input into a hypergraph based Laplacian SVM, which is the extension of SVM. CKA-MKL only considers the global manner. Furthermore, HKAM-MKL both consider global and local manners. HKAM-MKL is therefore superior to CKA-MKL in predicting DBPs. Different from the above MKL methods, MLapSVM-LBS fuses multiple information during training progress. MLapSVM-LBS uses the multiple local behavior similarity graph as the regularization term. Because the objective function of MLapSVM-LBS is non-convex, an alternation algorithm is employed. The advantage of MLapSVM-LBS is that the multiple information is fused during the training phase while allowing for some degree of freedom to model the views differently.

    There are several methods for predicting protein sequences that are based on structural information. Using structural alignment and statistical potential, Gao et al. [4] proposed the DBD-Hunter. DBD-Threader was subsequently proposed by Gao et al. [30] for the prediction of DBPs. The DBD-Threader uses a template library consisting of DNA-protein complex structures, while its classification relies only on the sequence of the target protein. When the structure of a candidate protein is known, structure-based predictors can be used. Therefore, predictors that rely solely on structural information about proteins are limited in their application.

    The second group is deep learning-based prediction methods. Deep learning-based methods are designed by capture the hidden representation of protein sequence. For example, Du et al. [31] reported a deep learning-based method called MsDBP. MsDBP only relies on the primary sequence, without human-crafted feature selection. Lu et al. [32] proposed a predictor that contains parallel long and short-term memory (LSTM) and convolutional neural networks (CNN). In Lu's work, the input of LSTM and CNN is sequence and PSSM, respectively. The spatial structure of a protein contains richer information compared with protein sequences. Therefore, Lu et al. [33] further constructed a graph convolutional network based on the contact map, which is generated by Pconsc4 [34]. Yan et al. [35] employed the transfer learning to construct data sets and build a deep learning neural network with attention mechanisms to detect DBPs. Because of their nature, most deep learning-based methods [33,36,37] are not suitable for small datasets.

    Inspired by a series of recent publications [26,27,38,39,40,41,42,43,44,45,46], we propose a predictor for detecting DBPs. This predictor was called LapLKA-RKM, which needs the following three steps: 1) represent the protein sequence with a set of feature vectors, including Global Encoding (GE), Multi-scale Continuous and Discontinuous descriptor (MCD), Normalized Moreau-Broto Auto Correlation (NMBAC), PSSM-based Discrete Wavelet Transform (PSSM-DWT), PSSM-based Average Blocks (PSSM-AB) and PSSM-Pse; 2) fuse these features by LapLKA (this progress can be seen as selection of features); 3) RKM was developed to make the prediction. A brief architecture of LapLKA-RKM is shown in Figure 1. We conducted LOOCV and independent testing on PDB1075 and PDB2272, respectively. The prediction accuracy indicate that our methods is an effectively tool for DBPs detection.

    Figure 1.  A workflow of the LapLKA-RKM.

    The contributions of our methods include: 1) we propose a MKL algorithm, called LapLKA, which can outperform other MKL methods in handling multiple kernels; 2) we extend the RKM to a multiple kernel setting by weighting shared hidden features.

    Three protein datasets with different sizes were adopted in our study to test the ability of LapLKA-RKM in predicting DBPs. These datasets collected from the PDB, UniProt and Swiss-Prot database, namely PDB1075 [12], PDB14189 [31] and PDB2272 [31].

    The dataset construction rules are as follows:

    N=N+N (1)

    where N is the number of total samples, N+ is the number of DBPs samples and N is the number of non-DBPs samples. We present a brief summary of the three datasets in Table 1. Sequences with sequence similarity greater than 25%, 25%, 40% in PDB1075, PDB2272 and PDB14189 were removed, respectively.

    Table 1.  A summary of three datasets used in this study.
    Datasets N+ N- N
    PDB1075 525 550 1075
    PDB14189 7129 7060 14,189
    PDB2272 1153 1119 2272

     | Show Table
    DownLoad: CSV

    Leave-one-out cross-validation (LOOCV) and independent testing are conducted to show the ability of predictor. We conduct the LOOCV and 10-CV on PDB1075, because PDB1075 is a small dataset and its running time is acceptable. To show the robustness of generalization and ability of big dataset of models, we take PDB14189 dataset as training set and PDB2272 as test set.

    A total of six sequence-based features are extracted from proteins, including GE [47], MCD [48], NMBAC [49], PSSM-DWT [50], PSSM-AB [51] and PSSM-Pse [13,52,53,54]. Where GE and MCD extract feature vectors from the amino acid composition of sequences. NMBAC describes the six physicochemical properties of amino acids, namely Polarizability, Polarity, Solvent Accessible Surface Area, Hydrophobicity, Net Charge Index of Side Chains and Volume of Side Chains. PSSM-AB, PSSM-DWT and PSSM-Pse consider proteins' evolutionary information, which can be represented by the position specificity score matrix (PSSM). PSSM is generated by PSI-BLAST [20]. The optimal parameters of NMBAC and PSSM-Pse were implemented by previous study [26]. In the related literature, these features are described in detail.

    RKM is a kind of kernel methods [55,56,57,58]. It maps data points from the input space to the feature space. The mapping is determined implicitly by a kernel function. Therefore, we need to construct kernel metrics as input to RKM. The kernel function mainly includes Linear Function, Polynomial Function and Radial Basis Function. Like other methods [27,38,59,60,61], RBF is employed to construct kernels and its formula is defined as:

    Kij=K(xi,xj)=exp(γxixj2),i,j=1,2,,N (2)

    where xi and xj are sample points, γ is the kernel bandwidth. Then, a predefined kernel set K is obtained:

    K={KGE,KMCD,KNMBAC,KPSSMAB,KPSSMDWT,KPSSMPse} (3)

    Laplacian Local Kernel Alignment (LapLKA) is a kind of supervised Multiple Kernel Learning (MKL). As we all know, an appropriate kernel matrix is very important to the success of any kernel method [62]. However, choosing an appropriate kernel matrix is difficult for biological applications. In terms of protein sequence, it can be described by different kernel matrixes. To address this limitation, MKL is proposed [39]. MKL aims to combine a set of predefined kernels by linear weight and the optimal kernel accurately represent a set of protein sequences. Let P as the number of predefined kernels, K={K1,,KP} as the kernel set. The optimal kernel K is the linear combination of the kernel set:

    K=Pp=1βpKp (4)

    where βp is the kernel mixture weight. Usually, the L1-norm is imposed to constraint the structure of β:

    ββ1=Pp=1|βp|=1 (5)

    The main goal of the LapLKA algorithm is to determine the values of β. There are two parts to LapLKA's learning strategy: local kernel and the inner relationship of global kernels. In previous studies [63,64,65], the score of kernel alignment is calculated only in global or local manner. Global manner aims to maximize the alignment score between the whole optimal kernel and the ideal kernel. Global manner may ignore the difference between similar samples. Contrary the global manner, local manner only considers the sub kernel, which is constructed by a set of similar samples. In the global manner, whole samples will be missed. For this reason, we propose LapLKA, which integrated local kernel alignments and the global kernel alignments.

    First, we define the function of kernel alignment as follow:

    A(P,Q)=P,QFPFQF (6)

    where P and Q are positive defined matrix, ,F and F are the Frobenius inner product and Frobenius norm, respectively. The value of kernel alignment is the cosine similarity between two kernels.

    For the local manner, we maximize the alignment score between the local kernel and the ideal kernel. The local kernel is constructed by each sample and its neighbors. We select the index of the k samples neighbor samples that are nearest to each sample. We choose the Euclidean distance in the input space as the evaluation of sample similarity. Then, we select the sample's neighbor samples based the similarity. The set of neighbors of samples of xi is Nk(xi). The local kernel about xc can be represented as:

    Kic=[Kc(u,v)]k×k,xu,xvNk(xi) (7)

    We maximize the average of all local kernel alignment scores. There, the objective function of local manner can be presented as follows:

    argmaxβ1NNi=1A(Pp=1βpK(i)p,K(i)Y) (8)

    where KY=YYT is ideal kernel, K(i)Y is calculated by the label of related samples.

    The global kernel alignment information is introduced into Eq (8) by the Laplacian regular term:

    Pi,j(βiβj)2Wij=Pi,j(β2i+β2j2βiβj)Wij=Pi,jβ2iDii+Pi,jβ2jDjj2Pi,jβiβjWij=2ββTLββ (9)

    where WRP×P is the global kernel alignment matrix, Wij represents the value of kernel alignment A(Ki,Kj). Equations (8) and (9) are integrated as follow:

    argmaxβ1NNi=1A(Pp=1βpK(i)p,K(i)Y)2λββTLββs.t.Pi=1βp=1 (10)

    To optimize Eq (10), we introduce the auxiliary variable τi into Eq (10). τi is defined as:

    τi=βTM(i)ββTMβ (11)

    where Mij=tr(KTiKj),M(l)ij=tr(K(l)TiK(l)j). Therefore, Equation (10) can be rewrite as:

    argmaxβββTQββTMββ2λββTLββs.t.Pi=1βp=1 (12)

    From [66,67,68,69,70,71,72], Equation (12) is equivalent to the following Quadratic Programming problem:

    argmaxβββTMββββT(2Q+4λL)s.t.Pi=1βp=1 (13)

    We employ CVX package [73] to optimization Eq (13).

    Restricted Kernel Machine (RKM) classification model is a kind of kernel method [8]. It was proposed by Suykens [56]. The objective function of RKM is closely similar to the Least Squares Support Vector Machine (LS-SVM) [74] model. SVM is also a kernel method and most methods [15,23,26,28,75] select SVM as classification. However, we choose RKM as classification. The reason is that, RKM is easily extend to deep framework, called Deep RKM [56]. Deep RKM can produce good results and we will use it throughout the rest paper.

    {(xi,yi)}Ni=1 denotes as the training data, where xiRd is the i-th input pattern and yi{1,1} the related sample label. It is well known that the objective function of LS-SVM is:

    argminw,bη2wTw+Ni=1e2is.t.ei=1(φ(xi)Tw+b)yi (14)

    We formulate a lower bound on the function Eq (14), and then the objective function of RKM classification is obtained:

    argminw,bη2wTw+Ni=1(1(φ(xi)Tw+b)yi)hiμ2Ni=1h2i (15)

    where b is a bias term, η and μ are hyperparameters and hi is a hidden feature. The map function φ() maps x from the input space into a reproducing kernel Hilbert space. Hidden features are obtained by an internal pairing of eTh, where e is the classification loss.

    The stationary points of the objective function Eq (15) in the primal formulation are characterized by:

    {1=φ(xi)Tw+b+λhi,i=1,,Nw=1ηNi=1φ(xi)yihi,i=1,,NNi=1yihi=0 (16)

    By eliminating the weights w, the linear formulation is obtained:

    [1ηK+μIN1N1TN0][yhb]=[y0] (17)

    where IN and 1N are the identity matrix and a one column vector, is the element-wise product.

    In this paper, we mainly focus on the RKM-based MKL formulations. The final linear system of RKM-based MKL is given by:

    [1ηPi=1βpKp+μIN1N1TN0][yhb]=[y0] (18)

    The linear system Eq (18) can be solved based on the training data. The variables h and bias term b are used to construct the classifier. For a test data point xt, the final decision function is:

    f(xt)=sign(1ηPp=1βpNi=1yihiKp(xi,xt)+b) (19)

    Because the identification of DBPs is the binary classification problem. The following parameters are employed to measure the performance of predictor:

    ACC=TP+TNTN+TP+FP+FN×100% (20)
    SP=TPTN+FP×100% (21)
    SN=TPTP+FN×100% (22)
    MCC=TN×TPFN×FP(TP+FP)(TN+FN)(TP+FN)(TN+FP)×100% (23)

    Here, TP is the number of DBPs that are predicted to be non-DBPs; FN is the number of non-DBPs that are predicted to be DBPs; FP is the number of DBPs that are predicted to be non-DBPs, and TN is the number of non-DBPs that are predicted to be non-DBP. In addition, ROC curve [76,77] and PR curve are also used to evaluate classification performance.

    We tune parameters for best performance by 5-fold cross-validation (5-CV) and grid searching on PDB1075. First, we try to find the optimal kernel bandwidth for six types of kernels. The optimal kernel bandwidth is obtained from its single kernel RKM and set the range from 25 to 25 with step 21. The results are shown in Table 2. Then, we select the parameters λ, η and μ from 25 to 25 with step 21, k from 10 to 50 with step 5. λ and k are parameters of LapLKA. λ weighs the relationship between the local manner and the global manner, and k is the number of neighbors for samples. η and μ are regularization parameters in RKM objective function.

    Table 2.  The optimal parameters for single kernel RKM.
    Parameters NMBAC GE MCD PSSM-AB PSSM-DWT PSSM-Pse
    γ 21 20 20 20 20 20
    η 20 20 20 21 2-1 2-1
    μ 2-1 2-1 2-3 20 2-4 2-1

     | Show Table
    DownLoad: CSV

    To demonstrate parameters sensitivity of LapLKA, we study the variation of performance according to change of λ and k with fixed parameters of RKM. Figure 2 shows the ACC variation with λ and k on PDB1075. We can see that our method is not sensitive to λ and k, especially k. Similarity, we study parameters sensitivity of RKM with fixed parameters of LapLKA. The ACC variation of η and μ is shown in Figure 3. We can observe that η and μ are both sensitivity parameters. When λ=25 and μ=25, the ACC score is the lowest. With λ and μ decreases gradually, the predicted performance of 5-CV is increase. It is still an open problem that the sensitively of the model to hyperparameters. Finally, we set k, λ, μ and η to be 15, 2, 0.125 and 2, respectively.

    Figure 2.  Effect of k and λ on ACC with fixed μ=0.125 and η=2 via 5-CV on PDB1075.
    Figure 3.  Effect of μ and η on ACC with fixed k=15 and λ=2 via 5-CV on PDB1075.

    In our method, there are four hyperparameters:k,λ,η,μ. Here, k is the parameter in the local multi-kernel, λ weighs the relationship between the global kernel and the local kernel, and η,μ is the RKM positive real regularization constant.

    To analyze the performance of these kernels, we evaluate different kernels in two experiments, as shown in Tables 3 and 4 and Figure 4.

    Table 3.  Compared with single kernel on PDB1075 (LOOCV).
    Kernel type ACC (%) SN (%) SP (%) MCC AUC
    NMBAC 73.49 76.57 70.55 0.4716 0.818
    GE 70.98 71.81 70.18 0.4198 0.786
    MCD 68.19 76.76 60.00 0.3723 0.762
    PSSM-AB 76.37 80.76 72.18 0.5307 0.840
    PSSM-DWT 74.51 75.05 74.00 0.4903 0.828
    PSSM-Pse 77.95 82.67 73.45 0.5628 0.864
    Weighted with LapLKA 85.77 89.90 81.82 0.7185 0.926

     | Show Table
    DownLoad: CSV
    Table 4.  Compared with single kernel on PDB2272 (independent test).
    Kernel type ACC (%) SN (%) SP (%) MCC AUC
    NMBAC 68.31 72.77 63.72 0.3665 0.7512
    GE 65.36 69.04 61.57 0.3070 0.7225
    MCD 71.79 79.36 63.99 0.4391 0.7853
    PSSM-AB 77.33 88.99 65.33 0.5601 0.8656
    PSSM-DWT 71.74 92.11 50.76 0.4723 0.8377
    PSSM-Pse 75.53 91.33 59.25 0.5354 0.8608
    Weighted with LapLKA 79.53 96.62 61.93 0.6264 0.9303

     | Show Table
    DownLoad: CSV
    Figure 4.  The ROC and PR curves of different kernels (LOOCV).

    Results of LOOCV on PDB1075 are listed in Table 3 and Figure 4. Because LapLKA is a linear combination of six types of kernels, LapLKA performs much better than the single kernel. In addition, the average scores of ACC, SN, SP, MCC and AUC with kernels using PSSM information (PSSM-AB, PSSM-DWT, PSSM-Pse) are 76.28%, 79.49%, 73.21%, 0.5279 and 0.8439, respectively. The kernels using AAC information (GE, MCD) perform worst, its average score of metrices is ACC:69.58%, SN:74.28%, SP:65.09%, MCC:0.3960 and AUC:0.7743. We can observe that the model using PSSM information is better than other information. Thus, PSSM is an excellent feature extraction method that contains the evolutionary relationship with other sequences.

    Results of independent test on PDB2272 are list in Table 4. Table 4 shows a same trend with Table 3. LapLKA achieves best performance and the model using PSSM information is better than other information. In addition, PSSM-AB achieves highest SP (65.33%) and second highest ACC (77.33%), MCC (0.5601) and AUC (0.8656). The advantage of LapLKA is also reflected on PDB2272. The improvement in ACC, SN, MCC and AUC are 2.2% (PSSM-AB), 4.51% (PSSM-DWT), 0.0663 (MCC) and 0.0647 (AUC), respectively.

    The running time of RKM with different kernels is also evaluated. In Table 5, the results are presented. RKM with multiple kernels is implemented in Matlab. It runs on an Intel i7-10750H CPU with 16 GB RAM. As we can see, our method is the most time-consuming. This can be explained by looking at the time complexity of RKM with single kernel and RKM with LapLKA-MKL. In RKM with single kernel, the time complexity of the training phase is largely influenced by calculating kernel matrices (O(N2d)) and solving linear problems (O(N3)). Three steps are involved in RKM with LapLKA-MKL: calculate the kernel matrices, MKL and solve a linear problem. These steps have a time complexity of O(PN2ˉd), O(N3) and O(N3).

    Table 5.  The running time of different kernels on PDB2272 (independent test).
    Kernel type Sec
    NMBAC 38.73
    GE 38.91
    MCD 42.04
    PSSM-AB 38.76
    PSSM-DWT 44.62
    PSSM-Pse 38.68
    Weighted with LapLKA 162.32

     | Show Table
    DownLoad: CSV

    Compared with single kernels, LapLKA achieves an obvious advantage. As a further demonstration of LapLKA's fusion capabilities, we compare it with BSV, FC, Comm and MV. Other MKL algorithms are also evaluated, including CKA, HSIC and FKL. In addition, we compare our method with other well-known classifiers. Other classifiers are fed multiple features concatenated for fair comparison. Details of the baseline methods are as follows:

    ● Best Single Kernel with RKM (BSK-RKM): The results of applying RKM in the best performance.

    ● Feature Concatenation with RKM (FC-RKM): Multiple features are concatenated and RKM is used to do classification.

    ● Feature Concatenation with Xtreme gradient boosting (FC-XGBoost): Multiple features are concatenated and XGBoost is used to do classification. The XGBoost [78] algorithm is a kind of ensemble learning model, which produces a strong model by assembling decision trees.

    ● Feature Concatenation with Random Forest (FC-RF): Multiple features are concatenated and RF is used to do classification. RF [79] is a classification algorithm combining ensemble tree-structed classifiers.

    ● Feature Concatenation with K Nearest Neighbors (FC-KNN): Multiple features are concatenated and KNN [80] is used to do classification. KNN is an algorithm for classification, which assigns a class label to a new data point based on the k nearest neighbors in the feature space.

    ● Committee RKM with RKM (Comm-RKM): Each kernel was input to RKM classification separately and taking the average of multiple RKM results as the final prediction result.

    ● Multi-View RKM classification [55] (MV-RKM): MV-RKM is an extension of the RKM Classification by assuming shared hidden nodes over all different features. The linear system of MV-RKM is:

    [1ηPi=1Kp+μINPN1TN0][yhb]=[Py0] (24)

    where PN is a column vector where each element equals P. From Eq (24), we can observe that MV-RKM can be seen as the MKL with mean weighted based RKM.

    ● Centered Kernel Alignment [26] with RKM (CKA-RKM): CKA is a kind of MKL algorithm. CKA estimates the optimal weights of kernels by maximizing the cosine similarity between the optimal kernel and ideal kernel. Different from LapLKA, CKA only consider the global manner.

    ● Hilbert Schmidt Independence Criterion [81] with RKM (HSIC-RKM): HSIC is a kind of MKL algorithm. HSIC optimize the kernel weight by maximize the dependence between the optimal kernel and ideal kernel. The advantage of HSIC is simple calculation and fast convergence.

    ● Fast Kernel Learning [82] with RKM (FKL-RKM): FKL also is a kind of MKL algorithm. FKL find fusing weight by minimize the Euclidean distance between the optimal kernel and ideal kernel. Since the objective function of FKL is quadratic programming, it is fast and effective at solving kernel weights.

    The hyperparameters of these fusion methods are detected by the 5-CV and the grid search on PDB1075.

    Table 6 and Figure 5 show all baseline methods and LapLKA on the PDB1075 by LOOCV. Table 7 shows comparison between each baseline methods on the PDB2272 by independent test. We can see: 1) LapLKA has the best performance no matter LOOCV on PDB1075 or independent test on big dataset. This indicates that LapLKA can obtain the best optimal kernel for classification by effectively combing the multiple kernels. 2) MV, CKA, HSIC and KTA perform better than typical fusion methods (BSV, FC and Comm) on PDB1075 by LOOCV. However, these MKL methods (MV, CKA, HSIC and KTA) are slightly inferior to typical fusion methods.

    Table 6.  Performance compared with other baseline methods on PDB1075 (LOOCV).
    Basline method ACC (%) SN (%) SP (%) MCC AUC
    BSV-RKM 77.95 82.67 73.45 0.5628 0.864
    FC-RKM 81.77 85.71 78.00 0.6382 0.898
    FC-XGBoost 73.52 76.93 70.34 0.4730 0.807
    FC-RF 75.18 65.58 84.39 0.5091 0.837
    FC-KNN 75.68 79.03 72.34 0.5140 0.829
    Comm-RKM 81.86 87.05 76.91 0.6418 0.899
    MV-RKM 83.35 88.76 78.18 0.6720 0.922
    CKA-RKM 82.51 84.19 80.91 0.6509 0.910
    HSIC-RKM 83.72 88.00 79.64 0.6777 0.916
    FKL-RKM 84.09 89.14 79.27 0.6863 0.921
    LapLKA-RKM 85.77 89.90 81.82 0.7185 0.926

     | Show Table
    DownLoad: CSV
    Figure 5.  The ROC and PR curves of different baseline methods.
    Table 7.  The statistics of different baseline methods on PDB1075 (10-CV).
    Basline method P value
    BSV-RKM 1.09E-6
    FC-RKM 2.88E-2
    FC-XGBoost 1.95E-5
    FC-RF 1.15E-3
    FC-KNN 5.81E-11
    Comm-RKM 4.85E-4
    MV-RKM 1.75E-1
    CKA-RKM 4.08E-2
    HSIC-RKM 1.57E-1
    FKL-RKM 1.50E-1

     | Show Table
    DownLoad: CSV

    A good prediction method should have good generalization capabilities. In light of this, we report the uncertainties of our method and baseline methods by 10-CV on PDB1075. The results are shown in Figure 6. According to the boxplot, our method is likely to produce similar results for different cross-validation splits. Additionally, our method produces the highest mean ACC. Furthermore, we report statistical tests of the differences under 10-CV on PDB1075. Table 7 demonstrates that, our method has statistically significant improvement over other baseline methods (P-value < 0.05, by t-test, in terms of ACC, for BSV-RKM, FC-RKM, FC-XGBoost, FC-RF, FC-KNN and CKA-RKM).

    Figure 6.  ACC of different baseline methods on PDB1075 (10-CV).

    In addition, the weight of each kernel (with MV, CKA, HSIC, KTA and LapLKA) on PDB14189 is shown in Figure 7. In HSIC and LapLKA approaches, the weights of PSSM-Pse are the largest and NMBAC is close to 0. Additionally, the weights of kernels using AAC usually lower than kernels using PSSM. For example, the sum of weights of PSSM kernels is 0.598, and the weights of AAC kernels is 0.281 in LapLKA. The analysis of performance of single kernel demonstrates that, the model using PSSM information is better than other information. Therefore, we can draw the conclusion that LapLKA could set low weights to noise kernels.

    Table 8.  Performance compared with other baseline methods on PDB2272 (independent test).
    Basline method ACC (%) SN (%) SP (%) MCC AUC
    BSV-RKM 77.33 88.99 65.33 0.5601 0.8656
    FC-RKM 74.25 92.63 55.32 0.5183 0.8871
    FC-XGBoost 73.62 55.92 91.35 0.5067 0.8243
    FC-RF 76.34 60.24 92.46 0.5561 0.8340
    FC-KNN 74.56 78.45 70.91 0.4968 0.8224
    Comm-RKM 77.11 90.63 63.18 0.5609 0.8647
    MV-RKM 75.18 93.50 56.30 0.5381 0.8855
    CKA-RKM 75.09 92.80 56.84 0.5336 0.8717
    HSIC-RKM 75.79 93.15 57.91 0.5472 0.8774
    FKL-RKM 75.48 92.97 57.46 0.5412 0.8836
    LapLKA-RKM 79.53 96.62 61.93 0.6264 0.9303

     | Show Table
    DownLoad: CSV
    Figure 7.  The weights of kernels obtained by different MKL on the PDB14189.

    Here, we compare our approach with other existing methods on PDB1075 by LOOCV and PDB2272 by independent test, as shown in Tables 9 and 10, respectively. It can be observed that high ACC of 85.77% (PDB1075 by LOOCV), 79.5% (PDB2272 by independent test). On PDB1075, our method got 1.12%, 2.26% and 0.03 improvement in ACC, SN and MCC over the second bet MV-H-RKM, respectively. MV-H-RKM enforce the structure consistency between input feature and the hidden node by the hypergraph regularization term. Therefore, MV-H-RKM also achieves the good performance. However, MV-H-RKM couple multiple features by means of hidden vector, which is same as MV-RKM. This means MV-H-RKM cannot filter noise features. HKAM-MKM achieves good performance with ACC (84.28%) and MCC (0.69). Similar our method, HKAM-MKM both consider the local and global kernel alignment and propose a hybrid kernel alignment model. Difference our method, the optimal kernel is input to SVM.

    Table 9.  Performance comparison with other existing methods on PDB1075 (LOOCV).
    Method ACC (%) SN (%) SP (%) MCC
    iDNA-Prot [83] 75.40 83.81 64.73 0.50
    iDNA-Prot|dis [84] 77.30 79.40 75.27 0.54
    PseDNA-Pro [12] 76.55 79.61 73.63 0.53
    iDNAPro-PseAAC [13] 76.55 75.62 77.45 0.53
    Local-DPP [85] 79.10 84.80 73.60 0.59
    MKSVM-HKA [86] 81.30 82.29 80.36 0.63
    FKRR-MVSF [87] 83.26 85.17 80.91 0.67
    CKA with SVM [26] 84.19 85.91 82.55 0.68
    MK-FSVM-SVDD [88] 82.23 81.90 82.55 0.65
    UMAP-DBP [89] 82.97 82.83 83.72 0.67
    HKAM-MKM [28] 84.28 80.00 88.76 0.69
    MV-H-RKM [38] 84.65 87.24 93.64 0.69
    LapLKA-RKM 85.77 89.90 81.82 0.72

     | Show Table
    DownLoad: CSV
    Table 10.  Performance comparison with other existing methods on PDB2272 (independent test).
    Method ACC (%) SN (%) SP (%) MCC
    DPP-PseACC [17] 58.1 56.6 59.6 0.163
    PseDNA-Pro [12] 61.8 75.3 48.1 0.243
    MsDBP [31] 64.3 70.7 63.2 0.340
    MKL-HSIC with H-LapSVM [27] 69.4 72.1 56.1 0.401
    MLapSVM-LBS [29] 71.2 71.6 70.8 0.424
    DBP-CNN [37] 67.9 69.0 66.8 0.358
    Deep Transfer Learning [35] 74.2 - - -
    PDBP-Fusion [36] 77.8 73.3 66.9 0.567
    GCN-method [33] 78.5 70.7 64.2 0.400
    HKAM-MKM [28] 78.4 91.5 62.4 0.596
    LapLKA-RKM 79.5 96.6 61.9 0.626

     | Show Table
    DownLoad: CSV

    In this paper, we developed an approach called LapLKA-RKM, a machine learning based predictor for DBPs. Our method contains three steps: feature extraction, feature fusion and classifier construction. We apply six different feature extraction methods (MCD, GE, NMBAC, PSSM-AB, PSSM-DWT and PSSM-Pse) to represent the protein sequences. Then, we utilize LapLKA-MKL to combine multiple predefined kernels. Finally, we employ RKM as a predictive classifier.

    Compared with other baseline methods and existing DBPs predictor, our method achieves the best accuracy on different datasets by LOOCV and independent test. On the LOOCV of PDB1075, LapLKA-RKM achieves the highest ACC, SN, MCC and AUC of 85.77%, 89.90%, 81.82%, 0.72 and 0.9258, respectively. Further, our method was tested on PDB2272 via independent test and also achieves better performance with ACC (79.5%), SN (96.6%), MCC (0.626) and AUC (0.9303). The results demonstrated that our method is an accurate tool for identification of DBPs. We also built an online platform to represent our model. We hope the simple to use web interface will lead to wide adoption of our method.

    This work is supported in part by the Postgraduate Research and Practice Innovation Program of Jiangsu Province (No. KYCX22_3276), National Natural Science Foundation of China (NSFC 62172076, 62073231, U22A2038), Zhejiang Provincial Natural Science Foundation of China (Grant No. LY23F020003), the Municipal Government of Quzhou (Grant No. 2022D006), the Municipal Government of Suzhou (Grant No. 2022SS03, SZFCXK202144).

    The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

    The data set and source code is obtained from https://figshare.com/articles/dataset/LapRKM-RKM_zip/22578496.



    [1] E. Pap, Null-Additive Set Functions, Kluwer Academic Publishers, Dordrecht, Boston, London, 1995. https://doi.org/10.1093/ietfec/e90-a.5.887
    [2] E. Pap, Decomposable measures and nonlinear equations, Fuzzy Sets Syst., 92 (1997), 205–221. https://doi.org/10.1016/S0165-0114(97)00171-1 doi: 10.1016/S0165-0114(97)00171-1
    [3] E. Pap, I. ˇStajner, Generalized pseudo-convolution in the theory of probabilistic metric spaces, information, fuzzy numbers, optimization, system theory, Fuzzy Sets Syst., 102 (1999), 393–415. https://doi.org/10.1016/S0165-0114(98)00214-0 doi: 10.1016/S0165-0114(98)00214-0
    [4] E. Pap, Pseudo-additive measures and their applications, in Handbook of Measure Theory (eds. E. Pap), Elsevier, North-Holland, Amsterdam, (2002), 1403–1468. https://doi.org/10.1016/B978-044450263-6/50036-1
    [5] D. Dubois, M. Prade, A class of fuzzy measures based on triangular norms: a general framework for the combination of uncertain information, Int. J. Gen. Syst., 8 (1982), 43–61. https://doi.org/10.1080/03081078208934833 doi: 10.1080/03081078208934833
    [6] D. Dubois, M. Prade, Fuzzy sets and systems: Theory and applications, in Journal of the Operational Research Society, Academic Press, (1980). https://doi.org/10.2307/2581310
    [7] J. Aczel, Lectures on Functional Equations and their Applications, Academic Press, (1966). https://doi.org/10.1016/S0001-8708(77)80038-8
    [8] C. H. Ling, Representation of associative functions, Pub. Math. Debrecen, 12 (1965), 189–212. https://doi.org/10.5486/pmd.1965.12.1-4.19 doi: 10.5486/pmd.1965.12.1-4.19
    [9] A. Markovˊa, B. Rieˇcan, On the double g-integral, Novi Sad J. Math., (1996), 67–70.
    [10] H. Ichihashi, H. Tanaka, K. Asai, Fuzzy integrals based on pseudo-additions and multiplications, J. Math. Anal. Appl., 130 (1988), 354–364. https://doi.org/10.1016/0022-247X(88)90311-3 doi: 10.1016/0022-247X(88)90311-3
    [11] M. Sugeno, T. Murofushi, Pseudo-additive measures and integrals, J. Math. Anal. Appl., 122 (1987), 197–222. https://doi.org/10.1016/0022-247X(87)90354-4 doi: 10.1016/0022-247X(87)90354-4
    [12] Z. T. Gong, T. Xing, Pseudo-differentiability, Pseudo-integrability and nonlinear differential equations, J. Comput. Anal. Appl., 16 (2014), 713–721.
    [13] D. L. Zhang, R. Mesiar, E. Pap, Jensen's inequality for Choquet integral revisited and a note on Jensen's inequality for generalized Choquet integral, Fuzzy Sets Syst., 430 (2022), 79–87. https://doi.org/10.1016/j.fss.2021.09.004 doi: 10.1016/j.fss.2021.09.004
    [14] D. L. Zhang, R. Mesiar, E. Pap, Jensen's inequalities for standard and generalized asymmetric Choquet integrals, Fuzzy Sets Syst., 457 (2023), 119–124. https://doi.org/10.1016/j.fss.2022.06.013 doi: 10.1016/j.fss.2022.06.013
    [15] A. Markovˊa, A note on g-derivative and g-integral, Tatra Mt. Math. Publ., 8 (1996), 71–76.
    [16] R. Mesiar, Pseudo-linear integrals and derivatives based on a generator g, Tatra Mt. Math. Publ., 8 (1996), 67–70.
  • This article has been cited by:

    1. Jiashuo Zhang, Ruheng Wang, Leyi Wei, MucLiPred: Multi-Level Contrastive Learning for Predicting Nucleic Acid Binding Residues of Proteins, 2024, 64, 1549-9596, 1050, 10.1021/acs.jcim.3c01471
    2. Xue Zhang, Quan Zou, Mengting Niu, Chunyu Wang, Jianlin Cheng, Predicting circRNA–disease associations with shared units and multi-channel attention mechanisms, 2025, 41, 1367-4811, 10.1093/bioinformatics/btaf088
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(501) PDF downloads(34) Cited by(0)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog