
The prediction of drug-target protein interaction (DTI) is a crucial task in the development of new drugs in modern medicine. Accurately identifying DTI through computer simulations can significantly reduce development time and costs. In recent years, many sequence-based DTI prediction methods have been proposed, and introducing attention mechanisms has improved their forecasting performance. However, these methods have some shortcomings. For example, inappropriate dataset partitioning during data preprocessing can lead to overly optimistic prediction results. Additionally, only single non-covalent intermolecular interactions are considered in the DTI simulation, ignoring the complex interactions between their internal atoms and amino acids. In this paper, we propose a network model called Mutual-DTI that predicts DTI based on the interaction properties of sequences and a Transformer model. We use multi-head attention to extract the long-distance interdependent features of the sequence and introduce a module to extract the sequence's mutual interaction features in mining complex reaction processes of atoms and amino acids. We evaluate the experiments on two benchmark datasets, and the results show that Mutual-DTI outperforms the latest baseline significantly. In addition, we conduct ablation experiments on a label-inversion dataset that is split more rigorously. The results show that there is a significant improvement in the evaluation metrics after introducing the extracted sequence interaction feature module. This suggests that Mutual-DTI may contribute to modern medical drug development research. The experimental results show the effectiveness of our approach. The code for Mutual-DTI can be downloaded from https://github.com/a610lab/Mutual-DTI.
Citation: Jiahui Wen, Haitao Gan, Zhi Yang, Ran Zhou, Jing Zhao, Zhiwei Ye. Mutual-DTI: A mutual interaction feature-based neural network for drug-target protein interaction prediction[J]. Mathematical Biosciences and Engineering, 2023, 20(6): 10610-10625. doi: 10.3934/mbe.2023469
[1] | Saranya Muniyappan, Arockia Xavier Annie Rayan, Geetha Thekkumpurath Varrieth . DTiGNN: Learning drug-target embedding from a heterogeneous biological network based on a two-level attention-based graph neural network. Mathematical Biosciences and Engineering, 2023, 20(5): 9530-9571. doi: 10.3934/mbe.2023419 |
[2] | Wanying Xu, Xixin Yang, Yuanlin Guan, Xiaoqing Cheng, Yu Wang . Integrative approach for predicting drug-target interactions via matrix factorization and broad learning systems. Mathematical Biosciences and Engineering, 2024, 21(2): 2608-2625. doi: 10.3934/mbe.2024115 |
[3] | Lu Yuan, Yuming Ma, Yihui Liu . Protein secondary structure prediction based on Wasserstein generative adversarial networks and temporal convolutional networks with convolutional block attention modules. Mathematical Biosciences and Engineering, 2023, 20(2): 2203-2218. doi: 10.3934/mbe.2023102 |
[4] | Xianfang Wang, Qimeng Li, Yifeng Liu, Zhiyong Du, Ruixia Jin . Drug repositioning of COVID-19 based on mixed graph network and ion channel. Mathematical Biosciences and Engineering, 2022, 19(4): 3269-3284. doi: 10.3934/mbe.2022151 |
[5] | Peter Hinow, Edward A. Rietman, Sara Ibrahim Omar, Jack A. Tuszyński . Algebraic and topological indices of molecular pathway networks in human cancers. Mathematical Biosciences and Engineering, 2015, 12(6): 1289-1302. doi: 10.3934/mbe.2015.12.1289 |
[6] | Huiqing Wang, Sen Zhao, Jing Zhao, Zhipeng Feng . A model for predicting drug-disease associations based on dense convolutional attention network. Mathematical Biosciences and Engineering, 2021, 18(6): 7419-7439. doi: 10.3934/mbe.2021367 |
[7] | Bo Zhou, Bing Ran, Lei Chen . A GraphSAGE-based model with fingerprints only to predict drug-drug interactions. Mathematical Biosciences and Engineering, 2024, 21(2): 2922-2942. doi: 10.3934/mbe.2024130 |
[8] | Hanyu Zhao, Chao Che, Bo Jin, Xiaopeng Wei . A viral protein identifying framework based on temporal convolutional network. Mathematical Biosciences and Engineering, 2019, 16(3): 1709-1717. doi: 10.3934/mbe.2019081 |
[9] | Wen Zhu, Yuxin Guo, Quan Zou . Prediction of presynaptic and postsynaptic neurotoxins based on feature extraction. Mathematical Biosciences and Engineering, 2021, 18(5): 5943-5958. doi: 10.3934/mbe.2021297 |
[10] | Dong Ma, Shuang Li, Zhihua Chen . Drug-target binding affinity prediction method based on a deep graph neural network. Mathematical Biosciences and Engineering, 2023, 20(1): 269-282. doi: 10.3934/mbe.2023012 |
The prediction of drug-target protein interaction (DTI) is a crucial task in the development of new drugs in modern medicine. Accurately identifying DTI through computer simulations can significantly reduce development time and costs. In recent years, many sequence-based DTI prediction methods have been proposed, and introducing attention mechanisms has improved their forecasting performance. However, these methods have some shortcomings. For example, inappropriate dataset partitioning during data preprocessing can lead to overly optimistic prediction results. Additionally, only single non-covalent intermolecular interactions are considered in the DTI simulation, ignoring the complex interactions between their internal atoms and amino acids. In this paper, we propose a network model called Mutual-DTI that predicts DTI based on the interaction properties of sequences and a Transformer model. We use multi-head attention to extract the long-distance interdependent features of the sequence and introduce a module to extract the sequence's mutual interaction features in mining complex reaction processes of atoms and amino acids. We evaluate the experiments on two benchmark datasets, and the results show that Mutual-DTI outperforms the latest baseline significantly. In addition, we conduct ablation experiments on a label-inversion dataset that is split more rigorously. The results show that there is a significant improvement in the evaluation metrics after introducing the extracted sequence interaction feature module. This suggests that Mutual-DTI may contribute to modern medical drug development research. The experimental results show the effectiveness of our approach. The code for Mutual-DTI can be downloaded from https://github.com/a610lab/Mutual-DTI.
Predicting and identifying DTI is of great significance in medicine and biology [1,2,3]. Measuring the affinity with drugs and proteins in wet lab is the most expensive and time-consuming method. However, these experiments are often expensive and time-consuming due to the complexity of molecules [4]. Virtual screening (VS) through computation can significantly reduce costs. Structure-based VS and ligand-based VS, which are classical virtual screening methods, have achieved great success [5,6]. Structure-based methods use 3-dimensional (3D) conformations of proteins and drugs for study of bioactivity. The ligand-based method is based on the assumption that similar molecules will interact with similar proteins [7]. However, the application of these methods is limited. For example, ligand-based VS methods perform poorly when the molecule has a few known binding proteins and structure-based VS methods cannot be executed when the 3D structure of a protein is unknown. Since the accurate reconstruction of protein structures is still to be developed, the construction of 3D-free DTI prediction methods has attracted increasing attention [8]. The machine learning approach considers the chemical space, genomic space and their interactions in a specific framework and formulates the DTI prediction as a classification problem following a feature-based and similarity-based approach. Similarity-based approaches rely on the assumption that drugs with similar structures should have similar effects, feature-based approaches construct a feature vector consisting of a combination of descriptors for the drug and the protein as model inputs. Bleakley et al. [9] proposed a new supervised method of inference to predict unknown drug-target interactions, which uses support vector machines as local classifiers. Since then, a variety of machine learning-based algorithms have been proposed that consider both composites and protein information in a unified model [10,11,12,13,14,15,16,17,18,19].
In recent years, the development of deep learning in drug discovery has been rapid. In comparison to traditional machine learning, end-to-end models eliminate the need to define and compute descriptors before modeling, providing different strategies and representations for proteins and drugs. Initially, manually specified descriptors were used to represent drugs and proteins, and a fully connected neural network (FCN) was designed to make predictions [20]. Since descriptors are designed from a single perspective and cannot be changed during the training process, descriptor-based approaches cannot extract task-relevant features. Therefore, many end-to-end models have been proposed. Lee et al. [21] proposed a model called DeepConv-DTI to predict DTI. The model uses convolution layer to extract local residue features of generalized proteins. Tsubaki et al. [22] used different models to represent drugs and proteins from the perspective of considering the structure of drugs as graph structures. Graph neural networks (GNN) were used to learn features from drug sequences, and convolutional neural network (CNN) was used to train protein sequences. In order to consider deeper features between molecules, Li et al. [23] proposed a multi-objective neural network (MONN), which introduced a gate recurrent unit (GRU) module to predict the affinity and can accurately determine the interaction and affinity between molecules. Zamora-Resendiz et al. [24] defined a new spatial graph convolutional network (GCN) architecture that employs graph reduction to reduce the number of training parameters and facilitate the abstraction of intermediate representations. Ryu et al. [25] combined GCN with an attention mechanism to enable GCN to identify atoms in different environments, which could extract better structural features related to a target molecular property such as solubility, polarity, synthetic accessibility and photovoltaic efficiency, compared to the vanilla GCN. Ru et al. [26] combined the ideas of adjacency and learning to rank to establish correlations between proteins and drugs using adjacency, and predicted the binding affinity of drugs and proteins using learning to rank methods to input features into the classifier. Transformer is a model that uses a self-attention to improve the speed of model training, and has achieved great success in the field of natural language processing. Wang et al. [27] first extracted drug carriers by GNN, and then represented protein features by Transformer and CNN to obtain remote interaction information in proteins by a one-sided attention mechanism. Chen et al. [28] obtained the interaction features by Transformer decoder and proposed a more rigorous method for data set partitioning. Ren et al. [29] presented a deep learning framework based on multimodal representation and meta-path semantic analysis, which drugs and proteins are represented as multimodal data and the relationships between them are captured by meta-path semantic analysis. However, most of these methods only consider a single non-covalent interaction between drugs and proteins. In fact, there is much more than one interaction between drugs and proteins.
Inspired by the Transformer decoder, which is able to extract long-range interdependent features [28], this paper proposed a dual-pathway model for DTI prediction based on mutual reaction features, called Mutual-DTI. The transformer's decoder was modified to treat drugs and proteins as two distinct sequences. Additionally, a module was added to extract mutual features that enable learning of the complex interactions between atoms and amino acids. Figure 1 shows an overview of the entire network. The dual pathway approach has also been applied in other field. For example, dual attention matching (DAM) [30] was proposed to learn the global features from local features with self-attention but ignored the mutual influence information between local features of two modalities.
In this paper, we captured the spatial and other feature information of drugs with a GNN and represented the protein features with a gated convolutional network. The drug features and protein features were then input as two sequences into the Transformer decoder, which included the mutual feature module. Different from DAM, the mutual feature module simultaneously considered the local features of both drug molecules and proteins, which effectively extracted the interaction features between two sequences. Finally, the drug-protein feature vector was input into the fully connected layer for prediction. We expected the Mutual-DTI model to exhibit better performance and generality with the addition of the mutual feature module. To validate this, we evaluated it on two benchmark datasets and conducted ablation experiments on a more tightly delineated dataset [28]. The results showed that Mutual-DTI exhibited better performance. We further visualized the attention scores obtained by Mutual-DTI learning, and the results showed that the mutual feature module of Mutual-DTI helped to reduce the search space of binding points.
GNN aggregates operations to extract node features in a graph. We represent drugs using a graph structure where nodes represent atoms like carbon and hydrogen, and edges represent chemical bonds like single and double bonds. We use the RDKit Python library* to convert a simplified molecular input line entry system (SMILES) string into a two-dimensional drug molecular graph.
*Website:http://www.rdkit.org/
We define a drug graph by G={V,E}, where V is the set of atomic nodes and E is the set of chemical bond edges. Considering the small number of atomic and chemical bond types, we perform a local breadth-first search for the nodes in the graph, where the search depth r is equal to the number of hops of a particular node [22,31]. For example, we start searching from node vi, traverse the subgraph of range r and record the information of all neighboring nodes and edges of node vi in the subgraph. We define subgraph for node vi in the depth r range as follows:
Gsubri=(Vsubri,Esubri) | (2.1) |
Esubri={emn|m,n∈N(i,r)} | (2.2) |
where N(i,r) is the set of nodes adjacent to vi in the subgraph Gsubri, including vi. emn is the edge connecting vm and node vn.
According to the subgraph Gsubri, we can extract the corresponding chemical features, such as atomic type, atomicity, aromaticity, etc. The details are shown in Table 1.
Feature | Representation |
atom type | C, N, O, S, F, P, Cl, Br, B, H (onehot) |
degree of atom | 0, 1, 2, 3, 4, 5 (onehot) |
number of hydrogens attached | 0, 1, 2, 3, 4 (onehot) |
implicit valence electrons | 0, 1, 2, 3, 4, 5 (onehot) |
aromaticity | 0 or 1 |
We use the random initialized embedding of the extracted chemical features as the initial input to the GNN. We denote the embedding of the n-th layer network node vi as f(n)i∈Rd, In the GNN, we update f(n)i according to the following equation:
f(n)i=σ(f(n−1)i+∑j∈N(i,r)h(n−1)ij) | (2.3) |
where σ is the sigmoid function: σ(x)=1/(1+e−x) and h(n−1)ij is the hidden vector between nodes vi and vj. This hidden vector can be computed by the following neural network:
h(n)ij=ε(ωf(n)i+b) | (2.4) |
where ε is the nonlinear activation function ReLU: ε(x)=max(0,x),ω∈Rd×d is the weight hyperparameter and b∈Rd is the deviation vector. After the GNN layer, we obtain the feature vector c1,c2,c3,⋯,cl of a drug sequence, where l is the number of atoms in the sequence.
A protein sequence consists of 20 amino acids. If we learn a protein sequence as a sentence, there are only 20 kinds of words that make up the sentence. To increase the diversity of features, based on the n-gram language model, we define the words in a protein sequence as n-gram amino acids [22]. For a given amino acid sequence, we split it into repeated n-gram amino acid sequences. For example, we set n to 3, and the protein sequence MVVMNSL⋯TSQATP is split into MVV,VVM,VMN,MNS,⋯,TSQ,SQA,QAT,ATP, so that the variety of words composing the sentence will be expanded to 203.
To ensure a reasonable vocabulary, we sets n to 3. For a given protein sequence S = a1a2a3⋯aL, where L is the length of the protein sequence and ai is the i-th amino acid. We split it into:
[a1a2a3],[a2a3a4],⋯,[aL−2aL−1aL] |
We use ai:i+2∈Rd to denote the d-dimensional embedding of the word [aiai+1ai+2]. We do the initialize d-dimensional embedding of the protein sequence processed by the above method and then input it into a gated convolution network with Conv1D and gated linear units [32]. We compute the hidden layer according to Eq (2.5):
Li(X)=(X×ω1+s)⊗σ(X×ω2+t) | (2.5) |
where Li is the i-layer in the gated convolution network, X∈Rn×d1 is the input to the i-layer, ω1∈Rd1×d2,s∈Rd2,ω2∈Rd1×d2 and t∈Rd2 are the learning parameters, n is the length of the sequence. d1,d2 are the dimensions of the input and hidden features respectively, σ is sigmoid function and ⊗ is matrix product. The output of the gated convolution network is the final representation of the protein sequence.
We extracted feature vectors of drug and protein sequences using the drug and protein modules and inputted them into the Transformer decoder. The decoder learned mutual features, resulting in drug and protein sequences with interaction features as output. Since the order of the feature vectors has no effect on the DTI modeling, we remove the positional embedding in the Transformer. The key technology of the decoder is the multi-headed self-attention layer. The multi-headed self-attention layer consists of several scaled point-attention layers for extracting the interaction information between the encoder and the decoder. The self-attention layer accepts three inputs, i.e., key K, value V and query Q, and computes the attention in the following manner:
Self_attention(Q,K,V)=softmax(QKT√dk)V | (2.6) |
where dk is a scaling factor that depends on the number of layers. Considering the complex reaction processes involving non-covalent chemical bonds within drugs and proteins, we added a module to extract interaction features using a multi-headed self-attentive layer in the Transformer decoder. The decoder takes the drug and protein sequences as inputs, enabling the extraction of drug- and protein-dominated interaction features simultaneously. The module further extracts complex interaction feature vectors between atoms and amino acids within the sequence, as depicted in Figure 1.
After the interaction features are extracted by the decoder, we obtain the interaction feature matrices D∈Rb×n1×d and P∈Rb×n2×d for the drugs and proteins. Where b is the batch size, n1,n2 is the number of words and d is the feature dimension. Averaging over the different dimensions of the feature matrix :
Da=mean(D,1) | (2.7) |
Pa=mean(P,1) | (2.8) |
where mean(input,dim) is a mean operation that returns the mean value of each row in the given dimension dim. The obtained feature vectors Da and Pa are concatenated and fed to the classification block.
The classification module consists of a multilayer fully connected neural network with an activation function of ReLU and a final layer output representing the interaction probability ˆy. As a binary classification task, we use binary cross-entropy loss to train Mutual-DTI:
Loss=−[ylogˆy+(1−y)log(1−ˆy)] | (2.9) |
Mutual-DTI was implemented with Pytorch 1.10.0. The original transformer model had 6 layers and contained 512 hidden dimensions, we reduced the number of layers from 6 to 3, the number of hidden layers from 512 to 10, the number of protein representations, atomi representations, hidden layers and y interactions to 10 and the number of attention heads to 2, as this configuration achieves excellent generalization capabilities. During training, we used the Adam optimizer [33] with the learning rate set to 0.005 and the batch size set to 128.All settings and hyperparameters of Mutual-DTI are shown in the Table 2.
Name | Value |
Dimension of atom representation | 10 |
Dimension of protein representation | 10 |
Number of decoder layers | 3 |
Number of hidden layers | 10 |
Number of attention heads | 2 |
Learning rate | 5e-3 |
Weight decay | 1e-6 |
Dropout | 0.1 |
Batch size | 128 |
The human dataset and the C.elegans dataset were created by Liu et al. [34]. These two datasets comprise of compound-protein pairs, which include both negative and positive samples that are highly plausible. The human dataset comprises 3369 positive interactions between 1052 unique compounds and 852 unique proteins, while the C. elegans dataset comprises 4000 positive interactions between 1434 unique compounds and 2504 unique proteins. As shown in Table 3. We randomly divided into training set, validation set and test set in the ratio of 8:1:1. In addition, we utilized AUC, precision and recall as evaluation metrics for Mutual-DTI, and compared it with some traditional machine learning methods on both the human and C. elegans datasets. These traditional machine learning methods for comparison are k-NN, RF, L2-logistic (L2) and SVM. their results are from the original paper [34]. The main results are shown in Figure 2. Mutual-DTI outperforms the machine learning methods under both benchmark datasets.
Dataset | Drugs | Proteins | Interactions | Positive | Negative |
Human | 1052 | 852 | 6728 | 3369 | 3359 |
C.elegans | 1434 | 2504 | 7786 | 4000 | 3786 |
GPCR | 5359 | 356 | 15343 | 7989 | 7354 |
Davis | 68 | 379 | 25772 | 7320 | 18452 |
In other experiments, we compare the proposed method with recent deep learning methods used for DTI prediction. The methods are as follows: GNN-CPI [22], GNN-PT [27], TransformerCPI [28]. The main hyperparameters are set as follows:
GNN-CPI: vector dimensionality of vertices, edges and n-grams = 10, numbers of layers in gnn = 3, window size = 11, numbers of layers in cnn = 2, numbers of layers in output = 3.
GNN-PT: numbers of layers in gnn = 3, numbers of layers in output = 1, heads of attention = 2.
TransformerCPI: dimension of atom = 34, dimension of protein = 100, dimension of hidden = 64, number of hidden layers = 3, heads of attention = 8.
The same settings of the original paper were used for all parameter settings. We used the same preprocessing for the initial data of drugs and proteins. We preprocessed the dataset in the same way as in previous experiments, and repeated the experiment three times using different random seeds. For each repetition, we randomly split the dataset and saved the model parameters corresponding to the optimal validation set AUC for each test set. The main results under the human dataset and the C. elegans dataset are shown in Tables 4 and 5. On the human dataset, the average evaluation metrics of Mutual-DTI are 0.984, 0.962 and 0.943 for AUC, precision and recall, respectively, which outperform the other methods. On the C. elegans dataset, the average evaluation metrics AUC, precision and recall of Mutual-DTI are 0.987, 0.948 and 0.949, respectively, which mostly outperform the other models. The results suggest that Mutual-DTI can effectively learn informative features for predicting interactions from both one-dimensional protein sequences and two-dimensional molecular maps, demonstrating its generalizability across different datasets.
Methods | AUC | Precision | Recall |
GNN-CPI | 0.917 ± 0.072 | 0.783 ± 0.061 | 0.889 ± 0.096 |
GNN-PT | 0.978 ± 0.006 | 0.939 ± 0.010 | 0.934 ± 0.006 |
TransformerCPI | 0.972 ± 0.005 | 0.938 ± 0.018 | 0.932 ± 0.001 |
Mutual-DTI | 0.984 ± 0.001 | 0.962 ± 0.019 | 0.943 ± 0.016 |
Methods | AUC | Precision | Recall |
GNN-CPI | 0.899 ± 0.104 | 0.850 ± 0.132 | 0.778 ± 0.192 |
GNN-PT | 0.984 ± 0.007 | 0.940 ± 0.024 | 0.933 ± 0.014 |
TransformerCPI | 0.984 ± 0.004 | 0.943 ± 0.025 | 0.951 ± 0.016 |
Mutual-DTI | 0.987 ± 0.004 | 0.948 ± 0.018 | 0.949 ± 0.013 |
To evaluate the importance of interactive feature modules, we propose two sub-models. The first one is no-mutual block, which has no mutual feature module, another network with mutual feature module. In order to improve the accuracy of the experimental results, we evaluated a more strictly divided GPCR dataset [28]. As shown in Table 3. The key to constructing this dataset is that the drugs in its training set appear in only one class of samples (positive interaction or negative interaction DTI pairs), and in the test set appear in only the opposite class of samples. This forces the model to use protein information to learn interaction patterns and make opposite predictions for selected drugs, which is more realistic.
Table 6 shows the prediction performance of the two models on the GPCR dataset. As shown in the table, by comparing models with and without mutual features module, it can be concluded that improvements can indeed be achieved using interaction features. This suggests the need to establish correlations between drug and protein information in the DTI prediction extrapolation process. We also conducted experiments on the Davis dataset, which contains 7320 positive and 18,452 negative interactions. The Davis dataset we used was created by Zhao et al. [35]. As shown in Table 7, the model with the interaction module included also showed superior performance on the unbalanced dataset.
Methods | AUC | Precision | Recall |
no-mutual-DTI | 0.810 ± 0.023 | 0.704 ± 0.014 | 0.768 ± 0.030 |
Mutual-DTI | 0.820 ± 0.014 | 0.699 ± 0.010 | 0.796 ± 0.046 |
Methods | AUC | Precision | Recall |
no-mutual-DTI | 0.886 ± 0.005 | 0.728 ± 0.023 | 0.654 ± 0.005 |
Mutual-DTI | 0.900 ± 0.002 | 0.767 ± 0.013 | 0.680 ± 0.027 |
In this section, we employed a three-dimensional surface plot to analyze the impact of model hyperparameters (atomic and protein dimensions) on the DTI prediction performance. Since these two parameters are among the most important hyperparameters. We sampled values for atomic and protein dimensions from 10 to 40, with a gap of 10 dimensions each time. For instance, the atomic dimension/protein dimension values were: 10/10, 10/20, 10/30, ..., 20/10, 20/20, ..., 40/30, 40/40, for a total of 16 different settings, and experiments were conducted multiple times with different random seeds. Other settings were the same as the previous experiments. As shown in the Figure 3, the x-axis represents the atomic dimension, the y-axis represents the protein dimension and the z-axis represents the AUC obtained from the test set. From the results, it can be seen that under different dimension settings, the surfaces are very smooth and the model exhibits good robustness.
To demonstrate that the mutual feature module we introduced not only enhances the performance of the model but also provides deeper interpretation, we conducted a case study. First, we applied a Frobenius parametric solution to the protein feature vector matrix obtained by the Transformer decoder. Next, we used the Softmax function to derive attention weights of the protein sequences, which were then mapped onto the 3D structure of the complex to visualize the regions that are more efficient for drug-protein reaction. The attention weights of the crystal structure of gw0385-bound HIV protease D545701 (PDB:2FDD) are shown in Figure 4. The complex has a total of 12 binding sites. We have marked the regions that received high attention more than 0.75 in red, and it can be found that a total of 4 of the 12 binding sites received high attention scores, namely ASP-25, ALA-28, PRO-81 and ALA82. The results show that Mutual-DTI helps to narrow down the search space of binding sites.
Based on our prior experiments on robustness, it is evident that the model's AUC slightly varies when encoding and embedding atoms and proteins in various dimensions. Our inference is that the GNN module, which learns the drug molecule characteristics, and the gated convolutional unit module, which learns the protein features, can effectively extract feature information in the Mutual-DTI model. This indicates that constructing drug and protein sequences into subgraphs via local breadth-first search algorithms and constructing words via n-gram methods is reasonable. In the ablation experiment, we discovered that removing the module that extracts mutual reaction features resulted in a significant decrease in the model's prediction accuracy. Our speculation is that the model only learns each sequence's individual feature information after learning drug and protein features, while DTI is a dynamic process. Introducing the Mutual learning module treats drug and protein features as the main body, and they dynamically focus on each other's key parts in the learning layer, thus directly capturing the interaction features of the two given sequences. By learning the interaction features, the model obtains a deeper understanding of the DTI process and can more easily capture the crucial parts that may contribute to the reaction when dealing with unknown drug and protein data, leading to superior performance in predicting results.
From the perspective of model complexity analysis, Mutual-DTI has higher complexity compared to the networks that only use GNN(e.g., GNN-CPI). This is because we use a more complex self-attention mechanism, which allows us to capture long-range dependencies between tokens in the sequence. Compared to TransformerCPI, which is also based on Transformer, Mutual-DTI has lower complexity. The rationale behind this is that, while Mutual-DTI takes into account two parallel multi-head attention layers, the network's number of attention heads is diminished from 8 to 2, and we devised a lower hidden layer dimension. Consequently, this has notably curtailed the number of parameters in the Mutual-DTI. These designs help Mutual-DTI better fit the training data while avoiding overfitting due to excessive complexity.
In this paper, we present a Transformer-based network model for predicting DTI and introduce a module for extracting sequence interaction features to model complex reaction processes between atoms and amino acids. To validate the effectiveness of Mutual-DTI, we compare it with the latest baseline on two benchmark datasets. The results show that Mutual-DTI outperforms the baseline. We also evaluate Mutual-DTI on the label reversal dataset and observe a significant improvement with the introduction of the mutual feature module. Finally, we map the attention weights obtained by the mutual feature module to the protein sequences, which helps us better interpret the model and determine the reliability of the predictions.
Although Mutual-DTI shows effective performance in predicting DTI, there is still room for improvement. The experimental results show a significant decrease in performance on the strictly limited label inversion dataset compared to the human dataset and the C.elegans dataset. This suggests that the feature extraction of sequences is very limited, and adding a 3D representation of molecules or proteins may help extract more information.
The work was supported by Open Funding Project of the State Key Laboratory of Biocatalysis and Enzyme Engineering under grant No. SKLBEE2021020 and SKLBEE2020020, the High-level Talents Fund of Hubei University of Technology under grant No. GCRC2020016, Key Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province under grant No. 2020E10010-02 and Natural Science Foundation of Hubei Province under grant No. 2021CFB282.
All authors declare no conflicts of interest in this paper.
[1] |
C. Chen, H. Shi, Y. Han, Z. Jiang, X. Cui, B. Yu, DNN-DTIs: Improved drug-target interactions prediction using XGBoost feature selection and deep neural network, Comput. Biol. Med., 136 (2021), 104676. https://doi.org/10.1101/2020.08.11.247437 doi: 10.1101/2020.08.11.247437
![]() |
[2] |
X. Ru, X. Ye, T. Sakurai, Q. Zou, C. Lin, Current status and future prospects of drug–target interaction prediction, Briefings Funct. Genomics, 20 (2021), 312–322. https://doi.org/10.1093/bfgp/elab031 doi: 10.1093/bfgp/elab031
![]() |
[3] |
G. Huang, F. Yan, D. Tan, A review of computational methods for predicting drug targets, Curr. Protein Pept. Sci., 19 (2018), 562–572. https://doi.org/10.1016/j.jad.2018.12.111 doi: 10.1016/j.jad.2018.12.111
![]() |
[4] |
J. Vamathevan, D. Clark, P. Czodrowski, I. Dunham, E. Ferran, G. Lee, et al., Applications of machine learning in drug discovery and development, Nat. Rev. Drug Discovery, 18 (2019), 463–477. https://doi.org/10.1038/s41573-019-0024-5 doi: 10.1038/s41573-019-0024-5
![]() |
[5] |
E. Maia, L. C. Assis, T. Oliveira, A.M. de Silva, A. G. Taranto, Structure-based virtual screening: from classical to artificial intelligence, Front. Chem., 8 (2020), 343. https://doi.org/10.3389/fchem.2020.00343 doi: 10.3389/fchem.2020.00343
![]() |
[6] |
H. Mubarak, S. Naomie, A. D. Mohammed, S. Faisal, A. Ali, Adapting document similarity measures for ligand-based virtual screening, Molecules, 21 (2016), 476. https://doi.org/10.3390/molecules21040476 doi: 10.3390/molecules21040476
![]() |
[7] |
R. Ferdousi, R. Safdari, Y. Omidi, Computational prediction of drug-drug interactions based on drugs functional similarities, J. Biomed. Inf., 70 (2017), 54. https://doi.org/10.1016/j.jbi.2017.04.021 doi: 10.1016/j.jbi.2017.04.021
![]() |
[8] |
M. Bredel, E. Jacoby, Chemogenomics: An emerging strategy for rapid target and drug discovery, Nat. Rev. Genet., 5 (2004), 262–275. https://doi.org/10.1038/nrg1317 doi: 10.1038/nrg1317
![]() |
[9] |
K. Bleakley, Y. Yamanishi, Supervised prediction of drug–target interactions using bipartite local models, Bioinformatics, 25 (2009), 2397–2403. https://doi.org/10.1093/bioinformatics/btp433 doi: 10.1093/bioinformatics/btp433
![]() |
[10] |
F. Cheng, Y. Zhou, J. Li, W. Li, G. Liu, Y. Tang, Prediction of chemical–protein interactions: multitarget-QSAR versus computational chemogenomic methods, Mol. BioSyst., 8 (2012), 2373–2384. https://doi.org/10.1039/C2MB25110H doi: 10.1039/C2MB25110H
![]() |
[11] |
M. Gönen, Predicting drug–target interactions from chemical and genomic kernels using Bayesian matrix factorization, Bioinformatics, 28 (2012), 2304–2310. https://doi.org/10.1093/bioinformatics/bts360 doi: 10.1093/bioinformatics/bts360
![]() |
[12] |
S. Liu, J. An, J. Zhao, S. Zhao, H. Lv, S. Wang, et al., Drug-Target interaction prediction based on multisource information weighted fusion, Contrast Media Mol. Imaging, 2021 (2021). https://doi.org/10.1155/2021/6044256 doi: 10.1155/2021/6044256
![]() |
[13] |
J. Li, X. Yang, Y. Guan, Z. Pan, Prediction of drug–target interaction using dual-network integrated logistic matrix factorization and knowledge graph embedding, Molecules, 27 (2022), 5131. https://doi.org/10.3390/molecules27165131 doi: 10.3390/molecules27165131
![]() |
[14] |
Y. Ding, J. Tang, F. Guo, Q. Zou, Identification of drug–target interactions via multiple kernel-based triple collaborative matrix factorization, Brief. Bioinf., 23 (2022). https://doi.org/10.1093/bib/bbab582 doi: 10.1093/bib/bbab582
![]() |
[15] |
L. Jacob, J. P. Vert, Protein-ligand interaction prediction: An improved chemogenomics approach, Bioinformatics, 24 (2008), 2149–2156. https://doi.org/10.1093/bioinformatics/btn409 doi: 10.1093/bioinformatics/btn409
![]() |
[16] |
T. Van Laarhoven, S. B Nabuurs, E. Marchiori, Gaussian interaction profile kernels for predicting drug–target interaction, Bioinformatics, 27 (2011), 3036–3043. https://doi.org/10.1093/bioinformatics/btr500 doi: 10.1093/bioinformatics/btr500
![]() |
[17] |
F. Wang, D. Liu, H. Wang, C. Luo, M. Zheng, H. Liu, et al., Computational screening for active compounds targeting protein sequences: Methodology and experimental validation, J. Chem. Inf. Model., 51 (2011), 2821–2828. https://doi.org/10.1021/ci200264h doi: 10.1021/ci200264h
![]() |
[18] |
Y. Wang, J. Zeng, Predicting drug-target interactions using restricted Boltzmann machines, Bioinformatics, 29 (2013), i126–i134. https://doi.org/10.1093/bioinformatics/btt234 doi: 10.1093/bioinformatics/btt234
![]() |
[19] |
Y. Yamanishi, M. Araki, A. Gutteridge, W. Honda, M. Kanehisa, Prediction of drug–target interaction networks from the integration of chemical and genomic spaces, Bioinformatics, 24 (2008), i232–i240. https://doi.org/10.1093/bioinformatics/btn162 doi: 10.1093/bioinformatics/btn162
![]() |
[20] |
K. Tian, M. Shao, Y. Wang, J. Guan, S. Zhou, Boosting compound-protein interaction prediction by deep learning, Methods, 110 (2016), 64–72. https://doi.org/10.1016/j.ymeth.2016.06.024 doi: 10.1016/j.ymeth.2016.06.024
![]() |
[21] |
I. Lee, J. Keum, H. Nam, DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences, PLoS Comput. Biol., 15 (2019), e1007129. https://doi.org/10.1371/journal.pcbi.1007129 doi: 10.1371/journal.pcbi.1007129
![]() |
[22] |
M. Tsubaki, K. Tomii, J. Sese, Compound–protein interaction prediction with end-to-end learning of neural networks for graphs and sequences, Bioinformatics, 35 (2019), 309–318. https://doi.org/10.1093/bioinformatics/bty535 doi: 10.1093/bioinformatics/bty535
![]() |
[23] |
S. Li, F. Wan, H. Shu, T. Jiang, D. Zhao, J. Zeng, MONN: A multi-objective neural network for predicting compound-protein interactions and affinities, Cell Syst., 10 (2020), 308–322. https://doi.org/10.1016/j.cels.2020.03.002 doi: 10.1016/j.cels.2020.03.002
![]() |
[24] |
R. Zamora-Resendiz, S. Crivelli, Structural learning of proteins using graph convolutional neural networks, BioRxiv, (2019), 610444. https://doi.org/10.1101/610444 doi: 10.1101/610444
![]() |
[25] | S. Ryu, J. Lim, S. H. Hong, W. Y. Kim, Deeply learning molecular structure-property relationships using attention-and gate-augmented graph convolutional network, preprint, arXiv: 1805.10988. https://doi.org/10.48550/arXiv.1805.10988 |
[26] |
X. Ru, X. Ye, T. Sakurai, Q. Zou, NerLTR-DTA: Drug–target binding affinity prediction based on neighbor relationship and learning to rank, Bioinformatics, 38 (2022), 1964–1971. https://doi.org/10.1093/bioinformatics/btac048 doi: 10.1093/bioinformatics/btac048
![]() |
[27] | J. Wang, X. Li, H. Zhang, GNN-PT: Enhanced prediction of compound-protein interactions by integrating protein transformer, preprint, arXiv: 2009.00805. https://doi.org/10.48550/arXiv.2009.00805 |
[28] |
L. Chen, X. Tan, D. Wang, F. Zhong, X. Liu, T. Yang, et al., TransformerCPI: Improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments, Bioinformatics, 36 (2020), 4406–4414. https://doi.org/10.1093/bioinformatics/btaa524 doi: 10.1093/bioinformatics/btaa524
![]() |
[29] |
Z. H. Ren, Z. H. You, Q. Zou, C. Q. Yu, Y. F. Ma, Y. J. Guan, et al., DeepMPF: Deep learning framework for predicting drug–target interactions based on multi-modal representation with meta-path semantic analysis, J. Transl. Med., 21 (2023), 1–18. https://doi.org/10.1186/s12967-023-03876-3 doi: 10.1186/s12967-023-03876-3
![]() |
[30] | Y. Wu, L. Zhu, Y. Yan, Y. Yang, Dual attention matching for audio-visual event localization, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019), 6292–6300. |
[31] | F. Costa, K. De Grave, Fast neighborhood subgraph pairwise distance kernel, in Proceedings of the 26th International Conference on Machine Learning, (2010), 255–262. |
[32] | Y. N. Dauphin, A. Fan, M. Auli, D. Grangier, Language modeling with gated convolutional networks, in International Conference on Machine Learning, (2017), 933–941. |
[33] | D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, preprint, arXiv: 1412.6980. https://doi.org/10.48550/arXiv.1412.6980 |
[34] |
H. Liu, J. Sun, J. Guan, J. Zheng, S. Zhou, Improving compound–protein interaction prediction by building up highly credible negative samples, Bioinformatics, 31 (2015), i221–i229. https://doi.org/10.1093/bioinformatics/btv256 doi: 10.1093/bioinformatics/btv256
![]() |
[35] |
Q. Zhao, H. Zhao, K. Zheng, J. Wang, HyperAttentionDTI: Improving drug–protein interaction prediction by sequence-based deep learning with attention mechanism, Bioinformatics, 38 (2022), 655–662. https://doi.org/10.1093/bioinformatics/btab715 doi: 10.1093/bioinformatics/btab715
![]() |
1. | Yang Zhang, Caiqi Liu, Mujiexin Liu, Tianyuan Liu, Hao Lin, Cheng-Bing Huang, Lin Ning, Attention is all you need: utilizing attention in AI-enabled drug discovery, 2023, 25, 1467-5463, 10.1093/bib/bbad467 | |
2. | Jiahui Wen, Haitao Gan, Zhi Yang, Ming Shi, Ji Wang, 2024, Chapter 30, 978-981-99-8140-3, 400, 10.1007/978-981-99-8141-0_30 | |
3. | Farnaz Palhamkhani, Milad Alipour, Abbas Dehnad, Karim Abbasi, Parvin Razzaghi, Jahan B. Ghasemi, DeepCompoundNet: enhancing compound–protein interaction prediction with multimodal convolutional neural networks, 2023, 0739-1102, 1, 10.1080/07391102.2023.2291829 | |
4. | Wenhui Zhang, Xin Wen, Kewei Zhang, Rui Cao, Baolu Gao, 2024, BN-DTI: A deep learning based sequence feature incorporating method for predicting drug-target interaction, 979-8-3503-6413-2, 1, 10.1109/IJCB62174.2024.10744528 | |
5. | Qian Liao, Yu Zhang, Ying Chu, Yi Ding, Zhen Liu, Xianyi Zhao, Yizheng Wang, Jie Wan, Yijie Ding, Prayag Tiwari, Quan Zou, Ke Han, Application of Artificial Intelligence In Drug-target Interactions Prediction: A Review, 2025, 2, 3005-1444, 10.1038/s44385-024-00003-9 |
Feature | Representation |
atom type | C, N, O, S, F, P, Cl, Br, B, H (onehot) |
degree of atom | 0, 1, 2, 3, 4, 5 (onehot) |
number of hydrogens attached | 0, 1, 2, 3, 4 (onehot) |
implicit valence electrons | 0, 1, 2, 3, 4, 5 (onehot) |
aromaticity | 0 or 1 |
Name | Value |
Dimension of atom representation | 10 |
Dimension of protein representation | 10 |
Number of decoder layers | 3 |
Number of hidden layers | 10 |
Number of attention heads | 2 |
Learning rate | 5e-3 |
Weight decay | 1e-6 |
Dropout | 0.1 |
Batch size | 128 |
Dataset | Drugs | Proteins | Interactions | Positive | Negative |
Human | 1052 | 852 | 6728 | 3369 | 3359 |
C.elegans | 1434 | 2504 | 7786 | 4000 | 3786 |
GPCR | 5359 | 356 | 15343 | 7989 | 7354 |
Davis | 68 | 379 | 25772 | 7320 | 18452 |
Methods | AUC | Precision | Recall |
GNN-CPI | 0.917 ± 0.072 | 0.783 ± 0.061 | 0.889 ± 0.096 |
GNN-PT | 0.978 ± 0.006 | 0.939 ± 0.010 | 0.934 ± 0.006 |
TransformerCPI | 0.972 ± 0.005 | 0.938 ± 0.018 | 0.932 ± 0.001 |
Mutual-DTI | 0.984 ± 0.001 | 0.962 ± 0.019 | 0.943 ± 0.016 |
Methods | AUC | Precision | Recall |
GNN-CPI | 0.899 ± 0.104 | 0.850 ± 0.132 | 0.778 ± 0.192 |
GNN-PT | 0.984 ± 0.007 | 0.940 ± 0.024 | 0.933 ± 0.014 |
TransformerCPI | 0.984 ± 0.004 | 0.943 ± 0.025 | 0.951 ± 0.016 |
Mutual-DTI | 0.987 ± 0.004 | 0.948 ± 0.018 | 0.949 ± 0.013 |
Methods | AUC | Precision | Recall |
no-mutual-DTI | 0.810 ± 0.023 | 0.704 ± 0.014 | 0.768 ± 0.030 |
Mutual-DTI | 0.820 ± 0.014 | 0.699 ± 0.010 | 0.796 ± 0.046 |
Methods | AUC | Precision | Recall |
no-mutual-DTI | 0.886 ± 0.005 | 0.728 ± 0.023 | 0.654 ± 0.005 |
Mutual-DTI | 0.900 ± 0.002 | 0.767 ± 0.013 | 0.680 ± 0.027 |
Feature | Representation |
atom type | C, N, O, S, F, P, Cl, Br, B, H (onehot) |
degree of atom | 0, 1, 2, 3, 4, 5 (onehot) |
number of hydrogens attached | 0, 1, 2, 3, 4 (onehot) |
implicit valence electrons | 0, 1, 2, 3, 4, 5 (onehot) |
aromaticity | 0 or 1 |
Name | Value |
Dimension of atom representation | 10 |
Dimension of protein representation | 10 |
Number of decoder layers | 3 |
Number of hidden layers | 10 |
Number of attention heads | 2 |
Learning rate | 5e-3 |
Weight decay | 1e-6 |
Dropout | 0.1 |
Batch size | 128 |
Dataset | Drugs | Proteins | Interactions | Positive | Negative |
Human | 1052 | 852 | 6728 | 3369 | 3359 |
C.elegans | 1434 | 2504 | 7786 | 4000 | 3786 |
GPCR | 5359 | 356 | 15343 | 7989 | 7354 |
Davis | 68 | 379 | 25772 | 7320 | 18452 |
Methods | AUC | Precision | Recall |
GNN-CPI | 0.917 ± 0.072 | 0.783 ± 0.061 | 0.889 ± 0.096 |
GNN-PT | 0.978 ± 0.006 | 0.939 ± 0.010 | 0.934 ± 0.006 |
TransformerCPI | 0.972 ± 0.005 | 0.938 ± 0.018 | 0.932 ± 0.001 |
Mutual-DTI | 0.984 ± 0.001 | 0.962 ± 0.019 | 0.943 ± 0.016 |
Methods | AUC | Precision | Recall |
GNN-CPI | 0.899 ± 0.104 | 0.850 ± 0.132 | 0.778 ± 0.192 |
GNN-PT | 0.984 ± 0.007 | 0.940 ± 0.024 | 0.933 ± 0.014 |
TransformerCPI | 0.984 ± 0.004 | 0.943 ± 0.025 | 0.951 ± 0.016 |
Mutual-DTI | 0.987 ± 0.004 | 0.948 ± 0.018 | 0.949 ± 0.013 |
Methods | AUC | Precision | Recall |
no-mutual-DTI | 0.810 ± 0.023 | 0.704 ± 0.014 | 0.768 ± 0.030 |
Mutual-DTI | 0.820 ± 0.014 | 0.699 ± 0.010 | 0.796 ± 0.046 |
Methods | AUC | Precision | Recall |
no-mutual-DTI | 0.886 ± 0.005 | 0.728 ± 0.023 | 0.654 ± 0.005 |
Mutual-DTI | 0.900 ± 0.002 | 0.767 ± 0.013 | 0.680 ± 0.027 |