Analysis of yellow fever prevention strategy from the perspective of mathematical model and cost-effectiveness analysis

Bevina D. Handari; Dipo Aldila; Bunga O. Dewi; Hanna Rosuliyana; Sarbaz H. A. Khosnaw; Bevina D. Handari; Dipo Aldila; Bunga O. Dewi; Hanna Rosuliyana; Sarbaz H. A. Khosnaw

doi:10.3934/mbe.2022084

Mathematical Biosciences and Engineering

2022, Volume 19, Issue 2: 1786-1824. doi: 10.3934/mbe.2022084

Previous Article Next Article

Research article Special Issues

Analysis of yellow fever prevention strategy from the perspective of mathematical model and cost-effectiveness analysis

1.
Department of Mathematics, Universitas Indonesia, Kampus UI Depok, Depok 16424, Indonesia
2.
Department of Mathematics, University of Raparin, Ranya 46012, Kurdistan Region of Iraq

Academic Editor: Higinio Ramos

Received: 27 October 2021 Accepted: 14 December 2021 Published: 17 December 2021

We developed a new mathematical model for yellow fever under three types of intervention strategies: vaccination, hospitalization, and fumigation. Additionally, the side effects of the yellow fever vaccine were also considered in our model. To analyze the best intervention strategies, we constructed our model as an optimal control model. The stability of the equilibrium points and basic reproduction number of the model are presented. Our model indicates that when yellow fever becomes endemic or disappears from the population, it depends on the value of the basic reproduction number, whether it larger or smaller than one. Using the Pontryagin maximum principle, we characterized our optimal control problem. From numerical experiments, we show that the optimal levels of each control must be justified, depending on the strategies chosen to optimally control the spread of yellow fever.

Keywords:

Citation: Bevina D. Handari, Dipo Aldila, Bunga O. Dewi, Hanna Rosuliyana, Sarbaz H. A. Khosnaw. Analysis of yellow fever prevention strategy from the perspective of mathematical model and cost-effectiveness analysis[J]. Mathematical Biosciences and Engineering, 2022, 19(2): 1786-1824. doi: 10.3934/mbe.2022084

Related Papers:

[1]	Virginia Giorno, Amelia G. Nobile . Exact solutions and asymptotic behaviors for the reflected Wiener, Ornstein-Uhlenbeck and Feller diffusion processes. Mathematical Biosciences and Engineering, 2023, 20(8): 13602-13637. doi: 10.3934/mbe.2023607
[2]	Buyu Wen, Bing Liu, Qianqian Cui . Analysis of a stochastic SIB cholera model with saturation recovery rate and Ornstein-Uhlenbeck process. Mathematical Biosciences and Engineering, 2023, 20(7): 11644-11655. doi: 10.3934/mbe.2023517
[3]	Junjing Xiong, Xiong Li, Hao Wang . The survival analysis of a stochastic Lotka-Volterra competition model with a coexistence equilibrium. Mathematical Biosciences and Engineering, 2019, 16(4): 2717-2737. doi: 10.3934/mbe.2019135
[4]	Virginia Giorno, Serena Spina . On the return process with refractoriness for a non-homogeneous Ornstein-Uhlenbeck neuronal model. Mathematical Biosciences and Engineering, 2014, 11(2): 285-302. doi: 10.3934/mbe.2014.11.285
[5]	Aniello Buonocore, Luigia Caputo, Enrica Pirozzi, Maria Francesca Carfora . A simple algorithm to generate firing times for leaky integrate-and-fire neuronal model. Mathematical Biosciences and Engineering, 2014, 11(1): 1-10. doi: 10.3934/mbe.2014.11.1
[6]	Yuanshi Wang, Donald L. DeAngelis . A mutualism-parasitism system modeling host and parasite with mutualism at low density. Mathematical Biosciences and Engineering, 2012, 9(2): 431-444. doi: 10.3934/mbe.2012.9.431
[7]	Meng Gao, Xiaohui Ai . A stochastic Gilpin-Ayala mutualism model driven by mean-reverting OU process with Lévy jumps. Mathematical Biosciences and Engineering, 2024, 21(3): 4117-4141. doi: 10.3934/mbe.2024182
[8]	Jinyu Wei, Bin Liu . Global dynamics of a Lotka-Volterra competition-diffusion-advection system for small diffusion rates in heterogenous environment. Mathematical Biosciences and Engineering, 2021, 18(1): 564-582. doi: 10.3934/mbe.2021031
[9]	Giuseppe D'Onofrio, Enrica Pirozzi . Successive spike times predicted by a stochastic neuronal model with a variable input signal. Mathematical Biosciences and Engineering, 2016, 13(3): 495-507. doi: 10.3934/mbe.2016003
[10]	Sheng Wang, Lijuan Dong, Zeyan Yue . Optimal harvesting strategy for stochastic hybrid delay Lotka-Volterra systems with Lévy noise in a polluted environment. Mathematical Biosciences and Engineering, 2023, 20(4): 6084-6109. doi: 10.3934/mbe.2023263

Abstract

1. Introduction

Predicting and identifying DTI is of great significance in medicine and biology ^[1,2,3]. Measuring the affinity with drugs and proteins in wet lab is the most expensive and time-consuming method. However, these experiments are often expensive and time-consuming due to the complexity of molecules ^[4]. Virtual screening (VS) through computation can significantly reduce costs. Structure-based VS and ligand-based VS, which are classical virtual screening methods, have achieved great success ^[5,6]. Structure-based methods use 3-dimensional (3D) conformations of proteins and drugs for study of bioactivity. The ligand-based method is based on the assumption that similar molecules will interact with similar proteins ^[7]. However, the application of these methods is limited. For example, ligand-based VS methods perform poorly when the molecule has a few known binding proteins and structure-based VS methods cannot be executed when the 3D structure of a protein is unknown. Since the accurate reconstruction of protein structures is still to be developed, the construction of 3D-free DTI prediction methods has attracted increasing attention ^[8]. The machine learning approach considers the chemical space, genomic space and their interactions in a specific framework and formulates the DTI prediction as a classification problem following a feature-based and similarity-based approach. Similarity-based approaches rely on the assumption that drugs with similar structures should have similar effects, feature-based approaches construct a feature vector consisting of a combination of descriptors for the drug and the protein as model inputs. Bleakley et al. ^[9] proposed a new supervised method of inference to predict unknown drug-target interactions, which uses support vector machines as local classifiers. Since then, a variety of machine learning-based algorithms have been proposed that consider both composites and protein information in a unified model ^{[10,11,12,13,14,15,16,17,18,19]}.

In recent years, the development of deep learning in drug discovery has been rapid. In comparison to traditional machine learning, end-to-end models eliminate the need to define and compute descriptors before modeling, providing different strategies and representations for proteins and drugs. Initially, manually specified descriptors were used to represent drugs and proteins, and a fully connected neural network (FCN) was designed to make predictions ^[20]. Since descriptors are designed from a single perspective and cannot be changed during the training process, descriptor-based approaches cannot extract task-relevant features. Therefore, many end-to-end models have been proposed. Lee et al. ^[21] proposed a model called DeepConv-DTI to predict DTI. The model uses convolution layer to extract local residue features of generalized proteins. Tsubaki et al. ^[22] used different models to represent drugs and proteins from the perspective of considering the structure of drugs as graph structures. Graph neural networks (GNN) were used to learn features from drug sequences, and convolutional neural network (CNN) was used to train protein sequences. In order to consider deeper features between molecules, Li et al. ^[23] proposed a multi-objective neural network (MONN), which introduced a gate recurrent unit (GRU) module to predict the affinity and can accurately determine the interaction and affinity between molecules. Zamora-Resendiz et al. ^[24] defined a new spatial graph convolutional network (GCN) architecture that employs graph reduction to reduce the number of training parameters and facilitate the abstraction of intermediate representations. Ryu et al. ^[25] combined GCN with an attention mechanism to enable GCN to identify atoms in different environments, which could extract better structural features related to a target molecular property such as solubility, polarity, synthetic accessibility and photovoltaic efficiency, compared to the vanilla GCN. Ru et al. ^[26] combined the ideas of adjacency and learning to rank to establish correlations between proteins and drugs using adjacency, and predicted the binding affinity of drugs and proteins using learning to rank methods to input features into the classifier. Transformer is a model that uses a self-attention to improve the speed of model training, and has achieved great success in the field of natural language processing. Wang et al. ^[27] first extracted drug carriers by GNN, and then represented protein features by Transformer and CNN to obtain remote interaction information in proteins by a one-sided attention mechanism. Chen et al. ^[28] obtained the interaction features by Transformer decoder and proposed a more rigorous method for data set partitioning. Ren et al. ^[29] presented a deep learning framework based on multimodal representation and meta-path semantic analysis, which drugs and proteins are represented as multimodal data and the relationships between them are captured by meta-path semantic analysis. However, most of these methods only consider a single non-covalent interaction between drugs and proteins. In fact, there is much more than one interaction between drugs and proteins.

Inspired by the Transformer decoder, which is able to extract long-range interdependent features ^[28], this paper proposed a dual-pathway model for DTI prediction based on mutual reaction features, called Mutual-DTI. The transformer's decoder was modified to treat drugs and proteins as two distinct sequences. Additionally, a module was added to extract mutual features that enable learning of the complex interactions between atoms and amino acids. Figure 1 shows an overview of the entire network. The dual pathway approach has also been applied in other field. For example, dual attention matching (DAM) ^[30] was proposed to learn the global features from local features with self-attention but ignored the mutual influence information between local features of two modalities.

Figure 1. An overview of the entire network frame.

DownLoad: Full-Size Img PowerPoint

In this paper, we captured the spatial and other feature information of drugs with a GNN and represented the protein features with a gated convolutional network. The drug features and protein features were then input as two sequences into the Transformer decoder, which included the mutual feature module. Different from DAM, the mutual feature module simultaneously considered the local features of both drug molecules and proteins, which effectively extracted the interaction features between two sequences. Finally, the drug-protein feature vector was input into the fully connected layer for prediction. We expected the Mutual-DTI model to exhibit better performance and generality with the addition of the mutual feature module. To validate this, we evaluated it on two benchmark datasets and conducted ablation experiments on a more tightly delineated dataset ^[28]. The results showed that Mutual-DTI exhibited better performance. We further visualized the attention scores obtained by Mutual-DTI learning, and the results showed that the mutual feature module of Mutual-DTI helped to reduce the search space of binding points.

2. Materials and methods

2.1. Drug model

GNN aggregates operations to extract node features in a graph. We represent drugs using a graph structure where nodes represent atoms like carbon and hydrogen, and edges represent chemical bonds like single and double bonds. We use the RDKit Python library^* to convert a simplified molecular input line entry system (SMILES) string into a two-dimensional drug molecular graph.

^*Website:http://www.rdkit.org/

We define a drug graph by $G\; = \left\{ V, E \right\}$ , where $V$ is the set of atomic nodes and $E$ is the set of chemical bond edges. Considering the small number of atomic and chemical bond types, we perform a local breadth-first search for the nodes in the graph, where the search depth r is equal to the number of hops of a particular node ^[22,31]. For example, we start searching from node $v_{i}$ , traverse the subgraph of range r and record the information of all neighboring nodes and edges of node $v_{i}$ in the subgraph. We define subgraph for node $v_{i}$ in the depth r range as follows:

$\begin{equation} G_{i}^{{sub}^{r}} = \left( {V_{i}^{{sub}^{r}}, E_{i}^{{sub}^{r}}} \right) \end{equation}$

(2.1)

$\begin{equation} E_{i}^{{sub}^{r}} = \left\{ e_{mn} \middle| m, n \in N(i, r) \right\} \end{equation}$

(2.2)

where $N(i, r)$ is the set of nodes adjacent to $v_{i}$ in the subgraph $G_{i}^{{sub}^{r}}$ , including $v_{i}$ . $e_{mn}$ is the edge connecting $v_{m}$ and node $v_{n}$ .

According to the subgraph $G_{i}^{{sub}^{r}}$ , we can extract the corresponding chemical features, such as atomic type, atomicity, aromaticity, etc. The details are shown in Table 1.

Table 1. Atomic characteristics and representation.

Feature	Representation
atom type	C, N, O, S, F, P, Cl, Br, B, H (onehot)
degree of atom	0, 1, 2, 3, 4, 5 (onehot)
number of hydrogens attached	0, 1, 2, 3, 4 (onehot)
implicit valence electrons	0, 1, 2, 3, 4, 5 (onehot)
aromaticity	0 or 1

| Show Table

DownLoad: CSV

We use the random initialized embedding of the extracted chemical features as the initial input to the GNN. We denote the embedding of the n-th layer network node $v_{i}$ as $f_i^{(n)}\in\mathbb{R}^d$ , In the GNN, we update $f_i^{(n)}$ according to the following equation:

$\begin{equation} f_{i}^{(n)} = \sigma\left( {f_{i}^{(n - 1)} + {\sum\limits_{j \in N(i, r)}h_{ij}^{(n - 1)}}} \right) \end{equation}$

(2.3)

where $\sigma$ is the sigmoid function: $\sigma(x) = {1/\left({1 + e^{- x}} \right)}$ and $h_{ij}^{\left(n-1\right)}$ is the hidden vector between nodes $v_i$ and $v_j$ . This hidden vector can be computed by the following neural network:

$\begin{equation} h_{ij}^{(n)} = \varepsilon\left( {{\omega f}_{i}^{(n)} + b} \right) \end{equation}$

(2.4)

where $\varepsilon$ is the nonlinear activation function ReLU: $\varepsilon\left(x\right) = max(0, x), \omega\in\mathbb{R}^{d\times d}$ is the weight hyperparameter and $b\in\mathbb{R}^d$ is the deviation vector. After the GNN layer, we obtain the feature vector $c_1, c_2, c_3, \cdots, c_l$ of a drug sequence, where $l$ is the number of atoms in the sequence.

2.2. Protein model

A protein sequence consists of 20 amino acids. If we learn a protein sequence as a sentence, there are only 20 kinds of words that make up the sentence. To increase the diversity of features, based on the n-gram language model, we define the words in a protein sequence as n-gram amino acids ^[22]. For a given amino acid sequence, we split it into repeated n-gram amino acid sequences. For example, we set n to 3, and the protein sequence $MVVMNSL\cdots TSQATP$ is split into $MVV, VVM, VMN, MNS, \cdots, TSQ, SQA, QAT, ATP$ , so that the variety of words composing the sentence will be expanded to ${20}^3$ .

To ensure a reasonable vocabulary, we sets n to 3. For a given protein sequence $S\ = \ a_1a_2a_3\cdots a_L$ , where $L$ is the length of the protein sequence and $a_i$ is the i-th amino acid. We split it into:

$\left\lbrack {a_{1}a_{2}a_{3}} \right\rbrack, \left\lbrack {a_{2}a_{3}a_{4}} \right\rbrack, \cdots, \left\lbrack {a_{L - 2}a_{L - 1}a_{L}} \right\rbrack$

We use $a_{i:i+2}\in\mathbb{R}^d$ to denote the d-dimensional embedding of the word $\left[a_ia_{i+1}a_{i+2}\right]$ . We do the initialize d-dimensional embedding of the protein sequence processed by the above method and then input it into a gated convolution network with Conv1D and gated linear units ^[32]. We compute the hidden layer according to Eq (2.5):

$\begin{equation} L_{i}(X) = \left( {X \times \omega_{1} + s} \right) \otimes \sigma\left( {X \times \omega_{2} + t} \right) \end{equation}$

(2.5)

where $L_i$ is the i-layer in the gated convolution network, $X\in\mathbb{R}^{n\times d_1}$ is the input to the i-layer, $\omega_1\in\mathbb{R}^{d_1\times d_2}, s\in\mathbb{R}^{d_2}, \omega_2\in\mathbb{R}^{d_1\times d_2}$ and $t\in\mathbb{R}^{d_2}$ are the learning parameters, n is the length of the sequence. $d_1{, d}_2$ are the dimensions of the input and hidden features respectively, $\sigma$ is sigmoid function and $\otimes$ is matrix product. The output of the gated convolution network is the final representation of the protein sequence.

2.3. Mutual model

We extracted feature vectors of drug and protein sequences using the drug and protein modules and inputted them into the Transformer decoder. The decoder learned mutual features, resulting in drug and protein sequences with interaction features as output. Since the order of the feature vectors has no effect on the DTI modeling, we remove the positional embedding in the Transformer. The key technology of the decoder is the multi-headed self-attention layer. The multi-headed self-attention layer consists of several scaled point-attention layers for extracting the interaction information between the encoder and the decoder. The self-attention layer accepts three inputs, i.e., key K, value V and query Q, and computes the attention in the following manner:

$\begin{equation} Self\_ attention(Q, K, V) = softmax\left( \frac{QK^{T}}{\sqrt{d_{k}}} \right)V \end{equation}$

(2.6)

where $d_k$ is a scaling factor that depends on the number of layers. Considering the complex reaction processes involving non-covalent chemical bonds within drugs and proteins, we added a module to extract interaction features using a multi-headed self-attentive layer in the Transformer decoder. The decoder takes the drug and protein sequences as inputs, enabling the extraction of drug- and protein-dominated interaction features simultaneously. The module further extracts complex interaction feature vectors between atoms and amino acids within the sequence, as depicted in Figure 1.

After the interaction features are extracted by the decoder, we obtain the interaction feature matrices $D \in \mathbb{R}^{{b \times n}_{1} \times d}$ and $P \in \mathbb{R}^{b \times n_{2} \times d}$ for the drugs and proteins. Where $b$ is the batch size, $n_1, n_2$ is the number of words and $d$ is the feature dimension. Averaging over the different dimensions of the feature matrix :

$\begin{equation} D_{a}\; = \; mean(D, 1) \end{equation}$

(2.7)

$\begin{equation} P_{a}\; = \; mean(P, 1) \end{equation}$

(2.8)

where $mean(input, dim)$ is a mean operation that returns the mean value of each row in the given dimension dim. The obtained feature vectors $D_a$ and $P_a$ are concatenated and fed to the classification block.

2.4. Classified block

The classification module consists of a multilayer fully connected neural network with an activation function of ReLU and a final layer output representing the interaction probability $\hat{y}$ . As a binary classification task, we use binary cross-entropy loss to train Mutual-DTI:

$\begin{equation} Loss = - \left\lbrack {ylog\hat{y}\; + (1 - y)log\left( 1 - \hat{y} \right)\; } \right\rbrack \end{equation}$

(2.9)

Mutual-DTI was implemented with Pytorch 1.10.0. The original transformer model had 6 layers and contained 512 hidden dimensions, we reduced the number of layers from 6 to 3, the number of hidden layers from 512 to 10, the number of protein representations, atomi representations, hidden layers and y interactions to 10 and the number of attention heads to 2, as this configuration achieves excellent generalization capabilities. During training, we used the Adam optimizer ^[33] with the learning rate set to 0.005 and the batch size set to 128.All settings and hyperparameters of Mutual-DTI are shown in the Table 2.

Table 2. Hyperparameters of Mutual-DTI.

Name	Value
Dimension of atom representation	10
Dimension of protein representation	10
Number of decoder layers	3
Number of hidden layers	10
Number of attention heads	2
Learning rate	5e-3
Weight decay	1e-6
Dropout	0.1
Batch size	128

| Show Table

DownLoad: CSV

3. Results

3.1. Performance on public datasets

The human dataset and the C.elegans dataset were created by Liu et al. ^[34]. These two datasets comprise of compound-protein pairs, which include both negative and positive samples that are highly plausible. The human dataset comprises 3369 positive interactions between 1052 unique compounds and 852 unique proteins, while the C. elegans dataset comprises 4000 positive interactions between 1434 unique compounds and 2504 unique proteins. As shown in Table 3. We randomly divided into training set, validation set and test set in the ratio of 8:1:1. In addition, we utilized AUC, precision and recall as evaluation metrics for Mutual-DTI, and compared it with some traditional machine learning methods on both the human and C. elegans datasets. These traditional machine learning methods for comparison are k-NN, RF, L2-logistic (L2) and SVM. their results are from the original paper ^[34]. The main results are shown in Figure 2. Mutual-DTI outperforms the machine learning methods under both benchmark datasets.

Table 3. Summary of the datasets.

Dataset	Drugs	Proteins	Interactions	Positive	Negative
Human	1052	852	6728	3369	3359
C.elegans	1434	2504	7786	4000	3786
GPCR	5359	356	15343	7989	7354
Davis	68	379	25772	7320	18452

| Show Table

DownLoad: CSV

Figure 2. In k-NN, RF, L2, SVM and Mutual-DTI of human dataset and C.elegans dataset, AUC, accuracy and recall, where (a) is the human dataset and (b) is the C.elegans dataset.

DownLoad: Full-Size Img PowerPoint

In other experiments, we compare the proposed method with recent deep learning methods used for DTI prediction. The methods are as follows: GNN-CPI ^[22], GNN-PT ^[27], TransformerCPI ^[28]. The main hyperparameters are set as follows:

GNN-CPI: vector dimensionality of vertices, edges and n-grams = 10, numbers of layers in gnn = 3, window size = 11, numbers of layers in cnn = 2, numbers of layers in output = 3.

GNN-PT: numbers of layers in gnn = 3, numbers of layers in output = 1, heads of attention = 2.

TransformerCPI: dimension of atom = 34, dimension of protein = 100, dimension of hidden = 64, number of hidden layers = 3, heads of attention = 8.

The same settings of the original paper were used for all parameter settings. We used the same preprocessing for the initial data of drugs and proteins. We preprocessed the dataset in the same way as in previous experiments, and repeated the experiment three times using different random seeds. For each repetition, we randomly split the dataset and saved the model parameters corresponding to the optimal validation set AUC for each test set. The main results under the human dataset and the C. elegans dataset are shown in Tables 4 and 5. On the human dataset, the average evaluation metrics of Mutual-DTI are 0.984, 0.962 and 0.943 for AUC, precision and recall, respectively, which outperform the other methods. On the C. elegans dataset, the average evaluation metrics AUC, precision and recall of Mutual-DTI are 0.987, 0.948 and 0.949, respectively, which mostly outperform the other models. The results suggest that Mutual-DTI can effectively learn informative features for predicting interactions from both one-dimensional protein sequences and two-dimensional molecular maps, demonstrating its generalizability across different datasets.

Table 4. Comparison between model and baseline on human dataset.

Methods	AUC	Precision	Recall
GNN-CPI	0.917 $\pm$ 0.072	0.783 $\pm$ 0.061	0.889 $\pm$ 0.096
GNN-PT	0.978 $\pm$ 0.006	0.939 $\pm$ 0.010	0.934 $\pm$ 0.006
TransformerCPI	0.972 $\pm$ 0.005	0.938 $\pm$ 0.018	0.932 $\pm$ 0.001
Mutual-DTI	0.984 $\pm$ 0.001	0.962 $\pm$ 0.019	0.943 $\pm$ 0.016

| Show Table

DownLoad: CSV

Table 5. Comparison between model and baseline on C.elegans dataset.

Methods	AUC	Precision	Recall
GNN-CPI	0.899 $\pm$ 0.104	0.850 $\pm$ 0.132	0.778 $\pm$ 0.192
GNN-PT	0.984 $\pm$ 0.007	0.940 $\pm$ 0.024	0.933 $\pm$ 0.014
TransformerCPI	0.984 $\pm$ 0.004	0.943 $\pm$ 0.025	0.951 $\pm$ 0.016
Mutual-DTI	0.987 $\pm$ 0.004	0.948 $\pm$ 0.018	0.949 $\pm$ 0.013

| Show Table

DownLoad: CSV

3.2. Effectiveness of mutual feature module

To evaluate the importance of interactive feature modules, we propose two sub-models. The first one is no-mutual block, which has no mutual feature module, another network with mutual feature module. In order to improve the accuracy of the experimental results, we evaluated a more strictly divided GPCR dataset ^[28]. As shown in Table 3. The key to constructing this dataset is that the drugs in its training set appear in only one class of samples (positive interaction or negative interaction DTI pairs), and in the test set appear in only the opposite class of samples. This forces the model to use protein information to learn interaction patterns and make opposite predictions for selected drugs, which is more realistic.

Table 6 shows the prediction performance of the two models on the GPCR dataset. As shown in the table, by comparing models with and without mutual features module, it can be concluded that improvements can indeed be achieved using interaction features. This suggests the need to establish correlations between drug and protein information in the DTI prediction extrapolation process. We also conducted experiments on the Davis dataset, which contains 7320 positive and 18,452 negative interactions. The Davis dataset we used was created by Zhao et al. ^[35]. As shown in Table 7, the model with the interaction module included also showed superior performance on the unbalanced dataset.

Table 6. Comparison between the model and no-mutual on GPCR dataset.

Methods	AUC	Precision	Recall
no-mutual-DTI	0.810 $\pm$ 0.023	0.704 $\pm$ 0.014	0.768 $\pm$ 0.030
Mutual-DTI	0.820 $\pm$ 0.014	0.699 $\pm$ 0.010	0.796 $\pm$ 0.046

| Show Table

DownLoad: CSV

Table 7. Comparison between the model and no-mutual on Davis dataset.

Methods	AUC	Precision	Recall
no-mutual-DTI	0.886 $\pm$ 0.005	0.728 $\pm$ 0.023	0.654 $\pm$ 0.005
Mutual-DTI	0.900 $\pm$ 0.002	0.767 $\pm$ 0.013	0.680 $\pm$ 0.027

| Show Table

DownLoad: CSV

3.3. Robustness of model

In this section, we employed a three-dimensional surface plot to analyze the impact of model hyperparameters (atomic and protein dimensions) on the DTI prediction performance. Since these two parameters are among the most important hyperparameters. We sampled values for atomic and protein dimensions from 10 to 40, with a gap of 10 dimensions each time. For instance, the atomic dimension/protein dimension values were: 10/10, 10/20, 10/30, ..., 20/10, 20/20, ..., 40/30, 40/40, for a total of 16 different settings, and experiments were conducted multiple times with different random seeds. Other settings were the same as the previous experiments. As shown in the Figure 3, the x-axis represents the atomic dimension, the y-axis represents the protein dimension and the z-axis represents the AUC obtained from the test set. From the results, it can be seen that under different dimension settings, the surfaces are very smooth and the model exhibits good robustness.

Figure 3. 3D surface plots of different dimensions on two datasets, the x-axis represents the atomic dimension, the y-axis represents the protein dimension and the z-axis represents the AUC obtained from the test set. (a) is the human dataset and (b) is the C.elegans dataset.

DownLoad: Full-Size Img PowerPoint

3.4. Model interpretation

To demonstrate that the mutual feature module we introduced not only enhances the performance of the model but also provides deeper interpretation, we conducted a case study. First, we applied a Frobenius parametric solution to the protein feature vector matrix obtained by the Transformer decoder. Next, we used the Softmax function to derive attention weights of the protein sequences, which were then mapped onto the 3D structure of the complex to visualize the regions that are more efficient for drug-protein reaction. The attention weights of the crystal structure of gw0385-bound HIV protease D545701 (PDB:2FDD) are shown in Figure 4. The complex has a total of 12 binding sites. We have marked the regions that received high attention more than 0.75 in red, and it can be found that a total of 4 of the 12 binding sites received high attention scores, namely ASP-25, ALA-28, PRO-81 and ALA82. The results show that Mutual-DTI helps to narrow down the search space of binding sites.

Figure 4. Attention weight of Protease (PDB: 2FDD).

DownLoad: Full-Size Img PowerPoint

4. Discussion

Based on our prior experiments on robustness, it is evident that the model's AUC slightly varies when encoding and embedding atoms and proteins in various dimensions. Our inference is that the GNN module, which learns the drug molecule characteristics, and the gated convolutional unit module, which learns the protein features, can effectively extract feature information in the Mutual-DTI model. This indicates that constructing drug and protein sequences into subgraphs via local breadth-first search algorithms and constructing words via n-gram methods is reasonable. In the ablation experiment, we discovered that removing the module that extracts mutual reaction features resulted in a significant decrease in the model's prediction accuracy. Our speculation is that the model only learns each sequence's individual feature information after learning drug and protein features, while DTI is a dynamic process. Introducing the Mutual learning module treats drug and protein features as the main body, and they dynamically focus on each other's key parts in the learning layer, thus directly capturing the interaction features of the two given sequences. By learning the interaction features, the model obtains a deeper understanding of the DTI process and can more easily capture the crucial parts that may contribute to the reaction when dealing with unknown drug and protein data, leading to superior performance in predicting results.

From the perspective of model complexity analysis, Mutual-DTI has higher complexity compared to the networks that only use GNN(e.g., GNN-CPI). This is because we use a more complex self-attention mechanism, which allows us to capture long-range dependencies between tokens in the sequence. Compared to TransformerCPI, which is also based on Transformer, Mutual-DTI has lower complexity. The rationale behind this is that, while Mutual-DTI takes into account two parallel multi-head attention layers, the network's number of attention heads is diminished from 8 to 2, and we devised a lower hidden layer dimension. Consequently, this has notably curtailed the number of parameters in the Mutual-DTI. These designs help Mutual-DTI better fit the training data while avoiding overfitting due to excessive complexity.

5. Conclusions

In this paper, we present a Transformer-based network model for predicting DTI and introduce a module for extracting sequence interaction features to model complex reaction processes between atoms and amino acids. To validate the effectiveness of Mutual-DTI, we compare it with the latest baseline on two benchmark datasets. The results show that Mutual-DTI outperforms the baseline. We also evaluate Mutual-DTI on the label reversal dataset and observe a significant improvement with the introduction of the mutual feature module. Finally, we map the attention weights obtained by the mutual feature module to the protein sequences, which helps us better interpret the model and determine the reliability of the predictions.

Although Mutual-DTI shows effective performance in predicting DTI, there is still room for improvement. The experimental results show a significant decrease in performance on the strictly limited label inversion dataset compared to the human dataset and the C.elegans dataset. This suggests that the feature extraction of sequences is very limited, and adding a 3D representation of molecules or proteins may help extract more information.

Acknowledgments

The work was supported by Open Funding Project of the State Key Laboratory of Biocatalysis and Enzyme Engineering under grant No. SKLBEE2021020 and SKLBEE2020020, the High-level Talents Fund of Hubei University of Technology under grant No. GCRC2020016, Key Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province under grant No. 2020E10010-02 and Natural Science Foundation of Hubei Province under grant No. 2021CFB282.

Conflict of interest

All authors declare no conflicts of interest in this paper.

References

[1]	J. Florczak-Wyspianska, E. Nawotczynska, W. Kozubski, Yellow fever vaccine-associated neurotropic disease (yel-and)–a case report, Neurol. Neurochir., 51 (2017), 101–105. doi: 10.1016/j.pjnns.2016.09.002. doi: 10.1016/j.pjnns.2016.09.002
[2]	United Nations, Prevention of Yellow Fever, 2019. Available from: https://www.cdc.gov/yellowfever/prevention/index.html.
[3]	United Nations, Yellow Fever, 2019. Available from: https://www.who.int/news-room/fact-sheets/detail/yellow-fever.
[4]	J. E. Staples, A. D. Barrett, A. Wilder-Smith, J. Hombach, Review of data and knowledge gaps regarding yellow fever vaccine-induced immunity and duration of protection, Vaccines, 5 (2020), 1–7. doi: 10.1038/s41541-020-0205-6. doi: 10.1038/s41541-020-0205-6
[5]	D. Aldila, Analyzing the impact of the media campaign and rapid testing for covid-19 as an optimal control problem in east java, Indonesia, Chaos, Solitons Fractals, 141 (2020), 110364. doi: 10.1016/j.chaos.2020.110364. doi: 10.1016/j.chaos.2020.110364
[6]	D. Aldila, Optimal control for dengue eradication program under the media awareness effect, Int. J. Nonlinear Sci. Numer. Simul., 2021. doi: 10.1515/ijnsns-2020-0142. doi: 10.1515/ijnsns-2020-0142
[7]	D. Aldila, Mathematical model for HIV spreads control program with ART treatment, in Journal of physics: Conference series, 974 (2018), 012035.
[8]	C. A. G. Engelhard, A. P. Hodgkins, E. E. Pearl, P. K. Spears, J. Rychtar, D. Taylor, A mathematical model of Guinea worm disease in Chad with fish as intermediate transport hosts, J. Theor. Biol., 521 (2021), 110683. doi: 10.1016/j.jtbi.2021.110683. doi: 10.1016/j.jtbi.2021.110683
[9]	Z. Guo, G. Sun, Z. Wang, Z. Jin, L. Li, C. Li, Spatial dynamics of an epidemic model with nonlocal infection, Appl. Math. Comput., 377 (2020), 125158. doi: 10.1016/j.amc.2020.125158. doi: 10.1016/j.amc.2020.125158
[10]	G. Sun, M. Li, J. Zhang, W. Zhang, X. Pei, Z. Jin, Transmission dynamics of brucellosis: Mathematical modelling and applications in China, Comput. Struct. Biotechnol. J., 18 (2020), 3843–3860. doi: 10.1016/j.csbj.2020.11.014. doi: 10.1016/j.csbj.2020.11.014
[11]	M. Chapwanya, A. Matusse, Y. Dumont, On synergistic co-infection in crop diseases. The case of the Maize Lethal Necrosis Disease, Appl. Math. Modell., 90 (2021), 912–942. doi: 10.1016/j.apm.2020.09.036. doi: 10.1016/j.apm.2020.09.036
[12]	Y. Belgaid, M. Helal, A. Lakmeche, E. Venturino, On the stability of periodic solutions of an impulsive system arising in the control of agroecosystems, in International Symposium on Mathematical and Computational Biology, (2020), 183–199.
[13]	M. Kung'aro, L. Luboobi, F. Shahada, Modelling and stability analysis of SVEIRS yellow fever two host model, Gulf J. Math., 3 (2015), 106–129. doi: 10.1016/j.ces.2015.02.038. doi: 10.1016/j.ces.2015.02.038
[14]	S. Raimundo, M. Amaku, E. Massad, Equilibrium analysis of a yellow fever dynamical model with vaccination, Comput. Math. Methods Med., 2015 (2015), 482091. doi: 10.1155/2015/482091. doi: 10.1155/2015/482091
[15]	U. Danbaba, S. Garba, Stability analysis and optimal control for yellow fever model with vertical transmission, Int. J. Appl. Comput. Math, 6 (2020), 1–34. doi: 10.1007/s40819-020-00860-z. doi: 10.1007/s40819-020-00860-z
[16]	S. Zhao, L. Stone, D. Gao, D. He, Modelling the large-scale yellow fever outbreak in luanda, angola, and the impact of vaccination, PLoS Negl. Trop. Dis., 12 (2018), e0006158. doi: 10.1371/journal.pntd.0006158. doi: 10.1371/journal.pntd.0006158
[17]	F. Agusto, M. Leite, Optimal control and cost-effective analysis of the 2017 meningitis outbreak in Nigeria, Infect. Dis. Modell., 4 (2019), 161–187. doi: 10.1016/j.idm.2019.05.003. doi: 10.1016/j.idm.2019.05.003
[18]	M. Bruyand, M. Receveur, T. Pistone, C. Verdiere, R. Thiebaut, D. Malvy, Yellow fever vaccination in non-immunocompetent patients, Med. Mal. Infect., 38 (2008), 524–532. doi: 10.1016/j.medmal.2008.06.031. doi: 10.1016/j.medmal.2008.06.031
[19]	S. M. Raimundo, M. Amaku, E. Massad, Equilibrium analysis of a yellow fever dynamical model with vaccination, Comput. Math. Methods Med., 2015 (2015), 482091. doi: 10.1155/2015/482091. doi: 10.1155/2015/482091
[20]	Pan American Health Organization/World Health Organization, Epidemiological update: yellow fever, 2018. Available from: https://reliefweb.int/report/brazil/epidemiological-update-yellow-fever-20-march-2018.
[21]	F. M. Shearer, C. L. Moyes, D. M. Pigott, O. J. Brady, F. Marinho, A. Deshpande, et al., Global yellow fever vaccination coverage from 1970 to 2016: an adjusted retrospective analysis, Lancet Infect. Dis., 17 (2017), 1209–1217. doi: 10.1016/S1473-3099(17)30419-X. doi: 10.1016/S1473-3099(17)30419-X
[22]	K. W. Blayneh, A. B. Gumel, S. Lenhart, T. Clayton, Backward bifurcation and optimal control in transmission dynamics of West Nile virus, Bull. Math. Biol., 72 (2010), 1006–1028. doi: 10.1007/s11538-009-9480-0. doi: 10.1007/s11538-009-9480-0
[23]	B. Buonomo, R. D. Marca, Optimal bed net use for a dengue disease model with mosquito seasonal pattern, Math. Methods Appl. Sci., 41 (2018), 573–592. doi: 10.1002/mma.4629. doi: 10.1002/mma.4629
[24]	M. Andraud, N. Hens, C. Marais, P. Beutels, Dynamic epidemiological models for dengue transmission: a systematic review of structural approaches, PLoS ONE, 7 (2012), e49085. doi: 10.1371/journal.pone.0049085. doi: 10.1371/journal.pone.0049085
[25]	P. Cottin, M. Niedrig, C. Domingo, Safety profile of the yellow fever vaccine stamaril: a 17-year review, Expert Rev. Vaccines, 12 (2013), 1351–1368. doi: 10.1586/14760584.2013.836320. doi: 10.1586/14760584.2013.836320
[26]	D. Aldila, T. Götz, E. Soewono, An optimal control problem arising from a dengue disease transmission model, Math. Biosci., 242 (2013), 9–16. doi: 10.1016/j.mbs.2012.11.014. doi: 10.1016/j.mbs.2012.11.014
[27]	T. T. Yusuf, D. O. Daniel, Mathematical modelling of yellow fever transmission dynamics with multiple control measures, Asian Res. J. Math., 13 (2019), 1–15. doi: 10.9734/arjom/2019/v13i430112. doi: 10.9734/arjom/2019/v13i430112
[28]	D. Aldila, M. Angelina, Optimal control problem and backward bifurcation on malaria transmission with vector bias, Heliyon, 7 (2021), e06824. doi: 10.1016/j.heliyon.2021.e06824. doi: 10.1016/j.heliyon.2021.e06824
[29]	B. Handari, F. Vitra, R. Ahya, T. N. S, D. Aldila, Optimal control in a malaria model: intervention of fumigation and bed nets, Adv. Differ. Equations, 2019 (2019), 497. doi: 10.1186/s13662-019-2424-6. doi: 10.1186/s13662-019-2424-6
[30]	M. S. Indriyono Tantoro, Pedoman Pencegahan Penyakit Yellow Fever, in Kementerian Kesehatan Republik Indonesia Direktorat Jenderal Pencegahan dan Pengendalian Penyakit, 2017.
[31]	D. Aldila, B. M. Samiadji, G. M. Simorangkir, S. H. A. Khosnaw, M. Shahzad, Impact of early detection and vaccination strategy in covid-19 eradication program in jakarta, indonesia, BMC Res. Notes, 14 (2021), 132. doi: 10.1186/s13104–021–05540–9. doi: 10.1186/s13104–021–05540–9
[32]	I. M. Wangaria, S. Davisa, L. Stone, Backward bifurcation in epidemic models: Problems arisingwith aggregated bifurcation parameters, Appl. Math. Modell., 40 (2016), 1669–1675. doi: 10.1016/j.apm.2015.07.022. doi: 10.1016/j.apm.2015.07.022
[33]	O. Diekmann, J. A. P. Heesterbeek, M. G. Roberts, The construction of next-generation matrices for compartmental epidemic models, J. R. Soc. Interface, 7 (2010), 873–885. doi: 10.1098/rsif.2009.0386. doi: 10.1098/rsif.2009.0386
[34]	P. van den Driessche, J. Watmough, Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission, Math. Biosci., 180 (2002), 29–48. doi: 10.1016/S0025-5564(02)00108-6. doi: 10.1016/S0025-5564(02)00108-6
[35]	G. Simorangkir, D. Aldila, A. Rizka, H. Tasman, E. Nugraha, Mathematical model of tuberculosis considering observed treatment and vaccination interventions, J. Interdiscip. Math., 24 (2021), 1717–1737. doi: 10.1080/09720502.2021.1958515. doi: 10.1080/09720502.2021.1958515
[36]	D. Aldila, K. Rasyiqah, G. Ardaneswari, H. Tasman, A mathematical model of zika disease by considering transition from the asymptomatic to symptomatic phase, in Journal of Physics: Conference Series, 1821 (2021), 012001. doi: 10.1088/1742-6596/1821/1/012001.
[37]	D. Aldila, B. Handari, Effect of healthy life campaigns on controlling obesity transmission: A mathematical study, in Journal of Physics: Conference Series, 1747 (2021), 012003. doi: 10.1088/1742-6596/1747/1/012003.
[38]	J. Li, Y. Zhao, S. Li, Fast and slow dynamics of malaria model with relapse, Math. Biosci., 246 (2013), 94–104. doi: 10.1098/rspb.2016.0048. doi: 10.1098/rspb.2016.0048
[39]	K. Nudee, S. Chinviriyasit, W. Chinviriyasit, The effect of backward bifurcation in controlling measles transmission by vaccination, Chaos Solitons Fractals, 123 (2018), 400–412. doi: 10.1016/j.chaos.2019.04.026. doi: 10.1016/j.chaos.2019.04.026
[40]	O. Sharomi, C. Podder, A. Gumel, E. Elbasha, J. Watmough, Role of incidence function in vaccine-induced backward bifurcation in some HIV models, Math. Biosci., 210 (2007), 436–463. doi: 10.1016/j.mbs.2007.05.012. doi: 10.1016/j.mbs.2007.05.012
[41]	C. Castillo–Chavez, B. Song, Dynamical models of tuberculosis and their applications, Math. Biosci. Eng., 1 (2014), 361–404. doi: 10.3934/mbe.2004.1.361. doi: 10.3934/mbe.2004.1.361
[42]	N. Chitnis, J. Hyman, J. Cushing, Determining important parameters in the spread of malaria through the sensitivity analysis of a mathematical model, Bull. Math. Biol., 70 (2008), 1272–1296. doi: 10.1007/s11538-008-9299-0. doi: 10.1007/s11538-008-9299-0
[43]	S. H. A. Khosnaw, M. Shahzad, M. Ali, F. Sultan, A quantitative and qualitative analysis of the covid–19 pandemic model, Chaos, Solitons Fractals, 138 (2020), 109932. doi: 10.1016/j.chaos.2020.109932. doi: 10.1016/j.chaos.2020.109932
[44]	S. H. A. Khosnaw, M. Shahzad, M. Ali, F. Sultan, Mathematical modelling for coronavirus disease (covid-19) in predicting future behaviours and sensitivity analysis, Math. Model. Nat. Phenom., 15 (2020), 33. doi: 10.1016/j.chaos.2020.109932. doi: 10.1016/j.chaos.2020.109932
[45]	A. Abidemi, N. Aziz, Optimal control strategies for dengue fever spread in Johor, Malaysia, Comput. Methods Programs Biomed., 196 (2020), 105585. doi: 10.1016/j.cmpb.2020.105585. doi: 10.1016/j.cmpb.2020.105585
[46]	D. Aldila, M. Ndii, B. Samiadji, Optimal control on covid-19 eradication program in Indonesia under the effect of community awareness, Math. Biosci. Eng., 17 (2021), 6355–6389. doi: 10.3934/mbe.2020335. doi: 10.3934/mbe.2020335
[47]	N. Sharma, R. Singh, J. Singh, E. Castillo, Modeling assumptions, optimal control strategies and mitigation through vaccination to zika virus, Chaos, Solitons Fractals, 150 (2021), 111137. doi: 10.1016/j.chaos.2021.111137. doi: 10.1016/j.chaos.2021.111137
[48]	N. Sweilam, S. Al-Mekhlafi, Optimal control for a fractional tuberculosis infection model including the impact of diabetes and resistant strains, J. Adv. Res., 17 (2019), 125–137. doi: 10.3934/mbe.2020335. doi: 10.3934/mbe.2020335
[49]	L. Pontryagin, V. Boltyanskii, R. Gamkrelidze, E. Mishchenko, The Mathematical Theory of Optimal Processes, Interscience Publishers John Wiley & Sons, Inc., New York-London.
[50]	S. Lenhart, J. Workman, Optimal Control Applied to Biological Models, Chapman and Hall, London/Boca Raton, 2007.
[51]	K. Okosun, O. Rachid, N. Marcus, Optimal control strategies and cost-effectiveness analysis of a malaria model, BioSystems, 111 (2013), 83–101. doi: 0.1016/j.biosystems.2012.09.008.
[52]	F. Agusto, I. ELmojtaba, Optimal control and cost-effective analysis of malaria/visceral leishmaniasis co–infection, PLoS ONE, 12 (2017), 1–31. doi: 10.1371/journal.pone.0171102. doi: 10.1371/journal.pone.0171102
[53]	J. Akanni, F. Akinpelu, S. Olaniyi, A. Oladipo, A. Ogunsola, Modelling financial crime population dynamics: optimal control and cost-effectiveness analysis, Int. J. Dyn. Control, 2019 (2019), 1–14. doi: 10.1007/s40435-019-00572-3. doi: 10.1007/s40435-019-00572-3
[54]	E. Q. Lima, M. L. Nogueira, Viral hemorrhagic fever-induced acute kidney injury, in Seminars in Nephrology, 28 (2008), 409–415. doi: 10.1016/j.semnephrol.2008.04.009.
[55]	R. Klitting, E. A. Gould, C. Paupy, X. De Lamballerie, What does the future hold for yellow fever virus?, Genes, 9 (2018), 291. doi: 10.3390/genes9060291. doi: 10.3390/genes9060291
[56]	I. McGuinness, J. D. Beckham, K. L. Tyler, D. M. Pastula, An overview of yellow fever virus disease, Neurohospitalist, 7 (2017), 157. doi: 10.1177/1941874417708129. doi: 10.1177/1941874417708129

This article has been cited by:

1.	Yang Zhang, Caiqi Liu, Mujiexin Liu, Tianyuan Liu, Hao Lin, Cheng-Bing Huang, Lin Ning, Attention is all you need: utilizing attention in AI-enabled drug discovery, 2023, 25, 1467-5463, 10.1093/bib/bbad467
2.	Jiahui Wen, Haitao Gan, Zhi Yang, Ming Shi, Ji Wang, 2024, Chapter 30, 978-981-99-8140-3, 400, 10.1007/978-981-99-8141-0_30
3.	Farnaz Palhamkhani, Milad Alipour, Abbas Dehnad, Karim Abbasi, Parvin Razzaghi, Jahan B. Ghasemi, DeepCompoundNet: enhancing compound–protein interaction prediction with multimodal convolutional neural networks, 2023, 0739-1102, 1, 10.1080/07391102.2023.2291829
4.	Wenhui Zhang, Xin Wen, Kewei Zhang, Rui Cao, Baolu Gao, 2024, BN-DTI: A deep learning based sequence feature incorporating method for predicting drug-target interaction, 979-8-3503-6413-2, 1, 10.1109/IJCB62174.2024.10744528
5.	Qian Liao, Yu Zhang, Ying Chu, Yi Ding, Zhen Liu, Xianyi Zhao, Yizheng Wang, Jie Wan, Yijie Ding, Prayag Tiwari, Quan Zou, Ke Han, Application of Artificial Intelligence In Drug-target Interactions Prediction: A Review, 2025, 2, 3005-1444, 10.1038/s44385-024-00003-9

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)