Drug-target binding affinity prediction method based on a deep graph neural network

Dong Ma; Shuang Li; Zhihua Chen; Dong Ma; Shuang Li; Zhihua Chen

doi:10.3934/mbe.2023012

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 1: 269-282. doi: 10.3934/mbe.2023012

Previous Article Next Article

Research article Special Issues

Drug-target binding affinity prediction method based on a deep graph neural network

1.
Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
2.
Beidahuang Industry Group General Hospital, Harbin, China

Academic Editor: Leyi Wei
† The authors contributed equally to this work.

Received: 01 August 2022 Revised: 06 September 2022 Accepted: 07 September 2022 Published: 30 September 2022

The development of new drugs is a long and costly process, Computer-aided drug design reduces development costs while computationally shortening the new drug development cycle, in which DTA (Drug-Target binding Affinity) prediction is a key step to screen out potential drugs. With the development of deep learning, various types of deep learning models have achieved notable performance in a wide range of fields. Most current related studies focus on extracting the sequence features of molecules while ignoring the valuable structural information; they employ sequence data that represent only the elemental composition of molecules without considering the molecular structure maps that contain structural information. In this paper, we use graph neural networks to predict DTA based on corresponding graph data of drugs and proteins, and we achieve competitive performance on two benchmark datasets, Davis and KIBA. In particular, an MSE of 0.227 and CI of 0.895 were obtained on Davis, and an MSE of 0.127 and CI of 0.903 were obtained on KIBA.

Keywords:

Citation: Dong Ma, Shuang Li, Zhihua Chen. Drug-target binding affinity prediction method based on a deep graph neural network[J]. Mathematical Biosciences and Engineering, 2023, 20(1): 269-282. doi: 10.3934/mbe.2023012

Related Papers:

[1]	Zongwei Ma, Hongying Shu . Viral infection dynamics in a spatial heterogeneous environment with cell-free and cell-to-cell transmissions. Mathematical Biosciences and Engineering, 2020, 17(3): 2569-2591. doi: 10.3934/mbe.2020141
[2]	Xinran Zhou, Long Zhang, Tao Zheng, Hong-li Li, Zhidong Teng . Global stability for a class of HIV virus-to-cell dynamical model with Beddington-DeAngelis functional response and distributed time delay. Mathematical Biosciences and Engineering, 2020, 17(5): 4527-4543. doi: 10.3934/mbe.2020250
[3]	Xuejuan Lu, Lulu Hui, Shengqiang Liu, Jia Li . A mathematical model of HTLV-I infection with two time delays. Mathematical Biosciences and Engineering, 2015, 12(3): 431-449. doi: 10.3934/mbe.2015.12.431
[4]	Wei Wang, Xiulan Lai . Global stability analysis of a viral infection model in a critical case. Mathematical Biosciences and Engineering, 2020, 17(2): 1442-1449. doi: 10.3934/mbe.2020074
[5]	Yijun Lou, Li Liu, Daozhou Gao . Modeling co-infection of Ixodes tick-borne pathogens. Mathematical Biosciences and Engineering, 2017, 14(5&6): 1301-1316. doi: 10.3934/mbe.2017067
[6]	Chunyang Qin, Yuming Chen, Xia Wang . Global dynamics of a delayed diffusive virus infection model with cell-mediated immunity and cell-to-cell transmission. Mathematical Biosciences and Engineering, 2020, 17(5): 4678-4705. doi: 10.3934/mbe.2020257
[7]	Biao Tang, Weike Zhou, Yanni Xiao, Jianhong Wu . Implication of sexual transmission of Zika on dengue and Zika outbreaks. Mathematical Biosciences and Engineering, 2019, 16(5): 5092-5113. doi: 10.3934/mbe.2019256
[8]	N. H. AlShamrani, A. M. Elaiw . Stability of an adaptive immunity viral infection model with multi-stages of infected cells and two routes of infection. Mathematical Biosciences and Engineering, 2020, 17(1): 575-605. doi: 10.3934/mbe.2020030
[9]	Yanfeng Liang, David Greenhalgh . Estimation of the expected number of cases of microcephaly in Brazil as a result of Zika. Mathematical Biosciences and Engineering, 2019, 16(6): 8217-8242. doi: 10.3934/mbe.2019416
[10]	Elzbieta Ratajczyk, Urszula Ledzewicz, Maciej Leszczynski, Avner Friedman . The role of TNF-α inhibitor in glioma virotherapy: A mathematical model. Mathematical Biosciences and Engineering, 2017, 14(1): 305-319. doi: 10.3934/mbe.2017020

Abstract

1. Introduction

Enteric viruses are harmful pathogens that have been linked to several waterborne disease outbreaks in humans. These viruses can be found in water and food that have been contaminated with fecal waste, either directly or indirectly. They can survive and remain attached to soil sediments, with high rates of resuspension and redistribution in flowing groundwater, particularly during flood runoff events. This poses a significant risk to human health, as it can lead to severe contamination of potable wells because changes in the ionic strength of flowing water may originate high detachment rates of virions from surface collectors (i.e., soil particles) ^[2]. Given these risks, accurate assessments of groundwater quality are crucial, especially in the aftermath of floods ^[3,4,5]. In both the USA (e.g., Big Horn Lodge, WY; Atlantic City, WY; Coeur d'Alene, ID; Island Park, ID) ^[6] and Italy (Salento peninsula) ^[7], there have been several outbreaks caused by contaminated drinking water from fractured bedrock (e.g., limestone) aquifers, such as limestone aquifers. These aquifers have been shown to be particularly susceptible to microbial contamination by norovirus (NoV), hepatitis-A virus (HAV), rotavirus (RoV), EV, and adenovirus (AdV) ^[8,9,10].

Quantifying the microbial health risk assessment (RA) during outbreaks, particularly hepatitis A and gastrointestinal infections observed in regions such as Salento Peninsula, Italy ^[7], caused by ingesting contaminated food and water, is crucial for guiding public health policies for disease prevention and control. RA results are vital for community water management systems (i.e., policymakers) to evaluate water supply quality and set appropriate performance targets for wastewater treatment plants. In risk assessment, dose-response (DR) models are employed to quantify infections through various pathways of pathogenic agents delivered to the host (target). Probabilistic models and coefficient updates can reduce uncertainty and management costs in health risk assessments. DR models are based on pathogen challenges, involving experiments on volunteers, animals, or cells infected with a pathogen under controlled conditions. These experiments are necessary to study infectivity and immunogenicity in human hosts via mechanistic equations, similar to clinical trials of new vaccines. The infection risk is calculated from challenge experiments involving volunteers, determining the percentage of exposed nonimmune individuals who tested positive. DR model extrapolates the infection probability curve for increasing doses to better approach the experimental results. The physical meaning of "infectious dose" and coefficients ^[11] in the most commonly applied exponential and beta-Poisson approximate models are often poorly explained in many studies on RA. The term "infectious" is a requirement from the single-hit theory of the probability of infection, which posits that every single pathogen Poisson distributed in the dose must have the potential to infect the host, thereby making it "infectious." Moreover, certain studies on RA, such as those constructed by Ayuso-Gabella et al. ^[12] and Pecson et al. ^[13], utilize established literature coefficients ^[11] obtained from specific challenge experiments without providing clear explanations for potential discrepancies between the volume of the applied inoculum in the dose of the studied experiment and that found in the literature. These discrepancies often arise because using an exponential or approximate beta-Poisson DR model scaling from 1 to 100 ml of inoculum volume of the dose leads to a significant horizontal shift in the curve of the predicted infection risk ^[14]. As suggested by Schmidt ^[15], when applying DR models in RA, particularly exponential and approximate beta-Poisson models, it is crucial to carefully investigate exposure assessment, specifically the estimation of the mean infectious dose. This refers to the product of the infectious pathogen concentration and the volume (or weight) of the contaminated inoculum supplied to the host. Therefore, a strong link exists between exposure assessment and the subsequent computation of the DR model. Reliable RA requires in-depth studies in different scientific fields, such as biomolecular microbiology for pathogenic agent identification and assays, medicine for understanding the pathogenesis of illness, and the health impact of infections, including host immunity and cell infection mechanistic processes. Most of this knowledge goes beyond the scope of environmental science and mathematics. The complexity involved may explain the frequent uncertainty in the results of RA ^[16,17]. Another concern is the identification of aggregation or non-aggregation of infectious pathogen particles in the dose delivered to the host. Gerba and Betancourt ^[18] explained the importance of viral aggregation for viral survival in wastewater. They demonstrated that most enteric viruses in polluted water samples appeared in aggregated forms, which increased their survival or resistance to environmental conditions and wastewater disinfection treatments. Therefore, studying how the aggregation of NoV can affect human health during environmental host exposure could lead to a more reliable estimate of the probability of infection. The uncertainty of exposure assessment, resulting from challenges in accurately predicting the number of infectious particles in the mean dose, coupled with the limited availability of specifically designed human challenge experiments in the literature, can lead to less reliable results of applied RA methods ^[14]. Teunis et al. ^[19] utilized various data sources, including outbreak data, to estimate the mean infection risk for a host exposed to a single dose of 1-NoV, resulting in values of 0.28 for NoV GI.1 and 0.076 for less infective NoV GII in nonimmune (Se+) subjects. Their research indicated that GII NoV is associated with more severe infections, despite GI NoV being preferred in human challenge experiments due to its higher infectivity.

In this study, we sought to establish a relationship between the minimum infectious dose (ID) and the ID having a 50% probability of initiating host infection in the same challenge experiment for estimating the coefficients of individual infection risks via conventional mechanistic DR models, specifically for host inoculation with NoV GI.1, or pooled Enterovirus group, Poliovirus 1/SM, or Echo-12 virus. To validate the proposed coefficients, we compared the solutions of our DR model with the results of human challenge trials conducted by Teunis et al. ^[20], Atmar ^[21], Lion ^[22], and Mateo ^[23].

2. Materials and methods

In this review of dose-infection challenge experiments, we conducted a thorough reassessment of existing DR models and coefficients for the probability of individual host infection caused by Norovirus (NoV) GI.1. Our analysis focused on establishing a suitable relationship between the mean infectious dose (MID) and the corresponding infectious dose (ID) with a 50% probability of initiating a host infection, known as ID₅₀.

Since 1967, numerous studies have been carried out to develop DR models for RA. The primary method of estimating the probability of infections has been based on the single-hit probability model (SHPM) initially proposed by Furumoto and Mickey ^[1]. Recently, Nilsen and Wyller ^[24] integrated SHPM into a stochastic framework. Subsequent methodological improvements ^[11,24] have recommended the utilization of functions complementary to the beta-Poisson probability distribution of SHPM for individual infections, such as the negative binomial (NB) or the gamma threshold probability distribution, among others ^[26]. These DR models consider variations in host-to-host susceptibility by combining single- or multi-hit Poisson probability and the conditional probability distribution of the minimum count of ingested infectious agent required to infect a host. However, the definition of the MID implies the presence of a threshold dose, which has not yet been comprehensively investigated in viral challenge experiments ^[17] involving NoV GI.1 inoculation ^[23]. For instance, Caul's experiment ^[27] yielded data indicating a MID for widespread aerosols containing infectious particles in the range of 10 to 100.

Furthermore, certain critical aspects of the SHPM theory, as discussed by van Abel et al. ^[28], Messner ^[29], and Schmidt ^[15], have not been thoroughly examined in specific DR experiments. These aspects include the effects of individual host susceptibilities to infections caused by the same transmitted pathogen. Moreover, van Abel et al. ^[28] observed that in some experiments, secretor Se– individuals were infected by NoV genotype GII.4; that is, they became susceptible to infection. Similarly, in a study conducted by Mateo et al. ^[23], one immune host exhibited severe gastroenteritis, similar to a Se-positive challenge host. Despite these findings, the literature lacks well-designed viral challenge experiments for accurately estimating the coefficients of DR models, even when considering methods that encompass all possible infectious pathogens.

To address these gaps, Rahman et al. ^[30] developed a mechanistic method for DR models to investigate foodborne host infection by Listeria monocytogenes. Their model describes the process of the host cell's resistance against infectious pathogens and considers the possibility that plasma in host cells could facilitate the release of antibodies to eliminate pathogens. Integrating such mechanistic alternative models into the single-hit theory can significantly improve health risk estimations, as they are based on the operating environmental conditions and clinical data of a single host during viral challenge infection experiments.

2.1. Conventional methods for infection probability estimation

In various RA methods, the α and β coefficients of the beta-Poisson DR model ^[16] are derived from past pathogen challenge experiments. These experiments were conducted by different researchers for various pathogens, such as HAV (Hepatitis A Virus) infection by Ward ^[31], RoV (Rotavirus) by Ward ^[32], Echovirus-11/12 by Shift ^[33], and Coxsackievirus (CV) and AdV (Adenovirus) by Couch et al. ^[34] (refer to Table 1). Recent experiments focusing on DR curves for NoV were conducted by Atmar et al. ^[20], Frenck et al. ^[35], and Seitz et al. ^[36]. Additionally, Teunis et al. ^[37] proposed a redefinition of the coefficients of a DR model for AdV (AdV4, AdV7, and AdV16) by grouping data from various types of dose-infection challenge experiments found in the literature and the results of infection tests on kidney pig cells.

Table 1. Coefficients of dose-response probability of infection models scaled to 1 g (or 1 ml) of the inoculum (i.e., contaminated food or water) volume size.

Dose-response model		α and β (or β₁ for the exponential model)
HAV	Exponential	1.8229	Haas and Eisenberg ^[39]
AdV	Exponential	2.397	Crabtree et al. ^[40]
NoV	Exponential	2.375	Sokolova et al. ^[41]
RoV	Approximate beta-Poisson	0.253 and 0.422	Teunis and Havelaar ^[14]
EV
Group	Approximate beta-Poisson	0.167 and 0.191	de Man et al. ^[42]
Echovirus-12	Approximate beta-Poisson	0.401 and 227.2	Teunis et al. ^[43]
	Exponential	78.3	McBride et al. ^[44]; Haas et al. ^[45]
	Approximate beta-Poisson
CV	Exponential	129	Mena et al. ^[46]

| Show Table

DownLoad: CSV

In contrast, Strachan et al. ^[38] combined infection data collected from numerous global outbreaks, using data from both animal and human cells, to define a DR probability model of infection for E. coli O157. They applied binomial and beta-binomial distributions in MLE to determine the α and β coefficients. Strachan et al. also proposed using DR challenge tests with surrogates of infectious pathogens, such as E. coli O157 and Sighella ^[38]. However, it is important to note that the availability of data from pathogen challenge experiments suitable for implementing comprehensive new DR models remains limited (refer to Table 1). Furthermore, using surrogate pathogens or animal cells in challenge experiments has resulted in probability-of-infection curves that significantly differ from those obtained using known DR model coefficients.

The approximate beta-Poisson model, as described in the literature, is expressed as follows.

$P = 1-{\left(1+\frac{ID}{\beta }\right)}^{-\alpha } .$

(1)

However, this formulation leads to an overestimation of the NoV GI.1 probability of infection at low infectious doses (ID) compared to the "exact" beta-Poisson infections model solution ^[15,20,28]:

$P = 1{-}_{1}{\mathrm{F}}_{1}\left(\alpha ;\alpha +\beta ;-ID\right) .$

(2)

The model coefficients α ( = 0.04) and β ( = 0.055) ^[20] were obtained from NoV GI.1 (8fIIa + 8fIIb) challenge experiments for disaggregated virions in the doses. However, Teunis et al. ^[43] suggested that the DR model (1) can provide an acceptable probability of infection approaching an exact solution (2) when reliable model coefficients are applied, although some uncertainty may be expected in the result at low infectious doses ^[28] of pathogens in volunteers (i.e., hosts).

2.2. New relationships in beta-Poisson DR models

The main theoretical implication of our proposed mechanistic method, as presented in this study, is to revolve around establishing relationships between ID/β and ID/ID₅₀ to enhances microbial risk assessments methods. The ID₅₀ in the approximate beta-Poisson models. can be calculated as [39, p. 163]

${ID}_{50} = \beta \left({2}^{1/\alpha }-1\right) .$

(3)

as derived from specific pathogen challenge experiments. Of note, in RA, the correct application of α and β model coefficients provided in existing DR models is achieved when ID, ID₅₀, and β are defined for the same challenge experiment. This ensure that the mean pathogen dose supplied to the host and the model coefficients applied in the DR model refer to the same infection event ^[11]. In the present work, we demonstrate that using relationships between ID, the coefficient β, and minimum infectious dose rather than ID₅₀ can lead to more reliable estimates for the probability of infection in RA, particularly for the transmission of enteric viruses to hosts. To establish this relationship, we collected ID₅₀ and MID data (Table 2) from various challenge experiments involving infections caused by host inoculation with different enteric viruses ^[47,48].

Table 2. MID and ID₅₀ values collected from challenge infection experiments on enteric viruses.

MID	ID₅₀	Pathogen	Source
1	1.26	HAV	Ward et al. ^[31]
1	6.17	RoV	Graham et al. ^[49]; Teunis and Havelaar ^[14]
0.83	1.66	AdV	Couch et al. ^[34]
		EV
1	2	Poliovirus 1/SM	Schiff et al. ^[33]
17	78.3	Echo-12	Schiff et al. ^[33,51]
30	69.1	Coxsackie (CV) B4-A21	Health Canada ^[50]; Mena et al. ^[46]

| Show Table

DownLoad: CSV

Table 2 presents the collected MID and ID₅₀ values from challenge infection experiments involving enteric viruses. The best fit (R² = 0.91) of the MID vs. ID₅₀ values provided the relationship shown in Figure 1 on a semi-log plane in combination with the uncertainty intervals. Microsoft Excel was utilized to derive the following regression equations:

$MID = {c}_{1}\cdot \mathit{log}\left({ID}_{50}\right)-{c}_{2},$

(3a)

$\mathit{log}\left({ID}_{50}\right) = \frac{1}{{c}_{1}}\left(MID+{c}_{2}\right) ,$

(3b)

where c₁ = 2 and c₂ = 6 are the best-fit coefficients.

Figure 1. Best-fit relationship between MID-ID₅₀ values from the collected challenge-controlled experiments and uncertainty intervals of estimations.

DownLoad: Full-Size Img PowerPoint

Our proposed mechanistic method favors simple physically-based relationships between MID and ID₅₀, rather than complex equations obtained through advanced best-fit methods and the Akaike Information Criterion (second order) ^[52]. To further enhance these relationships, additional ID₅₀/MID data from enteric virus challenge trials could be included, potentially leading to the definition of novel coefficients for beta-Poisson infection probability models derived from well-designed challenge studies.

In this study, we propose new models and coefficients to reduce overestimation and uncertainties in microbial risk assessments. By imposing an ID₅₀ of 18, as estimated by Teunis et al. ^[20] from a NoV GI.1 challenge experiment on volunteers inoculated with the strain NoV GI.1 8fIIb, we calculated the MID of 15.3 ± 3 for NoV GI.1 using Eq (3b) (refer to Figure 1). We defined β_new as equal to MID of 15.3 (i.e., ≫ 1) in 100 ml (or 1.5, scaled to 1 ml of inoculated volume). Subsequently, we calculated the corresponding model coefficient α_new as 0.89 (i.e., ≪ 15.3) by inverting the known relationship (3) as follows:

${\alpha }_{new} = \frac{1}{{log}_{2}\left(\frac{{ID}_{50}}{{\beta }_{new}}+1\right)}$

(4)

3. Results

3.1. Disaggregated NoV GI.1 inoculation DR models

The proposed model's coefficients were validated through comparison, as depicted in Figure 2. We considered a DR model (1) with an exact beta-Poisson solution (2) to determine the probability of infection based on α (0.04) and β (0.055) coefficients proposed by Teunis et al. to predict the probability of infections caused by disaggregate NoV GI.1 (8fIIa + 8fIIb). The exact beta-Poisson solution was obtained using Microsoft Excel, employing the Kummer confluent hypergeometric function ^[53], resulting in the following expression:

${}_{1}{F}_{1} = \sum _{n = 0}^{\infty }\frac{{\left(\alpha \right)}_{n}{\left(-ID\right)}^{n}}{{\left(\alpha +\beta \right)}_{n}n!},$

(5)

where α_n = α (α+1)⋅(α+2)⋅…⋅(α+n-1), and similarly, (α+β)_n = (α+β)⋅(α+β+1)⋅…⋅(α+β+n-1). The value of n represents the number of terms considered in the series, which is set to 15 in the calculations. The integral form of the beta-Poisson model probability of infection is given by ^[11]

$P = 1-{\int }_{0}^{1}\left(\frac{\Gamma \left(\mathrm{\alpha }+\mathrm{\beta }\right)}{\Gamma \left(\mathrm{\alpha }\right)\cdot \Gamma \left(\mathrm{\beta }\right)}\cdot {r}^{\alpha -1}\cdot {\left(1-r\right)}^{\beta -1}\right)\cdot {e}^{-r\cdot ID}dr,$

(6)

where "r" is the single-hit value of the beta-probability of infection (i.e., host susceptibility) and Γ() represents the gamma function.

Figure 2. Probability of infection for the ingestion of NoV GI.1 8fIIb for ID < 10 (disaggregated in 1 ml of inoculum volume): i) the exact (1−₁F₁ (α, α + β, −ID)) solution provided by Teunis et al. using coefficients from cumulative infections in a given challenge for strains (8fIIa + 8fIIb), ii) the approximated beta-Poisson model using new coefficients (α = 0.89 and β = 1.53), and iii) the exponential model using the proposed coefficient (β₁ = 2.597), which approaches the exact beta-Poisson solution given by Eq (2).

DownLoad: Full-Size Img PowerPoint

Additionally, Figure 2 presents the infection probability obtained by the exponential model using the following expression:

$P = 1-{e}^{-\frac{ID}{{\beta }_{1}}} ,$

(7)

where β₁ is the proposed coefficient with a value of 2.597. β1 = 2.597 improves the approximate beta-Poisson solution (Eq 1), closely aligning with the exact beta-Poisson infection probabilities in the considered challenge experiment at low doses. This enhances prediction accuracy and model reliability.

Figure 3 demonstrates that the exact beta-Poisson (2) obtained using the values (0.89; 1.53) for the proposed coefficients also fits well with the infection probabilities obtained via maximum likelihood estimation (MLE) (Table 3) for NoV GI.1 8fIIb in the challenge experiment conducted by Teunis et al.

Figure 3. Exact solutions of single-hit individual infection probabilities from GI.1 NoV 8fIIb challenge experiments, considering the 56% host nonimmune fraction, using the values provided in the literature with α = 0.04 and β = 0.055 ^[20] (dashed dot line), and new coefficients α = 0.89 and β = 1.53 and, using Eqs (2) (solid line) and (6) (dashed line), respectively; and MLE results from Teunis et al. ^[20] (green-square dots) with ± 95% interval (green-dot lines) and infection data from Atmar et al. ^[21] (grey-triangle dots), Leon ^[22] (violet-circle dots), and Mateo ^[23] (red-rhomb dots).

DownLoad: Full-Size Img PowerPoint

Table 3. Infections/doses from the MLE of NoV GI.1 8fIIb challenge trial ^[20].

ID	0.01	0.1	0.3	0.5	0.8	1	3	10	18	100	1000
P	0.01	0.03	0.1	0.15	0.2	0.28	0.44	0.48	0.5	0.54	0.56

| Show Table

DownLoad: CSV

The model probabilities (see Figure 3) are compared with MLE results from Teunis et al. ^[20] challenge experiment and infection data from Atmar et al. ^[21], Leon ^[22], and Mateo ^[23], all adjusted for the 56% nonimmune host fraction.

The risk infection curves shown in Figure 3 confirm the appropriateness of the proposed mechanistic coefficients in fitting the exact beta-Poisson probabilities of infection (6) and approaching the MLE curve from the challenge test conducted by Teunis et al. ^[20], as well as the infection data from the NoV GI.1 challenge trials conducted by Atmar et al. ^[21], Lion ^[22], and Mateo ^[23] considering the 56% nonimmune host fraction.

The exact beta-Poisson solutions given by Eqs (2) and (6) were calculated using Microsoft Excel and MATHCAD (https://www.mathcad.com), respectively. The DR models in Figure 3 provide a mean infection risk of 0.285 ( = 0.16/0.56) for Se+ host secretors exposed to only 1-NoV GI.1, which is very close to the infection risk of 0.28 recently estimated by Teunis et al. ^[19].

3.1.1. "Aggregate" NoV GI.1 inoculation DR models

In this section, we present three theoretical formulations of the DR model of infection from the literature, predicting the infection risk due to norovirus aggregation in doses supplied to volunteers.

The underlying experimental work by Teunis et al. aimed to explain how the aggregation ^[17,45] of noroviruses might affect the individual probability of infection. The infection probability for nonimmune hosts exposed to aggregated NoV virions can be expressed as ^[28]

${P}_{i} = \left[1-{\theta \cdot }_{2}{F}_{1}\left(\beta , {\bf b};\alpha +\beta ;\boldsymbol{a}\right)\right]$

(8)

This probability depends, via the Gauss hypergeometric function ₂F₁(), on the degree of virion aggregation (or percentage) "a" in each aggregate present in the inoculated dose given to volunteers. Various challenge experiments in the literature have suggested a "log-series" probability distribution of "a" among aggregations (Poisson distributed) of inocula with a corresponding mean size (i.e., number of NoV virions) µ in the mean percentages of "a." Note that ϑ and b in Eq (8) represent transformation variables of the mean aggregate infectious dose (id).

In this study, we have revised the "beta-binomial" probability of the infection model and applied the Euler transformation ^[53] to Eq (8) to obtain the following:

${P}_{i} = \left(1-\varphi \right)\left\{1-\left[\vartheta \frac{\Gamma \left(\mathrm{c}\right)}{\Gamma \left(\mathrm{b}\right)\Gamma (\mathrm{c}-\mathrm{b})}{\int }_{0}^{1}{r}^{\left(b-1\right)}\cdot {\left(1-r\right)}^{\left(c-b-1\right)}\left({1-r\cdot a}^{-\beta }\right)dr\right]\right\},$

(9)

where

$\vartheta = {e}^{-\frac{\stackrel{-}{id}}{a\mu }}, \;\; \;\; \;\;\mu = \frac{-a}{\left(1-a\right)\cdot log\left(1-a\right)},$

(9a)

$\mathrm{b} = \frac{\stackrel{-}{id}\cdot \left(1-a\right)}{a}, \;\; \;\;c = \alpha +\beta ,$

(9b)

In Eq (9), the term (1 - φ) denotes the fraction of individuals who are fully susceptible (i.e., r = 1) to NoV infection caused by the aggregate dose id, whereas $\stackrel{-}{id} = -\mathrm{log}\left[P\left(id\right)/(1-P\left(id\right)\right]$ is the log transformation variable of the mean aggregate infectious dose of NoV, where P is the beta-Poisson probability given by Eq (6) for nonimmune hosts.

It is important to note that Eq (9) represents the "single hit" beta-Poisson probability of infection only for the extreme cases of a = 1, or a = 0. Thus, the probability of infection caused by aggregated NoV GI.1 combines the beta (continuous) probability distribution of the host-to-host susceptibility, to the negative binomial probability of infection from "clumped" virions within each dose ^[11].

Teunis et al. ^[20] estimated the model coefficients (α = 5.35⋅10^-3 and β = 2.51⋅10^-3) using the MLE approach based on experimental infection probabilities from aggregated NoV 8fIIa human challenge trial. In this study, we propose different coefficients (α = 0.89 and β = 1.53) for NoV GI.1. Additionally, Messner et al. ^[29] reconsidered the Eq (8) above from the Teunis model and proposed the fractional Poisson (FP) probability distribution, which can be expressed as

${P}_{i} = P(1-{\rm{ \mathsf{ φ} }})\cdot (1-{e}^{-\frac{id}{\mu }})$

(10)

where P represents the corresponding beta-Poisson single-hit probability of infection at the same infectious dose. This equation is based on the Bernoulli probability distribution, where r can take values 1 or 0, and it provides an alternative simplified solution to the exact beta-binomial probability form given in Eqs (8) and (9) by quantifying a constant immunity host fraction.

Following Schmidt ^[15], the integral formulation of the single hit of "adapted" beta-Poisson (ABP) conditional probability of infections from aggregated NoV can be obtained by reconsidering the same challenge dataset used by Messner et al. ^[29]. This single-hit DR model assumes that all ingested aggregate virions are completely disengaged in the host. Therefore, the ABP model accounts for the beta distribution of the host-to-host susceptibility caused by "clumped" ingested virions. The aggregates of NoV are modeled with a Poisson distribution into every single dose supplied to the host, and since virions in every aggregate follow a log-distributed pattern ^[20], the resulting virion distribution in the administered mean dose $\stackrel{-}{id}$ follows a negative binomial probability distribution. The ABP probability of infection model, considering a constant host immunity fraction $\varphi$ , can then be expressed as ^[15].

${P}_{i} = \left(1-\varphi \right)\left\{1-{\int }_{0}^{1}\left(\frac{\Gamma \left(\alpha +\beta \right)}{\Gamma \left(\alpha \right)\cdot \Gamma \left(\beta \right)}\cdot {r}^{\alpha -1}\cdot {\left(1-r\right)}^{\beta -1}\right)\cdot \left[1-{\left(1-r\right)}^{\stackrel{-}{id}}\right]\right\}dr.$

(11)

In this study, we purpose $\stackrel{-}{id}$ estimations via the extended negative binomial (ENB) (Pòlya) probability distribution (www.vosesoftware.com/riskwiki) of the aggregate virus count in the infectious dose, as given in the following equation.

$\stackrel{-}{id} = \frac{1}{\left[\frac{\Gamma \left({\bf a}+z\right)}{\Gamma \left({\bf a}\right)\cdot \Gamma \left({z}_{i}\right)}\cdot {{P}_{i}\left(id\right)}^{{\bf a}}\cdot {\left(1-{P}_{i}\left(id\right)\right)}^{z}\right]} .$

(11a)

In the above Eq (11a), the mean of the ENB distribution is defined as

$z = -\frac{\lambda }{\left[\mu \cdot log\left(1-{\bf a}\right)\right]}$

(11b)

where λ (the ENB parameter) is the count of the mean ingested aggregates in the dose, μ is the mean size of the aggregates as in Eq (9a), and z_i is the initial value of $\stackrel{-}{id}$ in the ENB distribution.

3.1.2. **Validation of proposed aggregated Nov GI.1 DR models**

Figure 4 illustrates the results of a numerical validation of the proposed infection probability models (9) (solid line) and (11) for aggregated NoV infections. The validation was conducted using the MLE of infection counts provided by Teunis et al. ^[20] from a challenge experiment involving aggregate NoV GI.1 8fIIa infections. To ensure accurate representation, we set the product μ × a equal to 400 virions (i.e., μ = 400 virions and, a ≅ 1) based on Teunis et al. findings. This approach allowed us to depict the proposed probabilities using Eqs (9) and (11) in Figure 4(a), (b), respectively, for dose sizes of aggregated NoV GI.1 8fIIa id < 1 and id > 1. In specific terms, for Eq (9), we fixed a = 0.07 (i.e., 7%) and mean μ = 1 for one aggregate dose id of 5,714 virions to obtain μ × a = 400 virions, as seen in Teunis et al. NoV 8fIIa challenge experiment. Using the gamma-binomial distribution, we derived the probability (9), which accurately fitted the MLE of infections from the aggregated NoV 8fIIa challenge trial by setting a = 0.07, immunity host fraction φ = 37%, α = 0.89, and β = 1.53. The predicted mean infectious dose from Eq (9a) was 1.037, corresponding to an id of 5,927 virions and μ × a equal to 415 virions, showing a minimal computational discrepancy of 4% for 400 virions compared to Teunis et al.'s challenge study.

Figure 4. DR model of infection probabilities caused by aggregated NoV GI.1 8fIIa assuming α = 0.89 and β = 1.53 and a host immune fraction φ of 37%, using Eq (9) (solid line), (11) (dashed and dot line) and (14) dashed line); a) id < 1; and b) id > 1, with 95% confidence intervals (dotted lines).

DownLoad: Full-Size Img PowerPoint

In this study, we applied the DR model (11) by setting the degree of aggregation a to 0.2, i.e., one aggregate id = 1980.2 virions and the mean aggregation size μ = 1.122 (from Eq (9a)). Furthermore, based on the information provided, z_i = 13.5, and subsequently μ × a = 449 virions (showing an 11% discrepancy concerning the input data). Finally, we set the mean count of ingested aggregate doses, λ, to 4.

The infection probability (9) aligned well with the infection data from the experiment conducted by Teunis et al. However, the ABP model (11) well approached the probability of infection caused by NoV GI.1 8fIIa, as calculated by Teunis et al. using MLE, for id > 2. Additionally, it is worth noting that λ ( = 4) and a ( = 0.2) were not independent parameters in the latter model. To establish the relationship (12) between these parameters, we initiated with the log transformation of the mean aggregate dose in the Teunis et al. dataset (refer to Eqs (9a) and (9b)). By using the single-hit beta-Poisson probability of infection P as given by Eq (6) and considering a dose ratio of λ/(0.9⋅id₅₀) = 1.9, we approximated a 45% infection probability of nonimmune hosts when id₅₀ was at 2.34 (refer to Figure 2).

$\boldsymbol{a} = -log\left[\frac{P\left(\frac{\lambda }{0.9{\cdot id}_{50}}\right)}{1-P\left(\frac{\lambda }{0.9{\cdot id}_{50}}\right)}\right]$

(12)

The relationship presented in (12) establishes a link between the degree of virion aggregation in doses, a, and the mean count of the aggregates in the doses, λ. This relationship is particularly valuable in practical applications of Eqs (9) and (11) when the percentage of NoV aggregation was not measured at every dose during the challenge trial. By using Eq (12), the uncertainty in MRA can be reduced, addressing the indeterminacy in measurements of a.

In Figure 4, we observe that at low doses (id < 0.8), the DR model (9) provides a better fit to Teunis et al. MLE of infection data compared to the ABP model (11). The ABP model considers overdispersion due to the conditional probability, specifically the negative binomial distribution ^[25], assuming that aggregated virions disengaged in the host. These overestimations of the probability can be partially mitigated (for id > 0.3) by combining the DR model (11) with the probability distribution that accounts for the probability that clumped virions may not completely disengage after the host's challenge. This combination allows us to derive the following expression:

${P}_{{i}_{rev}} = {P}_{i}\times {P\left(id\right)}^{-\mathit{a}} \;\;\;\;for\;\;id > 0.3$

(13)

where a = 0.2, P_i is the probability given by (11), and P is the single-hit beta-Poisson probability derived from Eq (6).

Finally, the probabilities of infection from NoV 8fIIa, represented by the DR models shown in Figure 4, fall within 95% confidence of the uncertainty interval of the MLE infection probabilities obtained from the challenge trial conducted by Teunis et al.

3.2. Models of risk infection from disaggregated EV

Contaminated water or food may contain multiple enteroviruses, including Poliovirus 1/SM, Echo-12, and CV, simultaneously. In such cases, the probability of infection by pooled EV due to the ingestion of contaminated drinking water, for instance, can be defined as follows:

${P}_{i} = 1-\prod _{i}^{nv}\left(1-{P}_{i, v}\right)$

(14)

where Pi, v represents the probability of infection by the specific enterovirus, and nv is the total number of EVs detected in the water. Accurate measurement of the specific volume of the inoculum for the infectious dose ^[15] of each EV is essential during exposure assessment.

Alternatively, some studies have used the approximate beta-Poisson model with optimized coefficients, α = 0.167 and β = 0.191 (Table 1) for the pooled EV group infectious doses ^[42]. This DR-pooled probability of infection, which defines coefficients similar to those by Teunis et al. ^[43] for Poliovirus 1 LSc2ab (α = 0.114 and β = 0.159) in the literature, may lead to significant overestimations of the probability of EV infection at low doses (Figure 5).

Figure 5. Exponential and approximate beta-Poisson models of infections (inocula of 1 ml) from the EV group, Echo-12, Poliovirus 1 LSc2ab, and Poliovirus 1/SM using revised model coefficients provided in the literature (refer to Table 1) and the approximate beta-Poisson (2) or exponential (1) model.

DownLoad: Full-Size Img PowerPoint

Therefore, in cases where challenge trial studies are not available for MRA, we propose β_new = 3.82, which is obtained by downscaling the literature coefficient 0.191 defined by de Man et al. ^[42] for the probability of infection from EV, i.e., β_new = 0.191 × 20. This shift in the EV group probability curve to the left results in a higher infection probability than the sum of the probabilities given by each single virus (refer to Figure 5).

Moreover, the exponential DR model with the coefficient defined by McBride et al. ^[44] and Haas et al. ^[45] (refer to Table 1) was used to determine the Echo-12 infection probability trend in Figure 5, whereas for the exponential probability of infection from PV 1/SM, we propose a coefficient of 112.73, i.e.,

${\beta }_{1} = \frac{\left({ID}_{50}+MID\right)}{{ID}_{50}}\cdot 40 ,$

(15)

where ID₅₀ is 2 (refer to Table 2) and MID is 1.1 ^[34], whereas 40 is the value of the proposed downscaling coefficient.

Finally, we suggest setting the value of β_new to 17 for calculating the probability of infection by the Echo-12 virus using the approximate beta-Poisson model. To arrive at this value, we matched the model coefficient with the MID value presented in Table 2. Additionally, we obtained the α_new value from Eq (4) by setting ID₅₀ to 78.3, as determined by McBride et al. ^[44] and Haas ^[45].

4. Discussion

We presented suitable probability models to estimate the risk of infection from disaggregated or aggregate NoV GI.1 in doses. We showed that our proposed approach compared favorably with the MLE obtained from results of the challenge trial conducted by Teunis et al. ^[20], which involved volunteers as nonimmune hosts. The model coefficients, α = 0.89 and β = 1.53, were determined using a novel ID₅₀ = f(MID) relationship derived from the best fit of MID and ID₅₀ values collected from several challenge experiments involving enteric viruses. Our results showed that the proposed relationship is effective in estimating the exact beta-Poisson probability of NoV virion infection. Importantly, the coefficients of the norovirus model were derived from ID₅₀ values estimated by Teunis et al. ^[20] in their challenge experiments involving disaggregated NoV GI.1 8fIIb. Applying the new model coefficients yielded an ID₅₀ of 23, deviating from the ID₅₀ of 18 suggested by Teunis et al. This result was consistent with the findings of van Abel et al. ^[28], and it approached the ID₅₀ of 26 suggested by Teunis et al. ^[20] for their aggregated challenge experiment with NoV GI.1, using inocula of strain NoV GI.1 8fIIa supplied to volunteers.

Nevertheless, experimental determination of MID values of human viruses is challenging due to the specific operative conditions of each challenge infection experiment, leading to significant uncertainty in the estimations. However, successful comparison of the predicted infection curves using the proposed coefficients for norovirus indicates the reliability of the MID/ID50 relationship. However, it is essential to acknowledge the limitations in applying DR models in this study due to data scarcity and the need for further research from challenge infection experiments. Further research should not only focus on virus exposure but also combine experiment data to validate the accuracy of the proposed formulas.

For infectious dose sizes < 1 virion, the proposed revised exact beta-Poisson probabilities provided the same values as those determined by Teunis et al. ^[20] for disaggregated NoV (8fIIa + 8fIIb) infections, using coefficients α = 0.04 and β = 0.055. However, with the latter coefficients, deviations in the solution were evident for doses > 1 due to the numerical degeneration of the series values given by Eq (5), and the integral (6) did not converge. Similarly, Eq (5) degenerated when using α = 0.631 and β = 6.5⋅10⁵ provided by Teunis et al. for infection probability, owing to host exposure to disaggregated NoV GI.1 8fIIb in doses. On the other hand, Eq (6) performed well when utilizing the proposed values, α = 0.89 and β = 1.53. Practical examples illustrating the usefulness of the proposed dose-infection models can be found in QMRAs applied at the large-scale population level ^[3]. These results impact groundwater management and policy-making decisions ^[54] concerning drinking water supplies and crop irrigations using reclaimed water.

5. Conclusions

In this study, we presented probability models for estimating health risks of infection from disaggregated or aggregated NoV GI.1 virions at varying doses. These models serve as valuable tools for quantifying microbial health risk assessments during outbreaks, guiding public health policies for disease prevention and control. By providing reliable models and coefficients, the uncertainty and cost of management actions in health risk assessments can be reduced. Our updated infection models were positively compared with the results of the MLE method adopted by Teunis et al. ^[20] in a challenge trial performed on volunteers with a nonimmune host fraction. Furthermore, we proposed a relationship (12) between the degree of virion aggregation in every single dose "a" and the mean count of the aggregates in infectious doses. This relationship simplifies the practical application of DR models (9) and (11) in risk assessments by reducing the uncertainty stemming form the indetermination in the measurement of a.

Enteric viruses are harmful pathogens associated with numerous outbreaks of waterborne disease in humans. They can typically be isolated from water or food directly or indirectly contaminated by fecal waste, and their ability to survive in flowing groundwater poses significant health risks, especially after flood-runoff groundwater infiltration. Therefore, after floods, it is crucial to conduct accurate assessments of the health risk associate with the quality of water supplied by public water systems. In this study, we proposed a new mechanistic approach to reliably estimate the coefficients of DR models for predicting the risk of infection with human enteric viruses.

Determination of MID from human viruses experimentally is highly uncertain, as each MID estimation depends on the specific operating conditions of a particular challenge infection experiment. Although well-designed human virus challenge experiments would enhance research on health risk assessment, they are often not available for practical risk assessments. These limitations contribute to uncertainty in the risk assessment results for models derived from the single-hit probability theory. Thus, complementary investigations are needed to support those based on the immunity theory of individual secretors (Se) currently applied. Further research is necessary to understand the heterogeneous behavior of individual host-to-host susceptibility to the same dose of supplied pathogens and to elucidate the specific virus-cell mechanisms responsible for infectivity and pathogenesis. For instance, Rahman et al. developed a mechanistic DR model to investigate foodborne host infections caused by Listeria monocytogenes, offering an alternative approach to the single-hit theory that may reduce uncertainties in estimating infection risk by incorporating both the operating environmental conditions and clinical data of every host during a given challenge trial.

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

The author would like to acknowledge the support provided by National Research Fund FOE 2020 for the CNR project "Capitale naturale e risorse per il futuro dell' Italia" (B55F21001990001).

Conflict of interest

The author declares that there is no conflict of interest.

References

[1]	Y. Zhang, Artificial intelligence for bioinformatics and biomedicine, Curr. Bioinf., 15 (2020), 801–802. https://doi.org/10.2174/157489361508201221092330 doi: 10.2174/157489361508201221092330
[2]	B. Jena, S. Saxena, G. K. Nayak, L. Saba, N. Sharma, J. S. Suri, Artificial intelligence-based hybrid deep learning models for image classification: The first narrative review, Comput. Biol. Med., 137 (2021), 104803. https://doi.org/10.1016/j.compbiomed.2021.104803 doi: 10.1016/j.compbiomed.2021.104803
[3]	H. Lin, Development and application of artificial intelligence methods in biological and medical data, Curr. Bioinf., 15 (2020), 515–516. https://doi.org/10.2174/157489361506200610112345 doi: 10.2174/157489361506200610112345
[4]	R. C. Andrade, M. Boroni, M. K. Amazonas, F. R. Vargas, New drug candidates for osteosarcoma: Drug repurposing based on gene expression signature, Comput. Biol. Med., 134 (2021), 104470. https://doi.org/10.1016/j.compbiomed.2021.104470 doi: 10.1016/j.compbiomed.2021.104470
[5]	J. Wang, Y. Shi, X. Wang, H. Chang, A drug target interaction prediction based on LINE-RF learning, Curr. Bioinf., 15 (2020), 750–757. https://doi.org/10.2174/1574893615666191227092453 doi: 10.2174/1574893615666191227092453
[6]	M. Aslam, M. Shehroz, F. Ali, A. Zia, S. Pervaiz, M. Shah, et al., Chlamydia trachomatis core genome data mining for promising novel drug targets and chimeric vaccine candidates identification, Comput. Biol. Med., 136 (2021), 104701. https://doi.org/10.1016/j.compbiomed.2021.104701 doi: 10.1016/j.compbiomed.2021.104701
[7]	J. Yan, J. Huang, C. Zhang, H. Huo, F. Chen, Virtual screening of acetylcholinesterase inhibitors based on machine learning combined with molecule docking methods, Curr. Bioinf., 16 (2021), 963–971. https://doi.org/10.2174/1574893615999200719234045 doi: 10.2174/1574893615999200719234045
[8]	F. F. Ahmed, M. Khatun, M. Mosharaf, M. N. Mollah, Prediction of protein-protein interactions in Arabidopsis thaliana using partial training samples in a machine learning framework, Curr. Bioinf., 16 (2021), 865–879. https://doi.org/10.2174/1574893616666210204145254 doi: 10.2174/1574893616666210204145254
[9]	D. P. Boso, D. D. Mascolo, R. Santagiuliana, P. Decuzzi, B. A. Schrefler, Drug delivery: Experiments, mathematical modelling and machine learning, Comput. Biol. Med., 123 (2020), 103820. https://doi.org/10.1016/j.compbiomed.2020.103820 doi: 10.1016/j.compbiomed.2020.103820
[10]	Y. Ding, J. Tang, F. Guo, Q. Zou, Identification of drug-target interactions via multiple kernel-based triple collaborative matrix factorization, Briefings Bioinf., 23 (2022). https://doi.org/10.1093/bib/bbab582 doi: 10.1093/bib/bbab582
[11]	R. Su, X. Liu, L. Wei, Q. Zou, Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response, Methods, 166 (2019), 91–102. https://doi.org/10.1016/j.ymeth.2019.02.009 doi: 10.1016/j.ymeth.2019.02.009
[12]	Q. Bai, S. Liu, Y. Tian, T. Xu, A. J. Banegas-Luna, H. Pérez-Sánchez, Application advances of deep learning methods for de novo drug design and molecular dynamics simulation, Wiley Interdiscip. Rev.: Comput. Mol. Sci., 12 (2022), e1581. https://doi.org/10.1002/wcms.1581 doi: 10.1002/wcms.1581
[13]	Q. Bai, S. Tan, T. Xu, H. Liu, J. Huang, X. Yao, MolAICal: A soft tool for 3D drug design of protein targets by artificial intelligence and classical algorithm, Briefings Bioinf., 22 (2021). https://doi.org/10.1093/bib/bbaa161 doi: 10.1093/bib/bbaa161
[14]	J. Li, A. Fu, L. Zhang, An overview of scoring functions used for protein-ligand interactions in molecular docking, Interdiscip. Sci.: Comput. Life Sci., 11 (2019), 320–328. https://doi.org/10.1007/s12539-019-00327-w doi: 10.1007/s12539-019-00327-w
[15]	Y. Ding, J. Tang, F. Guo, Protein crystallization identification via fuzzy model on linear neighborhood representation, IEEE/ACM Trans. Comput. Biol. Bioinf., 18 (2019), 1986–1995. https://doi.org/10.1109/TCBB.2019.2954826 doi: 10.1109/TCBB.2019.2954826
[16]	Y. Ding, J. Tang, F. Guo, Human protein subcellular localization identification via fuzzy model on kernelized neighborhood representation, Appl. Soft Comput., 96 (2020), 106596. https://doi.org/10.1016/j.asoc.2020.106596 doi: 10.1016/j.asoc.2020.106596
[17]	T. Nguyen, H. Le, T. P. Quinn, T. Nguyen, T. D. Le, S. Venkatesh, GraphDTA: Predicting drug-target binding affinity with graph neural networks, Bioinformatics, 37 (2021), 1140–1147. https://doi.org/10.1093/bioinformatics/btaa921 doi: 10.1093/bioinformatics/btaa921
[18]	M. Jiang, Z. Li, S. Zhang, S. Wang, X. Wang, Q. Yuan, et al., Drug-target affinity prediction using graph neural network and contact maps, RSC Adv., 10 (2020), 20701–20712. https://doi.org/10.1039/D0RA02297G doi: 10.1039/D0RA02297G
[19]	T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, preprint, arXiv: 1609.02907.
[20]	P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, Graph attention networks, preprint, arXiv: 1710.10903.
[21]	M. I. Davis, J. P. Hunt, S. Herrgard, P. Ciceri, L. M. Wodicka, G. Pallares, et al., Comprehensive analysis of kinase inhibitor selectivity, Nat. Biotechnol., 29 (2011), 1046–1051. https://doi.org/10.1038/nbt.1990 doi: 10.1038/nbt.1990
[22]	R. Wang, X. Fang, Y. Lu, S. Wang, The PDBbind database: Collection of binding affinities for protein-ligand complexes with known three-dimensional structures, J. Med. Chem., 47 (2004), 2977–2980. https://doi.org/10.1021/jm030580l doi: 10.1021/jm030580l
[23]	R. Wang, X. Fang, Y. Lu, Y. C. Yang, S. Wang, The PDBbind database: Methodologies and updates, J. Med. Chem., 48 (2005), 4111–4119. https://doi.org/10.1021/jm048957q doi: 10.1021/jm048957q
[24]	D. Weininger, SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules, J. Chem. Inf. Comput. Sci., 28 (1988), 31–36. https://doi.org/10.1021/ci00057a005 doi: 10.1021/ci00057a005
[25]	M. Michel, D. Menéndez Hurtado, A. Elofsson, PconsC4: Fast, accurate and hassle-free contact predictions, Bioinformatics, 35 (2019), 2677–2679. https://doi.org/10.1093/bioinformatics/bty1036 doi: 10.1093/bioinformatics/bty1036
[26]	Q. Wu, Z. Peng, I. Anishchenko, Q. Cong, D. Baker, J. Yang, Protein contact prediction using metagenome sequence data and residual neural networks, Bioinformatics, 36 (2020), 41–48. https://doi.org/10.1093/bioinformatics/btz477 doi: 10.1093/bioinformatics/btz477
[27]	J. C. Jeong, X. Lin, X. W. Chen, On position-specific scoring matrix for protein function prediction, IEEE/ACM Trans. Comput. Biol. Bioinf., 8 (2010), 308–315. https://doi.org/10.1109/TCBB.2010.93 doi: 10.1109/TCBB.2010.93
[28]	Y. Ding, P. Tiwari, Q. Zou, F. Guo, H. M. Pandey, C-loss based higher-order fuzzy inference systems for identifying DNA N4-methylcytosine sites, IEEE Trans. Fuzzy Syst., 2022 (2022). https://doi.org/10.1109/TFUZZ.2022.3159103 doi: 10.1109/TFUZZ.2022.3159103
[29]	X. Hu, L. Chu, J. Pei, W. Liu, J. Bian, Model complexity of deep learning: A survey, Knowl. Inf. Syst., 63 (2021), 2585–2619. https://doi.org/10.1007/s10115-021-01605-0 doi: 10.1007/s10115-021-01605-0
[30]	Q. Li, Z. Han, X. M. Wu, Deeper insights into graph convolutional networks for semi-supervised learning, in Thirty-Second AAAI conference on artificial intelligence, AAAI, New Orleans, USA, (2018), 3538–3545. https://doi.org/10.1609/aaai.v32i1.11604
[31]	G. Taubin, A signal processing approach to fair surface design, in Proceedings of the 22nd annual conference on Computer graphics and interactive techniques, ACM, (1995), 351–358. https://doi.org/10.1145/218380.218473
[32]	Y. Ding, W. He, J. Tang, Q. Zou, F. Guo, Laplacian regularized sparse representation based classifier for identifying DNA N4-methylcytosine Sites via L2, 1/2-matrix Norm, IEEE/ACM Trans. Comput. Biol. Bioinf., 2021 (2021). https://doi.org/10.1109/TCBB.2021.3133309 doi: 10.1109/TCBB.2021.3133309
[33]	Y. Ding, J. Tang, F. Guo, Identification of drug-target interactions via dual laplacian regularized least squares with multiple kernel fusion, Knowledge-Based Syst., 204 (2020), 106254. https://doi.org/10.1016/j.knosys.2020.106254 doi: 10.1016/j.knosys.2020.106254
[34]	P. Tiwari, S. Dehdashti, A. K. Obeid, P. Marttinen, P. Bruza, Kernel method based on non-linear coherent states in quantum feature space, J. Phys. A: Math. Theor., 55 (2022), 355301. https://doi.org/10.1088/1751-8121/ac818e doi: 10.1088/1751-8121/ac818e
[35]	J. Klicpera, S. Weißenberger, S. Günnemann, Diffusion improves graph learning, preprint, arXiv: 1911.05485.
[36]	L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank citation ranking: Bringing order to the web, Stanford InfoLab., 1999 (1999).
[37]	F.Wu, A. Souza, T. Zhang, C. Fifty, T. Yu, K. Weinberger, Simplifying graph convolutional networks, in International conference on machine learning, PMLR, 97 (2019), 6861–6871. https://doi.org/10.48550/arXiv.902.07153
[38]	H. Zhu, P. Koniusz, Simple spectral graph convolution, in International Conference on Learning Representations, (2020).
[39]	F. Fouss, K. Francoisse, L.Yen, A. Pirotte, M. Saerens, An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification, Neural networks, 31 (2012), 53–72. https://doi.org/10.1016/j.neunet.2012.03.001 doi: 10.1016/j.neunet.2012.03.001
[40]	A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, Pytorch: An imperative style, high-performance deep learning library, in Advances in neural information processing systems, 32 (2019).
[41]	M. Fey, J. E. Lenssen, Fast graph representation learning with PyTorch Geometric, preprint, arXiv: 1903.02428.
[42]	C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, et al., Weisfeiler and leman go neural: Higher-order graph neural networks, in Proceedings of the AAAI conference on artificial intelligence, AAAI, Honolulu, USA, 33 (2019), 4602–4609. https://doi.org/10.1609/aaai.v33i01.33014602
[43]	W. Hamilton, Z. Ying, J. Leskovec, Inductive representation learning on large graphs, in Advances in neural information processing systems, 30 (2017).
[44]	D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, et al., Convolutional networks on graphs for learning molecular fingerprints, in Advances in neural information processing systems, 28 (2015). https://doi.org/10.48550/arXiv.1509.09292
[45]	M. Gönen, G. Heller, Concordance probability and discriminatory power in proportional hazards regression, Biometrika, 92 (2005), 965–970. https://doi.org/10.1093/biomet/92.4.965 doi: 10.1093/biomet/92.4.965
[46]	D. M. Allen, Mean square error of prediction as a criterion for selecting variables, Technometrics, 13 (1971), 469–475. https://doi.org/10.1080/00401706.1971.10488811 doi: 10.1080/00401706.1971.10488811
[47]	Z. Xu, S. Wang, F. Zhu, J. Huang, Seq2seq fingerprint: An unsupervised deep molecular embedding for drug discovery, in Proceedings of the 8th ACM international conference on bioinformatics, computational biology, and health informatics, ACM, Boston, USA, (2017), 285–294. https://doi.org/10.1145/3107411.3107424
[48]	E. Asgari, M. R. Mofrad Continuous distributed representation of biological sequences for deep proteomics and genomics, PloS one, 10 (2015), e0141287. https://doi.org/10.1371/journal.pone.0141287 doi: 10.1371/journal.pone.0141287
[49]	J. Chung, C. Gulcehre, K. Cho, . Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, preprint, arXiv: 1412.3555.
[50]	T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, ACM, San Francisco, USA, (2016), 785–794. https://doi.org/10.1145/2939672.2939785
[51]	G. Fu, Y. Ding, A. Seal, B. Chen, Y. Sun, E. Bolton, Predicting drug target interactions using meta-path-based semantic network analysis, BMC Bioinf., 17 (2016), 1–10. https://doi.org/10.1186/s12859-016-1005-x doi: 10.1186/s12859-016-1005-x
[52]	Y. Pu, J. Li, J. Tang, F. Guo, DeepFusionDTA: Drug-target binding affinity prediction with information fusion and hybrid deep-learning ensemble model, IEEE/ACM Trans. Comput. Biol. Bioinf., 2021 (2021). https://doi.org/10.1109/TCBB.2021.3103966 doi: 10.1109/TCBB.2021.3103966
[53]	H. Öztürk, E. Ozkirimli, A. Özgür, WideDTA: Prediction of drug-target binding affinity. preprint, arXiv: 1902.04166.
[54]	M. A. Thafar, M. Alshahrani, S. Albaradei, T. Gojobori, M. Essack, X. Gao, Affinity2Vec: Drug-target binding affinity prediction through representation learning, graph mining, and machine learning, Sci. Rep., 12 (2022), 1–18. https://doi.org/10.1038/s41598-022-08787-9 doi: 10.1038/s41598-022-08787-9

This article has been cited by:

Xiaodong Wang, Yang Lv, Danyang Guo, Xianghao Duan, Analysis of microbial contamination and risk assessment model construction at critical public congregation areas of apartment buildings, 2025, 267, 03601323, 112232, 10.1016/j.buildenv.2024.112232

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Mathematical Biosciences and Engineering

3.9

Metrics

Article views(3192) PDF downloads(242) Cited by(5)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(3) / Tables(5)

Mathematical Biosciences and Engineering

Drug-target binding affinity prediction method based on a deep graph neural network

Related Papers:

Abstract

1. Introduction

2. Materials and methods

2.1. Conventional methods for infection probability estimation

2.2. New relationships in beta-Poisson DR models

3. Results

3.1. Disaggregated NoV GI.1 inoculation DR models

3.1.1. "Aggregate" NoV GI.1 inoculation DR models

3.1.2. **Validation of proposed aggregated Nov GI.1 DR models**

3.2. Models of risk infection from disaggregated EV

4. Discussion

5. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Mathematical Biosciences and Engineering

Drug-target binding affinity prediction method based on a deep graph neural network

Related Papers:

Abstract

1. Introduction

2. Materials and methods

2.1. Conventional methods for infection probability estimation

2.2. New relationships in beta-Poisson DR models

3. Results

3.1. Disaggregated NoV GI.1 inoculation DR models

3.1.1. "Aggregate" NoV GI.1 inoculation DR models

3.1.2. Validation of proposed aggregated Nov GI.1 DR models

3.2. Models of risk infection from disaggregated EV

4. Discussion

5. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog

3.1.2. **Validation of proposed aggregated Nov GI.1 DR models**