Citation: Vasiliy N. Leonenko, Sergey V. Ivanov. Prediction of influenza peaks in Russian cities: Comparing the accuracy of two SEIR models[J]. Mathematical Biosciences and Engineering, 2018, 15(1): 209-232. doi: 10.3934/mbe.2018009
[1] | Raimund Bürger, Gerardo Chowell, Pep Mulet, Luis M. Villada . Modelling the spatial-temporal progression of the 2009 A/H1N1 influenza pandemic in Chile. Mathematical Biosciences and Engineering, 2016, 13(1): 43-65. doi: 10.3934/mbe.2016.13.43 |
[2] | Eunha Shim . Prioritization of delayed vaccination for pandemic influenza. Mathematical Biosciences and Engineering, 2011, 8(1): 95-112. doi: 10.3934/mbe.2011.8.95 |
[3] | Michael A. Andrews, Chris T. Bauch . Parameterizing a dynamic influenza model using longitudinal versus age-stratified case notifications yields different predictions of vaccine impacts. Mathematical Biosciences and Engineering, 2019, 16(5): 3753-3770. doi: 10.3934/mbe.2019186 |
[4] | Oren Barnea, Rami Yaari, Guy Katriel, Lewi Stone . Modelling seasonal influenza in Israel. Mathematical Biosciences and Engineering, 2011, 8(2): 561-573. doi: 10.3934/mbe.2011.8.561 |
[5] | Yuganthi R. Liyanage, Nora Heitzman-Breen, Necibe Tuncer, Stanca M. Ciupe . Identifiability investigation of within-host models of acute virus infection. Mathematical Biosciences and Engineering, 2024, 21(10): 7394-7420. doi: 10.3934/mbe.2024325 |
[6] | Xiaomeng Wang, Xue Wang, Xinzhu Guan, Yun Xu, Kangwei Xu, Qiang Gao, Rong Cai, Yongli Cai . The impact of ambient air pollution on an influenza model with partial immunity and vaccination. Mathematical Biosciences and Engineering, 2023, 20(6): 10284-10303. doi: 10.3934/mbe.2023451 |
[7] | Babak Khorsand, Abdorreza Savadi, Javad Zahiri, Mahmoud Naghibzadeh . Alpha influenza virus infiltration prediction using virus-human protein-protein interaction network. Mathematical Biosciences and Engineering, 2020, 17(4): 3109-3129. doi: 10.3934/mbe.2020176 |
[8] | Hiroshi Nishiura . Joint quantification of transmission dynamics and diagnostic accuracy applied to influenza. Mathematical Biosciences and Engineering, 2011, 8(1): 49-64. doi: 10.3934/mbe.2011.8.49 |
[9] | Jacek Banasiak, Eddy Kimba Phongi, MirosŁaw Lachowicz . A singularly perturbed SIS model with age structure. Mathematical Biosciences and Engineering, 2013, 10(3): 499-521. doi: 10.3934/mbe.2013.10.499 |
[10] | Kasia A. Pawelek, Anne Oeldorf-Hirsch, Libin Rong . Modeling the impact of twitter on influenza epidemics. Mathematical Biosciences and Engineering, 2014, 11(6): 1337-1356. doi: 10.3934/mbe.2014.11.1337 |
Acute respiratory infections (ARIs) are among the oldest and the most widely spread human infectious diseases. The most notorious of them, influenza, causes repetitive epidemic outbreaks with ARI incidence dramatically exceeding the average seasonal level. Outbreaks of influenza result in 3 to 5 million cases of severe illness annually worldwide, and the mortality rate is from 250 to 500 thousand individuals per year [29]. Influenza also causes an increase of heart attacks and strokes [4], as well as other disease complications. Even during an epidemic outbreak, only 15 to 20% of the total ARI cases are attributed to influenza [22], and diagnosis of influenza or another acute respiratory infection with similar symptoms is possible only through laboratory testing [3]. Due to those issues, the common clinical diagnosis 'influenza-like illness' (ILI) is often used, which includes all severe ARI cases fitting to a certain description. The criteria of ILI vary slightly in different national healthcare systems. According to the WHO, ILI is an acute respiratory infection with measured fever of
The earliest attempts of mathematical modeling of influenza-like illness outbreaks took place in the late 1960s and this area of mathematical epidemiology is still popular today. Despite the efforts of various scientific groups to clarify the mechanism of flu propagation dynamics, many unresolved questions remains. Today researchers try to enhance the descriptive abilities of their models by taking into account different external factors, such as weather [11], [24], [31], variation of virus strains [26], individual contact patterns, and others. A thorough review on the corresponding research papers and the flu-related factors considered can be found in [25].
One of the particular applications of the calibrated flu dynamics models is outbreak prediction. For that purpose, in addition to the variations of classical Kermack-McKendrick SEIR models [10], [32], other various approaches are used including agent-based modeling [13], metapopulational modeling [7], and social media data analysis [8], [12]. A detailed review on influenza forecasting can be found in [5]. One of the issues that the researchers face is the lack of reliable long-term flu incidence data provided by well-established influenza surveillance systems. As a result, the majority of studies are performed on data from Western Europe ([9], [27]) and Northern America ([11], [24], [28]). However, there are several exceptions, such as [2], where flu incidence in Chile is regarded. The use of the data from a limited set of geographical areas sometimes brings researchers to disputable assumptions. For instance, when flu incidence data from tropical regions became available in last decade, it demonstrated that some well-established hypotheses on the nature of ILI dynamics emerged from the temperate regions data have limited applicability [25]. Therefore, it is fruitful to expand the number of ILI incidence data sources.
Russia is one of the countries that can potentially contribute to the field. Influenza is considered a reportable disease by the Russian healthcare. In Saint-Petersburg (earlier Leningrad) ARI incidence has been collected since 1935, which gives one of the longest flu surveillance periods known [15]. In 1957, the all-USSR surveillance center was established. Its aim was to collect weekly and daily reports on ARI incidence from the local healthcare units throughout the country, with the number of covered cities being constantly increased over the years. The efficiency of the Soviet system of ARI cases registration led to the possibility of predicting the flu outbreaks in Soviet cities with a fairly high accuracy [6]. The employed model, created by Baroyan and Rvachev, was utilized by the specialists of Research Institute of Influenza [14] specifically for the sake of within-USSR flu propagation modeling [1]. Later it was applied to worldwide propagation of the pandemic flu [23]. The Baroyan-Rvachev model was a combination of the discrete Kermack-McKendrick SEIR model ("the local model") and a linear model of inter-city migration flows ("the transport model"). Although the structure itself was not novel, it matched real disease dynamics and made it possible to achieve accurate forecasts of the starting moments and peaks of influenza outbreaks in Soviet cities. For instance, from all the cases of epidemic outbreaks in 1970's, the day of the outbreak start was predicted without errors in 56.1% of cases and with a bias less than a week in 92.2% of cases, the same numbers for the day of the outbreak peak were 53.0% and 87.4% correspondingly [15].
However, from early 1980's the Soviet modeling framework for flu forecasting showed the signs of growing incoherence with the epidemic outbreak patterns observed in Soviet cities. According to [15], its malfunction was caused by the growing levels of herd immunity to flu due to increasing speed of its circulation around the globe. The core idea of Baroyan-Rvachev approach was that the fraction of non-immune individuals was the same in all Soviet cities and depended only on currently circulating virus strain. Since herd immunity levels grow with a rate dependent from different factors, including the structure of contact networks within an urban area, the aforementioned assumption is less applicable now than in Soviet times. Professor Ivannikov, who was in charge of utilizing the Soviet forecasting system, claimed that a new model for the influenza peak prediction in Russia should be built to provide accurate forecasts [16]. Unlike the Baroyan-Rvachev model, it should rely heavily on the analysis of the local urban incidence data, thus making it possible to overcome the issue of unequal herd immunity levels within the country. Nevertheless, this idea has never been tested on actual data.
In this paper, the authors aim at assessing the accuracy of the peak predictions with the help of two different SEIR models calibrated with local incidence data, and discussing the ways of improving it.
In [17] we formulated the continuous SEIR ("Susceptible-Exposed-Infected-Recovered") model for flu outbreak dynamics and calibrated it to the long-term Russian ARI incidence data. It was shown that the classical SEIR model, without any modifications accounting for the influence of external factors, may provide a satisfactory fit for the majority of influenza outbreak incidence datasets (the value of the coefficient of determination was
• to assess the possibility of accurate prediction of the epidemic outbreak peaks in any of the Russian cities relying on the incomplete incidence data for the current outbreak in this city only — apart from Baroyan-Rvachev approach, where the ARI incidence in all Soviet cities along with migration flow data was used to obtain predictions;
• to find out whether it is possible to obtain the desired accuracy without incorporating external factors into the model;
• to assess the number of incidence points for model calibration and, consequently, the average time before the actual peak, required to obtain the accurate prediction.
The particular steps we had to take to reach the aforementioned goals consisted of the following:
• to perform the retrospective forecast of influenza dynamics in three Russian cities (Moscow, Saint Petersburg and Novosibirsk) with the help of the SEIR model calibrated on incomplete data using long-term incidence data from Research Institute of Influenza;
• to assess the accuracy of the epidemic peak parameters prediction, particularly the day of the peak and its prospected height, and its dependence on the number of incidence points used for the model calibration.
Assuming that the accuracy of prediction may largely depend on the model serving as a core of the fitting algorithm, we have decided to employ the local submodel of Baroyan-Rvachev modeling framework [1], [15] in addition to our continuous SEIR model, and to compare their predictive abilities.
For the sake of describing the dynamics of influenza epidemic process, we have utilized a simple populational model represented by a system of ordinary differential equations. Since the flu has an incubation period, and recovered individuals acquire immunity from the particular virus strain [29], the population of an urban area under consideration is represented by set of four groups of individuals: susceptible (vulnerable to flu infection), exposed (asymptomatic and non-infectious), infectious (symptomatic, spreading the flu) and removed (immune to the flu). The sizes of groups are measured in fractions of total population
dSdt=−βSI, | (1) |
dEdt=βSI−γE, dIdt=γE−δI, dRdt=δI,S(t0)=S0≥0,E(t0)=E0≥0,I(t0)=I0≥0, S0+E0+I0=α,R(t0)=1−α. | (2) |
Since the duration of the epidemic process is relatively short, we consider the influence of birth and migration processes on the disease dynamics negligible and do not include these processes into the model. The description of parameters used for fitting the model is given in Table 1. Further in the text we consider
Definition | Description | Value | Unit |
Epidemiological parameters | |||
Initial ratio of susceptible individuals in the population | Estimated | - | |
Intensity of infection | Estimated | 1/(person | |
Intensity of transition to infective form of the disease | 0.39 | 1/day | |
Intensity of recovery | 0.133 | 1/day | |
Initial ratio of the infected | 0.0001 | - | |
Curve positioning parameters | |||
Relative vertical bias of the modeled incidence curve position | Estimated | - | |
Absolute horizontal bias of the modeled incidence curve epidemic start position compared to the data | Estimated | day |
The local submodel used in the Baroyan-Rvachev prediction framework was represented by the system of difference equations, with the time step equal to one day. Following the notations introduced in [15], let
¯yt=T∑τ=0yt−τgτ, | (3) |
yt+1=βρxt¯yt, xt+1=xt−yt+1, x0=αρ. | (4) |
The piecewise constant function
The original dataset provided by the Research Institute of Influenza [14] contains weekly cumulative incidence for all the ARI types (including flu) in three Russian cities from 1986 to 2014. Before the model fitting, we have to refine the incidence data by restoring the missed values and fixing the under-reporting. We also need to extract flu incidence from the cumulative ARI incidence data. Corresponding algorithms are described in detail in [18], here we introduce briefly the sequence of operations.
• Under-reporting correction. Since infected people avoid visiting healthcare facilities during holidays, the corresponding weekly prevalence is lower than the actual number of newly infected. This under-reporting bias can be corrected by means of cubic interpolation [1] using the incidence registered in the adjacent weeks. The sporadic gaps in incidence data are filled in the same fashion.
• Bringing the incidence data to daily format. The daily incidence is found with the help of cubic interpolation of weekly incidence. We assume that
• Extracting data on influenza outbreak from the cumulative seasonal ARI data with the help of a separate epidemic curve allocation algorithm. At first, the algorithm finds higher non-flu ARI incidence level
Let
F(Z(mod),Z(dat))=t1∑i=0(z(mod)i−z(dat)i)2, | (5) |
Here
Before optimizing the model parameters, we need to match accurately the model timeline (
• Aligning the timelines by outbreak starting day. We assume that the moment
• Aligning the timelines by peak day. We assume that the peak moment of the modeled epidemic curve coincides with the epidemic peak day from the dataset.
The first one is a part of a fitting algorithm for the continuous SEIR model, the second one is used for the Baroyan-Rvachev model calibration.
The issue that affects timelines alignment is inaccuracy of procedure input. The outbreak starting day detection depends on the curve extraction algorithms employed (fig. 2) and cannot be established accurately due to absence of distinct diagnosis of influenza and other acute respiratory illnesses. The peak moment is known only in the case we fit the model to data on past epidemic outbreaks, apart from performing predictions for the ongoing outbreak, but even then, biases in incidence registration (like the aforementioned under-reporting during holidays) can lead to incorrect determination of the peak moment.
In this paper we compensate the uncertainty in the input (outbreak starting day and peak day obtained from the dataset) by introducing curve positioning parameters. These parameters are the part of fitting algorithms, not the models themselves. Thus, their function is to not change the shape of the model curves, but rather to adjust the position of the modeled curve relatively to the epidemic incidence data. The main issue of using curve fitting parameters is that they give additional degrees of freedom to the fitting algorithm. Thus, they make it possible to fit various model curves to incidence data and expand the range of possible model parameter values.
The details on the curve positioning parameters used in each of two fitting algorithms are given in the subsequent section.
The list of parameters involved in the fitting procedure (table 1), apart from five model parameters (
Relying on conclusions made from earlier numerical experiments with the model fitting [17], we have decided to fix the values of
The fitting algorithm for the continuous SEIR model was introduced for the first time in [17]. The algorithm operations are performed as follows. For each
• For each fixed combination of values {
1. Find the numerical solution of the model (1)-(2) with the initial conditions
2. Calculate the modeling flu incidence in relative numbers:
E(t)=E(t−1)+NS→E(t)−NE→I(t) |
and
NS→E(t)=S(t−1)−S(t), |
we achieve:
y(mod,rel)(t)=−ΔS(t)−ΔE(t),t=1,2,…ΔS(t)=S(t)−S(t−1),ΔE(t)=E(t)−E(t−1), |
3. As we are working with disease incidence attributed only to influenza outbreaks, excluding the non-epidemic cases of ARI infections, we need to subtract the non-epidemic incidence from the overall ARI incidence data. For that purpose we need to derive the baseline level for the modeled outbreak start
ybase:=kinc⋅a2,y(dat)i:=y(dat)i−ybase,i∈¯0,T−1 |
4. Consider that the data incidence points from the dataset are shifted by
Y(dat)={y(dat)0,y(dat)1,…,y(dat)T−1},Y(mod)={y(mod)(Δ),y(mod)(Δ+1),…,y(mod)(Δ+T−1)}. |
5. Convert the relative model incidence values to absolute values:
y(mod)i=y(mod,rel)i⋅NL(m), | (6) |
where
6. Calculate the value of the fit function
In the described manner the BFGS algorithm finds the least distance
After the optimization algorithm has established the best fitting model parameter values, the model can be used to estimate the dynamics of population groups
The parameter description of the fitting algorithm which corresponds to Baroyan-Rvachev model is given in Table 2. In addition to the parameters taken from the model (3)-(4), a curve positioning parameter
Definition | Description | Value | Unit |
Model parameters | |||
Initial ratio of susceptible individuals in the population | Estimated | - | |
| Intensity of infection | Estimated | - |
Initial ratio of infected in the population | Estimated | - | |
Duration of infection | Fixed | day | |
A fraction of infectious individuals among those who were infected | Fixed | - | |
Population size | Fixed | persons | |
Curve positioning parameters | |||
Absolute horizontal bias of the modeled incidence curve peak position compared to the data | Estimated | day |
The important advantage of the Baroyan-Rvachev model fitting algorithm compared to the continuous SEIR model fitting is that it has fewer parameters to be varied. Moreover, it has been proven [1] that without the loss of fit quality we can vary the sole auxiliary value
Another benefit of the algorithm is that it relies on the peak day alignment rather than starting day alignment, so it is not affected by incorrect outbreak starting day detection and we do not need to add a vertical positioning parameter.
The description of the algorithm follows.
• For each fixed combination of values {
1. Derive the value of
s=kT+1∑Tτ=0kT−τgτ | (7) |
2. Set the preliminary model parameter values,
α′=1,β′=s |
3. Find the preliminary numerical solution of the model (3)-(4) with the parameter values
4. Derive the preliminary number of newly infected each day from the model output:
5. Derive the baseline level for the modeled outbreak start
z(dat)i:=z(dat)i−a2,i∈¯0,t1−1 |
6. According to the algorithm, we need to match in time the model peak with the incidence data peak. For that purpose we find the shift
δadj=t(mod)peak−(t(dat)peak+Δp), | (8) |
where
Z(dat)={z(dat)0,z(dat)1,…,z(dat)N−1},Z(mod)′={z(mod)′(δadj),z(mod)′(δadj+1),…,z(mod)′(δadj+N−1)}, |
where
7. Assigning optimal values to
minα,βF(Z(mod)(α,β,Δp),Z(dat))=F(Z(mod)(˜α,˜β,Δp),Z(dat)),maxZ(mod)(˜α,˜β,Δp)=maxZ(dat)+Δp; |
where
a=∑t1i=0z(dat)i∑t1i=0(z(mod)′i)2. |
To avoid launching the simulation for the second time, now with the values
z(mod)i=az(mod)′i,z(mod)i∈Z(mod)(˜α,˜β,Δp),z(mod)′i∈Z(mod)′. |
In that manner we find the optimal parameter values and the corresponding model curve
It is worth mentioning that
8. Calculate the value of the fit function
The BFGS algorithm finds the least distance
The described fitting algorithm for Baroyan-Rvachev model originates from [15], with several modifications that were made to unify it with the same procedure for the SEIR model. Particularly:
• The curve positioning parameter
• The iteration over the values of variable
• The value of
The modifications described enhanced both the accuracy and the performance of the algorithm and made it more suitable for our task of peak prediction.
Both algorithms are implemented as scripts collection written in Python programming language (Python 3.x with
Definition | Description | Value | Variation type |
Continuous SEIR model | |||
Initial ratio of susceptible individuals in the population | BFGS optimization | ||
Intensity of infection | BFGS optimization | ||
Relative vertical bias of the modeled incidence curve position | BFGS optimization | ||
Absolute horizontal bias of the modeled incidence curve epidemic start position compared to the data | Iteration | ||
Baroyan-Rvachev model | |||
The service parameter defining the product of | BFGS optimization | ||
Initial ratio of infected in the population | BFGS optimization | ||
Absolute horizontal bias of the modeled incidence curve peak position compared to the data | Iteration | ||
Prospected incidence curve peak day | Iteration | ||
* For complete incidence data ** For incomplete incidence data |
We have conducted the numerical experiments on weekly ARI incidence data for three Russian cities (Moscow, Saint Petersburg, and Novosibirsk) from July 1986 to June 2014. By means of epidemic curve allocation algorithm, we extracted the incidence data for the epidemic outbreaks, which gave us 67 epidemic outbreaks in total (there were no epidemics during some seasons).
To test the Baroyan-Rvachev fitting algorithm, we applied it to the complete outbreak data and compared the resulting accuracy of fit with the one achieved by the continuous SEIR model. The values of
After comparing the two models and the corresponding fitting algorithms on complete incidence data, we compared the predictive force of the models for the outbreak peak forecasting. For this purpose we reduced the incidence datasets for each epidemic season, reproducing the case of incomplete incidence data (Figure 4). The sample sizes were varied starting from 5 incidence points (that corresponds to the attempt of peak prediction at the fifth day of the outbreak, provided that the actual incidence data is provided by healthcare units by the end of each day).
At that stage of the experiment an important issue of the Baroyan-Rvachev fitting algorithm described in section 5.3 was revealed. Unlike the algorithm from section 5.2, it relies on the explicit knowledge of the day of outbreak peak. Obviously, during the exploitation of Baroyan-Rvachev modeling framework, the local submodel was calibrated on the 'half-wave' of flu incidence (i.e. on the data from the outbreak start till its peak), thus the value of the outbreak peak day was always available. When the fitting was made to complete data in the previous experiment, we knew this value too. In the current experiment, on the contrary, the day of the outbreak peak is meant to be the output parameter of the algorithm, along with the peak height. Hence, we had to modify the initial algorithm to make it suitable for prediction purposes. We modified the formula (8) in the following way:
δadj=t(mod)peak−θ(dat)peak. |
The varied parameter
Let
• the prediction bias of the peak day
• the ratio between the modeled and real outbreak peak heights
To assess the accuracy of peak prediction results, we have used the 1970's Soviet flu outbreak prediction framework criteria [15] already applied by the authors in [19]:
• 'Square'. The prediction is thought to be accurate if
• 'Vertical stripe'. The accurate prediction should have
• 'Horizontal stripe'. The accurate prediction should have
For every fixed outbreak, we have calculated the sample size
The obtained values of
As one can see, the accordance of the predictions to the 'horizontal stripe' criterion, i.e. the quality of peak height prediction, may be named satisfactory for the Baroyan-Rvachev model (98% of compliance for the predictions achieved one day before the peak) and unsatisfactory for the continuous SEIR model (51% of compliance). We believe that the Baroyan-Rvachev model, being more 'rigid', tends to reproduce better the overall trend of incidence data, whereas the continuous SEIR model is more prone to the reaction on the outliers in the data. Thus, the modeled incidence curves tend to change their slope in a greater extent in the latter case, resulting in bigger biases of the peak predictions.
The reason for the fact that the prediction compliance to accuracy criteria may still be low near the peak is due to the peculiar shape of some outbreak incidence curves which cannot be properly fitted by the one-peaked incidence model (see Figure 3 and [17] for more details).
The accuracy percentage for the 'vertical stripe' and, consequently, 'square', is generally unsatisfactory for the both models. The SEIR model gives slightly better forecasts on the initial stages of the outbreak, whereas the Baroyan-Rvachev model fits better to big samples (when we almost have the epidemic "half-wave").
The numerical experiments have shown that the prediction methods demonstrated in the paper may be applied to assess the height of the peak, but are incapable to predict the peak time. (An interesting fact is that the same issue, although caused by another reasons, also holds true for the last Baroyan and Rvachev forecasts performed in early 1980s [15] and for our attempts to modify their algorithm to be used without the transport data [19]). Apparently, we cannot expect great accuracy from the prediction obtained in such a straightforward manner, especially if the number of incidence points used to calibrate the model is not large. In this case, an incidence point sample can be fitted by various model curves with almost equal goodness of fit
Among the technical limitations that we face while applying the algorithms, the most important was connected with the fitting algorithm performance. Because of the long duration of algorithm execution on the amount of incidence data employed, we had to limit the number of runs with different initial values of input. In some cases, that may lead to an unsatisfactory fit due to the fact that the optimization function may have several local minima. We hope to increase the algorithm speed and reach a higher fit accuracy by employing the parallel techniques, such as thread distribution over the computer cores, in the same way as we made it in earlier works for a number of epidemic model algorithms [20].
The drawback of this work, that was already mentioned in [17], is that we did not consider the bias in data gained as a result of conversion of the weekly to daily incidence data. Despite the fact that the "synthetic" daily data is surely more "smooth" than the original one, we presume that our set of algorithms will be suitable to handle the real daily incidence dataset. Our assumption is supported by the fact that after filtering the fluctuations caused by the weekly cycle of individuals the daily epidemic curves resemble the synthetic data we work with (see [1]).
The authors are thankful to the two anonymous referees who helped to improve significantly the quality of the paper. This paper is financially supported by The Russian Scientific Foundation, Agreement #14-21-00137.
Figures 7-9 demonstrate the prediction accuracy according to three accuracy criteria for Saint Petersburg, Moscow, and Novosibirsk. For comparison purposes, we have added prospected prediction quality for the same cities obtained by calibrating the models to the earlier occurred outbreaks of the same epidemic season (see [19] for more details).
Figure 7 shows that the prediction accuracy of the peak height ('horizontal stripe' criterion) is better for the method employed in [19] than for the forecasting method described in this article. At the same time, the former method may be employed in a limited number of cases. Particularly, the employment of the method requires the existence of the city with the climax of the epidemic outbreak reached, otherwise we cannot calibrate the model to make a prediction. Thus, peak forecasting based on the incomplete data is more versatile, although less accurate. Note that the prediction accuracy of the method from [19] depends to large extent on the city which was used to calibrate the model. For instance, the reader can see that the predictions for Moscow obtained by calibrating the model to Novosibirsk data is significantly worse than the prediction obtained by using the incidence data from Saint Petersburg.
The prediction of peak days is unsatisfactory for the both methods, with the accuracy percentage of incomplete data forecasting becoming higher than the one of the method from [19] on average in the "day -3" (i.e. approximately three days before the peak).
Comparing the forecasting methods, we have come to the following question: may it be that the majority of the predictions is accurate due to the fact that the peak data (day and height) has very limited variance over the years? For instance, may we match the 'vertical stripe' accuracy criterion (a peak height prediction is 0.7 to 1.5 of the real peak height) by taking the average height of the previous epidemic peaks in that city? To answer this question, we have utilized the two statistical approaches:
• "Prediction by last peak data". We "predict" that the peak in the current season is likely to have exactly the same height and will happen after the same number of days from the epidemic outbreak start, as in the previous one.
• "Prediction by average peak data". We "predict" that the peak height and day are to coincide with the average height and average day calculated from the peak data over all the previous years.
The accuracy of the predictions obtained in the described way was compared with the modeling prediction obtained by the method from [19] (for each city we took the best accuracy from the two predictions based on the models calibrated on two cities; for instance, for Saint Petersburg we took the accuracy obtained on Moscow data, etc.) and by the method described in this article (we took the accuracy obtained by Baroyan-Rvachev model calibrated on the dataset correspondent to "day -1", i.e. the day before the actual peak). The comparison results are shown in Figure 10.
As one can see, in the case of the peak height prediction ("horizontal stripe" criterion) the accuracy of the modeling methods is significantly higher than of the primitive statistical approaches mentioned above. In case of peak day prediction, the modeling method from [19] demonstrates the accuracy, which is equal or worse than the one of the statistical approaches. This result supports our assumption that we cannot use the described modeling methods to assess the peak days. The fact that the Baroyan-Rvachev prediction accuracy is rather high for both "vertical stripe" and "square" criteria, does not prove this assumption wrong, because, as we can see on Figures 8-9, the accuracy of the method falls fast when we take fewer incidence points for the model calibration.
[1] | [ O. Baroyan, U. Basilevsky, V. Ermakov, K. Frank, L. Rvachev and V. Shashkov, Computer modelling of influenza epidemics for large-scale systems of cities and territories, in Proc. WHO Symposium on Quantitative Epidemiology, Moscow, 1970. |
[2] | [ R. Burger,G. Chowell,P. Mulet,L. Villada, Modelling the spatial-temporal progression of the 2009 A/H1N1 influenza pandemic in Chile, Mathematical Biosciences and Engineering, 13 (2016): 43-65. |
[3] | [ CDC, Influenza signs and symptoms and the role of laboratory diagnostics, [online], http://www.cdc.gov/flu/professionals/diagnosis/labrolesprocedures.htm. |
[4] | [ CDC, People with heart disease and those who have had a stroke are at high risk of developing complications from influenza (the flu), [online], http://www.cdc.gov/flu/heartdisease/. |
[5] | [ J. -P. Chretien, D. George, J. Shaman, R. A. Chitale and F. E. McKenzie, Influenza forecasting in human populations: A scoping review, PloS one, 9 (2014), e94130. |
[6] | [ A. D. Cliff, P. Haggett and J. K. Ord, Spatial Aspects of Influenza Epidemics, Routledge, 1986. |
[7] | [ V. Colizza, A. Barrat, M. Barthelemy, A. -J. Valleron and A. Vespignani, Modeling the worldwide spread of pandemic influenza: Baseline case and containment interventions, PLoS Med, 4 (2007), e13. |
[8] | [ S. Cook, C. Conrad, A. L. Fowlkes and M. H. Mohebbi, Assessing google flu trends performance in the united states during the 2009 influenza virus a (h1n1) pandemic PloS one, 6 (2011), e23610. |
[9] | [ N. Goeyvaerts, L. Willem, K. Van~Kerckhove, Y. Vandendijck, G. Hanquet, P. Beutels and N. Hens, Estimating dynamic transmission model parameters for seasonal influenza by fitting to age and season-specific influenza-like illness incidence Epidemics, 13 (2015), p1. |
[10] | [ I. Hall,R. Gani,H. Hughes,S. Leach, Real-time epidemic forecasting for pandemic influenza, Epidemiology and Infection, 135 (2007): 372-385. |
[11] | [ D. He, J. Dushoff, R. Eftimie and D. J. Earn, Patterns of spread of influenza A in Canada, Proceedings of the Royal Society of London B: Biological Sciences, 280 (2013), 20131174. |
[12] | [ K. S. Hickmann,G. Fairchild,R. Priedhorsky,N. Generous,J. M. Hyman,A. Deshpande,S. Y. Del Valle, Forecasting the 2013-2014 influenza season using wikipedia, PLoS Comput Biol, 11 (2015): e1004239. |
[13] | [ A. Hyder, D. L. Buckeridge and B. Leung, Predictive validation of an influenza spread model PloS one, 8 (2013), e65459. |
[14] | [ F. Institute, Research Institute of Influenza website, [online], http://influenza.spb.ru/en/. |
[15] | [ Y. G. Ivannikov and A. T. Ismagulov, Epidemiologiya Grippa (The Epidemiology of Influenza), Almaty, Kazakhstan, 1983, In Russian. |
[16] | [ Y. Ivannikov,P. Ogarkov, An experience of mathematical computing forecasting of the influenza epidemics for big territory, Journal of Infectology, 4 (2012): 101-106. |
[17] | [ V. N. Leonenko,S. V. Ivanov, Fitting the SEIR model of seasonal influenza outbreak to the incidence data for Russian cities, Russian Journal of Numerical Analysis and Mathematical Modelling, 31 (2016): 267-279. |
[18] | [ V. N. Leonenko,S. V. Ivanov,Y. K. Novoselova, A computational approach to investigate patterns of acute respiratory illness dynamics in the regions with distinct seasonal climate transitions, Procedia Computer Science, 80 (2016): 2402-2412. |
[19] | [ V. N. Leonenko,Y. K. Novoselova,K. M. Ong, Influenza outbreaks forecasting in Russian cities: Is Baroyan-Rvachev approach still applicable?, Procedia Computer Science, 101 (2016): 282-291. |
[20] | [ V. N. Leonenko,N. V. Pertsev,M. Artzrouni, Using high performance algorithms for the hybrid simulation of disease dynamics on CPU and GPU, Procedia Computer Science, 51 (2015): 150-159. |
[21] | [ D. C. Liu,J. Nocedal, On the limited memory bfgs method for large scale optimization, Mathematical programming, 45 (1989): 503-528. |
[22] | [ A. Romanyukha,T. Sannikova,I. Drynov, The origin of acute respiratory epidemics, Herald of the Russian Academy of Sciences, 81 (2011): 31-34. |
[23] | [ L. A. Rvachev,I. M. Longini, A mathematical model for the global spread of influenza, Mathematical Biosciences, 75 (1985): 1-22. |
[24] | [ J. Shaman, V. E. Pitzer, C. Viboud, B. T. Grenfell and M. Lipsitch, Absolute humidity and the seasonal onset of influenza in the continental United States, PLoS Biol, 8 (2010), e1000316. |
[25] | [ J. Tamerius,M. I. Nelson,S. Z. Zhou,C. Viboud,M. A. Miller,W. J. Alonso, Global influenza seasonality: Reconciling patterns across temperate and tropical regions, Environmental Health Perspectives, 119 (2011): 439-445. |
[26] | [ J. Truscott,C. Fraser,S. Cauchemez,A. Meeyai,W. Hinsley,C. A. Donnelly,A. Ghani,N. Ferguson, Essential epidemiological mechanisms underpinning the transmission dynamics of seasonal influenza, Journal of The Royal Society Interface, 9 (2011): 304-312. |
[27] | [ S. P. van Noort,R. Águas,S. Ballesteros,M. G. M. Gomes, The role of weather on the relation between influenza and influenza-like illness, Journal of Theoretical Biology, 298 (2012): 131-137. |
[28] | [ C. Viboud,O. N. Bjornstad,D. L. Smith,L. Simonsen,M. A. Miller,B. T. Grenfell, Synchrony, waves, and spatial hierarchies in the spread of influenza, Science, 312 (2006): 447-451. |
[29] | [ WHO, Influenza (seasonal). Fact sheet No. 211, March 2014. , [online], http://www.who.int/mediacentre/factsheets/fs211/en/. |
[30] | [ WHO, Surveillance case definitions for ILI and SARI, [online], http://www.who.int/influenza/surveillance_monitoring/ili_sari_surveillance_case_definition/en/. |
[31] | [ R. Yaari, G. Katriel, A. Huppert, J. Axelsen and L. Stone, Modelling seasonal influenza: The role of weather and punctuated antigenic drift, Journal of The Royal Society Interface, 10 (2013), 20130298. |
[32] | [ W. Yang, B. J. Cowling, E. H. Lau and J. Shaman, Forecasting influenza epidemics in hong kong, PLoS Comput Biol, 11 (2015), e1004383. |
1. | Nikita E. Seleznev, Vasiliy N. Leonenko, Absolute humidity anomalies and the influenza onsets in Russia: a computational study, 2017, 119, 18770509, 224, 10.1016/j.procs.2017.11.180 | |
2. | V. N. Leonenko, Yu. K. Novoselova, 2018, Chapter 26, 978-3-319-91091-8, 375, 10.1007/978-3-319-91092-5_26 | |
3. | Vasiliy Leonenko, Alexander Lobachev, Georgiy Bobashev, 2019, Chapter 36, 978-3-030-22733-3, 492, 10.1007/978-3-030-22734-0_36 | |
4. | Nikita E. Seleznev, Vasiliy N. Leonenko, 2017, Chapter 32, 978-3-319-69783-3, 374, 10.1007/978-3-319-69784-0_32 | |
5. | Shi Yin, Nan Zhang, Prevention schemes for future pandemic cases: mathematical model and experience of interurban multi-agent COVID-19 epidemic prevention, 2021, 0924-090X, 10.1007/s11071-021-06385-4 | |
6. | Nobuo Tomizawa, Kanako K. Kumamaru, Koh Okamoto, Shigeki Aoki, Multi-agent system collision model to predict the transmission of seasonal influenza in Tokyo from 2014–2015 to 2018–2019 seasons, 2021, 7, 24058440, e07859, 10.1016/j.heliyon.2021.e07859 | |
7. | Zhaofu Hong, Yingjie Li, Yeming Gong, Wanying Chen, A data-driven spatially-specific vaccine allocation framework for COVID-19, 2022, 0254-5330, 10.1007/s10479-022-05037-z | |
8. | Yongdong Shi, Rongsheng Huang, Hanwen Cui, Prediction and Analysis of Tourist Management Strategy Based on the SEIR Model during the COVID-19 Period, 2021, 18, 1660-4601, 10548, 10.3390/ijerph181910548 | |
9. | Vasiliy N. Leonenko, Herd immunity levels and multi-strain influenza epidemics in Russia: a modelling study, 2021, 36, 0927-6467, 279, 10.1515/rnam-2021-0023 | |
10. | A. A. Kosova, V. I. Chalapa, O. P. Kovtun, Methods for modellind and forecasting dynamics of infectious diseases, 2023, 22, 2071-5943, 102, 10.52420/2071-5943-2023-22-4-102-112 |
Definition | Description | Value | Unit |
Epidemiological parameters | |||
Initial ratio of susceptible individuals in the population | Estimated | - | |
Intensity of infection | Estimated | 1/(person | |
Intensity of transition to infective form of the disease | 0.39 | 1/day | |
Intensity of recovery | 0.133 | 1/day | |
Initial ratio of the infected | 0.0001 | - | |
Curve positioning parameters | |||
Relative vertical bias of the modeled incidence curve position | Estimated | - | |
Absolute horizontal bias of the modeled incidence curve epidemic start position compared to the data | Estimated | day |
Definition | Description | Value | Unit |
Model parameters | |||
Initial ratio of susceptible individuals in the population | Estimated | - | |
| Intensity of infection | Estimated | - |
Initial ratio of infected in the population | Estimated | - | |
Duration of infection | Fixed | day | |
A fraction of infectious individuals among those who were infected | Fixed | - | |
Population size | Fixed | persons | |
Curve positioning parameters | |||
Absolute horizontal bias of the modeled incidence curve peak position compared to the data | Estimated | day |
Definition | Description | Value | Variation type |
Continuous SEIR model | |||
Initial ratio of susceptible individuals in the population | BFGS optimization | ||
Intensity of infection | BFGS optimization | ||
Relative vertical bias of the modeled incidence curve position | BFGS optimization | ||
Absolute horizontal bias of the modeled incidence curve epidemic start position compared to the data | Iteration | ||
Baroyan-Rvachev model | |||
The service parameter defining the product of | BFGS optimization | ||
Initial ratio of infected in the population | BFGS optimization | ||
Absolute horizontal bias of the modeled incidence curve peak position compared to the data | Iteration | ||
Prospected incidence curve peak day | Iteration | ||
* For complete incidence data ** For incomplete incidence data |
Definition | Description | Value | Unit |
Epidemiological parameters | |||
Initial ratio of susceptible individuals in the population | Estimated | - | |
Intensity of infection | Estimated | 1/(person | |
Intensity of transition to infective form of the disease | 0.39 | 1/day | |
Intensity of recovery | 0.133 | 1/day | |
Initial ratio of the infected | 0.0001 | - | |
Curve positioning parameters | |||
Relative vertical bias of the modeled incidence curve position | Estimated | - | |
Absolute horizontal bias of the modeled incidence curve epidemic start position compared to the data | Estimated | day |
Definition | Description | Value | Unit |
Model parameters | |||
Initial ratio of susceptible individuals in the population | Estimated | - | |
| Intensity of infection | Estimated | - |
Initial ratio of infected in the population | Estimated | - | |
Duration of infection | Fixed | day | |
A fraction of infectious individuals among those who were infected | Fixed | - | |
Population size | Fixed | persons | |
Curve positioning parameters | |||
Absolute horizontal bias of the modeled incidence curve peak position compared to the data | Estimated | day |
Definition | Description | Value | Variation type |
Continuous SEIR model | |||
Initial ratio of susceptible individuals in the population | BFGS optimization | ||
Intensity of infection | BFGS optimization | ||
Relative vertical bias of the modeled incidence curve position | BFGS optimization | ||
Absolute horizontal bias of the modeled incidence curve epidemic start position compared to the data | Iteration | ||
Baroyan-Rvachev model | |||
The service parameter defining the product of | BFGS optimization | ||
Initial ratio of infected in the population | BFGS optimization | ||
Absolute horizontal bias of the modeled incidence curve peak position compared to the data | Iteration | ||
Prospected incidence curve peak day | Iteration | ||
* For complete incidence data ** For incomplete incidence data |