Whereas the prevention and treatment of Ebola virus disease (EVD) have been well studied after the 2013–16 outbreak in West Africa, the emergence of human outbreaks and their mechanisms have yet to be explored in detail. In particular, it has yet to be clarified whether the emergence records offer any theoretical insight into the changing interface between humans and animal reservoirs. Here we explore the epidemiological record of emergence, investigating predominant causes of the introduction to the human population, their characteristics, and frequencies. We retrieved data of every outbreak that can be traced back to a single zoonotic spillover. Through statistical analysis, we have shown that (ⅰ) the leading cause of emergence was eating and hunting habits, (ⅱ) primates act as the main source of zoonotic spillover, and (ⅲ) Zaire ebolavirus is the most virulent type. Moreover, the trend of emergence was demonstrated not to be a Poisson process, indicating that some unknown, underlying, non-random mechanisms are likely to govern the spillover event. In the Democratic Republic of Congo, an increasing emergence trend was favored compared with a purely random emergence model. Outbreak event data and their causative viruses should be explored biologically and epidemiologically to possibly predict future outbreak events.
1.
Introduction
Ebola virus disease (EVD), also referred to as Ebola hemorrhagic fever, is caused in humans by four ebolaviruses [1,2,3]. Following an incubation period of 3–21 days, clinical symptoms are initially non-specific, and include fever, muscle pain, sore throat and headaches, which are later typically followed by vomiting, diarrhea, and rashes. If exacerbated, hemorrhagic symptoms including internal and external bleeding as well as coma occur. The case fatality risk (CFR) of EVD is very high, killing 25–90% of diagnosed individuals [2,3]. The virus is transmitted via bodily fluids such as blood from infected humans [2]. Approved specific preventions (e.g. vaccines) and treatments have yet to be fully established, but these have been extensively studied in recent years accelerated by the large epidemics of EVD in West Africa from 2013–16.
The first recorded outbreak of EVD occurred in a cotton factory in Sudan from June to November 1976 [1]. Since late 2013, the largest ever epidemic of EVD occurred in West Africa, mainly in Guinea, Sierra Leone and Liberia, involving about 30,000 cases and 13,000 deaths. To this date, there have been 23 EVD outbreak events in Central and West Africa, all of which arose from zoonotic spillover from wild animals to humans. Zoonotic spillover is defined as the introduction to a human population from a nonhuman animal host [4]. However, some outbreaks were not able to be traced back to a single source animal [2,3,4]. Despite enhanced studies on the prevention and treatment of EVD in recent years, the emergence of human outbreaks and their mechanisms have yet to be explored in detail.
The emergence records could offer key insights into understanding the changing environment of interface between humans and animal reservoirs. For instance, we have yet to understand if EVD outbreaks are increasing in their frequency. Also, it should be well understood how the route of emergence is established from animals to humans, while animal reservoirs of this particular disease are always arising from wildlife. It is fruitful to understand underlying relationships between the infectiousness and virulence of EVD, and modes of transmission, source animals, and viral taxa (i.e. Zaire ebolavirus, Sudan ebolavirus, Bundibugyo ebolavirus). Here we explore the epidemiological record of EVD outbreaks, aiming to clarify predominant animal sources of the emergence, epidemiological characteristics of each outbreak (e.g. the risk of death), and frequency of emergence. We collected the data of every outbreak that can be traced back to a single zoonotic spillover, anticipating that knowing the main causes and sources of ebolavirus will contribute to predict future emergences, helping us to swiftly react to outbreaks.
2.
Materials and method
2.1. Collection of the outbreak emergence data
We conducted a literature review and collected all information with respect to the animal source of ebolavirus spillover in every recorded outbreak [1,2,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]. In principle, we adhered to the chronological list of outbreaks as described by the World Health Organization [8], and the animal source and other details were traced in literature [1,2,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]. To identify each outbreak, we classified the information obtained into: (1) the date of the outbreak, (2) country where it occurred, (3) number of reported cases, (4) number of deaths, (5) the identification method, (6) whether the human case of zoonotic source was laboratory confirmed or not, (7) the animal source of infection, (8) CFR, (9) the mode of transmission to the human index case, and (10) the viral taxa. Subsequently, we considered sub-categories as follows: (ⅰ) animal source of infection: Bats = 0, primates = 1, others = 2, and unknown = 3, (ⅱ) viral taxa: Sudan ebolavirus = 1, Zaire ebolavirus = 2, and Bundibugyo ebolavirus = 3, and (ⅲ) mode of transmission: Eating = 0, environmental = 1, unknown = 2. For clarity, we excluded iatrogenic events from our analysis, because they did not arise from animal-to-human contact, and it would skew the average CFR. The collected data are summarized in Table 1. It is not yet confirmed whether or not bats are infected with ebolavirus, but it is a leading hypothesis that they are, so here we list bats as sources of infection under the assumption that they are actively infected with ebolavirus.
2.2. Descriptive analysis
First, we examined statistical relationships among the CFR, suspected mode of transmission, source animals, and viral taxa, calculated as Pr(Di; Ci) for a sub-category i, where Di and Ci stand for the numbers of deaths and cases of sub-category i, respectively. Visual comparison as well as statistical testing of independence were conducted. For the visual assessment, we have plotted the mean CFR for each sub-category with their respective 95% confidence intervals (CI). Since there were only 22 data points, we performed bootstrap resampling to calculate accurate confidence intervals with realizations for 1,000 times. In addition, we performed a Fisher's exact test of independence for each pair of two variables to examine the statistical association. Second, we tested whether the mean CFR for each sub-category i was not significantly different (null hypothesis). Because there are more than two populations, the One-way Analysis of Variance (ANOVA) was employed to test the hypothesis. Otherwise, we altered the categories to have only two sub-categories due to small sample size of outbreaks, in order to create 2 × 2 contingency tables and more accurately use Fisher's exact tests, as follows: (ⅰ) animal source of infection: Not bats = 0, bats = 1, (ⅱ) viral taxa: Zaire ebolavirus = 1, others = 2, and (ⅲ) mode of transmission: Non-eating = 0, eating = 1. Third, we examined if the above-mentioned variables are associated with the cumulative number of cases in each outbreak. Following the F-tests, student's t-tests were implemented to identify the statistical associations. Because the 2013–2016 outbreak in West Africa has an exponentially greater number of cases than any other outbreak, which skews the mean and variance calculations, we utilized the logarithm of the cumulative number of cases.
2.3. Test for randomness
In addition to relationships among characteristic variables of outbreak emergence, here we examined the inter-epidemic time periods which represent the time interval between two outbreaks in sequence. If the emergence of EVD outbreaks occurs truly at a random manner, the inter-epidemic period should be exponentially distributed, with its mean equal to the standard deviation. Statistically speaking, such process is referred to as the Poisson process, typically as seen in the time interval between two natural disaster events (e.g. floods). The Poisson distribution test has also been examined with applications to influenza A pandemics elsewhere [32]. We plotted each epidemic across a timeline with 1-month intervals, starting from the month of the first epidemic (June 1976), and divided the epidemics by country, animal source of infection, and mode of transmission to create a visual representation of the interepidemic time intervals.
A gamma distribution is regarded as a more general representation of an exponential distribution, as it contains the exponential distribution as a special case. Let πi be the time between (i - 1)th and ith outbreaks. This is assumed to follow a distribution Γ(a, b) where a and b are the shape and rate parameters, respectively. If a = 1, the gamma distribution is identical to the exponential as a special case. Let f be the likelihood function of the inter-epidemic periods. Using the observed distribution of inter-epidemic periods, we tested whether the emergence of Ebola outbreaks follows a Poisson distribution. We used the likelihood ratio test where the log-likelihood difference is:
which is to follow a chi-squared distribution, where ˆa, ˆb, and ˆb′ represent the maximum likelihood estimate for gamma and exponential distributions. The p-value was obtained with the degree of freedom at 1.
Similarly, we tested whether there has been a time trend for the occurrence of outbreaks. In this instance, the shape parameter a is fixed at 1. Then, an inverse of the rate parameter 1/b gives the mean of the exponentially distributed inter-epidemic period. We substitute it by a linear function q0+q1(t−1976) where q0 is the intercept and q1 is the rate of change in the frequency of outbreaks. The log-likelihood difference is given as
where ˆq0,ˆq1, and ˆb′ are the maximum likelihood estimates for exponential distributions with and without time trends.
The level of statistical significance was set at α = 0.05. All statistical analyses were conducted using JMP Pro 13 (SAS Institute Inc., Cary, NC, USA) and statistical package R (https://cran.r-project.org/) [33].
3.
Results
Table 1 shows the list of existing outbreak records of EVD. Since the first report of EVD in 1976 [34], in total there have been 23 documented outbreaks, with the median number of cases and deaths at 59 and 35 persons, respectively. One outbreak in the Democratic Republic of Congo (DRC) was ongoing, and thus, excluded from the analysis except for its inter-epidemic period. In addition, there were three iatrogenic events as reported from the United Kingdom and Russia. The most frequent location of animal-to-human contact emergence was the DRC with n = 9 outbreaks, followed by Congo, Gabon, Sudan, and Uganda at n = 3, respectively. The index case was identified by genetic information for only three outbreaks, diagnosed by serological testing in two outbreaks, and the remaining 14 outbreaks relied on contact history. The source animal was confirmed in a laboratory only in three outbreaks (13.0%). As for the source animals, primates were the most common with n = 10, but species were highly variable, e.g., monkey, baboon, chimpanzee, and gorilla. Bats followed primates with n = 6, divided between fruit and insectivorous bats (n = 3 for each). The most frequent opportunity of transmission was when hunting and eating of those animals took place (n = 12).
We did not identify any significant difference by comparing the mean CFR by suspected mode of transmission and also by suspected animal source, other than viral taxa (Figure 1). Overall, the CFR by outbreak ranged from 24.8% as the minimum to 89.5% as the maximum. The median (and 25th to 75th percentiles) of CFR was 66.2% (50.7% to 78.8%). Summing cumulative numbers of cases and deaths across outbreaks, the crude CFR was estimated at 41.7%. The CFR by outbreaks attributed to eating/hunting behavior, environmental, and unknown modes of transmission have a p-value of 0.287 (ANOVA, Figure 1A). Similarly, for the CFR of outbreaks traced back to bats, primates, others, and unknown subcategories, there is a p-value of 0.343 (ANOVA, Figure 1B). When we examined the relationship between the CFR and viral taxa, the p-value was 0.02 (ANOVA, Figure 1C). Zaire ebolavirus appeared to be more fatal than Sudan and others, with the estimated CFR at 69.1% (95% CI: 61.2%–77.2%), and Sudan ebolavirus was the second most virulent (56.6% (95% CI: 47.0–65.0%)). However, this is still not confirmed, since running a t-test on the virulence of Zaire ebolavirus and Sudan ebolavirus only resulted in a p-value of 0.093 (t-test), which means we cannot reject the hypothesis that the means of their CFR's are equal.
Employing Fisher's exact test, we found that the source animal and mode of transmission were significantly associated (p < 0.01, Fisher's exact test), reflecting the fact that primates represent the transmission via hunting and eating. The mode of transmission was not significantly associated with viral taxa (p = 0.07, Fisher's exact test), and similarly, the source animal and viral taxa were not significantly associated (p = 0.24, Fisher's exact test).
Figure 2 shows the comparison of the cumulative number of cases by source animals, viral taxa, and the mode of transmission. F-tests were performed for the logarithm of the cumulative number of cases by characteristic variables, and we did not identify significant differences in variances between comparison groups (p = 0.05, 0.42, and 0.26 for mode of transmission, viral taxa and source animal, respectively). ANOVA tests for those same pairs of categories resulted in p-values of 0.56, 0.72, and 0.25, respectively, indicating that there were no significant differences in the cumulative number of cases by these variables.
We also ran similar hypothesis testing for the binary-subdivided categories. Student's t-tests for the association between the cumulative number of cases and the variables (ⅰ) source animal, (ⅱ) mode of transmission, and (ⅲ) viral taxa did not identify any significant association with p = 0.30, 0.34, and 0.73, respectively. Using the 2 × 2 contingency tables of source animal and mode of transmission, source animal and viral taxa, and mode of transmission and viral taxa, we obtained p-values of 0.33, 0.12, and 0.16, respectively, using Fisher's exact tests.
Figure 3 shows the inter-epidemic period by country of origin, source animal and mode of transmission. The inter-epidemic period was highly skewed, ranging from 0 to 196 months, with a median (and interquartile) of 12.0 (6.5 to 30.5 months). Overall, an exponential distribution yielded a greater negative loglikelihood (NLL) value (85.3) compared to a gamma distribution (81.9), and the difference was significant (p = 0.01, likelihood ratio test). Namely, the emergence was on a whole indicated to take place in a non-random manner. A time trend was not identified (NLL = 84.9).
Analyzing the inter-epidemic period by country (Figure 3, top), we did not identify any significant deviation of the periods from a Poisson distribution. Nevertheless, when we examined the presence of the time trend, the inter-epidemic period in DRC was shown to have been shortened (p = 0.03, likelihood ratio test) with the rate 0.56 (95% CI: 0.07, 1.79) per month. The inter-epidemic period was also examined by suspected source animal and suspected mode of transmission (Figure 3, middle and bottom). Those caused by bats were shown to prefer a gamma-distributed inter-epidemic period (NLL = 18.1) over a Poisson distribution (NLL = 23.1), with p-value smaller than 0.01 (likelihood ratio test). The same result was the case for the environmental transmission route (NLL = 23.4 and 18.4 for exponential and gamma distributions, respectively, with p < 0.01, likelihood ratio test) which mostly reflects the transmission from bats to humans. No indication of a time trend was found for the inter-epidemic period by source animals and modes of transmission.
4.
Discussion
The present study examined descriptive epidemiological features of ebolavirus outbreaks that have been documented to date, in the hope that reviewing such datasets would offer critical insights into prediction of future outbreaks. Outbreak events were most frequently seen in the DRC. The most frequent animal source was primates (n = 10) associated with eating/hunting behaviors, and there were 4 outbreaks attributed to bats. As already known, Zaire ebolavirus was shown to be the most lethal [35,36]. Additionally, by exploring inter-epidemic periods of all documented outbreaks, we have shown that a gamma distribution was preferred over an exponential distribution, indicating some dependence of the emergence event with respect to time. In particular, we have shown that (ⅰ) a non-Poisson process is the case for bat-originating outbreaks and (ⅱ) the inter-epidemic period was shortened with time in the DRC. To our knowledge, the present study is the first to clarify these features.
While the emergence of ebolavirus outbreaks has been considered as memoryless, or as random using a Bayesian model selection algorithm [32,37], we have shown that an application of a classical likelihood-ratio test to the existing data would result in the suggestion of non-random emergence. In particular, it is critical that the inter-epidemic period of outbreaks caused by bats was shown not to be a memoryless process. The fruit bat, or forest-dwelling bat, is believed to be the natural reservoir host of ebolavirus in wild life [38,39,40,41,42,43,44]. Considering that non-randomness could arise from (ⅰ) the spread of virus in bat populations and also from (ⅱ) the time-dependent variations in the chance of humans to be exposed to bats, our finding implies a need to directly explore the environmental route from bats to humans in more detail. Understanding that the transmission from primates was usually caused by eating habits, characteristics of environmental transmission would reflect exposures to bats and our finding is indicative of some dependence in that human-animal interface.
It is also remarkable that the frequency of EVD outbreaks was shown to have increased in the DRC, while such indication was not identified in other countries. The DRC is the country where the most recent outbreaks, one in Equateur Province and the other in North Kivu, have been identified. Given the North Kivu outbreak that occurred in the easternmost part of the DRC, it is concerning whether the geographic foci of the reservoir exposure has expanded compared to the past. Although the animal source of the latest outbreak has not been traced back, our finding offers an important clue for future monitoring. For instance, identifying the cause of an ongoing increase of emergence in the DRC as traced back to certain species of animal reservoirs, not only humans but also virus surveillance of corresponding animal species may realistically be considered. As the present study has shown, a simple statistical model can be employed to test whether there is a non-negligible time trend in the emergence of EVD outbreaks in humans, and that can potentially be extended to spatiotemporal analyses in the DRC.
Five limitations of the present study must be noted. First, we cannot exclude a possibility of ascertainment bias, especially in the early period of observation. Both epidemiological surveillance and laboratory testing technologies have considerably progressed in the last 40 years, and the time trend should be carefully interpreted. Second, the sample size is limited to 23 for 40 years, and thus, the potential sampling bias may have been imposed. Despite this fact, we believe that our findings add insights to existing publications [32,35,37], and in fact, we identified statistically significant results. Third, from a virologic point of view, one may feel that epidemiological characteristics of a more detailed virus cluster (e.g. a cluster in a phylogenetic tree) or genotype should ideally be explored. Fourth, while the present study focused on the documented outbreaks, we did not analyze additional datasets in both humans and animals. Specifically, the information of the spread of virus in wildlife is of utmost importance to clarify the ecological dynamics of ebolavirus [45]. In addition, we acknowledge that data on Ebola is often under-reported, and further work that explicitly takes this fact into account may be desirable [46]. Lastly, while we analyzed the CFR for different suspected modes of transmission and different animal sources of infection, the CFR is most likely skewed depending on the size of the epidemic. Therefore, the most important parameter is virus type.
Furthermore, for future work on Ebola, one could consider the differences in cultural behavior. For example, a small population with exceptionally higher contact could be considered "superspreaders", which alters the dynamics of the spread of EVD [47,48]. Such analysis is deemed essential, considering the persistent development of Ebola once it develops to a major epidemic [49]. Nonetheless, despite these remarks, we believe that the present study offers an important insight into non-random emergence of EVD in humans, especially through bats. An increasing time trend of outbreak frequency was identified in the DRC. In addition to descriptively characterizing the emergence, mode of transmission and animal source, we believe that the present study offers critical insight into the future prediction of EVD outbreaks.
5.
Conclusion
The present study examined descriptive epidemiological features of Ebola virus disease outbreaks. We have shown that (ⅰ) the leading cause of emergence was eating and hunting habits, (ⅱ) primates act as the main source of zoonotic spillover, and (ⅲ) Zaire ebolavirus is the most virulent type. Moreover, the emergence was demonstrated not to be a Poisson process, indicating that some unknown, underlying, and non-random mechanisms are likely to govern the spillover event. In addition, statistical fit to the emergence data with an increasing trend of outbreaks was favored in the Democratic Republic of Congo compared with a purely random emergence model. Outbreak event data and their causative viruses should be explored biologically and epidemiologically to possibly predict future outbreak events.
Acknowledgments
We would like to thank Professor Bryan Grenfell for offering the opportunity for LP to study in Japan. Additionally, we would like to thank each of our funders: LP, RK and HN were supported by the University of Tokyo-Princeton Strategic Partnership (Co-PIs: Bryan Grenfell and Hiroshi Nishiura). HN received funding from the Japan Agency for Medical Research and Development (AMED; JP18fk0108050), Japanese Society for the Promotion of Science (JSPS) KAKENHI (grant numbers 16KT0130, 16K15356, 17H05808 and 17H04701) and Japan Science and Technology Agency (JST) CREST program (JPMJCR1413). RK acknowledges the Japan Society for the Promotion of Science (JSPS) Fellowship with the funding KAKENHI (18J21587). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Conflict of interest
All authors declare no conflicts of interest in this paper.