1.
Introduction
The Omicron variant spreads fastest as ever among the severe acute respiratory syndrome coronaviruses 2 (SARS-CoV-2) we had so far. The variant was designated as a variant of concern by the World Health Organization (WHO) on November 26, 2021 [1]. As of February 7, 2022, infections by Omicron were reported to the WHO from official sources of 159 countries [2]. The rapid replacement of Delta by Omicron followed by a steep rise in SARS-CoV-2 infections has been observed after the introduction of Omicron in many countries, indicating that Omicron has considerably higher transmission advantages than Delta in the vaccinated population [3]. Furthermore, the BA.2 sublineage of Omicron is replacing the BA.1 sublineage in Denmark [4]. Although attenuated disease severity of Omicron was reported [5,6], there is an urgent need to evaluate the transmissibility of these Omicron sublineages.
Transmissibility of a new variant should be characterized by two factors associated with the transmission of the virus. The first factor is the effective reproduction number (Rt), which measures how many secondary cases are generated by a single primary case at time t. The second factor is the generation time distribution f(τ), which describes how secondary infections are distributed as a function of time τ since the infection of primary case [7].
Selby [8] has estimated the generation time of Omicron using case counts in England in 2021. In a logistic regression assuming that the generation time of Omicron is the same of that of Delta, the log odds of Omicron and Delta frequencies should be on a straight-line relationship. However, the case counts in England showed a bend in the log odds of observations that deviated from its theoretical straight line. Using this bend, Selby estimated the ratio of the mean generation time of Omicron to the mean generation time of Delta to be 0.46 (95% confidence intervals (CI): 0.38–0.61). Comparing transmission pairs of infections with S gene target failure (SGTF) viruses with those of non-SGTF viruses, Backer et al. [9] showed that Omicron BA.1 has shorter serial intervals that non-SGFT viruses such as Delta.
Leung et al. [10,11] developed a mathematical model describing trajectories of variant frequencies, assuming that ratios among variants' effective reproduction numbers were constant. This method, however, needs daily numbers of infections of each variant. In our previous paper, we have proposed an approximated form of the model which is free from using daily numbers of infections [12]. Using the proposed model in [12], we have analyzed nucleotide sequences of SARS-CoV-2 collected in Denmark [13]. This model, however, assumed that generation times of variants followed the same probability distribution and that effective reproduction numbers of Omicron BA.1 and BA.2 were the same. In this paper, we propose a new model in which the generation times of variants may be different from each other. Analyzing latest variant count data observed in Denmark using the new model, we compare generation times and effective reproduction numbers of Omicron BA.1 and BA.2 with those of Delta.
2.
Materials and methods
2.1. Model of variant frequencies
Consider a situation where viruses of variant a are circulating in the population at the beginning of the target period of analysis and variants A1 and A2 are newly introduced to the same population at calendar times tA1 and tA2, respectively. Let qa(t), qA1(t), and qA2(t) be frequencies of variants a, A1, and A2 in the viral population at calendar time t, respectively. Since there is no infection of A1 before its introduction, qA1(t) = 0 for a calendar time t < tA1, and the same is true for A2 for t < tA2.
The dynamics on the frequencies of infections by a, A1, and A2 is determined by the length of generation times and the transmissibility, i.e., effective reproduction numbers of the three variants. We assume that the generation times (GT) of infections by A1 and A2 are c1 and c2 times longer than that of a, respectively. We call the value of ci the relative generation time of Ai with respect to (w.r.t.) a. Let fa(τ), fA1(τ), and fA2(τ) be probability density functions of generation time distributions of infections by a, A1, and A2, respectively. The assumption on generation times can be described by the following equation:
for any generation time τ ≥ 0. To deal with the calendar time system, we truncate the generation time distributions at τ = 1 and τ = l and discretize it as follows:
for each variant χ in a, A1, and A2.
Let I(t) be the total number of new infections by either a, A1, or A2 at calendar time t, and let Ra(t), RA1(t), and RA2(t) be effective reproduction numbers of variants a, A1, and A2 at calendar time t, respectively. From the definition of the instantaneous reproduction number [7], Ra(t), RA1(t), and RA2(t) can be written as follows:
We assume that a patient infected by A1 and A2 generates respectively k1 and k2 times as many secondary transmissions as those of a patient infected by a regardless of time t. Using effective reproduction numbers of a, A1, and A2, this assumption can be described by the following equation:
for each calendar time t ≥ tAi. The value of the constant ki is called relative reproduction number of Ai w.r.t. a.
Under the two epidemiological assumptions described by Equations (1) and (5), the frequency of variant Ai in the viral population at time t is now modeled as
Equation (6) needs to use I(t), which are difficult to be quantified accurately in real time. To allow statistical estimation of ci and ki without knowing I(t), we consider an assumption that the number of new infections does not greatly vary within a single generation of transmission from time t–1 to t–1, i.e.
for each calendar time t. Then we obtain our alternative formula:
Note that Equation (7) is used just for approximating Equation (6) with Equation (8) for each calendar time t. We do not assume that I(t) is constant during the entire period of the analysis, although a chain of the assumptions leads such a consequence. Equation (6) and Equation (8) will be compared using sequence data from Denmark.
2.2. Model of observation counts
Let Na(t), NA1(t), and NA2(t) be the number of variants a, A1, and A2 observed at calendar time t. We assume that variants a, A1, and A2 are sampled following a multinomial distribution with probabilities qa(t), qA1(t), and qA2(t) in Na(t) + NA1(t) + NA2(t) trials. Then parameters ci and ki and the initial variant frequencies of qA1(tA1) and qA2(tA2) can be estimated by maximizing the likelihood function of the multinomial distributions, which is described as follows:
Note that Equation (8) does not use I(t) even if they are given.
2.3. Observation of variants in Denmark
As of February 28, 2022, the earliest Danish Omicron sequences registered in the GISAID database [14] were sampled on November 22, 2021. We downloaded a total of 150, 164 nucleotide sequences of SARS-CoV-2 viruses sampled in Denmark during November 22, 2021 to February 22, 2022 from the GISAID database on February 28, 2022. Of these, 47, 698 were labeled as PANGO lineages corresponding to Delta [15] and 102, 222 were labeled as lineages corresponding to Omicron. Of sequences corresponding to Omicron, 36, 118 were labeled as BA.1, 66, 076 were BA.2, seven were BA.3, and 21 were B.1.1.529. The other 244 sequences were labeled as lineages corresponding to neither Delta nor Omicron. The 272 sequences that were not labeled as Delta, BA.1 or BA.2 were ignored in the subsequent analyses. We consider Delta as the baseline variant a and Omicron BA.1 and BA.2 as A1 and A2 in our model. We set tA1 and tA2 to November 22, 2021 and December 5, 2021, which are the sample collection dates of the earliest Omicron BA.1 and BA.2, respectively. For each date t from November 22, 2021 to February 22, 2022, the number of sequences labeled as Delta collected at that day was used as Na(t), that of Omicron BA.1 as NA1(t) and that of Omicron BA.2 as NA2(t). Dates of sample collection and PANGO lineages assigned to sequences are provided in Supplementary Table 1.
2.4. Numbers of confirmed cases in Denmark
Cumulative numbers of confirmed cases in Denmark were retrieved from Johns Hopkins Coronavirus Resource Center [16] on May 10, 2022. Daily changes in the cumulative numbers were recorded as raw daily confirmed cases Iraw(t). For each date t during the period of the analysis, the 7-day rolling average of Iraw(t) centered at t was recorded as average daily confirmed cases Iave(t). Some of the models used either Iraw(t) or Iave(t) as the daily confirmed cases I(t) in Equation (6).
2.5. Generation times of the Delta variant
Hart et al. reported that the distribution of the generation time of the Delta variant had a mean of 4.7 days with a standard deviation of 3.3 days by analyzing household transmission data from UK Health Security Agency [17]. Based on the mean and the standard deviation, the generation times of infections by Delta were assumed to follow the gamma distribution with a shape parameter of 2.03 and a scale parameter of 2.32. We used the probability density function of this gamma distribution as fa(τ). The l was chosen to be 16, where is the 99th percentile of the generation time distribution of Delta.
2.6. Maximum likelihood estimation of parameters
We estimated the parameters of our models from the dataset using three assumptions on the generation time: The Three-GT, Two-GT, and One-GT assumptions. The Three-GT models assume that the GT distribution of Omicron BA.1 and BA.2 may be different from Delta in a manner described by Equation (1) and that the reproduction number of Omicron BA.1 and BA.2 may be different in a manner described by Equation (5). More precisely, c1 may be different from c2 and k1 may be different from k2. The Two-GT models assume that BA.1 and BA.2 share an identical generation time distribution. In other words, c1 must be equal to c2 and k1 may be different from k2. The One-GT models assume that the GT distributions of Omicron BA.1 and BA.2 infections are the same as that of Delta, i.e. c1 = c2 = 1, and that the reproduction number of Omicron may be different in a manner described by Equation (5). Note that c1 = c2 in the Three-GT assumption implies the Two-GT assumption, and that c1 = c2 = 1 in the Two-GT assumption and Three-GT assumption implies the One-GT assumption. The maximum likelihood estimation and the 95% CI calculation were done using daily count data from November 22, 2021 to February 22, 2022. The Three-GT-raw-I model, Two-GT-raw-I model, and One-GT-raw-I model use Iraw(t) as I(t) in Equation (6) for the calculation of qχ(t), under Three-GT, Two-GT, and One-GT assumptions, respectively. The Three-GT-ave-I model, Two-GT-ave-I model, and One-GT-ave-I model use Iave(t) as I(t) in Equation (6), under Three-GT, Two-GT, and One-GT assumptions, respectively. The Three-GT-no-I model, Two-GT-no-I model, and One-GT-no-I model use Equation (8), which calculates qχ(t) without I(t), under Three-GT, Two-GT, and One-GT assumptions, respectively. The 95% CIs of parameters are derived from the profile likelihood method [18]. We used the augmented Lagrangian algorithm implemented in the NLopt module of the Julia language [19] for the maximum likelihood estimation and the calculation of 95% CIs of parameters. Models were compared using the Akaike information criterion (AIC) [20].
3.
Results
The relative generation time of Omicron with respect to (w.r.t.) Delta, ci, and the relative reproduction number ki, were estimated from the daily count of Delta and Omicron BA.1 and BA.2 in Denmark using raw daily case information (Table 1), using 7-day rolling average case information (Table 2), and using no daily case information (Table 3). In all three Tables, the Three-GT assumption estimated smaller k1 and k2 compared to those of the Two-GT and One-GT assumptions.
Table 4 shows AIC of the nine models tested in this study. For each observation setting in raw-I, ave-I, and no-I, the Three-GT assumption has better AIC than the Two-GT assumption, and the Two-GT assumption has better AIC than the One-GT assumption. The values of AIC suggested that The Three-GT-no-I model, which is based on the Three-GT assumption and does not use I(t), is the best model to represent the sequence data in Denmark, followed by the Three-GT-ave-I model. The AIC of Three-GT-raw-I model was larger than those of Two-GT-no-I and Two-GT-ave-I models, indicating that the noise in raw case numbers made the likelihood of the model lower. From values of c1, c2, k1, and k2 in the Three-GT-no-I and Three-GT-ave-I models, it is suggested that the generation times of Omicron BA.2 is 0.76–0.80 times that of BA.1 and that the effective reproduction number of Omicron BA.2 is 1.25–1.27 times larger than that of Omicron BA.1 under the same epidemiological condition.
Figure 1 shows observed frequencies and estimated frequencies of Delta, Omicron BA.1, and Omicron BA.2 estimated using the Three-GT-no-I model. For most dates in the analyzed period, the trajectories of variant frequencies estimated by the model were within 95% confidence intervals of population variant frequencies estimated by binomial tests using observed variant counts. This indicates that Equation (8) represents the time evolution of variant frequencies in the population quite accurately.
Figure 2 shows the population average of the relative generation time and relative reproduction number of SARS-CoV-2 infections w.r.t Delta estimated from variant observations in Denmark using the Three-GT-no-I model. The replacement of Delta by Omicron BA.1 speeded up around December 7, 2021. From then, the population average of the generation times started decreasing and the population average of the relative reproduction numbers started increasing due to the Delta–Omicron BA.1 replacement. The decrease in the GT was saturated because Delta was being replaced by Omicron BA.1 and BA.2. However, population averages of relative reproduction numbers continued to increase due to the Omicron BA.1–Omicron BA.2 replacement.
4.
Discussion
We have developed a mathematical model describing the time evolution of variant frequencies using both the ratio of generation times and the ratio of effective reproduction number among variants. Analyzing variant counts of SARS-CoV-2 sampled in Denmark from November 22, 2021 to February 22, 2022 using the model, we have shown that the selective advantage of Omicron BA.1 and BA.2 over Delta was decomposed into advantage in the speed of transmission and advantage in the number of transmissions. We found that the mean generation time of Omicron BA.1 is 0.44–0.46 times that of Delta and the effective reproduction number of Omicron BA.1 is 1.88–2.19 times larger than that of Delta under the epidemiological conditions at the time. We also found that the generation times of Omicron BA.2 is 0.76–0.80 times that of BA.1 and that the effective reproduction number of Omicron BA.2 is 1.25–1.27 times larger than that of Omicron BA.1.
The estimation of these parameters has important implications to the control of current infections by Omicron BA.2. First, Omicron BA.2 generates 1.25–1.27 times more secondary transmissions than Omicron BA.1 in a partially immuned population in Denmark. Suppose BA.1 and BA.2 have effective reproduction numbers of RBA.1(t) and RBA.2(t) at time t, respectively. Control measures against Omicron BA.2 need to reduce 20%–21% of secondary infections to reduce RBA.2(t) to RBA.1(t). Second, generation times of Omicron BA.2 is 0.76–0.80 times that of Omicron BA.1. The contact tracing for Omicron BA.2 infections must be done more quickly than that for BA.1 to stop further infections by quarantine. Although the mean generation time of Omicron BA.2 is shorter than that of Omicron BA.1, it is still necessary to investigate the infectious period of both variants to shorten the quarantine period for Omicron BA.2 infections.
In our previous paper, we have estimated the relative reproduction number of Omicron w.r.t. Delta to be 3.19 (95%CI: 2.82–3.61) using variant frequencies observed in Denmark from November 1, 2021 to December 9, 2021 [13]. The discrepancy in the relative reproduction numbers from this study is attributed to the following reason. Ito et al. 2021 assumed the same generation time for Omicron and Delta. In fact, the estimate using the One-GT-no-I model estimated relative reproduction numbers of 2.84 (2.81–2.86) for BA.1 and 4.70 (4.64–4.74) for the BA.2 w.r.t Delta (Table 3). Both BA.1 and BA.2 were circulating in Denmark in December 2021 [4], and the relative reproduction number estimated by our previous study is consistent with results of this study.
Table 4 suggested that the Three-GT-no-I model, which does not use I(t), best represents observed data among the nine models tested in this study. It is known that the estimation of instantaneous reproduction number is sensitive to the noise in I(t) [21]. In fact, the likelihood of Three-GT-raw-I model was lower than those of Two-GT-no-I and Two-GT-ave-I models. Use of the approximated version in Equation (8) instead of Equation (6) may have contributed to the elimination of the observation noise of I(t). Use of the 7-day rolling average in I(t) has also contributed to the noise reduction. Although the AIC supported the Three-GT-no-I model, the estimated values in Table 3 may be biased to some extent due to the approximation in Equation (7). On the other hand, the estimated values in Table 2 may also be biased due to the averaging operation for I(t). It is difficult to judge whether estimates using the Three-GT-no-I model is more accurate than those using the Three-GT-ave-I model or not. One advantage to use the Three-GT-no-I model over the Three-GT-ave-I model is that one can predict the time course of variant replacement in future [22]. However, further studies are required to evaluate the bias introduced by the approximation in Equation (7).
We assume that the observational noise follows the multinomial distribution in Equation (9). This implies that the multinomial sampling from the population is assumed to be the only source of the noise. However, the noise may come from other possible sources, such as the regional difference in variant distributions. The narrow confidence intervals of estimated parameters may be attributed to the simple error model, and the true uncertainty may be larger than our confidence intervals. Such extra noise can be evaluated by using Dirichlet-multinomial distribution [22].
In our analyses, we set the introduction date of a variant to the date when the variant was first observed in the sequence dataset. We could estimate 95% confidence intervals of the introduction dates of variants; however, we have not done this because the estimation of introduction dates is not the purpose of this study. Undetected cases before the first detection do not affect largely because the effect of undetected cases can be compensated by increasing variant frequencies at the dates of their detections. To confirm this, we estimated parameters of the Three-GT-no-I model under assumptions that introduction dates of variants are 10 days and 20 days earlier than the dates of their detections. The estimated values are almost the same as Table 3 except the initial frequencies (Supplementary Table 2 and Supplementary Table 3).
The estimations using the Three-GT-no-I model completely depend on the variant counts based on the nucleotide sequences submitted to the GISAID database from Denmark. Most sequences from Denmark were submitted by member laboratories of Danish Covid-19 Genome Consortium, which analyzed positive specimen from hospitals [23]. As of February 24, the sequencing rate from the 47th week of 2021 to the 7th week of 2022 is 7.2% (157, 049 / 2, 156, 159). Despite of such a high sequencing rate, there may be some extent of sequencing bias toward new variants. Surveillance data that ensure random sampling of variants are necessary to calculate relative generation times and reproduction numbers accurately.
Acknowledgement
We gratefully acknowledge the laboratories responsible for obtaining the specimens and the laboratories where genetic sequence data were generated and shared via the GISAID Initiative, on which this research is based. The information on originating laboratories, submitting laboratories, and authors of SARS-CoV-2 sequence data can be found in Supplementary Table 1. We thank Brad Suchoski, Heidi Gurung, and Prasith Baccam from IEM, Inc. in the United States of America for their help converting our R program into the Julia language, which produced an approximate 50x speedup of our computations. This work was supported by the Japan Agency for Medical Research and Development (grant numbers JP20fk0108535, JP21wm0125008). K.I. received funding JSPS KAKENHI (21H03490). C.P. was supported by the World-leading Innovative and Smart Education Program (1801) from the Ministry of Education, Culture, Sports, Science, and Technology, Japan. H.N. received funding from Health and Labor Sciences Research Grants (20CA2024, 20HA2007, and 21HB1002); the Japan Agency for Medical Research and Development (JP20fk0108140); JSPS KAKENHI (21H03198) and the Japan Science and Technology Agency (JST) SICORP program (JPMJSC20U3 and JPMJSC2105). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Conflict of interest
The authors declare that there is no conflict of interest.