1.
Introduction
In 14th December 2020, the UK has found a new variant of SARS-CoV-2 or SARS-CoV-2 VOC 202012/01 (Variant of Concern, year 2020, month 12, variant 01) [1,2,3] or B.1.1.7. There are some researches on the emergence of this SARS-CoV-2 variant [4,5]. The variant [6] has caused a huge concern across the globe [7] and has spread across other continents. Based on known epidemiologic, scientific modellings, and clinical findings, it suggests this variant has indeed increased the overall transmissibility.
On the other hand, it also indicates that there is no change in severity [8] - measured by length of hospitalization and 28-day case fatality - or occurrence of reinfection in the UK [9], despite some scientists' dire warnings [10]. Even though the evaluation of variant is still going on [11], we need to find a systematic method to delve into the fundamental features of this mutant variant. This might concern whether 202012/01 diminishes the potency of the developed vaccines [12].
To trace the fundamental features of this new variant, we sample 9 regional daily COVID-19 data in the UK from 27th March, 2020 to 8th, January, 2021 [13] as the preliminary study and sample data from 28th July 2020 to 11th May 2021 as the follow-up study. Then we create a set of batches of data vectors whose dimensions are decided by the days chosen. In our research, we choose 7 days as the datum dimension. Furthermore, to find the fundamental features of B1.1.7, we introduce the 3-valued featured vectors whose elements consist of only $ -, 0 and 1. These features are easy to apply and would serve as our fundamental database for further feature recognition. We then utilise the concept of vector rejection to measure the similarities between the set of data vectors and the features based on the total norms for the vector rejections. Afterwards, we obtain distance tensor products which record such similarities. By choosing the minimal total norms regarding the features, we retrieve the minimal total norms and their corresponding features. Since these extracted features vary, we trace their paths via cosine values for the dynamical features. This path would reveal the fundamental trend and dynamical structures of the features for B1.1.7 in the UK. The preliminary focuses on comparing the dynamical structures of COVID-19 with respect to data with London (9 regions), where it is supposed to be the onset of B1.1.7., and without London (8 regions).
Our results show there is no obvious shift for the main features before and after the variant - this confirms there is no fundamental change in the behaviour of the variant. These findings indicate B1.1.7 contain no leading features, and, in principle, shall bolster the efficacy of the vaccines.
2.
Methodology
We devise rejection distance which derives from the concept of vector rejections and three-valued featured database. Both shall serve as our fundamental methods in tracking the features of B1.1.7.
2.1. Distance tensor product
Claim 1. (rejection distance) The norm of →c, a vector rejection of →a on →b, is ||→c||=√(|→a||⋅||→b||)2−(→a⋅→b)2||→b|| or specifically the rejection distance between →a and →b
Proof. By the property that →c=→a−||→a||⋅cosθ⋅→b||→b||=→a−→a⋅→b||→b||2⋅→b and the operation of the inner product ⋅, the result follows immediately.
Rejection distance acts as a role for measuring the distances between features, including empirical features or theoretical features (3-valued features, for example). This measurement will serve our optimal feature findings.
Definition 2.1. (Distance Tensor Product, DTP) For any ordered set of vectors, whose dimension are the same, V=(→v1,→v2,⋯,→vm) and W=(→w1,→w2,⋯,→wn), we define their distance tensor product by a m-by-n matrix DTP(V,W) whose (i,j) element is RejDist(→vi,→wj).
DTP shows the norms of the vector rejections, which reveal the distances between a set of given datum vectors and the featured vectors or the distance between feature vectors.
2.2. 3-valued features
Unlike Principal Component Analysis, or PCA [14], which heavily relies on the statistical property, the 3-valued features are applied directly to trace the trend of mutant variants of COVID-19. 3-valued features consist of a set of value {−1,0,1}, which is also used in wavelet analysis [15]. In total, there are 3n−1 features (or vectors) - the 0 vector is excluded from the whole potential features, where n denotes the dimension of the datum vector.
3.
Implementation procedures
3.1. Preliminary settings
In this section, we specify the implementation processes mentioned in section 3. From the database, we select three metrics: newCasesBySpecimenDate (M1, or 1), newDeaths28DaysByDeathDate (M2, or 2), and newDeaths60DaysByDeathDate (M3, or 3) - there indexes 1, 2, 3 will be used later on; and Area Type "Region" (there are 9 regions: Yorkshire and The Humber (R1), East of England (R2), North East(R3), North West(R4), South East(R5), East Midlands(R6), South West(R7), London(R8), West Midlands(R9)). The data are sorted by date, from 27th, March, 2020 to 8th, January, 2021. The data are stored in three matrices, according to M1, M2, and M3. The we calculate their first difference (change) of the new cases, 28-day deaths, and 60-day deaths, as shown in the following (the length for each vector is 287, since we sample 288 points of time). The value in interval i is the original value in i+1 minus the one in i. When compared with the original data, these values are the second-order difference.
3.2. Data description
The data come from two main parts. One is for the preliminary study for the onset of the variant spread in the UK. The periods cover from 27th, March, 2020 to 8th, January, 2021. It lists the daily new cases, 28-day deaths and 60-day deaths of COVID-19 for 9 regions in the UK. Specifically, the metrics (or indicators) from the database are M1, M2, and M3. M2 is a variable for the new increase of people who had a positive test result for COVID-19 and died within 28 days of the first positive test, up to the date of death. M3 is a variable for the new increase of people who had a positive test result for COVID-19 and died within 60 days of the first positive test, up to the date of death.
This imported data are further separated into two sets of data - the preliminary study - one covers only 8 regions by excluding London and the other covers 9 regions which contain London. The second part is a follow-up study covering a period from 28th, July, 2020 to 11th, May, 2021 [16]. The data are not separated and are treated as a whole, i.e., the 9 regions.
3.3. Procedures
Then we set every 7 days as the dimension of one datum vector and 9 regions as the number of data vectors. In other words, there are 287/7=41 ordered batches of data sets - each set contains 9 points (or datum sub-vectors, or simply datum vectors), and each point is a 7-dimensional datum vector. We form these data with respect to cases, 28-day deaths and 60-day deaths for the 287 intervals. There are several steps in the implementation. These procedures are designed for preliminary study, but the follow-up study would also follow suit.
1. Download the data related to daily COVID-19 cases and deaths.
2. Sort the data by the 288 dates, 9 regions and 3 metrics, according to our analytical purposes. The results are presented in Table 1. Let us still call them raw data, though they are actually processed to fit our purpose.
3. Compute the 287 first-order difference of the 288 time-series data, according to the 9 regions and 3 metrics. This will reveal the accelerated spreading trend of B.1.1.7. Let us call them difference data. Then the data are further extracted by regions and we name them difference datum vectors. The results are presented in Appendix B: difference datum vectors.
4. Split each 287 difference datum vector into 41 sub-vectors - each of which contains 7 elements (or daily data). Let us use {→Rkij} to denote the j'th 7-day sub-vector for region i, where k∈{1,2,3} (k=1 to denote "cases", k=2 to denote "28-day deaths", and k=3 to denote "60-day deaths"), 1≤i≤9 and 1≤j≤41. For example →R111=(54,21,57,34,−22,−55,51).
5. Specify the 3-valued features (7-element vectors), which is described in section 2.2, as our potential feature database. For 7-day period, there are 37−1=2186 three-valued features. The one being excluded is the zero vector. The resulting feature database goes as follows (→fi stands for i'th feature):
→f1=(−1,1,1,−1,−1,−1,−1);→f2=(0,1,−1,−1,−1,−1,−1);
→f3=(1,−1,−1,−1,−1,−1,−1);→f4=(−1,0,−1,−1,−1,−1,−1); ⋯; →f2183=(1,0,1,1,1,1,1); →f2184=(−1,1,1,1,1,1,1); →f2185=(0,1,1,1,1,1,1); →f2186=(1,1,1,1,1,1,1). Let us use FDB to denote the 3-valued feature database, i.e., FDB={→f1,→f2,⋯,→f2186}.
6. Calculate the distance tensor product (DTP) which is defined in Definition 2.1, where each element is defined by
where 1≤i≤9;1≤j≤41;1≤h≤2186, and k∈{1,2,3}. The generated DTP could be regarded as three (regarding k) three-dimensional matrices (regarding i,j,h). The results are presented in Table 2.
7. Compute DTPkj,h=9∑i=1DTPki,j,h, for all 1≤j≤41 and 1≤h≤2186. The results are shown in Table 3.
8. Compute maxkj=max{DTPkj,h:1≤h≤2186},minkj=min{DTPkj,h:1≤h≤2186},argmaxkj:=argmax{DTPkj,h:1≤h≤2186},argminkj:=argmin{DTPkj,h:1≤h≤2186} for all 1≤j≤41. The results are shown in Table 4. Meanwhile, plot {minkj:1≤j≤41} and {maxkj:1≤j≤41} for k∈{1,2,3} as presented in Figure 1.
9. Compute {cos(argminkv,argminkv+1):1≤v≤40} and {cos(argmaxkv,argmaxkv+1):1≤v≤40} for k∈{1,2,3}, i.e., the similarity indexes (cosine values) between the extracted minimal (or maximal) features to yield a dynamical trend of the representative features. The results are presented in Figures 2–4.
4.
Results
In this article, we explore the trend of underlying features of the COVID-19 new cases, 28-day deaths and 60-day deaths to reveal whether the new mutant variants of COVID-19 in the UK have caused fundamental features, which are based on 3-valued featured vectors. The results are presented in two parts: preliminary study and follow-up study.
4.1. Preliminary results
It covers the period from 27th, March, 2020 to 8th, January, 2021. The onset of the mutant variant spreads mainly in London. By comparing the dynamical trend of time-series of COVID-19 with London and without London, we shall see the role of B1.1.7 in the preliminary stage.
1. From Figure 1, we observe that the maximal and minimal rejection distance go almost together – an indication that the features chosen are qualified to capture their B.1.1.7's dynamical behaviours. Furthermore, no matter in the new cases, 28-day death, or 60-day deaths, when the time mutant variant takes place (around the tail of the three graphs), the diluting features are reverted in comparison to previous intervals. This indicates the new mutant variant changes the usual course of the collective viruses. In some sense, at this stage, it has not yet created a lead role in the pandemic, but it did affect the course of the development.
2. From the left-hand sides of Figures 2–4, the raw data for the minimal similarities almost reach 1, while the maximal similarities are pretty random. This indicates the optimal features are representative of the behaviour of B.1.1.7.
3. From the right-hand side of graphs in Figures 2 and 3, there is no obvious shift in the features when comparing the eight regions (without London) and the 9 regions (with London). This preliminary indication shows there is no clear feature from B.1.1.7 in terms of leading the pandemic. Since in the very beginning, the mutant variant is located mainly in London [18]. If there is no clear feature change between the two situations, it shows the variant is still gaining momentum and not yet to produce a leading feature. This might also provide a good indication that the vaccines for other COVID-19 viruses should be working for this mutant variant.
4.2. Follow-up results
From Figure 4, we observe that feature similarities fluctuates around 0. This indicates the leading features are pretty independent of each other. In some sense, the underlying behaviour of B.1.1.7 is versatile and less predictable. Combining the preliminary results and the follow-up ones, we shall reach a conclusion that B.1.1.7 has no clear feature and might be very easy in adopting new environments.
5.
Conclusions and future work
In this study, we devise 3-valued features to track the dynamical behaviours of SARS-CoV-2 VOC 202012/01. There features serve as the feature database for matching. By preliminary study and follow-up research, our study shows there is no clear leading features for B1.1.7. It also indicates that the virus is hard to pin down its properties and is versatile and hard to predict. The advantages for this study on the features are
a. It is a complete set features and serve as a matching database;
b. It could couple with machine learning and big data analysis to train and locate the features;
c. It could server as the pilot study to understand the potential features of the virus.
There are also some disadvantages:
a. As the length of each datum vector increases, the complexity of computation also exponentially increases;
b. It is hard to practically link the 3-valued features with real natural or scientific properties.
As for the future research, we could apply machine learning techniques on searching and selecting the optimal datum dimension. We could even lift the 3-valued features by adding more features into the candidate featured vectors. We could also associate the 3-valued features with other variables to yield a meaning interpretation of the extracted optimal features or associate them with randomness of genetic codes [17]. Another issue is whether a virus contains a leading feature would increase the severity of death rate of COVID-19 [18].
Acknowledgements
This work is supported by the Humanities and Social Science Research Planning Fund Project under the Ministry of Education of China (Grant No. 20XJAGAT001).
Conflict of interest
No potential conflict of interest was reported by the authors.
Appendix
Appendix A: Time-series raw data for 9 regions and 3 metrics
Appendix B: Difference datum vectors
(1) Difference datum vector for cases: (the length of each Rki is 287)
● R11=(54,21,57,34,⋯,−742,−386,−779,−1227)
● R12=(−61,25,92,22,⋯,−2407,−2193,−1094,−3133)
● ⋮
● R18=(−240,20,271,66,⋯,−3310,−4615,−3079,−4994)
● R19=(44,43,65,−18,⋯,−1429,−871,−2526,−1961)
(2) Difference datum vector for 28-day death:
● R21=(3,6,1,7,⋯,−3,2,−5,−13)
● R22=(−3,12,0,24,⋯,−2,−9,−10,−59)
● ⋮
● R28=(−6,21,10,27,⋯,−4,−25,−4,−66)
● R29=(17,0,4,17,⋯,0,6,−7,−44)
(3) Difference datum vector for 60-day death:
● R31=(3,6,1,7,⋯,−7,1,−3,−16)
● R32=(−3,12,0,24,⋯,−5,−12,−5,−63)
● ⋮
● R38=(−6,22,9,27,⋯,−4,−20,−12,−67)
● R39=(17,0,4,17,⋯,−7,6,−9,−45)
Appendix C: Distance tensor product matrices DTPki,j,h
Appendix D: List of DTPkj,h
Appendix E: Optimal rejection distances and features