In this work, Deep Bidirectional Recurrent Neural Networks (BRNNs) models were implemented based on both Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) cells in order to distinguish between genome sequence of SARS-CoV-2 and other Corona Virus strains such as SARS-CoV and MERS-CoV, Common Cold and other Acute Respiratory Infection (ARI) viruses. An investigation of the hyper-parameters including the optimizer type and the number of unit cells, was also performed to attain the best performance of the BRNN models. Results showed that the GRU BRNNs model was able to discriminate between SARS-CoV-2 and other classes of viruses with a higher overall classification accuracy of 96.8% as compared to that of the LSTM BRNNs model having a 95.8% overall classification accuracy. The best hyper-parameters producing the highest performance for both models was obtained when applying the SGD optimizer and an optimum number of unit cells of 80 in both models. This study proved that the proposed GRU BRNN model has a better classification ability for SARS-CoV-2 thus providing an efficient tool to help in containing the disease and achieving better clinical decisions with high precision.
Citation: Mohanad A. Deif, Ahmed A. A. Solyman, Mehrdad Ahmadi Kamarposhti, Shahab S. Band, Rania E. Hammam. A deep bidirectional recurrent neural network for identification of SARS-CoV-2 from viral genome sequences[J]. Mathematical Biosciences and Engineering, 2021, 18(6): 8933-8950. doi: 10.3934/mbe.2021440
In this work, Deep Bidirectional Recurrent Neural Networks (BRNNs) models were implemented based on both Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) cells in order to distinguish between genome sequence of SARS-CoV-2 and other Corona Virus strains such as SARS-CoV and MERS-CoV, Common Cold and other Acute Respiratory Infection (ARI) viruses. An investigation of the hyper-parameters including the optimizer type and the number of unit cells, was also performed to attain the best performance of the BRNN models. Results showed that the GRU BRNNs model was able to discriminate between SARS-CoV-2 and other classes of viruses with a higher overall classification accuracy of 96.8% as compared to that of the LSTM BRNNs model having a 95.8% overall classification accuracy. The best hyper-parameters producing the highest performance for both models was obtained when applying the SGD optimizer and an optimum number of unit cells of 80 in both models. This study proved that the proposed GRU BRNN model has a better classification ability for SARS-CoV-2 thus providing an efficient tool to help in containing the disease and achieving better clinical decisions with high precision.
[1] | R. Lu, X. Zhao, J. Li, P. Niu, B. Yang, H. Wu, et al., Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding, Lancet, 395 (2020), 565-574. doi: 10.1016/S0140-6736(20)30251-8 |
[2] | M. A. Deif, A. A. A. Solyman, R. E. Hammam, ARIMA Model Estimation Based on Genetic Algorithm for COVID-19 Mortality Rates, Int. J. Inf. Technol. Decis. Mak., (2021), 1-24. |
[3] | C. Wang, P. W. Horby, F. G. Hayden, G. F. Gao, A novel coronavirus outbreak of global health concern, Lancet, 395 (2020), 470-473. doi: 10.1016/S0140-6736(20)30185-9 |
[4] | D. Cucinotta, M. Vanelli, WHO declares COVID-19 a pandemic, Acta Bio. Med. Atenei Parm., 91 (2020), 157. |
[5] | M. Deif, R. Hammam, A. Solyman, Adaptive Neuro-Fuzzy Inference System (ANFIS) for Rapid Diagnosis of COVID-19 Cases Based on Routine Blood Tests, Int. J. Intell. Eng. Syst., 2020. |
[6] | Rational use of personal protective equipment for coronavirus disease (COVID-19) and considerations during severe shortages: interim guidance, World Health Organization, 2020. |
[7] | J. Yang, Inhibition of SARS-CoV-2 Replication by Acidizing and RNA Lyase-Modified Carbon Nanotubes Combined with Photodynamic Thermal Effect, J. Explor. Res. Pharmacol., (2020), 1-6. |
[8] | M. Pal, G. Berhanu, C. Desalegn, V. Kandi, Severe acute respiratory syndrome Coronavirus-2 (SARS-CoV-2): An update, Cureus, 12 (2020), 3. |
[9] | P. C. Y. Woo, Y. Huang, S. K. P. Lau, K. Y. Yuen, Coronavirus genomics and bioinformatics analysis, Viruses, 2 (2010), 1804-1820. doi: 10.3390/v2081803 |
[10] | N. Decaro, V. Mari, G. Elia, D. D. Addie, M. Camero, M. S. Lucente, et al., Recombinant canine coronaviruses in dogs, Europe, Emerg. Infect. Dis., 16 (2010), 41. |
[11] | M. Pachetti, B. Marini, F. Benedetti, F. Giudici, E. Mauro, P. Storici, et al., Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant, J. Transl. Med., 18 (2020), 1-9. doi: 10.1186/s12967-019-02189-8 |
[12] | L. Peñarrubia, M. Ruiz, R. Porco, S. N. Rao, M. Juanola-Falgarona, D. Manissero, et al., Multiple assays in a real-time RT-PCR SARS-CoV-2 panel can mitigate the risk of loss of sensitivity by new genomic variants during the COVID-19 outbreak, Int. J. Infect. Dis., 2020. |
[13] | W. R. Pearson, Rapid and sensitive sequence comparison with FASTP and FASTA, 1990. |
[14] | S. F. Altschul, W. Gish, W. Miller, E. W. Myers, D. J. Lipman, Basic local alignment search tool, J. Mol. Biol., 215 (1990), 403-410. |
[15] | L. Pinello, G. L. Bosco, G. C. Yuan, Applications of alignment-free methods in epigenomics, Brief Bioinf., 5 (2014), 419-430. |
[16] | S. Vinga, J. Almeida, Alignment-free sequence comparison-a review, Bioinformatics, 19 (2003), 513-523. doi: 10.1093/bioinformatics/btg005 |
[17] | D. Bzhalava, J. Ekström, F. Lysholm, E. Hultin, H. Faust, B. Persson, et al., Phylogenetically diverse TT virus viremia among pregnant women, Virology, 432 (2012), 427-434. doi: 10.1016/j.virol.2012.06.022 |
[18] | A. Tampuu, Z. Bzhalava, J. Dillner, R. Vicente, ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples, PLoS One, 14 (2019), e0222271. |
[19] | S. M. Naeem, M. S. Mabrouk, S. Y. Marzouk, M. A. Eldosoky, A diagnostic genomic signal processing (GSP)-based system for automatic feature analysis and detection of COVID-19, Brief Bioinf., 2020. |
[20] | M. A. Deif, R. E. Hammam, A. Solyman, Gradient Boosting Machine Based on PSO for prediction of Leukemia after a Breast Cancer Diagnosis, Int. J. Adv. Sci. Eng. Inf. Technol., 11 (2021), 508-515. doi: 10.18517/ijaseit.11.2.12955 |
[21] | Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature, 521 (2015), 436-444. |
[22] | J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks, 61 (2015), 85-117. doi: 10.1016/j.neunet.2014.09.003 |
[23] | M. Wainberg, D. Merico, A. Delong, B. J. Frey, Deep learning in biomedicine, Nat. Biotechnol., 36 (2018), 829-838. |
[24] | Y. Kim, Convolutional neural networks for sentence classification, preprint, arXiv: 1408.5882. |
[25] | A. Lopez-Rincon, A. Tonda, L. Mendoza-Maldonado, E. Claassen, J. Garssen, A. D. Kraneveld, Accurate identification of sars-cov-2 from viral genome sequences using deep learning, bioRxiv, 2020. |
[26] | M. A. Deif, R. E. Hammam, Skin lesions classification based on deep learning approach, J. Clin. Eng., 45 (2020), 155-161. doi: 10.1097/JCE.0000000000000405 |
[27] | G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process. Mag., 29 (2012), 82-97. |
[28] | N. G. Nguyen, V. A. Tran, D. L. Ngo, D. Phan, F. R. Lumbanraja, M. R. Faisal, et al., DNA sequence classification by convolutional neural network, J. Biomed. Sci. Eng., 9 (2016), 280. |
[29] | China National Center for Bioinformation, 2019 Novel Coronavirus Resource (2019nCoVR), 2020, https://bigd.big.ac.cn/ncov/?lang=en. |
[30] | A. Vabret, T. Mourez, S. Gouarin, J. Petitjean, F. Freymuth, An outbreak of coronavirus OC43 respiratory infection in Normandy, France, Clin. Infect. Dis., 36 (2013), 985-989. |
[31] | L. J. Cui, C. Zhang, T. Zhang, R. J. Lu, Z. D. Xie, L. L. Zhang, et al., Human coronaviruses HCoV-NL63 and HCoV-HKU1 in hospitalized children with acute respiratory infections in Beijing, China, Adv. Virol., 2011 (2011). |
[32] | F. Y. Zeng, C. W. M. Chan, M. N. Chan, J. D. Chen, K. Y. C. Chow, C. C. Hon, et al., The complete genome sequence of severe acute respiratory syndrome coronavirus strain HKU-39849 (HK-39), Exp. Biol. Med., 28 (2003), 866-873. |
[33] | T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, Adv. Neural Inf. Process. Syst., 26 (2013), 3111-3119. |