Data augmentation based semi-supervised method to improve COVID-19 CT classification

Xiangtao Chen; Yuting Bai; Peng Wang; Jiawei Luo; Xiangtao Chen; Yuting Bai; Peng Wang; Jiawei Luo

doi:10.3934/mbe.2023294

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 4: 6838-6852. doi: 10.3934/mbe.2023294

Previous Article Next Article

Research article

Data augmentation based semi-supervised method to improve COVID-19 CT classification

1.
College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China
2.
College of Computer Science and Engineering, Hunan Institute of Technology, Hengyang 421002, China
Correction on: Mathematical Biosciences and Engineering 21: 7854–7855.

Received: 29 November 2022 Revised: 16 January 2023 Accepted: 29 January 2023 Published: 06 February 2023

The Coronavirus (COVID-19) outbreak of December 2019 has become a serious threat to people around the world, creating a health crisis that infected millions of lives, as well as destroying the global economy. Early detection and diagnosis are essential to prevent further transmission. The detection of COVID-19 computed tomography images is one of the important approaches to rapid diagnosis. Many different branches of deep learning methods have played an important role in this area, including transfer learning, contrastive learning, ensemble strategy, etc. However, these works require a large number of samples of expensive manual labels, so in order to save costs, scholars adopted semi-supervised learning that applies only a few labels to classify COVID-19 CT images. Nevertheless, the existing semi-supervised methods focus primarily on class imbalance and pseudo-label filtering rather than on pseudo-label generation. Accordingly, in this paper, we organized a semi-supervised classification framework based on data augmentation to classify the CT images of COVID-19. We revised the classic teacher-student framework and introduced the popular data augmentation method Mixup, which widened the distribution of high confidence to improve the accuracy of selected pseudo-labels and ultimately obtain a model with better performance. For the COVID-CT dataset, our method makes precision, F1 score, accuracy and specificity 21.04%, 12.95%, 17.13% and 38.29% higher than average values for other methods respectively, For the SARS-COV-2 dataset, these increases were 8.40%, 7.59%, 9.35% and 12.80% respectively. For the Harvard Dataverse dataset, growth was 17.64%, 18.89%, 19.81% and 20.20% respectively. The codes are available at https://github.com/YutingBai99/COVID-19-SSL.
- COVID-19,
- semi-supervised,
- pseudo-labels,
- Mixup,
- teacher-student framework
Citation: Xiangtao Chen, Yuting Bai, Peng Wang, Jiawei Luo. Data augmentation based semi-supervised method to improve COVID-19 CT classification[J]. Mathematical Biosciences and Engineering, 2023, 20(4): 6838-6852. doi: 10.3934/mbe.2023294

Related Papers:

Abstract

The Coronavirus (COVID-19) outbreak of December 2019 has become a serious threat to people around the world, creating a health crisis that infected millions of lives, as well as destroying the global economy. Early detection and diagnosis are essential to prevent further transmission. The detection of COVID-19 computed tomography images is one of the important approaches to rapid diagnosis. Many different branches of deep learning methods have played an important role in this area, including transfer learning, contrastive learning, ensemble strategy, etc. However, these works require a large number of samples of expensive manual labels, so in order to save costs, scholars adopted semi-supervised learning that applies only a few labels to classify COVID-19 CT images. Nevertheless, the existing semi-supervised methods focus primarily on class imbalance and pseudo-label filtering rather than on pseudo-label generation. Accordingly, in this paper, we organized a semi-supervised classification framework based on data augmentation to classify the CT images of COVID-19. We revised the classic teacher-student framework and introduced the popular data augmentation method Mixup, which widened the distribution of high confidence to improve the accuracy of selected pseudo-labels and ultimately obtain a model with better performance. For the COVID-CT dataset, our method makes precision, F1 score, accuracy and specificity 21.04%, 12.95%, 17.13% and 38.29% higher than average values for other methods respectively, For the SARS-COV-2 dataset, these increases were 8.40%, 7.59%, 9.35% and 12.80% respectively. For the Harvard Dataverse dataset, growth was 17.64%, 18.89%, 19.81% and 20.20% respectively. The codes are available at https://github.com/YutingBai99/COVID-19-SSL.

References

[1]	D. Zavras, Healthcare access as an important element for the EU's socioeconomic development: Greece's residents' opinions during the COVID-19 pandemic, Natl. Account. Rev., 4 (2022), 362–377. https://doi.org/10.3934/NAR.2022020 doi: 10.3934/NAR.2022020
[2]	D. Panarello, G. Tassinari, The consequences of COVID-19 on older adults: Evidence from the share corona survey, Natl. Account. Rev., 4 (2022), 56–73. https://doi.org/10.3934/NAR.2022004 doi: 10.3934/NAR.2022004
[3]	M. Islam, F. Karray, R. Alhajj, J. Zeng, A review on deep learning techniques for the diagnosis of novel coronavirus (COVID-19), IEEE Access, 9 (2021), 30551–30572. https://doi.org/10.1109/ACCESS.2021.3058537 doi: 10.1109/ACCESS.2021.3058537
[4]	J. Xu, J. Xu, Y. Meng, C. Lu, L. Cai, X. Zeng, et al., Graph embedding and gaussian mixture variational autoencoder network for end-to-end analysis of single-cell rna sequencing data, Cell Rep. Methods, 2023 (2023), 100382. https://doi.org/10.1016/j.crmeth.2022.100382 doi: 10.1016/j.crmeth.2022.100382
[5]	A. Shoeibi, M. Khodatars, R. Alizadehsani, N. Ghassemi, M. Jafari, P. Moridian, et al., Automated detection and forecasting of COVID-19 using deep learning techniques: A review, preprint, arXiv: 2007.10785.
[6]	T. Ai, Z. Yang, H. Hou, C. Zhan, C. Chen, W. Lv, et al., orrelation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in china: a report of 1014 cases, Radiology, 296 (2020), E32–E40. https://doi.org/10.1148/radiol.2020200642 doi: 10.1148/radiol.2020200642
[7]	N. Ayoobi, D. Sharifrazi, R. Alizadehsani, A. Shoeibi, J. M. Gorriz, H. Moosaei, et al., Time series forecasting of new cases and new deaths rate for COVID-19 using deep learning methods, Results Phys., 27 (2021), 104495. https://doi.org/10.1016/j.rinp.2021.104495 doi: 10.1016/j.rinp.2021.104495
[8]	F. Khozeimeh, D. Sharifrazi, N. H. Izadi, J. H. Joloudari, A. Shoeibi, R. Alizadehsani, et al., Combining a convolutional neural network with autoencoders to predict the survival chance of COVID-19 patients, Sci. Rep., 11 (2021), 1–18. https://doi.org/10.1038/s41598-021-93543-8 doi: 10.1038/s41598-021-93543-8
[9]	A. Khan, S. Khan, M. Saif, A. Batool, A. Sohail, M. Khan, A survey of deep learning techniques for the analysis of COVID-19 and their usability for detecting omicron, preprint, arXiv: 2202.06372.
[10]	A. Parvaiz, M. Khalid, R. Zafar, H. Ameer, M. Ali, M. Fraz, Vision transformers in medical computer vision–a contemplative retrospection, preprint, arXiv: 2203.15269.
[11]	X. Yang, X. He, Y. Liang, Y. Yang, S. Zhang, P. Xie, Transfer learning or self-supervised learning? a tale of two pretraining paradigms, preprint, arXiv: 2007.04234.
[12]	S. Pan, Q. Yang, A survey on transfer learning, IEEE Trans. Knowl. Data Eng., 22 (2009), 1345–1359. https://doi.org/10.1109/TKDE.2009.191 doi: 10.1109/TKDE.2009.191
[13]	J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, F. Li, ImageNet: A large-scale hierarchical image database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), 248–255. https://doi.org/10.1109/CVPR.2009.5206848
[14]	H. Panwar, P. Gupta, M. Siddiqui, R. Morales-Menendez, P. Bhardwaj, V. Singh, A deep learning and grad-cam based color visualization approach for fast detection of COVID-19 cases using chest X-ray and CT-scan images, Chaos, Solitons Fractals, 140 (2020), 110190. https://doi.org/10.1016/j.chaos.2020.110190 doi: 10.1016/j.chaos.2020.110190
[15]	A. Jaiswal, N. Gianchandani, D. Singh, V. Kumar, M. Kaur, Classification of the COVID-19 infected patients using densenet201 based deep transfer learning, J. Biomol. Struct. Dyn., 39 (2021), 5682–5689. https://doi.org/10.1080/07391102.2020.1788642 doi: 10.1080/07391102.2020.1788642
[16]	H. Alshazly, C. Linse, E. Barth, T. Martinetz, Explainable COVID-19 detection using chest CT scans and deep learning, Sensors, 21 (2021), 455. https://doi.org/10.3390/s21020455 doi: 10.3390/s21020455
[17]	T. Pham, Classification of COVID-19 chest X-rays with deep learning: New models or fine tuning, Health Inf. Sci. Syst., 9 (2021), 1–11. https://doi.org/10.1007/s13755-020-00135-3 doi: 10.1007/s13755-020-00135-3
[18]	Y. Cao, T. Geddes, J. Yang, P. Yang, Ensemble deep learning in bioinformatics, Nat. Mach. Intell., 2 (2020), 500–508. https://doi.org/10.1038/s42256-020-0217-y doi: 10.1038/s42256-020-0217-y
[19]	M. Lenzerini, Data integration: A theoretical perspective, in Proceedings of the the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS02), (2002), 233–246. https://doi.org/10.1145/543613.543644
[20]	Z. Wang, Q. Liu, Q. Dou, Contrastive cross-site learning with redesigned net for COVID-19 CT classification, IEEE J. Biomed. Health. Inf., 24 (2020), 2806–2813. https://doi.org/10.1109/JBHI.2020.3023246 doi: 10.1109/JBHI.2020.3023246
[21]	O. Sagi, L. Rokach, Ensemble learning: A survey, Wiley Interdiscip. Rev.: Data Min. Knowl. Discovery, 8 (2018), e1249. https://doi.org/10.1002/widm.1249 doi: 10.1002/widm.1249
[22]	Z. Wang, Q. Liu, Q. Dou, Contrastive cross-site learning with redesigned net for COVID-19 CT classification, IEEE J. Biomed. Health. Inf., 24 (2020), 2806–2813. https://doi.org/10.1109/JBHI.2020.3023246 doi: 10.1109/JBHI.2020.3023246
[23]	R. Kundu, H. Basak, P. Singh, A. Ahmadian, M. Ferrara, R. Sarkar, Fuzzy rank-based fusion of cnn models using gompertz function for screening COVID-19 CT-scans, Sci. Rep., 11 (2021), 1–12. https://doi.org/10.1038/s41598-021-93658-y doi: 10.1038/s41598-021-93658-y
[24]	R. Kundu, P. Singh, S. Mirjalili, R. Sarkar, COVID-19 detection from lung ct-scans using a fuzzy integral-based cnn ensemble, Comput. Biol. Med., 138 (2021), 104895. https://doi.org/10.1016/j.compbiomed.2021.104895 doi: 10.1016/j.compbiomed.2021.104895
[25]	N. Shaik, T. Cherukuri, Transfer learning based novel ensemble classifier for COVID-19 detection from chest CT-scans, Comput. Biol. Med., 141 (2022), 105127. https://doi.org/10.1016/j.compbiomed.2021.105127 doi: 10.1016/j.compbiomed.2021.105127
[26]	E. Jangam, C. S. Annavarapu, A stacked ensemble for the detection of COVID-19 with high recall and accuracy, Comput. Biol. Med., 135 (2021), 104608. https://doi.org/10.1016/j.compbiomed.2021.104608 doi: 10.1016/j.compbiomed.2021.104608
[27]	A. Jaiswal, A. R. Babu, M. Z. Zadeh, D. Banerjee, F. Makedon, A survey on contrastive self-supervised learning, Technologies, 9 (2020), 2. https://doi.org/10.3390/technologies9010002 doi: 10.3390/technologies9010002
[28]	Y. Xu, H. Lam, G. Jia, J. Jiang, J. Liao, X. Bao, Improving COVID-19 CT classification of CNNS by learning parameter-efficient representation, preprint, arXiv: 2208.04718.
[29]	N. Ewen, N. Khan, Targeted self supervision for classification on a small COVID-19 CT scan dataset, in 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), (2021), 1481–1485. https://doi.org/10.1109/ISBI48211.2021.9434047
[30]	X. He, X. Yang, S. Zhang, J. Zhao, Y. Zhang, E. Xing, et al., Sample-efficient deep learning for COVID-19 diagnosis based on CT scans, medRxiv 2020.04.13.20063941, 2020. https://doi.org/10.1101/2020.04.13.20063941
[31]	C. Han, M. Kim, J. Kwak, Semi-supervised learning for an improved diagnosis of COVID-19 in CT images, PLoS One, 16 (2021), e0249450. https://doi.org/10.1371/journal.pone.0249450 doi: 10.1371/journal.pone.0249450
[32]	P. Silva, E. Luz, G. Silva, G. Moreira, R. Silva, D. Lucio, et al., COVID-19 detection in CT images with deep learning: A voting-based scheme and cross-datasets analysis, Inf. Med. Unlocked, 20 (2020), 100427. https://doi.org/10.1016/j.imu.2020.100427 doi: 10.1016/j.imu.2020.100427
[33]	Y. Wu, S. Gao, J. Mei, J. Xu, D. Fan, R. Zhang, et al., JCS: An explainable COVID-19 diagnosis system by joint classification and segmentation, IEEE Trans. Image Process., 30 (2021), 3113–3126. https://doi.org/10.1109/TIP.2021.3058783 doi: 10.1109/TIP.2021.3058783
[34]	G. Kostopoulos, S. Karlos, S. Kotsiantis, O. Ragos, Semi-supervised regression: A recent review, J. Intell. Fuzzy Syst., 35 (2018), 1483–1500. https://doi.org/10.3233/JIFS-169689 doi: 10.3233/JIFS-169689
[35]	J. Zhou, B. Jing, Z. Wang, H. Xin, H. Tong, Soda: Detecting COVID-19 in chest X-rays with semi-supervised open set domain adaptation, IEEE/ACM Trans. Comput. Biol. Bioinf., 2021 (2021). https://doi.org/10.1109/TCBB.2021.3066331 doi: 10.1109/TCBB.2021.3066331
[36]	A. More, Survey of resampling techniques for improving classification performance in unbalanced datasets, preprint, arXiv: 1608.06048.
[37]	S. Calderon-Ramirez, S. Yang, A. Moemeni, D. Elizondo, S. Colreavy-Donnelly, L. Chavarría-Estrada, et al., Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images, Appl. Soft Comput., 111 (2021), 107692. https://doi.org/10.1016/j.asoc.2021.107692 doi: 10.1016/j.asoc.2021.107692
[38]	S. Calderon-Ramirez, S. Yang, D. Elizondo, A. Moemeni, Dealing with distribution mismatch in semi-supervised deep learning for COVID-19 detection using chest X-ray images: A novel approach using feature densities, Appl. Soft Comput., 123 (2022), 108983. https://doi.org/10.1016/j.asoc.2022.108983 doi: 10.1016/j.asoc.2022.108983
[39]	R. Alizadehsani, D. Sharifrazi, N. Izadi, J. Joloudari, A. Shoeibi, J. Gorriz, et al., Uncertainty-aware semi-supervised method using large unlabeled and limited labeled COVID-19 data, ACM Trans. Multimedia Comput. Commun. Appl., 17 (2021), 1–24. https://doi.org/10.1145/3462635 doi: 10.1145/3462635
[40]	S. Calderon-Ramirez, S. Yang, A. Moemeni, S. Colreavy-Donnelly, D. Elizondo, L. Oala, et al., Improving uncertainty estimation with semi-supervised deep learning for COVID-19 detection using chest X-ray images, IEEE Access, 9 (2021), 85442–85454. https://doi.org/10.1109/ACCESS.2021.3085418 doi: 10.1109/ACCESS.2021.3085418
[41]	H. Asgharnezhad, A. Shamsi, R. Alizadehsani, A. Khosravi, S. Nahavandi, Z. A. Sani, et al., Objective evaluation of deep uncertainty predictions for COVID-19 detection, Sci. Rep., 12 (2022), 1–11. https://doi.org/10.1038/s41598-022-05052-x doi: 10.1038/s41598-022-05052-x
[42]	H. Zhang, M. Cisse, Y. Dauphin, D. Lopez-Paz, mixup: Beyond empirical risk minimization, preprint, arXiv: 1710.09412.
[43]	Q. Xie, M. Luong, E. Hovy, Q. Le, Self-training with noisy student improves imagenet classification, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), 10687–10698. https://doi.org/10.1109/CVPR42600.2020.01070
[44]	M. Rizve, K. Duarte, Y. Rawat, M. Shah, In defense of pseudo-labeling: An uncertainty-aware pseudo-label selection framework for semi-supervised learning, preprint, arXiv: 2101.06329.
[45]	J. Zhao, Y. Zhang, X. He, P. Xie, COVID-CT-dataset: a CT scan dataset about COVID-19, preprint, arXiv: 2003.13865.
[46]	E. Soares, P. Angelov, S. Biaso, M. Froes, D. Abe, SARS-COV-2 CT-scan dataset: A large dataset of real patients CT scans for SARS-COV-2 identification, medrxiv, 2020.
[47]	E. Soares, P. Angelov, A large dataset of real patients CT scans for COVID-19 identification, Harvard Dataverse, 1 (2020). https://doi.org/10.7910/DVN/SZDUQX doi: 10.7910/DVN/SZDUQX
[48]	M. Tan, Q. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, in Proceedings of the 36th International Conference on Machine Learning, 97 (2019), 6105–6114. https://doi.org/10.48550/arXiv.1905.11946
[49]	D. Sharifrazi, R. Alizadehsani, M. Roshanzamir, J. H. Joloudari, A. Shoeibi, M. Jafari, et al., Fusion of convolution neural network, support vector machine and sobel filter for accurate detection of COVID-19 patients using X-ray images, Biomed. Signal Process. Control, 68 (2021), 102622. https://doi.org/10.1016/j.bspc.2021.102622 doi: 10.1016/j.bspc.2021.102622
[50]	J. H. Joloudari, F. Azizi, I. Nodehi, M. A. Nematollahi, F. Kamrannejhad, A. Mosavi, et al., DNN-GFE: A deep neural network model combined with global feature extractor for COVID-19 diagnosis based on CT scan images, Tech. Rep., 2021 (2021).
[51]	K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556.
[52]	S. Zagoruyko, N. Komodakis, Wide residual networks, preprint, arXiv: 1605.07146.
[53]	K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 770–778.

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)