Accurate classification and segmentation of polyps are two important tasks in the diagnosis and treatment of colorectal cancers. Existing models perform segmentation and classification separately and do not fully make use of the correlation between the two tasks. Furthermore, polyps exhibit random regions and varying shapes and sizes, and they often share similar boundaries and backgrounds. However, existing models fail to consider these factors and thus are not robust because of their inherent limitations. To address these issues, we developed a multi-task network that performs both segmentation and classification simultaneously and can cope with the aforementioned factors effectively. Our proposed network possesses a dual-branch structure, comprising a transformer branch and a convolutional neural network (CNN) branch. This approach enhances local details within the global representation, improving both local feature awareness and global contextual understanding, thus contributing to the improved preservation of polyp-related information. Additionally, we have designed a feature interaction module (FIM) aimed at bridging the semantic gap between the two branches and facilitating the integration of diverse semantic information from both branches. This integration enables the full capture of global context information and local details related to polyps. To prevent the loss of edge detail information crucial for polyp identification, we have introduced a reverse attention boundary enhancement (RABE) module to gradually enhance edge structures and detailed information within polyp regions. Finally, we conducted extensive experiments on five publicly available datasets to evaluate the performance of our method in both polyp segmentation and classification tasks. The experimental results confirm that our proposed method outperforms other state-of-the-art methods.
Citation: Chenqian Li, Jun Liu, Jinshan Tang. Simultaneous segmentation and classification of colon cancer polyp images using a dual branch multi-task learning network[J]. Mathematical Biosciences and Engineering, 2024, 21(2): 2024-2049. doi: 10.3934/mbe.2024090
Accurate classification and segmentation of polyps are two important tasks in the diagnosis and treatment of colorectal cancers. Existing models perform segmentation and classification separately and do not fully make use of the correlation between the two tasks. Furthermore, polyps exhibit random regions and varying shapes and sizes, and they often share similar boundaries and backgrounds. However, existing models fail to consider these factors and thus are not robust because of their inherent limitations. To address these issues, we developed a multi-task network that performs both segmentation and classification simultaneously and can cope with the aforementioned factors effectively. Our proposed network possesses a dual-branch structure, comprising a transformer branch and a convolutional neural network (CNN) branch. This approach enhances local details within the global representation, improving both local feature awareness and global contextual understanding, thus contributing to the improved preservation of polyp-related information. Additionally, we have designed a feature interaction module (FIM) aimed at bridging the semantic gap between the two branches and facilitating the integration of diverse semantic information from both branches. This integration enables the full capture of global context information and local details related to polyps. To prevent the loss of edge detail information crucial for polyp identification, we have introduced a reverse attention boundary enhancement (RABE) module to gradually enhance edge structures and detailed information within polyp regions. Finally, we conducted extensive experiments on five publicly available datasets to evaluate the performance of our method in both polyp segmentation and classification tasks. The experimental results confirm that our proposed method outperforms other state-of-the-art methods.
[1] | A. Leufkens, M. G. H. Van Oijen, F. P. Vleggaar, P. D. Siersema, Factors influencing the miss rate of polyps in a back-to-back colonoscopy study, Endoscopy, 44 (2012), 470–475. https://doi.org/10.1055/s-0031-1291666 doi: 10.1055/s-0031-1291666 |
[2] | C. F. Chen, Z. J. Du, L. He, Y. J. Shi, J. Q. Wang, W. Dong, A novel gait pattern recognition method based on LSTM-CNN for lower limb exoskeleton, J. Bionic Eng., 18 (2021), 1059–1072. https://doi.org/10.1007/s42235-021-00083-y doi: 10.1007/s42235-021-00083-y |
[3] | S. Tian, J. Zhang, X. Y. Shu, L. Y. Chen, X. Niu, Y. Wang, A novel evaluation strategy to artificial neural network model based on bionics, J. Bionic Eng., 19 (2022), 1–16. https://doi.org/10.1007/s42235-021-00136-2 doi: 10.1007/s42235-021-00136-2 |
[4] | L. Xu, R. Maggar, A. B. Farimani, Forecasting COVID-19 new cases using deep learning methods, Comput. Biol. Med., 144 (2022), 105342. https://doi.org/10.1016/j.compbiomed.2022.105342 doi: 10.1016/j.compbiomed.2022.105342 |
[5] | J. X. Xie, B. Yao, Physics-constrained deep active learning for spatiotemporal modeling of cardiac electrodynamics, Comput. Biol. Med., 146 (2022), 105586–105594. https://doi.org/10.1016/j.compbiomed.2022.105586 doi: 10.1016/j.compbiomed.2022.105586 |
[6] | Q. Guan, Y. Z. Chen, Z. H. Wei, A. A. Heidari, H. G. Hu, X. H. Yang, et al., Medical image augmentation for lesion detection using a texture-constrained multichannel progressive GAN, Comput. Biol. Med., 145 (2022), 105444–105449. https://doi.org/10.1016/j.compbiomed.2022.105444 doi: 10.1016/j.compbiomed.2022.105444 |
[7] | R. K. Zhang, Y. L. Zheng, T. W. C. Mak, R. Yu, S. H. Wong, J. Y. W. Lau, et al., Automatic detection and classification of colorectal polyps by transferring low-level CNN features from nonmedical domain, IEEE J. Biomed. Health Inf., 21 (2016), 41–47. https://doi.org/10.1109/JBHI.2016.2635662 doi: 10.1109/JBHI.2016.2635662 |
[8] | M. F. Byme, N. Chapados, F. Soudan, C. Oertel, M. L. Pérez, R. Kelly, et al., Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model, Gut, 68 (2017), 94–100. https://doi.org/10.1136/gutjnl-2017-314547 doi: 10.1136/gutjnl-2017-314547 |
[9] | F. Younas, M. Usman, W. Q. Yan, A deep ensemble learning method for colorectal polyp classification with optimized network parameters, Appl. Intell., 53 (2023), 2410–2433. https://doi.org/10.1007/s10489-022-03689-9 doi: 10.1007/s10489-022-03689-9 |
[10] | O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in Medical Image Computing and Computer-Assisted Intervention, Springer, (2015), 234–241. https://doi.org/10.1007/978-3-319-24574-4_28 |
[11] | Z. W. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, J. Liang, Unet++: A nested u-net architecture for medical image segmentation, in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, (2018), 3–11. https://doi.org/10.1007/978-3-030-00889-5_1 |
[12] | D. Jha, P. H. Smedsurd, D. Johansen, T. D. Lange, H. D. Johansen, P. Halvorsen, et al., A comprehensive study on colorectal polyp segmentation with ResUNet++, conditional random field and test-time augmentation, IEEE J. Biomed. Health Inf., 25 (2021), 2029–2040. https://doi.org/10.1109/JBHI.2021.3049304 doi: 10.1109/JBHI.2021.3049304 |
[13] | D. Jha, P. H. Smedsrud, M. A. Riegler, D. Johansen, T. De Lange, P. Halvorsen, et al., Resunet++: An advanced architecture for medical image segmentation, in 2019 IEEE International Symposium on Multimedia (ISM), (2019), 225–2255. https://doi.org/10.1109/ISM46123.2019.00049 |
[14] | D. P. Fan, G. P. Ji, T. Zhou, G. Chen, H. Z. Fu, J. B. Shen, et al., Pranet: Parallel reverse attention network for polyp segmentation, in International Conference on Medical Image Computing and Computer-assisted Intervention, Springer, (2020), 263–273. https://doi.org/10.1007/978-3-030-59725-2_26 |
[15] | R. F. Zhang, G. B. Li, Z. Li, S. G. Cui, D. H. Qian, Y. Z. Yu, Adaptive context selection for polyp segmentation, in Medical Image Computing and Computer Assisted Intervention, Springer, (2020), 253–262. https://doi.org/10.1007/978-3-030-59725-2_25 |
[16] | G. P. Ji, G. B. Xiao, Y. C. Chou, D. P. Fan, K. Zhao, G. Chen, et al., Video polyp segmentation: A deep learning perspective, Mach. Intell. Res., 19 (2022), 531–549. https://doi.org/10.1007/s11633-022-1371-y doi: 10.1007/s11633-022-1371-y |
[17] | Y. Lin, J. C. Wu, G. B. Xiao, J. W. Guo, G. Chen, J. Y. Ma, BSCA-Net: Bit slicing context attention network for polyp segmentation, Pattern Recogn., 132 (2022), 108917. https://doi.org/10.1016/j.patcog.2022.108917 doi: 10.1016/j.patcog.2022.108917 |
[18] | Y. D. Zhang, H. Y. Liu, Q. Hu, Transfuse: Fusing transformers and CNNs for medical image segmentation, in Medical Image Computing and Computer Assisted-Intervention, Springer, (2021), 14–24. https://doi.org/10.1007/978-3-030-87193-2_2 |
[19] | A. Galdran, G. Carneiro, M. A. G. Ballester, Double encoder-decoder networks for gastrointestinal polyp segmentation, in Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, Springer, (2021), 293–307. https://doi.org/10.1007/978-3-030-68763-2_22 |
[20] | A. Amyar, B. Modzelewski, H. Li, S. Ruan, Multi-task deep learning based CT imaging analysis for COVID-19 pneumonia: Classification and segmentation, Comput. Biol. Med., 126 (2020), 104037. https://doi.org/10.1016/j.compbiomed.2020.104037 doi: 10.1016/j.compbiomed.2020.104037 |
[21] | Z. Wu, R. J. Ge, M. L. Wen, G. S. Liu, Y. Chen, P. Z. Zhang, et al., ELNet: Automatic classification and segmentation for esophageal lesions using convolutional neural network, Comput. Biol. Med., 67 (2021), 101838. https://doi.org/10.1016/j.media.2020.101838 doi: 10.1016/j.media.2020.101838 |
[22] | C. Chen, W. J. Bai, D. Rueckert, Multi-task learning for left atrial segmentation on GE-MRI, in Statistical Atlases and Computational Models of the Heart. Atrial Segmentation and LV Quantification Challenges: 9th International Workshop, Springer, (2018), 292–301. https://doi.org/10.1007/978-3-030-12029-0_32 |
[23] | R. Zhang, X. Y. Xiao, Z. Liu, Y. J. Li, S. Li, MRLN: Multi-task relational learning network for mrivertebral localization, identification, and segmentation, IEEE J. Biomed. Health Inf., 24 (2020), 2902–2911. https://doi.org/10.1109/JBHI.2020.2969084 doi: 10.1109/JBHI.2020.2969084 |
[24] | Y. Zhou, H. J. Chen, Y. F. Li, Q. Liu, X. A. Xu, S. Wang, et al., Multi-task learning for segmentation and classification of tumors in 3D automated breast ultrasound images, Med. Image Anal., 70 (2021), 101918–101920. https://doi.org/10.1016/j.media.2020.101918 doi: 10.1016/j.media.2020.101918 |
[25] | K. Liu, N. Uplavikar, W. Jiang, Y. J. Fu, Privacy-preserving multi-task learning, in 2018 IEEE International Conference on Data Mining (ICDM), IEEE, (2018), 1128–1133. https://doi.org/10.1109/ICDM.2018.00147 |
[26] | G. K. Zhang, X. A. Shen, Y. D. Zhang, Y. Luo, D. D. Zhu, H. M. Yang, et al., Cross-modal prostate cancer segmentation via self-attention distillation, IEEE J. Biomed. Health Inf., 26 (2021), 5298–5309. https://doi.org/10.1109/JBHI.2021.3127688 doi: 10.1109/JBHI.2021.3127688 |
[27] | C. Wang, M. Gan, Tissue self-attention network for the segmentation of optical coherence tomography images on the esophagus, Biomed. Opt. Express, 12 (2021), 2631–2646. https://doi.org/10.1364/BOE.419809 doi: 10.1364/BOE.419809 |
[28] | A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. H. Zhai, T. Unterthiner, et al., An image is worth 16x16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929. |
[29] | N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in European Conference on Computer Vision, Springer, (2020), 213–229. https://doi.org/10.1007/978-3-030-58452-8_13 |
[30] | J. N. Chen, Y. Y. Lu, Q. H. Yu, X. D. Luo, E. Adeil, Y. Wang, et al., TransUNet: Transformers make strong encoders for medical image segmentation, preprint, arXiv: 2102.04306. |
[31] | J. W. Wang, S. W. Tian, L. Yu, Z. C. Zhou, F. Wang, Y. T. Wang, HIGF-Net: Hierarchical information-guided fusion network for polyp segmentation based on transformer and convolution feature learning, Comput. Biol. Med., 161 (2023), 107038. https://doi.org/10.1016/j.compbiomed.2023.107038 doi: 10.1016/j.compbiomed.2023.107038 |
[32] | Y. L. Huang, D. H. Tan, Y. Zhang, X. Y. Li, K. Hu, TransMixer: A hybrid transformer and CNN architecture for polyp segmentation, in 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, (2020), 1558–1561. https://doi.org/10.1109/BIBM55620.2022.9995247 |
[33] | K. B. Park, J. Y. Lee, SwinE-Net: Hybrid deep learning approach to novel polyp segmentation using convolutional neural network and swin transformer, J. Comput. Des. Eng., 9 (2022), 616–633. https://doi.org/10.1093/jcde/qwac018 doi: 10.1093/jcde/qwac018 |
[34] | E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, P. Luo, SegFormer: Simple and efficient design for semantic segmentation with transformers, preprint, arXiv: 2105.15203. |
[35] | S. Woo, J. Park, J. Y. Lee, I. S. Kweon, CBAM: Convolutional block attention module, preprint, arXiv: 1807.06521. |
[36] | F. I. Diakogiannis, F. Waldner, P. Caccetta, C. Wu, ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data, ISPRS J. Photogramm. Remote Sens., 162 (2020), 94–114. https://doi.org/10.1016/j.isprsjprs.2020.01.013 doi: 10.1016/j.isprsjprs.2020.01.013 |
[37] | Z. Ma, Y. L. Qi, C. Xu, W. Zhao, M. Lou, Y. M. Wang, et al., ATFE-Net: Axial transformer and feature enhancement-based CNN for ultrasound breast mass segmentation, Comput. Biol. Med., 153 (2023), 106533–106545. https://doi.org/10.1016/j.compbiomed.2022.106533 doi: 10.1016/j.compbiomed.2022.106533 |
[38] | D. Jha, P. H. Smedsrud, M. A. Riegler, P. Halvorsen, T. D. lange, D. Johansen, et al., Kvasir-seg: A segmented polyp dataset, in MultiMedia Modeling: 26th International Conference, Springer, (2020), 451–462. https://doi.org/10.1007/978-3-030-37734-237 |
[39] | J. Bernal, F. J. Sánchez, G. Fernández-Esparrach, D. Gil, C. Rodríguez, F. Vilariño, WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians, Comput. Med. Imaging Graphics, 43 (2015), 99–111. https://doi.org/10.1016/j.compmedimag.2015.02.007 doi: 10.1016/j.compmedimag.2015.02.007 |
[40] | J. Bernal, F. J. Sánchez, F. Vilariño, Towards automatic polyp detection with a polyp appearance model, Pattern Recogn., 45 (2012), 3166–3182. https://doi.org/10.1016/j.patcog.2012.03.002 doi: 10.1016/j.patcog.2012.03.002 |
[41] | J. Silva, A. Histace, O. Romain, X. Dray, B. Granado, Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer, Int. J. Comput. Assisted Radiol. Surg., 9 (2014), 283–293. https://doi.org/10.1007/s11548-013-0926-3 doi: 10.1007/s11548-013-0926-3 |
[42] | D. Vázquez, J. Bernal, F. J. Sánchez, G. Fernández-Esparrach, A. M. López, A. Romero, et al., A benchmark for endoluminal scene segmentation of colonoscopy images, J. Healthcare Eng., 2017 (2017). https://doi.org/10.1155/2017/4037190 doi: 10.1155/2017/4037190 |
[43] | N. Shussman, S. D. Wexner, Colorectal polyps and polyposis syndromes, Gastroenterol. Rep., 2 (2014), 1–15. https://doi.org/10.1093/gastro/got041 doi: 10.1093/gastro/got041 |
[44] | M. Wang, X. W. An, Y. H. Li, N. Li, W. Hang, G. Liu, EMS-Net: Enhanced Multi- Scale Network for Polyp Segmentation, in 2021 43rd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC), IEEE, (2021), 2936–2939. https://doi.org/10.1109/EMBC46164.2021.9630787 |
[45] | Z. Qiu, Z. H. Wang, M. M. Zhang, Z. Y. Xu, J. Fan, L. F. Xu, BDG-Net: boundary distribution guided network for accurate polyp segmentation, in Medical Imaging 2022: Image Processing, SPIE, (2022), 792–799. https://doi.org/10.1117/12.2606785 |
[46] | R. Margolin, L. Zelnik-Manor, A. Tal, How to evaluate foreground maps, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), IEEE, (2014), 248–255. https://doi.org/10.1109/CVPR.2014.39 |
[47] | D. P. Fan, M. M. Cheng, Y. Liu, T. Li, A. Borji, Structure-measure: A new way to evaluate foreground maps, in Proceedings of the IEEE International Conference on Computer Vision, IEEE, (2017), 4548–4557. https://doi.org/10.1109/ICCV.2017.487 |
[48] | D. P. Fan, C. Gong, Y. Cao, B. Ren, M. M. Cheng, A. Borji, Enhanced-alignment measure for binary foreground map evaluation, preprint, arXiv: 1805.10421. |
[49] | C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, (2016), 1063–6919. https://doi.org/10.1109/CVPR.2016.308 |
[50] | A. Howard, M. Sandler, G. Chu, L. C. Chen, B. Chen, M. X. Tan, et al., Searching for MobileNetV3, in Proceedings of the IEEE/CVF International Conference on Computer Vision, IEEE, (2019), 1314–1324. https://doi.org/10.1109/ICCV.2019.00140 |
[51] | G. Huang, Z. Liu, L. V. D. Maaten, K. Q. Weinberger, Densely connected convolutional networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, (2017), 4700–4708. https://doi.org/10.1109/CVPR.2017.243 |
[52] | K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, (2016), 700–778. https://doi.org/10.1109/CVPR.2016.90 |
[53] | M. Tan, Q. V. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, preprint, arXiv: 1905.11946v5 |
[54] | P. Tang, X. T. Yan, Y. Nan, S. Xiang, S. Krammer, T. Lasser, FusionM4Net: A multi-stage multimodal learning algorithm for multi-label skin lesion classification, Med. Image Anal., 76 (2022), 102307. https://doi.org/10.1016/j.media.2021.102307 doi: 10.1016/j.media.2021.102307 |
[55] | J. Tang, S. Millington, S. T. Acton, J. Crandall, S. Hurwitz, Ankle cartilage surface segmentation using directional gradient vector flow snakes, in 2004 International Conference on Image Processing, 2004. ICIP'04, Singapore, 4 (2004), 2745–2748, https://doi.org/10.1109/ICIP.2004.1421672 |
[56] | J. Tang, S. Guo, Q. Sun, Y. Deng, D. Zhou, Speckle reducing bilateral filter for cattle follicle segmentation, BMC Genomics, 11 (2010), 1–9. https://doi.org/10.1186/1471-2164-11-S2-S9 doi: 10.1186/1471-2164-11-S2-S9 |
[57] | J. Tang, X. Liu, H. Cheng, K. M. Robinette, Gender recognition using 3-D human body shapes, IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., 41 (2011), 898–908. https://doi.org/10.1109/TSMCC.2011.2104950 doi: 10.1109/TSMCC.2011.2104950 |
[58] | J. Xu, Y. Y. Cao, Y. Sun, J. Tang, Absolute exponential stability of recurrent neural networks with generalized activation function, IEEE Trans. Neural Networks, 19 (2008), 1075–1089. https://doi.org/10.1109/TNN.2007.2000060 doi: 10.1109/TNN.2007.2000060 |