SegT: Separated edge-guidance transformer network for polyp segmentation

Feiyu Chen; Haiping Ma; Weijia Zhang; Feiyu Chen; Haiping Ma; Weijia Zhang

doi:10.3934/mbe.2023791

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 10: 17803-17821. doi: 10.3934/mbe.2023791

Previous Article Next Article

Research article Special Issues

SegT: Separated edge-guidance transformer network for polyp segmentation

1.
Department of Mathematics, Physics and Information Sciences, Shaoxing University, Shaoxing, China
2.
Department of AOP Physics, Visiting Scholar, University of Oxford, Oxford, United Kingdom

Academic Editor: Giuseppe Ciaburro

Received: 19 June 2023 Revised: 15 August 2023 Accepted: 22 August 2023 Published: 18 September 2023

Accurate segmentation of colonoscopic polyps is considered a fundamental step in medical image analysis and surgical interventions. Many recent studies have made improvements based on the encoder-decoder framework, which can effectively segment diverse polyps. Such improvements mainly aim to enhance local features by using global features and applying attention methods. However, relying only on the global information of the final encoder block can result in losing local regional features in the intermediate layer. In addition, determining the edges between benign regions and polyps could be a challenging task. To address the aforementioned issues, we propose a novel separated edge-guidance transformer (SegT) network that aims to build an effective polyp segmentation model. A transformer encoder that learns a more robust representation than existing convolutional neural network-based approaches was specifically applied. To determine the precise segmentation of polyps, we utilize a separated edge-guidance module consisting of separator and edge-guidance blocks. The separator block is a two-stream operator to highlight edges between the background and foreground, whereas the edge-guidance block lies behind both streams to strengthen the understanding of the edge. Lastly, an innovative cascade fusion module was used and fused the refined multi-level features. To evaluate the effectiveness of SegT, we conducted experiments with five challenging public datasets, and the proposed model achieved state-of-the-art performance.
- polyp segmentation,
- transformer network,
- separated edge-guidance,
- cascade fusion
Citation: Feiyu Chen, Haiping Ma, Weijia Zhang. SegT: Separated edge-guidance transformer network for polyp segmentation[J]. Mathematical Biosciences and Engineering, 2023, 20(10): 17803-17821. doi: 10.3934/mbe.2023791

Related Papers:

Abstract

Accurate segmentation of colonoscopic polyps is considered a fundamental step in medical image analysis and surgical interventions. Many recent studies have made improvements based on the encoder-decoder framework, which can effectively segment diverse polyps. Such improvements mainly aim to enhance local features by using global features and applying attention methods. However, relying only on the global information of the final encoder block can result in losing local regional features in the intermediate layer. In addition, determining the edges between benign regions and polyps could be a challenging task. To address the aforementioned issues, we propose a novel separated edge-guidance transformer (SegT) network that aims to build an effective polyp segmentation model. A transformer encoder that learns a more robust representation than existing convolutional neural network-based approaches was specifically applied. To determine the precise segmentation of polyps, we utilize a separated edge-guidance module consisting of separator and edge-guidance blocks. The separator block is a two-stream operator to highlight edges between the background and foreground, whereas the edge-guidance block lies behind both streams to strengthen the understanding of the edge. Lastly, an innovative cascade fusion module was used and fused the refined multi-level features. To evaluate the effectiveness of SegT, we conducted experiments with five challenging public datasets, and the proposed model achieved state-of-the-art performance.

References

[1]	H. Sung, J. Ferlay, R. L. Siegel, M. Laversanne, I. Soerjomataram, A. Jemal, et al., Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries, CA: Cancer J. Clin., 71 (2021), 209–249. https://doi.org/10.3322/caac.21660 doi: 10.3322/caac.21660
[2]	S. B. Ahn, D. S. Han, J. H. Bae, T. J. Byun, J. P. Kim, C. S. Eun, The miss rate for colorectal adenoma determined by quality-adjusted, back-to-back colonoscopies, Gut Liver, 6 (2012), 64. https://doi.org/10.5009/gnl.2012.6.1.64 doi: 10.5009/gnl.2012.6.1.64
[3]	C. M. C. Le Clercq, M. W. E. Bouwens, E. J. A. Rondagh, C. M. Bakker, E. T. P. Keulen, R. J. de Ridder, et al., Postcolonoscopy colorectal cancers are preventable: a population-based study, Gut, 63 (2014), 957–963. http://doi.org/10.1136/gutjnl-2013-304880 doi: 10.1136/gutjnl-2013-304880
[4]	C. Hao, T. Jin, F. Tan, J. Gao, Z. Ma, J. Cao, The analysis of time-varying high-order moment of wind power time series, Energy Rep., 9 (2023), 3154–3159. https://doi.org/10.1016/j.egyr.2023.02.010 doi: 10.1016/j.egyr.2023.02.010
[5]	J. Cao, D. Zhao, C. Tian, T. Jin, F. Song, Adopting improved adam optimizer to train dendritic neuron model for water quality prediction, Math. Biosci. Eng., 20 (2023), 9489–9510. https://doi.org/10.3934/mbe.2023417 doi: 10.3934/mbe.2023417
[6]	P. Brandao, O. Zisimopoulos, E. Mazomenos, G. Ciuti, J. Bernal, M. Visentini-Scarzanella, et al., Towards a computed-aided diagnosis system in colonoscopy: automatic polyp segmentation using convolution neural networks, J. Med. Rob. Res., 3 (2018). https://doi.org/10.1142/S2424905X18400020
[7]	D. Fan, G. Ji, T. Zhou, G. Chen, H. Fu, J. Shen, et al., Pranet: Parallel reverse attention network for polyp segmentation, in International Conference on Medical Image Computing and Computer-Assisted Intervention, 12266 (2020), 263–273. https://doi.org/10.1007/978-3-030-59725-2_26
[8]	O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in International Conference on Medical Image Computing and Computer-Assisted Intervention, 9351 (2015), 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
[9]	R. Zhang, G. Li, Z. Li, S. Cui, D. Qian, Y. Yu, Adaptive context selection for polyp segmentation, in International Conference on Medical Image Computing and Computer-Assisted Intervention, 12266 (2020), 253–262. https://doi.org/10.1007/978-3-030-59725-2_25
[10]	Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. Liang, Unet++: A nested u-net architecture for medical image segmentation, in International Workshop on Deep Learning in Medical Image Analysis, 11045 (2018), 3–11. https://doi.org/10.1007/978-3-030-00889-5_1
[11]	F. Shen, X. Du, L. Zhang, X. Shu, J. Tang, Triplet contrastive learning for unsupervised vehicle re-identification, preprint, arXiv: 2301.09498.
[12]	N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in European Conference on Computer Vision, 12346 (2020), 213–229. https://doi.org/10.1007/978-3-030-58452-8_13
[13]	A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth 16 $\times$ 16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929.
[14]	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, preprint, arXiv: 1706.03762.
[15]	L. Pan, W. Luan, Y. Zheng, Q. Fu, J. Li, PSGformer: Enhancing 3D point cloud instance segmentation via precise semantic guidance, preprint, arXiv: 2307.07708.
[16]	F. Shen, Y. Xie, J. Zhu, X. Zhu, H. Zeng, Git: Graph interactive transformer for vehicle re-identification, IEEE Trans. Image Process., 32 (2023), 1039–1051. https://doi.org/10.1109/TIP.2023.3238642 doi: 10.1109/TIP.2023.3238642
[17]	D. Fan, G. Ji, M. Cheng, L. Shao, Concealed object detection, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2021), 6024–6042. https://doi.org/10.1109/TPAMI.2021.3085766
[18]	L. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, in European Conference on Computer Vision, 11211 (2018), 833–851. https://doi.org/10.1007/978-3-030-01234-2_49
[19]	D. Bo, W. Wang, D. Fan, J. Li, H. Fu, L. Shao, Polyp-pvt: Polyp segmentation with pyramidvision transformers, preprint, arXiv: 2108.06932.
[20]	X. Li, H. Zhao, L. Han, Y. Tong, S. Tan, K. Yang, Gated fully fusion for semantic segmentation, in Proceedings of the AAAI conference on artificial intelligence, 34 (2020), 11418–11425. https://doi.org/10.1609/aaai.v34i07.6805
[21]	F. Shen, J. Zhu, X. Zhu, Y. Xie, J. Huang, Exploring spatial significance via hybrid pyramidal graph network for vehicle re-identification, IEEE Trans. Intell. Transp. Syst., 23 (2022), 8793–8804. https://doi.org/10.1109/TITS.2021.3086142 doi: 10.1109/TITS.2021.3086142
[22]	F. Shen, J. Zhu, X. Zhu, J. Huang, H. Zeng, Z. Lei, et al., An efficient multiresolution network for vehicle reidentification, IEEE Internet Things J., 9 (2022), 9049–9059. https://doi.org/10.1109/JIOT.2021.3119525 doi: 10.1109/JIOT.2021.3119525
[23]	T. Takikawa, D. Acuna, V. Jampani, S. Fidler, Gated-scnn: Gated shape cnns for semantic segmentation, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 5228–5237. https://doi.org/10.1109/ICCV.2019.00533
[24]	M. Zhen, J. Wang, L. Zhou, S. Li, T. Shen, J. Shang, et al., Joint semantic segmentation and boundary detection using iterative pyramid contexts, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 13663–13672. https://doi.org/10.1109/CVPR42600.2020.01368
[25]	A. Lou, S. Guan, M. H. Loew, Caranet: context axial reverse attention network for segmentation of small medical objects, J. Med. Imaging, 10 (2023). https://doi.org/10.1117/1.JMI.10.1.014005
[26]	H. Ma, H. Yang, D. Huang, Boundary guided context aggregation for semantic segmentation, preprint, arXiv: 2110.14587.
[27]	M. Kim, S. Woo, D. Kim, I. S. Kweon, The devil is in the boundary: Exploiting boundary representation for basis-based instance segmentation, in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), (2021), 928–937. https://doi.org/10.1109/WACV48630.2021.00097
[28]	A. Sánchez-González, B. García-Zapirain, D. Sierra-Sosa, A. Elmaghraby, Automatized colon polyp segmentation via contour region analysis, Comput. Biol. Med., 100 (2018), 152–164. https://doi.org/10.1016/j.compbiomed.2018.07.002 doi: 10.1016/j.compbiomed.2018.07.002
[29]	P. N. Figueiredo, I. N. Figueiredo, L. Pinto, S. Kumar, Y. R. Tsai, A. V. Mamonov, Polyp detection with computer-aided diagnosis in white light colonoscopy: comparison of three different methods, Endosc. Int. Open, 7 (2019), 209–215. https://doi.org/10.1055/a-0808-4456 doi: 10.1055/a-0808-4456
[30]	M. Li, M. Wei, X. He, F. Shen, Enhancing pary features via contrastive attention module for vehicle re-identification, in 2022 IEEE International Conference on Image Processing (ICIP), (2022), 1816–1820. https://doi.org/10.1109/ICIP46576.2022.9897943
[31]	F. Shen, X. Peng, L. Wang, X. Hao, M. Shu, Y. Wang, Hsgm: A hierarchical similarity graph module for object re-identification, in 2022 IEEE International Conference on Multimedia and Expo (ICME), (2022), 1–6. https://doi.org/10.1109/ICME52920.2022.9859883
[32]	F. Shen, L. Lin, M. Wei, J. Liu, J. Zhu, H. Zeng, et al., A large benchmark for fabric image retrieval, in 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC), (2019), 247–251. https://doi.org/10.1109/ICIVC47709.2019.8981065
[33]	M. Li, M. Wei, X. He, F. Shen, Enhancing part features via contrastive attention module for vehicle re-identification, in 2022 IEEE International Conference on Image Processing (ICIP), (2022), 1816–1820. https://doi.org/10.1109/ICIP46576.2022.9897943
[34]	S. Chen, X. Tan, B. Wang, X. Hu, Reverse attention for salient object detection, in European Conference on Computer Vision, 11213 (2018), 236–252. https://doi.org/10.1007/978-3-030-01240-3_15
[35]	H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, H. Jégou, Training data-efficient image transformers & distillation through attention, preprint, arXiv: 2012.12877.
[36]	Z. Pan, B. Zhuang, J. Liu, H. He, J. Cai, Scalable vision transformers with hierarchical pooling, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 367–376. https://doi.org/10.1109/ICCV48922.2021.00043
[37]	K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, Y. Wang, Transformer in transformer, preprint, arXiv: 2103.00112.
[38]	Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, et al., Swin transformer: Hierarchical vision transformer using shifted windows, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986
[39]	W. Wang, E. Xie, X. Li, D. Fan, K. Song, D. Liang, et al., Pvt v2: Improved baselines with pyramid vision transformer, Comput. Visual Media, 8 (2022), 415–424. https://doi.org/10.1007/s41095-022-0274-8 doi: 10.1007/s41095-022-0274-8
[40]	E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, P. Luo, Segformer: Simple and efficient design for semantic segmentation with transformers, preprint, arXiv: 2105.15203.
[41]	J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, et al., Transunet: Transformers make strong encoders for medical image segmentation, arXiv: 2102.04306.
[42]	Y. Zhang, H. Liu, Q. Hu, Transfuse: Fusing transformers and cnns for medical image segmentation, in International Conference on Medical Image Computing and Computer-Assisted Intervention, 12901 (2021), 14–24. https://doi.org/10.1007/978-3-030-87193-2_2
[43]	J. Schlemper, O. Oktay, M. Schaap, M. Heinrich, B. Kainz, B. Glocker, et al., Attention gated networks: Learning to leverage salient regions in medical images, Med. Image Anal., 53 (2019), 197–207. https://doi.org/10.1016/j.media.2019.01.012 doi: 10.1016/j.media.2019.01.012
[44]	Y. Lu, Y. Chen, D. Zhao, J. Chen, Graph-fcn for image semantic segmentation, in International Symposium on Neural Networks, 11554 (2019), 97–105. https://doi.org/10.1007/978-3-030-22796-8_11
[45]	M. M. Rahman, R. Marculescu, Medical image segmentation via cascaded attention decoding, in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), (2023), 6211–6220. https://doi.org/10.1109/WACV56688.2023.00616
[46]	G. Bertasius, J. Shi, L. Torresani, Semantic segmentation with boundary neural fields, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 3602–3610. https://doi.org/10.1109/CVPR.2016.392
[47]	L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, EEE Trans. Pattern Anal. Mach. Intell., 40 (2018), 834–848. https://doi.org/10.1109/TPAMI.2017.2699184 doi: 10.1109/TPAMI.2017.2699184
[48]	Y. Fang, C. Chen, Y. Yuan, K. Tong, Selective feature aggregation network with area-boundary constraints for polyp segmentation, in International Conference on Medical Image Computing and Computer-Assisted Intervention, 11764 (2019), 302–310. https://doi.org/10.1007/978-3-030-32239-7_34
[49]	S. Chen, X. Tan, B. Wang, H. Lu, X. Hu, Y. Fu, Reverse attention-based residual network for salient object detection, IEEE Trans. Image Process., 29 (2020), 3763–3776. https://doi.org/10.1109/TIP.2020.2965989 doi: 10.1109/TIP.2020.2965989
[50]	H. Chen, K. Sun, Z. Tian, C. Shen, Y. Huang, Y. Yan, Blendmask: Top-down meets bottom-up for instance segmentation, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 8573–8581. https://doi.org/10.1109/CVPR42600.2020.00860
[51]	A. Lou, M. Loew, Cfpnet: channel-wise feature pyramid for real-time semantic segmentation, in 2021 IEEE International Conference on Image Processing (ICIP), (2021), 1894–1898. https://doi.org/10.1109/ICIP42928.2021.9506485
[52]	S. Bhojanapalli, A. Chakrabarti, D. Glasner, D. Li, T. Unterthiner, A. Veit, Understanding robustness of transformers for image classification, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 10211–10221. https://doi.org/10.1109/ICCV48922.2021.01007
[53]	W. Wang, E. Xie, X. Li, D. Fan, K. Song, D. Liang, et al., Pyramid vision transformer: A versatile backbone for dense prediction without convolutions, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 548–558. https://doi.org/10.1109/ICCV48922.2021.00061
[54]	K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556.
[55]	K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90
[56]	J. Zhao, J. Liu, D. Fan, Y. Cao, J. Yang, M. Cheng, Egnet: Edge guidance network for salient object detection, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 8778–8787. https://doi.org/10.1109/ICCV.2019.00887
[57]	Z. Zhang, H. Fu, H. Dai, J. Shen, Y. Pang, L. Shao, Et-net: A generic edge-attention guidance network for medical image segmentation, in International Conference on Medical Image Computing and Computer-Assisted Intervention, (2019), 442–450. https://doi.org/10.1007/978-3-030-32239-7_49
[58]	Y. Dai, F. Gieseke, S. Oehmcke, Y. Wu, K. Barnard, Attentional feature fusion, in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), (2021), 3559–3568. https://doi.org/10.1109/WACV48630.2021.00360
[59]	Q. Zhang, Y. Yang, Sa-net: Shuffle attention for deep convolutional neural networks, in ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2021), 2235–2239. https://doi.org/10.1109/ICASSP39728.2021.9414568
[60]	B. Dong, M. Zhuge, Y. Wang, H. Bi, G. Chen, Accurate camouflaged object detection via mixture convolution and interactive fusion, preprint, arXiv: 2101.05687.
[61]	D. Vázquez, J. Bernal, F. J. Sánchez, G. Fernández-Esparrach, A. M. López, A. Romero, A benchmark for endoluminal scene segmentation of colonoscopy images, J. Healthcare Eng., 2017 (2017), 4037190. https://doi.org/10.1155/2017/4037190 doi: 10.1155/2017/4037190
[62]	J. Silva, A. Histace, O. Romain, X. Dray, B. Granado, Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer, Int. J. Comput. Assisted Radiol. Surg., 9 (2014), 283–293. https://doi.org/10.1007/s11548-013-0926-3 doi: 10.1007/s11548-013-0926-3
[63]	J. Bernal, F. J. Sánchez, G. Fernández-Esparrach, D. Gil, C. Rodríguez, F. Vilariño, Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians, Comput. Med. Imaging Graphics, 43 (2015), 99–111. https://doi.org/10.1016/j.compmedimag.2015.02.007 doi: 10.1016/j.compmedimag.2015.02.007
[64]	N. Tajbakhsh, S. R. Gurudu, J. Liang, Automated polyp detection in colonoscopy videos using shape and context information, IEEE Trans. Med. Imaging, 35 (2016), 630–644. https://doi.org/10.1109/TMI.2015.2487997 doi: 10.1109/TMI.2015.2487997
[65]	D. Jha, P. H. Smedsrud, M. A. Riegler, P. Halvorsen, T. de Lange, D. Johansen, et al., Kvasir-seg: A segmented polyp dataset, in International Conference on Multimedia Modeling, 11962 (2020), 451–462. https://doi.org/10.1007/978-3-030-37734-2_37
[66]	C. Huang, H. Wu, Y. Lin, Hardnet-mseg: A simple encoder-decoder polyp segmentation neural network that achieves over 0.9 mean dice and 86 fps, preprint, arXiv: 2101.07172.
[67]	F. Shen, X. He, M. Wei, Y. Xie, A competitive method to vipriors object detection challenge, preprint, arXiv: 2104.09059.
[68]	I. Loshchilov, F. Hutter, Decoupled weight decay regularization, preprint, arXiv: 1711.05101.

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)