Automated organ segmentation in anatomical sectional images of canines is crucial for clinical applications and the study of sectional anatomy. The manual delineation of organ boundaries by experts is a time-consuming and laborious task. However, semi-automatic segmentation methods have shown low segmentation accuracy. Deep learning-based CNN models lack the ability to establish long-range dependencies, leading to limited segmentation performance. Although Transformer-based models excel at establishing long-range dependencies, they face a limitation in capturing local detail information. To address these challenges, we propose a novel ECA-TFUnet model for organ segmentation in anatomical sectional images of canines. ECA-TFUnet model is a U-shaped CNN-Transformer network with Efficient Channel Attention, which fully combines the strengths of the Unet network and Transformer block. Specifically, The U-Net network is excellent at capturing detailed local information. The Transformer block is equipped in the first skip connection layer of the Unet network to effectively learn the global dependencies of different regions, which improves the representation ability of the model. Additionally, the Efficient Channel Attention Block is introduced to the Unet network to focus on more important channel information, further improving the robustness of the model. Furthermore, the mixed loss strategy is incorporated to alleviate the problem of class imbalance. Experimental results showed that the ECA-TFUnet model yielded 92.63% IoU, outperforming 11 state-of-the-art methods. To comprehensively evaluate the model performance, we also conducted experiments on a public dataset, which achieved 87.93% IoU, still superior to 11 state-of-the-art methods. Finally, we explored the use of a transfer learning strategy to provide good initialization parameters for the ECA-TFUnet model. We demonstrated that the ECA-TFUnet model exhibits superior segmentation performance on anatomical sectional images of canines, which has the potential for application in medical clinical diagnosis.
Citation: Yunling Liu, Yaxiong Liu, Jingsong Li, Yaoxing Chen, Fengjuan Xu, Yifa Xu, Jing Cao, Yuntao Ma. ECA-TFUnet: A U-shaped CNN-Transformer network with efficient channel attention for organ segmentation in anatomical sectional images of canines[J]. Mathematical Biosciences and Engineering, 2023, 20(10): 18650-18669. doi: 10.3934/mbe.2023827
Automated organ segmentation in anatomical sectional images of canines is crucial for clinical applications and the study of sectional anatomy. The manual delineation of organ boundaries by experts is a time-consuming and laborious task. However, semi-automatic segmentation methods have shown low segmentation accuracy. Deep learning-based CNN models lack the ability to establish long-range dependencies, leading to limited segmentation performance. Although Transformer-based models excel at establishing long-range dependencies, they face a limitation in capturing local detail information. To address these challenges, we propose a novel ECA-TFUnet model for organ segmentation in anatomical sectional images of canines. ECA-TFUnet model is a U-shaped CNN-Transformer network with Efficient Channel Attention, which fully combines the strengths of the Unet network and Transformer block. Specifically, The U-Net network is excellent at capturing detailed local information. The Transformer block is equipped in the first skip connection layer of the Unet network to effectively learn the global dependencies of different regions, which improves the representation ability of the model. Additionally, the Efficient Channel Attention Block is introduced to the Unet network to focus on more important channel information, further improving the robustness of the model. Furthermore, the mixed loss strategy is incorporated to alleviate the problem of class imbalance. Experimental results showed that the ECA-TFUnet model yielded 92.63% IoU, outperforming 11 state-of-the-art methods. To comprehensively evaluate the model performance, we also conducted experiments on a public dataset, which achieved 87.93% IoU, still superior to 11 state-of-the-art methods. Finally, we explored the use of a transfer learning strategy to provide good initialization parameters for the ECA-TFUnet model. We demonstrated that the ECA-TFUnet model exhibits superior segmentation performance on anatomical sectional images of canines, which has the potential for application in medical clinical diagnosis.
[1] | K. Karasawa, M. Oda, T. Kitasaka, K. Misawa, M. Fujiwara, C. W. Chu, et al., Multi-atlas pancreas segmentation: Atlas selection based on vessel structure, Med. Image Anal., 39 (2017), 18–28. https://doi.org/10.1016/j.media.2017.03.006 doi: 10.1016/j.media.2017.03.006 |
[2] | P. F. Li, P. Liu, C. L. Chen, H. Duan, W. J. Qiao, O. H. Ognami, The 3D reconstructions of female pelvic autonomic nerves and their related organs based on MRI: a first step towards neuronavigation during nerve-sparing radical hysterectomy, Eur. Radiol., 28 (2018), 4561–4569. https://doi.org/10.1007/s00330-018-5453-8 doi: 10.1007/s00330-018-5453-8 |
[3] | H. S. Park, D. S. Shin, D. H. Cho, Y. W. Jung, J. S. Park, Improved sectioned images and surface models of the whole dog body, Ann. Anat., 196 (2014), 352–359. https://doi.org/10.1016/j.aanat.2014.05.036 doi: 10.1016/j.aanat.2014.05.036 |
[4] | J. S. Park, Y. W. Jung, Software for browsing sectioned images of a dog body and generating a 3D model, Anat. Rec., 299 (2016), 81–87. https://doi.org/10.1002/ar.23200 doi: 10.1002/ar.23200 |
[5] | K. Czeibert, G. Baksa, A. Grimm, S. A. Nagy, E. Kubinyi, Ö. Petneházy, MRI, CT and high resolution macro-anatomical images with cryosectioning of a Beagle brain: creating the base of a multimodal imaging atlas, PLoS One, 14 (2019), e0213458. https://doi.org/10.1371/journal.pone.0213458 doi: 10.1371/journal.pone.0213458 |
[6] | X. Shu, Y. Y. Yang, B. Y. Wu, A neighbor level set framework minimized with the split Bregman method for medical image segmentation, Signal Process., 189 (2021), 108293. https://doi.org/10.1016/j.sigpro.2021.108293 doi: 10.1016/j.sigpro.2021.108293 |
[7] | X. Shu, Y. Y. Yang, J. Liu, X. J. Chang, B. Y. Wu, ALVLS: Adaptive local variances-Based levelset framework for medical images segmentation, Pattern Recogn., 136 (2023), 109257. https://doi.org/10.1016/j.patcog.2022.109257 doi: 10.1016/j.patcog.2022.109257 |
[8] | S. K. Zhou, H. Greenspan, C. Davatzikos, J. S. Duncan, B. Van Ginneken, A. Madabhushi, et al., A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises, Proc. IEEE, 109 (2021), 820–838. https://doi.org/10.1109/JPROC.2021.3054390 doi: 10.1109/JPROC.2021.3054390 |
[9] | A. Majumdar, L. Brattain, B. Telfer, C. Farris, J. Scalera, Detecting intracranial hemorrhage with deep learning, in 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, (2018), 583–587. https://doi.org/10.1109/EMBC.2018.8512336 |
[10] | J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015), 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965 |
[11] | G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 4700–4708. https://doi.org/10.1109/CVPR.2017.243 |
[12] | L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, in Proceedings of the European Conference on Computer Vision (ECCV), (2018), 801–818. |
[13] | O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in International Conference on Medical Image Computing and Computer-assisted Intervention, Springer, (2015), 234–241. https://doi.org/10.1007/978-3-319-24574-4_28 |
[14] | D. Schmid, V. B. Scholz, P. R. Kircher, I. E. Lautenschlaeger, Employing deep convolutional neural networks for segmenting the medial retropharyngeal lymph nodes in CT studies of dogs, Vet. Radiol. Ultrasound, 63 (2022), 763–770. https://doi.org/10.1111/vru.13132 doi: 10.1111/vru.13132 |
[15] | J. Park, B. Choi, J. Ko, J. Chun, I. Park, J. Lee, et al., Deep-learning-based automatic segmentation of head and neck organs for radiation therapy in dogs, Front. Vet. Sci., 8 (2021), 721612. https://doi.org/10.3389/fvets.2021.721612 doi: 10.3389/fvets.2021.721612 |
[16] | H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, et al., Swin-unet: Unet-like pure transformer for medical image segmentation, in European Conference on Computer Vision, (2021), 205–218. https://doi.org/10.1007/978-3-031-25066-8_9 |
[17] | Y. Xu, X. He, G. Xu, G. Qi, K. Yu, L. Yin, et al., A medical image segmentation method based on multi-dimensional statistical features, Front. Neurosci., 16 (2022), 1009581. https://doi.org/10.3389/fnins.2022.1009581 doi: 10.3389/fnins.2022.1009581 |
[18] | A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth 16x16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929. |
[19] | N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in European Conference on Computer Vision, Springer, (2020), 213–229. https://doi.org/10.1007/978-3-030-58452-8_13 |
[20] | S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, et al., Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 6881–6890. https://doi.org/10.1109/CVPR46437.2021.00681 |
[21] | J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, et al., Transunet: Transformers make strong encoders for medical image segmentation, preprint, arXiv: 2102.04306. |
[22] | B. Li, S. Liu, F. Wu, G. Li, M. Zhong, X. Guan, RT‐Unet: An advanced network based on residual network and transformer for medical image segmentation, Int. J. Intell. Syst., 37 (2022), 8565–8582. https://doi.org/10.1002/int.22956 doi: 10.1002/int.22956 |
[23] | H. Wang, P. Cao, J. Wang, O. R. Zaiane, Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer, in Proceedings of the AAAI Conference on Artificial Intelligence, 36 (2022), 2441–2449. https://doi.org/10.1609/aaai.v36i3.20144 |
[24] | Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, ECA-Net: Efficient channel attention for deep convolutional neural networks, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 11534–11542. https://doi.org/10.1109/CVPR42600.2020.01155 |
[25] | A. E. Kavur, N. S. Gezer, M. Barış, S. Aslan, P. H. Conze, V. Groza, et al., CHAOS challenge-combined (CT-MR) healthy abdominal organ segmentation, Med. Image Anal. , 69 (2021), 101950. https://doi.org/10.1016/j.media.2020.101950 doi: 10.1016/j.media.2020.101950 |
[26] | K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 770–778. |
[27] | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in Advances in Neural Information Processing Systems, 30 (2017). |
[28] | H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, Pyramid scene parsing network, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 2881–2890. https://doi.org/10.1109/CVPR.2017.660 |
[29] | J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, et al., Dual attention network for scene segmentation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), 3146–3154. |
[30] | Y. Cao, J. Xu, S. Lin, F. Wei, H. Hu, Gcnet: Non-local networks meet squeeze-excitation networks and beyond, in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019. https://doi.org/10.1109/ICCVW.2019.00246 |
[31] | Y. Yuan, X. Chen, J. Wang, Object-contextual representations for semantic segmentation, in European Conference on Computer Vision, Springer, (2020), 173–190. https://doi.org/10.1007/978-3-030-58539-6_11 |
[32] | Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, et al., Swin transformer: Hierarchical vision transformer using shifted windows, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986 |
[33] | E. Z. Xie, W. H. Wang, Z. D. Yu, A. Anandkumar, J. M. Alvarez, P. Luo, SegFormer: Simple and efficient design for semantic segmentation with transformers, in Advances in Neural Information Processing Systems, 34 (2021), 12077–12090. |
[34] | M. D. Alahmadi, Medical image segmentation with learning semantic and global contextual representation, Diagnostics, 12 (2022), 1548. https://doi.org/10.3390/diagnostics12071548 doi: 10.3390/diagnostics12071548 |
[35] | J. Fang, C. Yang, Y. Shi, N. Wang, Y. Zhao, External attention based TransUNet and label expansion strategy for crack detection, IEEE Trans. Intell. Transp. Syst., 23 (2022), 19054–19063. https://doi.org/10.1109/TITS.2022.3154407 doi: 10.1109/TITS.2022.3154407 |
[36] | M. H. Guo, C. Z. Lu, Q. Hou, Z. Liu, M. M. Cheng, S. M. Hu, SegNeXt: Rethinking convolutional attention design for semantic segmentation, in Advances in Neural Information Processing Systems, 35 (2022), 1140–1156. |
[37] | H. Bao, L. Dong, S. Piao, F. Wei, BEiT: BERT pre-training of image transformers, preprint, arXiv: 2106.08254. |