DST-Net: Dual self-integrated transformer network for semi-supervised segmentation of optic disc and optic cup in fundus image

Yanxia Sun; Tianze Xu; Jing Wang; Jinke Wang; Yanxia Sun; Tianze Xu; Jing Wang; Jinke Wang

doi:10.3934/era.2025097

Electronic Research Archive

2025, Volume 33, Issue 4: 2216-2245. doi: 10.3934/era.2025097

Previous Article Next Article

Research article Special Issues

DST-Net: Dual self-integrated transformer network for semi-supervised segmentation of optic disc and optic cup in fundus image

1.
Weihai Research Institute, Harbin University of Science and Technology, Weihai 264300, China
2.
School of Automation, Harbin University of Science and Technology, Harbin 150080, China

Received: 20 November 2024 Revised: 10 March 2025 Accepted: 02 April 2025 Published: 16 April 2025

Accurate and efficient optic disc and cup segmentation from fundus images is significant for glaucoma screening. However, current neural network-based optic disc (OD) and optic cup (OC) segmentation tend to prioritize the image's local edge features, thus limiting their capacity to model long-term relationships, with errors in delineating the boundaries. To address this issue, we proposed a semi-supervised dual self-integrated transformer network (DST-Net) for joint segmentation of the OD and OC. First, we introduce a dual-view co-training mechanism to construct the encoder and decoder of the self-integrated network from the mutually enhanced feature learning modules of Vision Transformer (ViT) and convolutional neural networks (CNN), which are co-trained with dual views to learn the global and local features of the image adaptively. Moreover, we employ a dual self-integrated teacher-student framework, effectively utilizing large amounts of unlabeled fundus images through semi-supervised learning, thereby refining OD and OC segmentation results. Finally, we use a boundary difference over union loss (BDoU-loss) to optimize boundary prediction further. We implemented the comparative experiments on the publicly available dataset RIGA+. The OD and OC Dice values of the proposed DST-Net reached 95.12 ± 0.14 and 85.69 ± 0.27, respectively, outperforming other state-of-the-art (SOTA) methods. In addition, DST-Net shows strong generalization on the DRISHTI-GS1 and RIM-ONE-v3 datasets, proving its promising prospect in OD and OC segmentation.
- OD and OC,
- segmentation,
- CNN,
- transformer,
- semi-supervised
Citation: Yanxia Sun, Tianze Xu, Jing Wang, Jinke Wang. DST-Net: Dual self-integrated transformer network for semi-supervised segmentation of optic disc and optic cup in fundus image[J]. Electronic Research Archive, 2025, 33(4): 2216-2245. doi: 10.3934/era.2025097

Related Papers:

Abstract

Accurate and efficient optic disc and cup segmentation from fundus images is significant for glaucoma screening. However, current neural network-based optic disc (OD) and optic cup (OC) segmentation tend to prioritize the image's local edge features, thus limiting their capacity to model long-term relationships, with errors in delineating the boundaries. To address this issue, we proposed a semi-supervised dual self-integrated transformer network (DST-Net) for joint segmentation of the OD and OC. First, we introduce a dual-view co-training mechanism to construct the encoder and decoder of the self-integrated network from the mutually enhanced feature learning modules of Vision Transformer (ViT) and convolutional neural networks (CNN), which are co-trained with dual views to learn the global and local features of the image adaptively. Moreover, we employ a dual self-integrated teacher-student framework, effectively utilizing large amounts of unlabeled fundus images through semi-supervised learning, thereby refining OD and OC segmentation results. Finally, we use a boundary difference over union loss (BDoU-loss) to optimize boundary prediction further. We implemented the comparative experiments on the publicly available dataset RIGA+. The OD and OC Dice values of the proposed DST-Net reached 95.12 ± 0.14 and 85.69 ± 0.27, respectively, outperforming other state-of-the-art (SOTA) methods. In addition, DST-Net shows strong generalization on the DRISHTI-GS1 and RIM-ONE-v3 datasets, proving its promising prospect in OD and OC segmentation.

References

[1]	Y. C. Tham, X. Li, T. Y. Wong, H. A. Quigley, T. Aung, C. Y. Cheng, Global prevalence of glaucoma and projections of glaucoma burden through 2040: A systematic review and meta-analysis, Ophthalmology, 121 (2014), 2081–2090. https://doi.org/10.1016/j.ophtha.2014.05.013 doi: 10.1016/j.ophtha.2014.05.013
[2]	A. C. Thompson, A. A. Jammal, F. A. Medeiros, A review of deep learning for screening, diagnosis, and detection of glaucoma progression, Transl. Vision Sci. Technol., 9 (2020), 42. https://doi.org/10.1167/tvst.9.2.42 doi: 10.1167/tvst.9.2.42
[3]	R. C. Zhao, X. L. Chen, X. Y. Liu, Z. L. Chen, F. Guo, S. Li, Direct cup-to-disc ratio estimation for glaucoma screening via semi-supervised learning, IEEE J. Biomed. Health Inf., 24 (2019), 1104–1113. https://doi.org/10.1109/JBHI.2019.2934477 doi: 10.1109/JBHI.2019.2934477
[4]	C. Jia, F. Shi, M. Zhao, Y. Zhang, X. Cheng, M. Z. Wang, et al., Semantic segmentation with light field imaging and convolutional neural networks, IEEE Trans. Instrum. Meas., 70 (2021), 5017214. https://doi.org/10.1109/TIM.2021.3115204 doi: 10.1109/TIM.2021.3115204
[5]	T. Hassan, B. Hassan, M. U. Akram, S. Hashimi, A. H. Taguri, N. Werghi, Incremental cross-domain adaptation for robust retinopathy screening via Bayesian deep learning, IEEE Trans. Instrum. Meas., 70 (2021), 2516414. https://doi.org/10.1109/TIM.2021.3122172 doi: 10.1109/TIM.2021.3122172
[6]	Y. F. Guo, Y. J. Peng, B. Zhang, CAFR-CNN: Coarse-to-fine adaptive faster R-CNN for cross-domain joint optic disc and cup segmentation, Appl. Intell., 51 (2021), 5701–5725. https://doi.org/10.1007/s10489-020-02145-w doi: 10.1007/s10489-020-02145-w
[7]	L. Luo, D. Y. Xue, F. Pan, X. L. Feng, Joint optic disc and optic cup segmentation based on boundary prior and adversarial learning, Int. J. Comput. Assisted Radiol. Surg., 16 (2021), 905–914. https://doi.org/10.1007/s11548-021-02373-6 doi: 10.1007/s11548-021-02373-6
[8]	P. S. Yin, Y. W. Xu, J. H. Zhu, J. Liu, C. A. Yi, H. C. Huang, et al., Deep level set learning for optic disc and cup segmentation, Neurocomputing, 464 (2021), 330–341. https://doi.org/10.1016/j.neucom.2021.08.102 doi: 10.1016/j.neucom.2021.08.102
[9]	J. N. Chen, Y. Y. Lu, Q. H. Yu, X. D. Luo, E. Adeli, Y. Wang, et al., Transunet: Transformers make strong encoders for medical image segmentation, preprint, arXiv: 2102.04306.
[10]	A.Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in Advances in Neural Information Processing Systems 30 (NIPS 2017), Curran Associates, Inc., 30 (2017), 1–11.
[11]	A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. H. Zhai, T. Unterthiner, et al., An image is worth 16×16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929.
[12]	J. Wu, W. Ji, H. Z. Fu, M. Xu, Y. M. Jin, Y. W. Xu, Medsegdiff-v2: Diffusion-based medical image segmentation with transformer, in Proceedings of the AAAI Conference on Artificial Intelligence, AAAI Press, 38 (2024), 6030–6038. https://doi.org/10.1609/aaai.v38i6.28418
[13]	Y. Chen, D. Su, J. Luo, Laplacian-guided hierarchical transformer: A network for medical image segmentation, Comput. Methods Programs Biomed., 260 (2025), 108526. https://doi.org/10.1016/j.cmpb.2024.108526 doi: 10.1016/j.cmpb.2024.108526
[14]	E. Goceri, Medical image data augmentation: Techniques, comparisons and interpretations, Artif. Intell. Rev., 56 (2023), 12561–12605. https://doi.org/10.1007/s10462-023-10453-z doi: 10.1007/s10462-023-10453-z
[15]	Y. Wang, J. Cheng, Y. Chen, S. Shao, L. Y. Zhu, Z. Z. Wu, et al., Fvp: Fourier visual prompting for source-free unsupervised domain adaptation of medical image segmentation, IEEE Trans. Med. Imaging, 42 (2023), 3738–3751. https://doi.org/10.1109/TMI.2023.3306105 doi: 10.1109/TMI.2023.3306105
[16]	P. L. Shi, J. N. Qiu, S. M. D. Abaxi, H. Wei, F. P. W. Lo, W. Yuan, Generalist vision foundation models for medical imaging: A case study of segment anything model on zero-shot medical segmentation, Diagnostics, 13 (2023), 1947. https://doi.org/10.3390/diagnostics13111947 doi: 10.3390/diagnostics13111947
[17]	J. Zilly, J. M. Buhmann, D. Mahapatra, Glaucoma detection using entropy sampling and ensemble learning for automatic optic cup and disc segmentation, Comput. Med. Imaging Graphics, 55 (2017), 28–41. https://doi.org/10.1016/j.compmedimag.2016.07.012 doi: 10.1016/j.compmedimag.2016.07.012
[18]	L. Wang, J. Gu, Y. Z. Chen, Y. B. Liang, W. J. Zhang, J. T. Pu, et al., Automated segmentation of the optic disc from fundus images using an asymmetric deep learning network, Pattern Recognit. 112 (2021), 107810. https://doi.org/10.1016/j.patcog.2020.107810 doi: 10.1016/j.patcog.2020.107810
[19]	A. Tulsani, P. Kumar, S. Pathan, Automated segmentation of optic disc and optic cup for glaucoma assessment using improved UNET++ architecture, Biocybern. Biomed. Eng., 41 (2021), 819–832. https://doi.org/10.1016/j.bbe.2021.05.011 doi: 10.1016/j.bbe.2021.05.011
[20]	S. Pachade, P. Porwal, M. Kokare, L. Giancardo, F. Meriaudeau, NENet: Nested EfficientNet and adversarial learning for joint optic disc and cup segmentation, Med. Image Anal., 74 (2021) 102253. https://doi.org/10.1016/j.media.2021.102253 doi: 10.1016/j.media.2021.102253
[21]	X. X. Guo, J. H. Li, Q. F. Lin, Z. H. Tu, X. Y. Hu, S. T. Che, Joint optic disc and cup segmentation using feature fusion and attention, Comput. Biol. Med., 150 (2022), 106094 https://doi.org/10.1016/j.compbiomed.2022.106094 doi: 10.1016/j.compbiomed.2022.106094
[22]	H. Z. Fu, J. Cheng, Y. W. Xu, D. W. K. Wong, J. Liu, X. C. Cao, Joint optic disc and cup segmentation based on multilabel deep network and polar transformation, IEEE Trans. Med. Imaging, 37 (2018), 1597–1605. https://doi.org/10.1109/TMI.2018.2791488 doi: 10.1109/TMI.2018.2791488
[23]	Z. Q. Zhu, Z. M. Zhang, G. Q. Qi, Y. Y. Li, Y. Z. Li, L. Mu, A dual-branch network for ultrasound image segmentation, Biomed. Signal Process. Control, 103 (2025), 107368 https://doi.org/10.1016/j.bspc.2024.107368 doi: 10.1016/j.bspc.2024.107368
[24]	Z. Q. Zhu, X. Y. He, G. Q. Qi, Y. Y. Li, B. S. Cong, Y. Liu, Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal MRI, Inf. Fusion, 91 (2023) 376–387. https://doi.org/10.1016/j.inffus.2022.10.022 doi: 10.1016/j.inffus.2022.10.022
[25]	Y. H. Fu, J. Chen, J. Li, D. Y. Pan, X. Z. Yue, Y. M. Zhu, Optic disc segmentation by U-net and probability bubble in abnormal fundus images, Pattern Recognit., 117 (2021), 107971. https://doi.org/10.1016/j.patcog.2021.107971 doi: 10.1016/j.patcog.2021.107971
[26]	H. Cao, Y. Y. Wang, J. Chen, D. S. Jiang, X. P. Zhang, Q. Tian, et al., Swin-unet: Unet-like pure transformer for medical image segmentation, in European Conference on Computer Vision–ECCV 2022 Workshops, Springer, (2022), 205–218. https://doi.org/10.1007/978-3-031-25066-8_9
[27]	S. H. Li, X. C. Sui, X. D. Luo, X. X. Xu, Y. Liu, R. Goh, Medical image segmentation using squeeze-and-expansion transformers, in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, (2021), 807–815. https://doi.org/10.24963/ijcai.2021/112
[28]	Z. Q. Zhu, Z. Y. Wang, G. Q. Qi, N. Mazur, P. Yang, Y. Liu, Brain tumor segmentation in MRI with multi-modality spatial information enhancement and boundary shape correction, Pattern Recognit., 153 (2024), 110553. https://doi.org/10.1016/j.patcog.2024.110553 doi: 10.1016/j.patcog.2024.110553
[29]	Z. Q. Zhu, K. Yu, G. Q. Qi, B. S. Cong, Y. Y. Li, Z. X. Li, et al., Lightweight medical image segmentation network with multi-scale feature-guided fusion, Comput. Biol. Med., 182 (2024), 109204. https://doi.org/10.1016/j.compbiomed.2024.109204 doi: 10.1016/j.compbiomed.2024.109204
[30]	Z. Q. Zhu, M. W. Sun, G. Q. Qi, Y. Y. Li, X. B. Gao, Y. Liu, Sparse dynamic volume TransUNet with multi-level edge fusion for brain tumor segmentation, Comput. Biol. Med., 172 (2024), 108284. https://doi.org/10.1016/j.compbiomed.2024.108284 doi: 10.1016/j.compbiomed.2024.108284
[31]	Y. H. Fu, J. F. Liu, J. Shi, TSCA-Net: Transformer based spatial-channel attention segmentation network for medical images, Comput. Biol. Med, . 170 (2024), 107938. https://doi.org/10.1016/j.compbiomed.2024.107938 doi: 10.1016/j.compbiomed.2024.107938
[32]	Y. G. Yi, Y. Jiang, B. Zhou, N. Y. Zhang, J. Y. Dai, X. Huang, et al., C2FTFNet: Coarse-to-fine transformer network for joint optic disc and cup segmentation, Comput. Biol. Med., 164 (2023), 107215. https://doi.org/10.1016/j.compbiomed.2023.107215 doi: 10.1016/j.compbiomed.2023.107215
[33]	R. Hussain, H. Basak, Ut-net: Combining u-net and transformer for joint optic disc and cup segmentation and glaucoma detection, preprint, arXiv: 2303.04939.
[34]	J. D. Wu, H. H. Fang, F. X. Shang, D. L. Yang, Z. W. Wang, J. Gao, et al., SeATrans: Learning segmentation-assisted diagnosis model via transformer, in Medical Image Computing and Computer Assisted Intervention–MICCAI 2022, Springer, 13432 (2022), 677–687. https://doi.org/10.1007/978-3-031-16434-7_65
[35]	Z. Liu, H. Hu, Y. T. Lin, Z. L. Yao, Z. D. Xie, Y. X. Wei, et al., Swin transformer v2: Scaling up capacity and resolution, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2022), 12009–12019. https://doi.org/10.48550/arXiv.2111.09883
[36]	C. L. Yang, Y. L. Wang, J. M. Zhang, H. Zhang, Z. J. Wei, Z. Lin, et al., Lite vision transformer with enhanced self-attention, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2022), 11998–12008.
[37]	B. Han, Q. M. Yao, X. R. Yu, G. Niu, M. Xu, W. H. Hu, et al., Co-teaching: Robust training of deep neural networks with extremely noisy labels, in Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Curran Associates, Inc., 31 (2018), 1–11.
[38]	A. Tarvainen, H. Valpola, Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, in Advances in Neural Information Processing Systems 30 (NIPS 2017), Curran Associates, Inc., 30 (2017), 1–10.
[39]	S. Laine, T. Aila, Temporal ensembling for semi-supervised learning, preprint, arXiv: 1610.02242.
[40]	J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, F. F. Li, Imagenet: A large-scale hierarchical image database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, (2009), 248–255. https://doi.org/10.1109/CVPR.2009.5206848
[41]	F. Sun, Z. M. Luo, S. Z. Li, Boundary difference over union loss for medical image segmentation, in Medical Image Computing and Computer Assisted Intervention–MICCAI 2023, Springer, 14223 (2023), 292–301. https://doi.org/10.1007/978-3-031-43901-8_28
[42]	A. Almazroa, S. Alodhayb, E. Osman, E. Ramadan, M. Hummadi, M. Dlaim, et al., Retinal fundus images for glaucoma analysis: The RIGA dataset, in Medical Imaging 2018: Imaging Informatics for Healthcare, Research, and Applications, SPIE, (2018), 55–62. https://doi.org/10.1117/12.2293584
[43]	M. Bateson, H. Kervadec, J. Dolz, H. Lombaert, I. B. Ayed, Source-relaxed domain adaptation for image segmentation, in Medical Image Computing and Computer Assisted Intervention–MICCAI 2020, Springer, 12261 (2020), 490–499. https://doi.org/10.1007/978-3-030-59710-8_48
[44]	M. Bateson, H. Kervadec, J. Dolz, H. Lombaert, I. B. Ayed, Source-free domain adaptation for image segmentation, Med. Image Anal., 82 (2022), 102617. https://doi.org/10.1016/j.media.2022.102617 doi: 10.1016/j.media.2022.102617
[45]	C. Yang, X. Guo, Z. Chen, Y. Yuan, Source free domain adaptation for medical image segmentation with fourier style mining, Med. Image Anal., 79 (2022), 102457. https://doi.org/10.1016/j.media.2022.102457 doi: 10.1016/j.media.2022.102457
[46]	S. J. Wang, L. Q. Yu, X. Yang, C. W. Fu, P. A. Heng, Patch-based output space adversarial learning for joint optic disc and cup segmentation, IEEE Trans. Med. Imaging, 38 (2019), 2485–2495. https://doi.org/10.1109/TMI.2019.2899910 doi: 10.1109/TMI.2019.2899910
[47]	S. Wang, L. Yu, K. Li, X. Yang, C. W. Fu, P. A. Heng, Boundary and entropy-driven adversarial learning for fundus image segmentation, in Medical Image Computing and Computer Assisted Intervention–MICCAI 2019, Springer, 11764 (2019), 102–110. https://doi.org/10.1007/978-3-030-32239-7_12
[48]	S. Hu, Z. Liao, Y. Xia, ProSFDA: Prompt learning based source-free domain adaptation for medical image segmentation, preprint, arXiv: 2211.11514.
[49]	F. Li, A. Jiang, M. Li, C. Xiao, W. Ji, HPFG: Semi-supervised medical image segmentation framework based on hybrid pseudo-label and feature-guiding, Med. Biol. Eng. Comput., 62 (2024), 405–421. https://doi.org/10.1007/s11517-023-02946-4 doi: 10.1007/s11517-023-02946-4
[50]	Y. L. He, J. Kong, D. Liu, J. Li, C. Zheng, Self-ensembling with mask-boundary domain adaptation for optic disc and cup segmentation, Eng. Appl. Artif. Intell., 129 (2024), 107635. https://doi.org/10.1016/j.engappai.2023.107635 doi: 10.1016/j.engappai.2023.107635
[51]	S. Mallick, J. Paul, J. Sil, Response fusion attention U-ConvNext for accurate segmentation of optic disc and optic cup, Neurocomputing, 559 (2023), 126798. https://doi.org/10.1016/j.neucom.2023.126798 doi: 10.1016/j.neucom.2023.126798
[52]	K. Wu, J. Zhang, H. Peng, M. C. Liu, B. Xiao, J. L. Fu, et al., Tinyvit: Fast pretraining distillation for small vision transformers, in European Conference on Computer Vision–ECCV 2022, Springer, 13681 (2022), 68–85. https://doi.org/10.1007/978-3-031-19803-8_5
[53]	S. N. Wadekar, A. Chaurasia, Mobilevitv3: Mobile-friendly vision transformer with simple and effective fusion of local, global and input features, preprint, arXiv: 2209.15159.

Reader Comments

Your name:*

Email:*
© 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)