Accurate and efficient optic disc and cup segmentation from fundus images is significant for glaucoma screening. However, current neural network-based optic disc (OD) and optic cup (OC) segmentation tend to prioritize the image's local edge features, thus limiting their capacity to model long-term relationships, with errors in delineating the boundaries. To address this issue, we proposed a semi-supervised dual self-integrated transformer network (DST-Net) for joint segmentation of the OD and OC. First, we introduce a dual-view co-training mechanism to construct the encoder and decoder of the self-integrated network from the mutually enhanced feature learning modules of Vision Transformer (ViT) and convolutional neural networks (CNN), which are co-trained with dual views to learn the global and local features of the image adaptively. Moreover, we employ a dual self-integrated teacher-student framework, effectively utilizing large amounts of unlabeled fundus images through semi-supervised learning, thereby refining OD and OC segmentation results. Finally, we use a boundary difference over union loss (BDoU-loss) to optimize boundary prediction further. We implemented the comparative experiments on the publicly available dataset RIGA+. The OD and OC Dice values of the proposed DST-Net reached 95.12 ± 0.14 and 85.69 ± 0.27, respectively, outperforming other state-of-the-art (SOTA) methods. In addition, DST-Net shows strong generalization on the DRISHTI-GS1 and RIM-ONE-v3 datasets, proving its promising prospect in OD and OC segmentation.
Citation: Yanxia Sun, Tianze Xu, Jing Wang, Jinke Wang. DST-Net: Dual self-integrated transformer network for semi-supervised segmentation of optic disc and optic cup in fundus image[J]. Electronic Research Archive, 2025, 33(4): 2216-2245. doi: 10.3934/era.2025097
Accurate and efficient optic disc and cup segmentation from fundus images is significant for glaucoma screening. However, current neural network-based optic disc (OD) and optic cup (OC) segmentation tend to prioritize the image's local edge features, thus limiting their capacity to model long-term relationships, with errors in delineating the boundaries. To address this issue, we proposed a semi-supervised dual self-integrated transformer network (DST-Net) for joint segmentation of the OD and OC. First, we introduce a dual-view co-training mechanism to construct the encoder and decoder of the self-integrated network from the mutually enhanced feature learning modules of Vision Transformer (ViT) and convolutional neural networks (CNN), which are co-trained with dual views to learn the global and local features of the image adaptively. Moreover, we employ a dual self-integrated teacher-student framework, effectively utilizing large amounts of unlabeled fundus images through semi-supervised learning, thereby refining OD and OC segmentation results. Finally, we use a boundary difference over union loss (BDoU-loss) to optimize boundary prediction further. We implemented the comparative experiments on the publicly available dataset RIGA+. The OD and OC Dice values of the proposed DST-Net reached 95.12 ± 0.14 and 85.69 ± 0.27, respectively, outperforming other state-of-the-art (SOTA) methods. In addition, DST-Net shows strong generalization on the DRISHTI-GS1 and RIM-ONE-v3 datasets, proving its promising prospect in OD and OC segmentation.
| [1] |
Y. C. Tham, X. Li, T. Y. Wong, H. A. Quigley, T. Aung, C. Y. Cheng, Global prevalence of glaucoma and projections of glaucoma burden through 2040: A systematic review and meta-analysis, Ophthalmology, 121 (2014), 2081–2090. https://doi.org/10.1016/j.ophtha.2014.05.013 doi: 10.1016/j.ophtha.2014.05.013
|
| [2] |
A. C. Thompson, A. A. Jammal, F. A. Medeiros, A review of deep learning for screening, diagnosis, and detection of glaucoma progression, Transl. Vision Sci. Technol., 9 (2020), 42. https://doi.org/10.1167/tvst.9.2.42 doi: 10.1167/tvst.9.2.42
|
| [3] |
R. C. Zhao, X. L. Chen, X. Y. Liu, Z. L. Chen, F. Guo, S. Li, Direct cup-to-disc ratio estimation for glaucoma screening via semi-supervised learning, IEEE J. Biomed. Health Inf., 24 (2019), 1104–1113. https://doi.org/10.1109/JBHI.2019.2934477 doi: 10.1109/JBHI.2019.2934477
|
| [4] |
C. Jia, F. Shi, M. Zhao, Y. Zhang, X. Cheng, M. Z. Wang, et al., Semantic segmentation with light field imaging and convolutional neural networks, IEEE Trans. Instrum. Meas., 70 (2021), 5017214. https://doi.org/10.1109/TIM.2021.3115204 doi: 10.1109/TIM.2021.3115204
|
| [5] |
T. Hassan, B. Hassan, M. U. Akram, S. Hashimi, A. H. Taguri, N. Werghi, Incremental cross-domain adaptation for robust retinopathy screening via Bayesian deep learning, IEEE Trans. Instrum. Meas., 70 (2021), 2516414. https://doi.org/10.1109/TIM.2021.3122172 doi: 10.1109/TIM.2021.3122172
|
| [6] |
Y. F. Guo, Y. J. Peng, B. Zhang, CAFR-CNN: Coarse-to-fine adaptive faster R-CNN for cross-domain joint optic disc and cup segmentation, Appl. Intell., 51 (2021), 5701–5725. https://doi.org/10.1007/s10489-020-02145-w doi: 10.1007/s10489-020-02145-w
|
| [7] |
L. Luo, D. Y. Xue, F. Pan, X. L. Feng, Joint optic disc and optic cup segmentation based on boundary prior and adversarial learning, Int. J. Comput. Assisted Radiol. Surg., 16 (2021), 905–914. https://doi.org/10.1007/s11548-021-02373-6 doi: 10.1007/s11548-021-02373-6
|
| [8] |
P. S. Yin, Y. W. Xu, J. H. Zhu, J. Liu, C. A. Yi, H. C. Huang, et al., Deep level set learning for optic disc and cup segmentation, Neurocomputing, 464 (2021), 330–341. https://doi.org/10.1016/j.neucom.2021.08.102 doi: 10.1016/j.neucom.2021.08.102
|
| [9] | J. N. Chen, Y. Y. Lu, Q. H. Yu, X. D. Luo, E. Adeli, Y. Wang, et al., Transunet: Transformers make strong encoders for medical image segmentation, preprint, arXiv: 2102.04306. |
| [10] | A.Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in Advances in Neural Information Processing Systems 30 (NIPS 2017), Curran Associates, Inc., 30 (2017), 1–11. |
| [11] | A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. H. Zhai, T. Unterthiner, et al., An image is worth 16×16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929. |
| [12] | J. Wu, W. Ji, H. Z. Fu, M. Xu, Y. M. Jin, Y. W. Xu, Medsegdiff-v2: Diffusion-based medical image segmentation with transformer, in Proceedings of the AAAI Conference on Artificial Intelligence, AAAI Press, 38 (2024), 6030–6038. https://doi.org/10.1609/aaai.v38i6.28418 |
| [13] |
Y. Chen, D. Su, J. Luo, Laplacian-guided hierarchical transformer: A network for medical image segmentation, Comput. Methods Programs Biomed., 260 (2025), 108526. https://doi.org/10.1016/j.cmpb.2024.108526 doi: 10.1016/j.cmpb.2024.108526
|
| [14] |
E. Goceri, Medical image data augmentation: Techniques, comparisons and interpretations, Artif. Intell. Rev., 56 (2023), 12561–12605. https://doi.org/10.1007/s10462-023-10453-z doi: 10.1007/s10462-023-10453-z
|
| [15] |
Y. Wang, J. Cheng, Y. Chen, S. Shao, L. Y. Zhu, Z. Z. Wu, et al., Fvp: Fourier visual prompting for source-free unsupervised domain adaptation of medical image segmentation, IEEE Trans. Med. Imaging, 42 (2023), 3738–3751. https://doi.org/10.1109/TMI.2023.3306105 doi: 10.1109/TMI.2023.3306105
|
| [16] |
P. L. Shi, J. N. Qiu, S. M. D. Abaxi, H. Wei, F. P. W. Lo, W. Yuan, Generalist vision foundation models for medical imaging: A case study of segment anything model on zero-shot medical segmentation, Diagnostics, 13 (2023), 1947. https://doi.org/10.3390/diagnostics13111947 doi: 10.3390/diagnostics13111947
|
| [17] |
J. Zilly, J. M. Buhmann, D. Mahapatra, Glaucoma detection using entropy sampling and ensemble learning for automatic optic cup and disc segmentation, Comput. Med. Imaging Graphics, 55 (2017), 28–41. https://doi.org/10.1016/j.compmedimag.2016.07.012 doi: 10.1016/j.compmedimag.2016.07.012
|
| [18] |
L. Wang, J. Gu, Y. Z. Chen, Y. B. Liang, W. J. Zhang, J. T. Pu, et al., Automated segmentation of the optic disc from fundus images using an asymmetric deep learning network, Pattern Recognit. 112 (2021), 107810. https://doi.org/10.1016/j.patcog.2020.107810 doi: 10.1016/j.patcog.2020.107810
|
| [19] |
A. Tulsani, P. Kumar, S. Pathan, Automated segmentation of optic disc and optic cup for glaucoma assessment using improved UNET++ architecture, Biocybern. Biomed. Eng., 41 (2021), 819–832. https://doi.org/10.1016/j.bbe.2021.05.011 doi: 10.1016/j.bbe.2021.05.011
|
| [20] |
S. Pachade, P. Porwal, M. Kokare, L. Giancardo, F. Meriaudeau, NENet: Nested EfficientNet and adversarial learning for joint optic disc and cup segmentation, Med. Image Anal., 74 (2021) 102253. https://doi.org/10.1016/j.media.2021.102253 doi: 10.1016/j.media.2021.102253
|
| [21] |
X. X. Guo, J. H. Li, Q. F. Lin, Z. H. Tu, X. Y. Hu, S. T. Che, Joint optic disc and cup segmentation using feature fusion and attention, Comput. Biol. Med., 150 (2022), 106094 https://doi.org/10.1016/j.compbiomed.2022.106094 doi: 10.1016/j.compbiomed.2022.106094
|
| [22] |
H. Z. Fu, J. Cheng, Y. W. Xu, D. W. K. Wong, J. Liu, X. C. Cao, Joint optic disc and cup segmentation based on multilabel deep network and polar transformation, IEEE Trans. Med. Imaging, 37 (2018), 1597–1605. https://doi.org/10.1109/TMI.2018.2791488 doi: 10.1109/TMI.2018.2791488
|
| [23] |
Z. Q. Zhu, Z. M. Zhang, G. Q. Qi, Y. Y. Li, Y. Z. Li, L. Mu, A dual-branch network for ultrasound image segmentation, Biomed. Signal Process. Control, 103 (2025), 107368 https://doi.org/10.1016/j.bspc.2024.107368 doi: 10.1016/j.bspc.2024.107368
|
| [24] |
Z. Q. Zhu, X. Y. He, G. Q. Qi, Y. Y. Li, B. S. Cong, Y. Liu, Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal MRI, Inf. Fusion, 91 (2023) 376–387. https://doi.org/10.1016/j.inffus.2022.10.022 doi: 10.1016/j.inffus.2022.10.022
|
| [25] |
Y. H. Fu, J. Chen, J. Li, D. Y. Pan, X. Z. Yue, Y. M. Zhu, Optic disc segmentation by U-net and probability bubble in abnormal fundus images, Pattern Recognit., 117 (2021), 107971. https://doi.org/10.1016/j.patcog.2021.107971 doi: 10.1016/j.patcog.2021.107971
|
| [26] | H. Cao, Y. Y. Wang, J. Chen, D. S. Jiang, X. P. Zhang, Q. Tian, et al., Swin-unet: Unet-like pure transformer for medical image segmentation, in European Conference on Computer Vision–ECCV 2022 Workshops, Springer, (2022), 205–218. https://doi.org/10.1007/978-3-031-25066-8_9 |
| [27] | S. H. Li, X. C. Sui, X. D. Luo, X. X. Xu, Y. Liu, R. Goh, Medical image segmentation using squeeze-and-expansion transformers, in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, (2021), 807–815. https://doi.org/10.24963/ijcai.2021/112 |
| [28] |
Z. Q. Zhu, Z. Y. Wang, G. Q. Qi, N. Mazur, P. Yang, Y. Liu, Brain tumor segmentation in MRI with multi-modality spatial information enhancement and boundary shape correction, Pattern Recognit., 153 (2024), 110553. https://doi.org/10.1016/j.patcog.2024.110553 doi: 10.1016/j.patcog.2024.110553
|
| [29] |
Z. Q. Zhu, K. Yu, G. Q. Qi, B. S. Cong, Y. Y. Li, Z. X. Li, et al., Lightweight medical image segmentation network with multi-scale feature-guided fusion, Comput. Biol. Med., 182 (2024), 109204. https://doi.org/10.1016/j.compbiomed.2024.109204 doi: 10.1016/j.compbiomed.2024.109204
|
| [30] |
Z. Q. Zhu, M. W. Sun, G. Q. Qi, Y. Y. Li, X. B. Gao, Y. Liu, Sparse dynamic volume TransUNet with multi-level edge fusion for brain tumor segmentation, Comput. Biol. Med., 172 (2024), 108284. https://doi.org/10.1016/j.compbiomed.2024.108284 doi: 10.1016/j.compbiomed.2024.108284
|
| [31] |
Y. H. Fu, J. F. Liu, J. Shi, TSCA-Net: Transformer based spatial-channel attention segmentation network for medical images, Comput. Biol. Med, . 170 (2024), 107938. https://doi.org/10.1016/j.compbiomed.2024.107938 doi: 10.1016/j.compbiomed.2024.107938
|
| [32] |
Y. G. Yi, Y. Jiang, B. Zhou, N. Y. Zhang, J. Y. Dai, X. Huang, et al., C2FTFNet: Coarse-to-fine transformer network for joint optic disc and cup segmentation, Comput. Biol. Med., 164 (2023), 107215. https://doi.org/10.1016/j.compbiomed.2023.107215 doi: 10.1016/j.compbiomed.2023.107215
|
| [33] | R. Hussain, H. Basak, Ut-net: Combining u-net and transformer for joint optic disc and cup segmentation and glaucoma detection, preprint, arXiv: 2303.04939. |
| [34] | J. D. Wu, H. H. Fang, F. X. Shang, D. L. Yang, Z. W. Wang, J. Gao, et al., SeATrans: Learning segmentation-assisted diagnosis model via transformer, in Medical Image Computing and Computer Assisted Intervention–MICCAI 2022, Springer, 13432 (2022), 677–687. https://doi.org/10.1007/978-3-031-16434-7_65 |
| [35] | Z. Liu, H. Hu, Y. T. Lin, Z. L. Yao, Z. D. Xie, Y. X. Wei, et al., Swin transformer v2: Scaling up capacity and resolution, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2022), 12009–12019. https://doi.org/10.48550/arXiv.2111.09883 |
| [36] | C. L. Yang, Y. L. Wang, J. M. Zhang, H. Zhang, Z. J. Wei, Z. Lin, et al., Lite vision transformer with enhanced self-attention, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2022), 11998–12008. |
| [37] | B. Han, Q. M. Yao, X. R. Yu, G. Niu, M. Xu, W. H. Hu, et al., Co-teaching: Robust training of deep neural networks with extremely noisy labels, in Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Curran Associates, Inc., 31 (2018), 1–11. |
| [38] | A. Tarvainen, H. Valpola, Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, in Advances in Neural Information Processing Systems 30 (NIPS 2017), Curran Associates, Inc., 30 (2017), 1–10. |
| [39] | S. Laine, T. Aila, Temporal ensembling for semi-supervised learning, preprint, arXiv: 1610.02242. |
| [40] | J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, F. F. Li, Imagenet: A large-scale hierarchical image database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, (2009), 248–255. https://doi.org/10.1109/CVPR.2009.5206848 |
| [41] | F. Sun, Z. M. Luo, S. Z. Li, Boundary difference over union loss for medical image segmentation, in Medical Image Computing and Computer Assisted Intervention–MICCAI 2023, Springer, 14223 (2023), 292–301. https://doi.org/10.1007/978-3-031-43901-8_28 |
| [42] | A. Almazroa, S. Alodhayb, E. Osman, E. Ramadan, M. Hummadi, M. Dlaim, et al., Retinal fundus images for glaucoma analysis: The RIGA dataset, in Medical Imaging 2018: Imaging Informatics for Healthcare, Research, and Applications, SPIE, (2018), 55–62. https://doi.org/10.1117/12.2293584 |
| [43] | M. Bateson, H. Kervadec, J. Dolz, H. Lombaert, I. B. Ayed, Source-relaxed domain adaptation for image segmentation, in Medical Image Computing and Computer Assisted Intervention–MICCAI 2020, Springer, 12261 (2020), 490–499. https://doi.org/10.1007/978-3-030-59710-8_48 |
| [44] |
M. Bateson, H. Kervadec, J. Dolz, H. Lombaert, I. B. Ayed, Source-free domain adaptation for image segmentation, Med. Image Anal., 82 (2022), 102617. https://doi.org/10.1016/j.media.2022.102617 doi: 10.1016/j.media.2022.102617
|
| [45] |
C. Yang, X. Guo, Z. Chen, Y. Yuan, Source free domain adaptation for medical image segmentation with fourier style mining, Med. Image Anal., 79 (2022), 102457. https://doi.org/10.1016/j.media.2022.102457 doi: 10.1016/j.media.2022.102457
|
| [46] |
S. J. Wang, L. Q. Yu, X. Yang, C. W. Fu, P. A. Heng, Patch-based output space adversarial learning for joint optic disc and cup segmentation, IEEE Trans. Med. Imaging, 38 (2019), 2485–2495. https://doi.org/10.1109/TMI.2019.2899910 doi: 10.1109/TMI.2019.2899910
|
| [47] | S. Wang, L. Yu, K. Li, X. Yang, C. W. Fu, P. A. Heng, Boundary and entropy-driven adversarial learning for fundus image segmentation, in Medical Image Computing and Computer Assisted Intervention–MICCAI 2019, Springer, 11764 (2019), 102–110. https://doi.org/10.1007/978-3-030-32239-7_12 |
| [48] | S. Hu, Z. Liao, Y. Xia, ProSFDA: Prompt learning based source-free domain adaptation for medical image segmentation, preprint, arXiv: 2211.11514. |
| [49] |
F. Li, A. Jiang, M. Li, C. Xiao, W. Ji, HPFG: Semi-supervised medical image segmentation framework based on hybrid pseudo-label and feature-guiding, Med. Biol. Eng. Comput., 62 (2024), 405–421. https://doi.org/10.1007/s11517-023-02946-4 doi: 10.1007/s11517-023-02946-4
|
| [50] |
Y. L. He, J. Kong, D. Liu, J. Li, C. Zheng, Self-ensembling with mask-boundary domain adaptation for optic disc and cup segmentation, Eng. Appl. Artif. Intell., 129 (2024), 107635. https://doi.org/10.1016/j.engappai.2023.107635 doi: 10.1016/j.engappai.2023.107635
|
| [51] |
S. Mallick, J. Paul, J. Sil, Response fusion attention U-ConvNext for accurate segmentation of optic disc and optic cup, Neurocomputing, 559 (2023), 126798. https://doi.org/10.1016/j.neucom.2023.126798 doi: 10.1016/j.neucom.2023.126798
|
| [52] | K. Wu, J. Zhang, H. Peng, M. C. Liu, B. Xiao, J. L. Fu, et al., Tinyvit: Fast pretraining distillation for small vision transformers, in European Conference on Computer Vision–ECCV 2022, Springer, 13681 (2022), 68–85. https://doi.org/10.1007/978-3-031-19803-8_5 |
| [53] | S. N. Wadekar, A. Chaurasia, Mobilevitv3: Mobile-friendly vision transformer with simple and effective fusion of local, global and input features, preprint, arXiv: 2209.15159. |