Open-world semi-supervised learning (OWSSL) has received significant attention since it addresses the issue of unlabeled data containing classes not present in the labeled data. Unfortunately, existing OWSSL methods still rely on a large amount of labeled data from seen classes, overlooking the reality that a substantial amount of labels is difficult to obtain in real scenarios. In this paper, we explored a new setting called open-world barely-supervised learning (OWBSL), where only a single label was provided for each seen class, greatly reducing labeling costs. To tackle the OWBSL task, we proposed a novel framework that leveraged augmented pseudo-labels generated for the unlabeled data. Specifically, we first generated initial pseudo-labels for the unlabeled data using visual-language models. Subsequently, to ensure that the pseudo-labels remained reliable while being updated during model training, we enhanced them using predictions from weak data augmentation. This way, we obtained the augmented pseudo-labels. Additionally, to fully exploit the information from unlabeled data, we incorporated consistency regularization based on strong and weak augmentations into our framework. Our experimental results on multiple benchmark datasets demonstrated the effectiveness of our method.
Citation: Zhongnian Li, Yanyan Ding, Meng Wei, Xinzheng Xu. Open-world barely-supervised learning via augmented pseudo labels[J]. Electronic Research Archive, 2024, 32(10): 5804-5818. doi: 10.3934/era.2024268
Open-world semi-supervised learning (OWSSL) has received significant attention since it addresses the issue of unlabeled data containing classes not present in the labeled data. Unfortunately, existing OWSSL methods still rely on a large amount of labeled data from seen classes, overlooking the reality that a substantial amount of labels is difficult to obtain in real scenarios. In this paper, we explored a new setting called open-world barely-supervised learning (OWBSL), where only a single label was provided for each seen class, greatly reducing labeling costs. To tackle the OWBSL task, we proposed a novel framework that leveraged augmented pseudo-labels generated for the unlabeled data. Specifically, we first generated initial pseudo-labels for the unlabeled data using visual-language models. Subsequently, to ensure that the pseudo-labels remained reliable while being updated during model training, we enhanced them using predictions from weak data augmentation. This way, we obtained the augmented pseudo-labels. Additionally, to fully exploit the information from unlabeled data, we incorporated consistency regularization based on strong and weak augmentations into our framework. Our experimental results on multiple benchmark datasets demonstrated the effectiveness of our method.
[1] | D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, C. A. Raffel, Mixmatch: A holistic approach to semi-supervised learning, Adv. Neural Inf. Process. Syst., 32 (2019). |
[2] | D. Berthelot, N. Carlini, E. D. Cubuk, A. Kurakin, K. Sohn, H. Zhang, et al., Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring, preprint, arXiv: 1911.09785. https://doi.org/10.48550/arXiv.1911.09785 |
[3] | Z. Peng, S. Tian, L. Yu, D. Zhang, W. Wu, S. Zhou, Semi-supervised medical image classification with adaptive threshold pseudo-labeling and unreliable sample contrastive loss, Biomed. Signal Process. Control, 79 (2023), 104142. https://doi.org/10.1016/j.bspc.2022.104142 doi: 10.1016/j.bspc.2022.104142 |
[4] | Y. Wang, H. Wang, Y. Shen, J. Fei, W. Li, G. Jin, et al., Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2022), 4238–4247. https://doi.org/10.1109/CVPR52688.2022.00421 |
[5] | H. Xu, L. Liu, Q. Bian, Z. Yang, Semi-supervised semantic segmentation with prototype-based consistency regularization, Adv. Neural Inf. Process. Syst., 35 (2022), 26007–26020. |
[6] | H. Mai, R. Sun, T. Zhang, F. Wu, RankMatch: Exploring the better consistency regularization for semi-supervised semantic segmentation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2024), 3391–3401. https://doi.org/10.1109/CVPR52733.2024.00326 |
[7] | H. Wang, Z. Zhang, J. Gao, W. Hu, A-teacher: Asymmetric network for 3D semi-supervised object detection, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2024), 14978–14987. https://doi.org/10.1109/CVPR52733.2024.01419 |
[8] | M. Xu, Z. Zhang, H. Hu, J. Wang, L. Wang, F. Wei, et al., End-to-end semi-supervised object detection with soft teacher, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 3060–3069. https://doi.org/10.1109/ICCV48922.2021.00305 |
[9] | J. Zhang, X. Lin, W. Zhang, K. Wang, X. Tan, J. Han, et al., Semi-detr: Semi-supervised object detection with detection transformers, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023), 23809–23818. https://doi.org/10.1109/CVPR52729.2023.02280 |
[10] | T. Sosea, C. Caragea, MarginMatch: Improving semi-supervised learning with pseudo-margins, in IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023), 15773–15782. https://doi.org/10.1109/CVPR52729.2023.01514 |
[11] | Y. Wang, H. Chen, Q. Heng, W. Hou, Y. Fan, Z. Wu, et al., FreeMatch: Self-adaptive thresholding for semi-supervised learning, in The Eleventh International Conference on Learning Representations, 2023. |
[12] | K. Cao, M. Brbic, J. Leskovec, Open-world semi-supervised learning, preprint, arXiv: 2102.03526. https://doi.org/10.48550/arXiv.2102.03526 |
[13] | L. Guo, Y. Zhang, Z. Wu, J. Shao, Y. Li, Robust semi-supervised learning when not all classes have labels, Adv. Neural Inf. Process. Syst., 35 (2022), 3305–3317. |
[14] | A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, et al., Learning transferable visual models from natural language supervision, in International Conference on Machine Learning, 139 (2021), 8748–8763. |
[15] | D. H. Lee, Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks, in Workshop on Challenges in Representation Learning, ICML, 3 (2013), 896. |
[16] | P. Cascante-Bonilla, F. Tan, Y. Qi, V. Ordonez, Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning, in Proceedings of the AAAI Conference on Artificial Intelligence, 35 (2021), 6912–6920. https://doi.org/10.1609/aaai.v35i8.16852 |
[17] | J. Hu, C. Chen, L. Cao, S. Zhang, A. Shu, J. Jiang, et al., Pseudo-label alignment for semi-supervised instance segmentation, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2023), 16337–16347. https://doi.org/10.1109/ICCV51070.2023.01497 |
[18] | J. Li, C. Xiong, S. C. Hoi, Comatch: Semi-supervised learning with contrastive graph regularization, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 9475–9484. https://doi.org/10.1109/ICCV48922.2021.00934 |
[19] | E. Arazo, D. Ortego, P. Albert, N. E. O'Connor, K. McGuinness, Pseudo-labeling and confirmation bias in deep semi-supervised learning, in Proceedings of the 2020 International Joint Conference on Neural Networks, (2020), 1–8. https://doi.org/10.1109/ijcnn48605.2020.9207304 |
[20] | S. Laine, T. Aila, Temporal ensembling for semi-supervised learning, preprint, arXiv: 1610.02242. https://doi.org/10.48550/arXiv.1610.02242 |
[21] | A. Tarvainen, H. Valpola, Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, Adv. Neural Inf. Process. Syst., 30 (2017). |
[22] | Q. Xie, Z. Dai, E. Hovy, T. Luong, Q. Le, Unsupervised Data Augmentation for Consistency Training, Adv. Neural Inf. Process. Syst., 33 (2020), 6256–6268. |
[23] | Y. Fan, A. Kukleva, D. Dai, B. Schiele Revisiting consistency regularization for semi-supervised learning, Int. J. Comput. Vision, 131 (2023), 626–643. https://doi.org/10.1007/s11263-022-01723-4 doi: 10.1007/s11263-022-01723-4 |
[24] | K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, et al., Fixmatch: Simplifying semi-supervised learning with consistency and confidence, Adv. Neural Inf. Process. Syst., 33 (2020), 596–608. |
[25] | B. Zhang, Y. Wang, W. Hou, H. Wu, J. Wang, M. Okumura, et al., Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling, Adv. Neural Inf. Process. Syst., 34 (2021), 18408–18419. |
[26] | I. Nassar, M. Hayat, E. Abbasnejad, H. Rezatofighi, G. Haffari, Protocon: Pseudo-label refinement via online clustering and prototypical consistency for efficient semi-supervised learning, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023), 11641–11650. https://doi.org/10.1109/CVPR52729.2023.01120 |
[27] | Y. Chen, X. Tan, B. Zhao, Z. Chen, R. Song, J. Liang, et al., Boosting semi-supervised learning by exploiting all unlabeled data, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023), 7548–7557. https://doi.org/10.1109/CVPR52729.2023.00729 |
[28] | J. Park, S. Yun, J. Jeong, J. Shin, Opencos: Contrastive semi-supervised learning for handling open-set unlabeled data, in European Conference on Computer Vision, (2022), 134–149. https://doi.org/10.1007/978-3-031-25063-7_9 |
[29] | S. Mo, J. Su, C. Ma, M. Assran, I. Misra, L. Yu, et al., Ropaws: Robust semi-supervised representation learning from uncurated data, preprint, arXiv: 2302.14483. https://doi.org/10.48550/arXiv.2302.14483 |
[30] | T. Lucas, P. Weinzaepfel, G. Rogez, Barely-supervised learning: Semi-supervised learning with very few labeled images, in Thirty-Sixth AAAI Conference on Artificial Intelligence, (2022), 1881–1889. https://doi.org/10.1609/aaai.v36i2.20082 |
[31] | G. Gui, Z. Zhao, L. Qi, L. Zhou, L. Wang, Y. Shi, Improving barely supervised learning by discriminating unlabeled samples with super-class, Adv. Neural Inf. Process. Syst., 35 (2022), 19849–19860. |
[32] | Y. Sun, Y. Li, Opencon: Open-world contrastive learning, preprint, arXiv: 2208.02764. https://doi.org/10.48550/arXiv.2208.02764 |
[33] | M. N. Rizve, N. Kardan, M. Shah, Towards realistic semi-supervised learning, in European Conference on Computer Vision, (2022), 437–455. https://doi.org/10.1007/978-3-031-19821-2_25 |
[34] | Y. Wang, Z. Zhong, P. Qiao, X. Cheng, X. Zheng, C. Liu, et al., Discover and align taxonomic context priors for open-world semi-supervised learning, Adv. Neural Inf. Process. Syst., 36 (2024). |
[35] | S. Vaze, K. Han, A. Vedaldi, A. Zisserman, Generalized category discovery, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2022), 7492–7501. https://doi.org/10.1109/CVPR52688.2022.00734 |
[36] | X. Wen, B. Zhao, X. Qi, Parametric classification for generalized category discovery: A baseline study, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2023), 16590–16600. https://doi.org/10.1109/ICCV51070.2023.01521 |
[37] | B. Zhao, X. Wen, K. Han, Learning semi-supervised gaussian mixture models for generalized category discovery, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2023), 16623–16633. https://doi.org/10.1109/ICCV51070.2023.01524 |
[38] | K. Zhou, J. Yang, C. C. Loy, Z. Liu, Learning to prompt for vision-language models, Int. J. Comput. Vision, 130 (2022), 2337–2348. https://doi.org/10.1007/s11263-022-01653-1 doi: 10.1007/s11263-022-01653-1 |
[39] | K. Zhou, J. Yang, C. C. Loy, Z. Liu, Conditional prompt learning for vision-language models, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2022), 16816–16825. https://doi.org/10.1109/CVPR52688.2022.01631 |
[40] | P. Gao, S. Geng, R. Zhang, T. Ma, R. Fang, Y. Zhang, et al., Clip-adapter: Better vision-language models with feature adapters, Int. J. Comput. Vision, 132 (2024), 581–595. https://doi.org/10.1007/s11263-023-01891-x doi: 10.1007/s11263-023-01891-x |
[41] | R. Zhang, R. Fang, W. Zhang, P. Gao, K. Li, J. Dai, et al., Tip-adapter: Training-free clip-adapter for better vision-language modeling, preprint, arXiv: 2111.03930. https://doi.org/10.48550/arXiv.2111.03930 |
[42] | A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, Master's thesis, University of Tront, 2009. |
[43] | Y. Le, X. Yang, Tiny ImageNet Visual Recognition Challenge, CS 231N, 7 (2015), 3. |
[44] | F. Li, F. Rob, P. Pietro, Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories, in 2004 Conference on Computer Vision and Pattern Recognition Workshop, (2004), 178–178. https://doi.org/10.1016/j.cviu.2005.09.012 |
[45] | L. Bossard, M. Guillaumin, L. V. Gool, Food-101-mining discriminative components with random forests, in ECCV 2014, (2014), 446–461. https://doi.org/10.1007/978-3-319-10599-4_29 |
[46] | I. Loshchilov, F. Hutter, Decoupled weight decay regularization, preprint, arXiv: 1711.05101. https://doi.org/10.48550/arXiv.1711.05101 |
[47] | E. D. Cubuk, B. Zoph, J. Shlens, Q. V. Le, Randaugment: Practical automated data augmentation with a reduced search space, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, (2020), 702–703. https://doi.org/10.1109/CVPRW50498.2020.00359 |
[48] | T. DeVries, Improved regularization of convolutional neural networks with cutout, preprint, arXiv: 1708.04552. https://doi.org/10.48550/arXiv.1708.04552 |
[49] | H. Wang, G. Pang, P. Wang, L. Zhang, W. Wei, Y.Zhang, Glocal energy-based learning for few-shot open-set recognition, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023), 7507–7516. https://doi.org/10.1109/CVPR52729.2023.00725 |