Complementary label learning (CLL) is a type of weakly supervised learning method that utilizes the category of samples that do not belong to a certain class to learn their true category. However, current CLL methods mainly rely on rewriting classification losses without fully leveraging the supervisory information in complementary labels. Therefore, enhancing the supervised information in complementary labels is a promising approach to improve the performance of CLL. In this paper, we propose a novel framework called Complementary Label Enhancement based on Knowledge Distillation (KDCL) to address the lack of attention given to complementary labels. KDCL consists of two deep neural networks: a teacher model and a student model. The teacher model focuses on softening complementary labels to enrich the supervision information in them, while the student model learns from the complementary labels that have been softened by the teacher model. Both the teacher and student models are trained on the dataset that contains only complementary labels. To evaluate the effectiveness of KDCL, we conducted experiments on four datasets, namely MNIST, F-MNIST, K-MNIST and CIFAR-10, using two sets of teacher-student models (Lenet-5+MLP and DenseNet-121+ResNet-18) and three CLL algorithms (PC, FWD and SCL-NL). Our experimental results demonstrate that models optimized by KDCL outperform those trained only with complementary labels in terms of accuracy.
Citation: Peng Ying, Zhongnian Li, Renke Sun, Xinzheng Xu. Complementary label learning based on knowledge distillation[J]. Mathematical Biosciences and Engineering, 2023, 20(10): 17905-17918. doi: 10.3934/mbe.2023796
Complementary label learning (CLL) is a type of weakly supervised learning method that utilizes the category of samples that do not belong to a certain class to learn their true category. However, current CLL methods mainly rely on rewriting classification losses without fully leveraging the supervisory information in complementary labels. Therefore, enhancing the supervised information in complementary labels is a promising approach to improve the performance of CLL. In this paper, we propose a novel framework called Complementary Label Enhancement based on Knowledge Distillation (KDCL) to address the lack of attention given to complementary labels. KDCL consists of two deep neural networks: a teacher model and a student model. The teacher model focuses on softening complementary labels to enrich the supervision information in them, while the student model learns from the complementary labels that have been softened by the teacher model. Both the teacher and student models are trained on the dataset that contains only complementary labels. To evaluate the effectiveness of KDCL, we conducted experiments on four datasets, namely MNIST, F-MNIST, K-MNIST and CIFAR-10, using two sets of teacher-student models (Lenet-5+MLP and DenseNet-121+ResNet-18) and three CLL algorithms (PC, FWD and SCL-NL). Our experimental results demonstrate that models optimized by KDCL outperform those trained only with complementary labels in terms of accuracy.
[1] | Y. Katsura, M. Uchida, Bridging ordinary-label learning and complementary-label learning, in Proceedings of the 12th Asian Conference on Machine Learning (ACML), 129 (2020), 161–176. |
[2] | Y. Li, J. Yang, Y. Song, L. Cao, J. Luo, L. J. Li, Learning from noisy labels with distillation, in 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 97 (2017), 1928–1936. https://doi.org/10.1109/ICCV.2017.211 |
[3] | M. Hu, H. Han, S. Shan, X. Chen, Weakly Supervised image classification through noise regularization, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, (2019), 11509–11517. https://doi.org/10.1109/CVPR.2019.01178 |
[4] | K. H. Lee, X. He, L. Zhang, L. Yang, CleanNet: Transfer learning for scalable image classifier training with label noise, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, (2018), 5447–5456. https://doi.org/10.1109/CVPR.2018.00571 |
[5] | X. Xia, T. Liu, N. Wang, B. Han, C. Gong, G. Niu, et al., Are anchor points really indispensable in label-noise learning, in Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS), Vancouver, (2019), 6838–6849. |
[6] | X. Zhai, A. Oliver, A. Kolesnikov, L. Beyer, S4L: Self-supervised semi-supervised learning, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, (2019), 1476–1485. https://doi.org/10.1109/ICCV.2019.00156 |
[7] | D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, C. A. Raffel, MixMatch: a holistic approach to semi-supervised learning, in Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS), Vancouver, (2019), 5049–5059. |
[8] | T. Miyato, S. I. Maeda, M. Koyama, S. Ishii, Virtual adversarial training: A regularization method for supervised and semi-supervised learning, IEEE Trans. Pattern Anal. Mach. Intell., 41 (2019), 1979–1993. https://doi.org/10.1109/TPAMI.2018.2858821 doi: 10.1109/TPAMI.2018.2858821 |
[9] | T. Sakai, M. C. Plessis, G. Niu, M. Sugiyama, Semi-supervised classification based on classification from positive and unlabeled data, in Proceedings of the 34th International Conference on Machine Learning (ICML), (2017), 2998–3006. |
[10] | Y. Yan, Y. Guo, Partial label learning with batch label correction, in Proceedings of the AAAI Conference on Artificial Intelligence, New York, 34 (2020), 6575–6582. https://doi.org/10.1609/aaai.v34i04.6132 |
[11] | N. Xu, J. Lv, X. Geng, Partial label learning via label enhancement, in Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, 33 (2019), 5557–5564. https://doi.org/10.1609/aaai.v33i01.33015557 |
[12] | M. L. Zhang, F. Yu, Solving the partial label learning problem: an instance-based approach, in Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, (2015), 4048–4054. |
[13] | T. Ishida, G. Niu, M. Sugiyama, Binary classification from positive-confidence data, in Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS), Palais, (2018), 5921–5932. |
[14] | N. Lu, G. Niu, A. K. Menon, M. Sugiyama, On the minimal supervision for training any binary classifier from only unlabeled data, preprint, arXiv: 1808.10585. |
[15] | T. Ishida, G. Niu, W. Hu, M. Sugiyama, Learning from complementary labels, in Proceedings of the 31st International Conference on Neural Information Processing System (NeurIPS), Long Beach, (2017), 5644–5654. |
[16] | X. Yu, T. Liu, M. Gong, D. Tao, Learning with biased complementary labels, in Computer Vision—ECCV 2018, Springer, Cham, 11205 (2018), 68–83. https://doi.org/10.1007/978-3-030-01246-5_5 |
[17] | T. Ishida, G. Niu, A. Menon, M. Sugiyama, Complementary-label learning for arbitrary losses and models, in Proceedings of the 36th International Conference on Machine Learning (ICML), 97 (2019), 2971–2980. |
[18] | Y. T. Chou, G. Niu, H. T. Lin, M. Sugiyama, Unbiased risk estimators can mislead: A case study of learning with complementary labels, in Proceedings of the 37th International Conference on Machine Learning (ICML), 119 (2020), 1929–1938. |
[19] | D. Liu, J. Ning, J. Wu, G. Yang, Extending ordinary-label learning losses to complementary-label learning, IEEE Signal Process. Lett., 28 (2021), 852–856. https://doi.org/10.1109/LSP.2021.3073250 doi: 10.1109/LSP.2021.3073250 |
[20] | H. Ishiguro, T. Ishida, M. Sugiyama, Learning from noisy complementary labels with robust loss functions, IEICE Trans. Inf. Syst., 105 (2022), 364–376. https://doi.org/10.1587/transinf.2021EDP7035 doi: 10.1587/transinf.2021EDP7035 |
[21] | Y. Zhang, F. Liu, Z. Fang, B. Yuan, G. Zhang, J. Lu, Learning from a complementary-label source domain: Theory and algorithms, IEEE Trans. Neural Networks Learn. Syst., 33 (2022), 7667–7681. https://doi.org/10.1109/TNNLS.2021.3086093 doi: 10.1109/TNNLS.2021.3086093 |
[22] | G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network, preprint, arXiv: 1503.02531. |
[23] | Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998), 2278–2324. https://doi.org/10.1109/5.726791 doi: 10.1109/5.726791 |
[24] | F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain, Psychol. Rev., 65 (1958), 386–408. https://doi.org/10.1037/h0042519 doi: 10.1037/h0042519 |
[25] | K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90 |
[26] | G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, (2017), 2261–2269. https://doi.org/10.1109/CVPR.2017.243 |
[27] | J. Jiang, F. Liu, W. W. Y. Ng, Q. Tang, W. Wang, Q. V. Pham, Dynamic incremental ensemble fuzzy classifier for data streams in green internet of things, IEEE Trans. Green Commun. Networking, 6 (2022), 1316–1329. https://doi.org/10.1109/TGCN.2022.3151716 doi: 10.1109/TGCN.2022.3151716 |
[28] | L. Zhang, W. Chen, W. Wang, Z. Jin, C. Zhao, Z. Cai, et al., CBGRU: A detection method of smart contract vulnerability based on a hybrid model, Sensors, 22 (2022), 3577. https://doi.org/10.3390/s22093577 doi: 10.3390/s22093577 |
[29] | J. Jiang, F. Liu, Y. Liu, Q. Tang, B. Wang, G. Zhong, et al., A dynamic ensemble algorithm for anomaly detection in IoT imbalanced data streams, Comput. Commun., 194 (2022), 250–257. https://doi.org/10.1016/j.comcom.2022.07.034 doi: 10.1016/j.comcom.2022.07.034 |