Zero-shot learning via visual-semantic aligned autoencoder

Tianshu Wei; Jinjie Huang; Cong Jin; Tianshu Wei; Jinjie Huang; Cong Jin

doi:10.3934/mbe.2023629

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 8: 14081-14095. doi: 10.3934/mbe.2023629

Previous Article Next Article

Research article

Zero-shot learning via visual-semantic aligned autoencoder

1.
School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150006, China
2.
School of Automation, Harbin University of Science and Technology, Harbin 150006, China

Academic Editor: Hao Wang

Received: 25 April 2023 Revised: 29 May 2023 Accepted: 04 June 2023 Published: 25 June 2023

Zero-shot learning recognizes the unseen samples via the model learned from the seen class samples and semantic features. Due to the lack of information of unseen class samples in the training set, some researchers have proposed the method of generating unseen class samples by using generative models. However, the generated model is trained with the training set samples first, and then the unseen class samples are generated, which results in the features of the unseen class samples tending to be biased toward the seen class and may produce large deviations from the real unseen class samples. To tackle this problem, we use the autoencoder method to generate the unseen class samples and combine the semantic features of the unseen classes with the proposed new sample features to construct the loss function. The proposed method is validated on three datasets and showed good results.
- generalized zero-shot learning,
- conventional zero-shot learning,
- autoencoder,
- generated samples,
- modalities alignment
Citation: Tianshu Wei, Jinjie Huang, Cong Jin. Zero-shot learning via visual-semantic aligned autoencoder[J]. Mathematical Biosciences and Engineering, 2023, 20(8): 14081-14095. doi: 10.3934/mbe.2023629

Related Papers:

Abstract

Zero-shot learning recognizes the unseen samples via the model learned from the seen class samples and semantic features. Due to the lack of information of unseen class samples in the training set, some researchers have proposed the method of generating unseen class samples by using generative models. However, the generated model is trained with the training set samples first, and then the unseen class samples are generated, which results in the features of the unseen class samples tending to be biased toward the seen class and may produce large deviations from the real unseen class samples. To tackle this problem, we use the autoencoder method to generate the unseen class samples and combine the semantic features of the unseen classes with the proposed new sample features to construct the loss function. The proposed method is validated on three datasets and showed good results.

References

[1]	R. Gao, X. Hou, J. Qin, Y. Shen, Y. Long, L. Liu, et al., Visual-semantic aligned bidirectional network for zero-shot learning, IEEE Trans. Multimedia, 25 (2022), 1649–1664. https://doi.org/10.1109/TMM.2022.3145666 doi: 10.1109/TMM.2022.3145666
[2]	L. Yang, X. Gao, Q. Gao, J. Han, L. Shao, Label-activating framework for zero-shot learning, Neural Netw., 121 (2020), 1–9. https://doi.org/10.1016/j.neunet.2019.08.023 doi: 10.1016/j.neunet.2019.08.023
[3]	G. Kwon, G. A. Regib, A gating model for bias calibration in generalized zero-shot learning, IEEE Trans. Image Process., (2022), 1. https://doi.org/10.1109/TIP.2022.3153138 doi: 10.1109/TIP.2022.3153138
[4]	T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, preprint, arXiv: 1301.3781.
[5]	X. Li, M. Fang, J. Liu, Low-rank embedded orthogonal subspace learning for zero-shot Classification, J. Visual Commun. Image Representation, 74 (2021), 102981. https://doi.org/10.1016/j.jvcir.2020.102981 doi: 10.1016/j.jvcir.2020.102981
[6]	Z. Ding, M. Shao, Y. Fu, Low-rank embedded ensemble semantic dictionary for zero-shot learning, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 6005–6013. https://doi.org/10.1109/CVPR.2017.636
[7]	Y. Liu, X. Gao, J. Han, L. Liu, L. Shao, Zero-shot learning via a specific rank-controlled semantic autoencoder, Pattern Recognit., 122 (2022), 108237. https://doi.org/10.1016/j.patcog.2021.108237 doi: 10.1016/j.patcog.2021.108237
[8]	E. Kodirov, T. Xiang, S. Gong, Semantic autoencoder for zero-shot learning, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 4447–4456. https://doi.org/10.1109/CVPR.2017.473
[9]	D. P. Kingma, M. Welling, Auto-encoding variational bayes, preprint, arXiv: 1312.6114.
[10]	I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al., Generative adversarial nets, in Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS), 2 (2014), 2672–2680.
[11]	J. Li, M. Jing, K. Lu, L. Zhu, H. T. Shen, Investigating the bilateral connections in generative zero-shot learning, IEEE Trans. Cybern., 52 (2022), 8167–8178. https://doi.org/10.1109/TCYB.2021.3050803 doi: 10.1109/TCYB.2021.3050803
[12]	X. Chen, J. Li, X. Lan, N. Zheng, Generalized zero-shot learning via multi-modal aggregated posterior aligning neural network, IEEE Trans. Multimedia, 24 (2022), 177–187. https://doi.org/10.1109/TMM.2020.3047546 doi: 10.1109/TMM.2020.3047546
[13]	W. Cao, C. Zhou, Y. Wu, Z. Ming, Z. Xu, J. Zhang, Research progress of zero-shot learning beyond computer vision, in International Conference on Algorithms and Architectures for Parallel Processing, 12453 (2020), 538–551. https://doi.org/10.1007/978-3-030-60239-0_36
[14]	W. Chao, S. Changpinyo, B. Gong, F. Sha, An empirical study and analysis of generalized zero-shot learning for object recognition in the wild, in European Conference on Computer Vision (ECCV), 9906 (2016), 52–68. https://doi.org/10.1007/978-3-319-46475-6_4
[15]	W. Xu, Y. Xian, J. Wang, B. Schiele, Z. Akata, Attribute prototype network for zero-shot learning, in Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS), (2020), 21969–21980.
[16]	W. Cao, Y. Wu, Y. Sun, H. Zhang, J. Ren, D. Gu, et al., A review on multimodal zero-shot learning, WIREs Data Min. Knowl. Discovery, 13 (2023), 1488. https://doi.org/10.1002/widm.1488 doi: 10.1002/widm.1488
[17]	W. Cao, Y. Wu, C. Huang, M. J. A. Patwary, X. Wang, MFF: Multi-modal feature fusion for zero-shot learning, Neurocomputing, 510 (2022), 172–180. https://doi.org/10.1016/j.neucom.2022.09.070 doi: 10.1016/j.neucom.2022.09.070
[18]	E. Schönfeld, S. Ebrahimi, S. Sinha, T. Darrell, Z. Akata, Generalized zero- and few-shot learning via aligned variational autoencoders, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 8239–8247. https://doi.org/10.1109/CVPR.2019.00844
[19]	R. Keshari, R. Singh, M. Vatsa, Generalized zero-shot learning via over-complete distribution, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 13297–13305. https://doi.org/10.1109/CVPR42600.2020.01331
[20]	Y. Xian, T. Lorenz, B. Schiele, Z. Akata, Feature generating networks for zero-shot learning, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), (2018), 5542–5551. https://doi.org/10.1109/CVPR.2018.00581
[21]	Y. Yang, X. Zhang, M. Yang, C. Deng, Adaptive bias-aware feature generation for generalized zero-shot learning, IEEE Trans. Multimedia, 25 (2023), 280–290. https://doi.org/10.1109/TMM.2021.3125134 doi: 10.1109/TMM.2021.3125134
[22]	Y. Luo, X. Wang, F. Pourpanah, Dual VAEGAN: A generative model for generalized zero-shot learning, Appl. Soft Comput., 107 (2021), 107352. https://doi.org/10.1016/J.ASOC.2021.107352 doi: 10.1016/J.ASOC.2021.107352
[23]	X. Chen, X. Lan, F. Sun, N. Zheng, A boundary based out-of-distribution classifier for generalized zero-Shot learning, in European Conference on Computer Vision (ECCV), (2020), 572–588. https://doi.org/10.1007/978-3-030-58586-0_34
[24]	W. Cao, Y. Wu, C. Chakraborty, D. Li, L. Zhao, S. K. Ghosh, Sustainable and transferable traffic sign recognition for intelligent transportation systems, IEEE Trans. Intell. Transp. Syst., (2022), 1–11. https://doi.org/10.1109/TITS.2022.3215572 doi: 10.1109/TITS.2022.3215572
[25]	Y. Liu, X. Gao, J. Han, L. Shao, A discriminative cross-aligned variational autoencoder for Zero-Shot Learning, IEEE Trans. Cybern., 53 (2023), 3794–3805. https://doi.org/10.1109/TCYB.2022.3164142 doi: 10.1109/TCYB.2022.3164142
[26]	C. H. Lampert, H. Nickisch, S. Harmeling, Attribute-based classification for zero-shot visual object categorization, IEEE Trans. Pattern Anal. Mach. Intell., 36 (2014), 453–465. https://doi.org/10.1109/TPAMI.2013.140 doi: 10.1109/TPAMI.2013.140
[27]	Y. Xian, C. H. Lampert, B. Schiele, Z. Akata, Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly, IEEE Trans. Pattern Anal. Mach. Intell., 41 (2019), 2251–2265. https://doi.org/10.1109/TPAMI.2018.2857768 doi: 10.1109/TPAMI.2018.2857768
[28]	A. Farhadi, I. Endres, D. Hoiem, D. Forsyth, Describing objects by their attributes, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), 1778–1785. https://doi.org/10.1109/CVPR.2009.5206772
[29]	Z. Akata, S. Reed, D. Walter, H. Lee, B. Schiele, Evaluation of output embeddings for fine-grained image classification, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 2927–2936. https://doi.org/10.1109/CVPR.2015.7298911
[30]	S. Biswas, Y. Annadani, Preserving semantic relations for Zero-shot learning, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2018), 7603–7612. https://doi.org/10.1109/CVPR.2018.00793
[31]	H. Zhang, Y. Long, Y. Guan, L. Shao, Triple verification network for generalized zero-shot learning, IEEE Trans. Image Process., 28 (2019), 506–517. https://doi.org/10.1109/TIP.2018.2869696 doi: 10.1109/TIP.2018.2869696
[32]	J. Liu, X. Li, G. Yang, Cross-class sample synthesis for zero-shot learning, in British Machine Vision Conference, (2018).

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)