Existing pedestrian re-identification models generally have low pedestrian retrieval accuracy when encountering factors such as changes in pedestrian posture and occlusion because the network cannot fully express pedestrian feature information. Therefore, this paper proposes a method to address this problem by combining the attention mechanism with multi-scale feature fusion, and combining the proposed cross-attention module with the ResNet50 backbone network. In this way, the ability of the network to extract strong salient features is significantly improved; at the same time, using the multi-scale feature fusion module to extract multi-scale features from different depths of the network, achieving the complementary advantages between features through feature addition, feature concatenation and feature weight selection. In addition, a feature enhancement method and an efficient pedestrian retrieval strategy are proposed to jointly promote the accuracy of pedestrian retrieval from both the training and testing levels. When tested on the occluded pedestrian recognition datasets Partial-REID and Partial-iLIDS, the accuracy of this method reached 70.1% and 65.6% on the Rank-1 indicator respectively, and 82.2% and 80.5% on the Rank-3 indicator respectively. At the same time, it also achieved high recognition accuracy when tested on the Market1501 dataset and DukeMTMC-reid dataset, reaching 95.9% and 89.9% on the Rank-1 indicator respectively, 89.1% and 80.3% on the mAP indicator respectively, and 67% and 46.2% on the mINP indicator respectively. It can be seen that this method has achieved good results in solving the above problems.
Citation: Songlin Liu, Shouming Zhang, Zijian Diao, Zhenbin Fang, Zeyu Jiao, Zhenyu Zhong. Pedestrian re-identification based on attention mechanism and Multi-scale feature fusion[J]. Mathematical Biosciences and Engineering, 2023, 20(9): 16913-16938. doi: 10.3934/mbe.2023754
Existing pedestrian re-identification models generally have low pedestrian retrieval accuracy when encountering factors such as changes in pedestrian posture and occlusion because the network cannot fully express pedestrian feature information. Therefore, this paper proposes a method to address this problem by combining the attention mechanism with multi-scale feature fusion, and combining the proposed cross-attention module with the ResNet50 backbone network. In this way, the ability of the network to extract strong salient features is significantly improved; at the same time, using the multi-scale feature fusion module to extract multi-scale features from different depths of the network, achieving the complementary advantages between features through feature addition, feature concatenation and feature weight selection. In addition, a feature enhancement method and an efficient pedestrian retrieval strategy are proposed to jointly promote the accuracy of pedestrian retrieval from both the training and testing levels. When tested on the occluded pedestrian recognition datasets Partial-REID and Partial-iLIDS, the accuracy of this method reached 70.1% and 65.6% on the Rank-1 indicator respectively, and 82.2% and 80.5% on the Rank-3 indicator respectively. At the same time, it also achieved high recognition accuracy when tested on the Market1501 dataset and DukeMTMC-reid dataset, reaching 95.9% and 89.9% on the Rank-1 indicator respectively, 89.1% and 80.3% on the mAP indicator respectively, and 67% and 46.2% on the mINP indicator respectively. It can be seen that this method has achieved good results in solving the above problems.
[1] | D. Yi, Z. Lei, S. C. Liao, S. Z. Li, Deep metric learning for person re-identification, in International Conference on Pattern Recognition (ICPR), (2014), 34–39. https://doi.org/10.1109/ICPR.2014.16 |
[2] | L. Wei, S. Zhang, H. Yao, W. Gao, Q. Tian, GLAD: Global–local-alignment descriptor for scalable person re-identification, IEEE Trans. Multimedia, 21 (2018), 986–999. https://doi.org/10.1109/TMM.2018.2870522 doi: 10.1109/TMM.2018.2870522 |
[3] | Y. Sun, L. Zheng, Y. Yang, Q. Tian, S. Wang, Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline), in Lecture Notes in Computer Science, Springer, 2018. https://doi.org/10.1007/978-3-030-01225-0_30 |
[4] | W. Li, X. T. Zhu, S. G. Gong, Harmonious attention network for person re-identification, arXiv preprint, (2018), arXiv: 1802.08122. https://doi.org/10.48550/arXiv.1802.08122 |
[5] | C. Ying, K. Cheng, Pedestrian re-identification method based on multi-scale learning of CNN and TransForme (in Chinese), J. Electron. Inf. Technol., 45 (2023), 2256–2263. https://doi.org/10.11999/JEIT220601 doi: 10.11999/JEIT220601 |
[6] | M. Jin, Y. Y. Li, X. J. Hao, M. Yang, L. G. Zhang, Pedestrian re-identification method based on asymmetric enhanced attention and feature cross fusion (in Chinese), Acta Metrol. Sin., 43 (2022), 1573–1580. https://doi.org/10.3969/j.issn.1000-1158.2022.12.08 doi: 10.3969/j.issn.1000-1158.2022.12.08 |
[7] | X. Yang, L. C. Liu, N. N. Wang, X. Gao, A two-stream dynamic pyramid representation model for video-based person re-identification, IEEE Trans. Image Process., 30 (2021), 6266–6276. https://doi.org/10.1109/TIP.2021.3093759 doi: 10.1109/TIP.2021.3093759 |
[8] | D. X. Xia, H. J. Liu, L. L. Xu, L. Wang, Visible-infrared person re-identification with data augmentation via cycle-consistent adversarial network, Neurocomputing, 443 (2021), 35–46. https://doi.org/10.1016/j.neucom.2021.02.088 doi: 10.1016/j.neucom.2021.02.088 |
[9] | D. Cheng, Y. H. Gong, S. P. Zhou, J. Wang, N. Zheng, Person re-identification by multi-channel parts-based cnn with improved triplet loss function, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 1335–1344. https://doi.org/10.1109/CVPR.2016.149 |
[10] | Z. Y. Liu, P. P. Wan, Feature extraction method for pedestrian re-identification based on attention mechanism (in Chinese), Comput. Appl., 40 (2020), 672–676. https://doi.org/10.11772/j.issn.1001-9081.2019081356 doi: 10.11772/j.issn.1001-9081.2019081356 |
[11] | Z. W. Wei, D. Qu, C. Liu, Feature extraction method for pedestrian re-identification based on connection attentio (in Chinese), Comput. Eng., 48 (2022), 220–226. https://doi.org/10.19678/j.issn.1000-3428.0061884 doi: 10.19678/j.issn.1000-3428.0061884 |
[12] | C. Yan, G. S. Pang, X. Bai, C. Liu, X. Ning, L. Gu, et al., Beyond triplet loss: person re-identification with fine-grained difference-aware pairwise loss, IEEE Trans. Multimedia, 24 (2021), 1665–1677. https://doi.org/10.1109/TMM.2021.3069562 doi: 10.1109/TMM.2021.3069562 |
[13] | J. Li, Pedestrian re-identification enhanced by combining attention and texture features (in Chinese), Comput. Sci. Explor., 16 (2022), 661–668. https://doi.org/10.3778/j.issn.1673-9418.2010046 doi: 10.3778/j.issn.1673-9418.2010046 |
[14] | I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al., Generative adversarial networks, Commun. ACM, 63 (2020), 139–144. https://doi.org/10.1145/3422622 doi: 10.1145/3422622 |
[15] | H. Kim, C. Park, C. Suh, M. Chae, H. Yoon, B. Youn, MPARN: multi-scale path attention residual network for fault diagnosis of rotating machines, J. Comput. Des. Eng., 10 (2023), 860–872. https://doi.org/10.1093/jcde/qwad031 doi: 10.1093/jcde/qwad031 |
[16] | L. Wen, X. Y. Li, L. Gao, A transfer convolutional neural network for fault diagnosis based on ResNet50, Neural Comput. Appl., 32 (2020), 6111–6124. https://doi.org/10.1007/s00521-019-04097-w doi: 10.1007/s00521-019-04097-w |
[17] | M. Shin, Z. Peng, H. Kim, S. Yoo, K. Yoon, Multivariableincorporating super-resolution residual network for transcranial focused ultrasound simulation, Comput. Methods Programs Biomed., 237 (2023), 107591. https://doi.org/10.1016/j.cmpb.2023.107591 doi: 10.1016/j.cmpb.2023.107591 |
[18] | H. Yin, Y. H. Gong, G. Qiu, Side window filterin, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 8758–8766. https://doi.org/10.1109/CVPR.2019.00896 |
[19] | H. Gao, W. Zeng, J. Chen, An improved gray-scale transformation method for pseudo-color image enhancement, Comput. Opt., 43 (2019), 78–82. https://doi.org/10.18287/2412-6179-2019-43-1-78-82 doi: 10.18287/2412-6179-2019-43-1-78-82 |
[20] | X. W. Sun, Q. S. Xu, L. Zhu, An effective Gaussian fitting approach for image contrast enhancemen, IEEE Access, 7 (2019), 31946–31958. https://doi.org/10.1109/ACCESS.2019.2900717 doi: 10.1109/ACCESS.2019.2900717 |
[21] | H. Luo, Y. Z. Gu, X. Y. Liao, S. Lai, W. Jiang, Bag of tricks and a strong baseline for deep person re-identification, in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2019), 1487–1495. https://doi.org/10.1109/CVPRW.2019.00190 |
[22] | M. Ye, J. B. Shen, G. J. Lin, T. Xiang, L. Shao, S. Hoi, Deep learning for person re-identification: A survey and outlook, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2021), 2872–2893. https://doi.org/10.1109/TPAMI.2021.3054775 doi: 10.1109/TPAMI.2021.3054775 |
[23] | Z. Z. Dai, G. Y. Wang, W. H. Yuan, X. Liu, S. Zhu, P. Tan, Cluster contrast for unsupervised person re-identification, arXiv preprint, (2022), arXiv: 2103.11568. https://doi.org/10.48550/arXiv.2103.11568 |
[24] | J. Miao, Y. Wu, P. Liu, Y. Ding, Y. Yang, Pose-guided feature alignment for occluded person re-identification, in IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 542–551. https://doi.org/10.1109/ICCV.2019.00063 |
[25] | Y. P. Zhai, S. J. Lu, Q. X. Ye, X. Shan, J. Chen, R. Ji, et al., Ad-cluster: Augmented discriminative clustering for domain adaptive person re-identification, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 9021–9030. https://doi.org/10.1109/CVPR42600.2020.00904 |
[26] | Q. Q. Zhou, B. N. Zhong, X. Y. Lan, G. Sun, Y. Zhang, B. Zhang, et al., Fine-grained spatial alignment model for person re-identification with focal triplet loss, IEEE Trans. Image Process., 29 (2020), 7578–7589. https://doi.org/10.1109/TIP.2020.3004267 doi: 10.1109/TIP.2020.3004267 |
[27] | Y. J. Li, Y. C. Chen, Y. Y. Lin, X. Du, Y. Wang, Recover and identify: A generative dual model for cross-resolution person re-identification, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 8090–8099. https://doi.org/10.1109/ICCV.2019.00818 |
[28] | C. Liu, X. J. Chang, Y. D. Shen, Unity style transfer for person re-identification, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 6887–6896. https://doi.org/10.1109/CVPR42600.2020.00692 |
[29] | X. S. Chen, C. M. Fu, Y. Zhao, F. Zheng, J. Song, R. Ji, et al., Salience-guided cascaded suppression network for person re-identification, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 3297–3307. https://doi.org/10.1109/CVPR42600.2020.00336 |
[30] | B. H. Chen, W. H. Deng, J. N. Hu, Mixed high-order attention network for person re-identification, in IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 371–381. https://doi.org/10.1109/ICCV.2019.00046 |
[31] | Y. L. Li, J. F. He, T. Z. Zhang, X. Liu, Y. Zhang, F. Wu, Diverse part discovery: Occluded person re-identification with part-aware transformer, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 2897–2906. https://doi.org/10.1109/CVPR46437.2021.00292 |
[32] | L. X. He, X. Y. Liao, W. Liu, X. Liu, P. Cheng, T. Mei, Fastreid: A pytorch toolbox for general instance re-identification, arXiv preprint, (2020), arXiv: 2006.02631. https://doi.org/10.48550/arXiv.2006.02631 |
[33] | D. Cheng, J. Y. Zhou, N. N. Wang, X. Gao, Hybrid dynamic contrast and probability distillation for unsupervised person Re-Id, IEEE Trans. Image Process., 31 (2022), 3334–3346. https://doi.org/10.1109/TIP.2022.3169693 doi: 10.1109/TIP.2022.3169693 |
[34] | Y. Cho, W. J. Kim, S. Hong, S. Yoon, Part-based pseudo label refinement for unsupervised person re-identification, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 7308–7318. https://doi.org/10.1109/CVPR52688.2022.00716 |
[35] | S. T. He, H. Luo, P. C. Wang, F. Wang, H. Li, W. Jiang, Transreid: Transformer-based object re-identification, in IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 15013–15022. https://doi.org/10.1109/ICCV48922.2021.01474 |
[36] | T. L. Chen, S. J. Ding, J. Y. Xie, Y. Yuan, W. Chen, Y. Yang, et al., Abd-net: Attentive but diverse person re-identification, in IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 8351–8361. https://doi.org/10.1109/ICCV.2019.00844 |
[37] | Y. J. Ge, F. Zhu, D. P. Chen, R. Zhao, H. Li, Self-paced contrastive learning with hybrid memory for domain adaptive object re-id, arXiv preprint, (2020), arXiv: 2006.02713. https://doi.org/10.48550/arXiv.2006.02713 |
[38] | Z. D. Wang, J. W. Zhang, L. Zheng, Y. Liu, Y. Sun, Y. Li, et al., Cycas: Self-supervised cycle association for learning re-identifiable descriptions, in Computer Vision–ECCV 2020, Springer, 2020. https://doi.org/10.1007/978-3-030-58621-8_5 |
[39] | Y. X. Ge, D. P. Chen, H. S. Li, Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification, arXiv preprint, (2020), arXiv: 2001.01526. https://doi.org/10.48550/arXiv.2001.01526 |
[40] | H. Chen, Y. H. Wang, B. Lagadec, A. Dantcheva, F. Bremond, Joint generative and contrastive learning for unsupervised person re-identification, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 2004–2013. https://doi.org/10.1109/CVPR46437.2021.00204 |
[41] | M. J. Wang, B. S. Lai, J. Q. Huang, X. Gong, X. Hua, Camera-aware proxies for unsupervised person re-identification, in Proceedings of the AAAI Conference on Artificial Intelligence, 35 (2021), 2764–2772. https://doi.org/10.1609/aaai.v35i4.16381 |
[42] | X. Y. Zhang, D. D. Li, Z. G. Wang, J. Wang, E. Ding, J. Shi, et al., Implicit sample extension for unsupervised person re-identification, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 7359–7368. https://doi.org/10.1109/CVPR52688.2022.00722 |
[43] | L. X. He, J. Liang, H. Q. Li, Z. Sun, Deep spatial feature reconstruction for partial person re-identification: Alignment-free approach, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2018), 7073–7082. https://doi.org/10.1109/CVPR.2018.00739 |
[44] | H. Luo, W. Jiang, X. Fan, C. Zhang, Stnreid: Deep convolutional networks with pairwise spatial transformer networks for partial person re-identification, IEEE Trans. Multimedia, 22 (2020), 2905–2913. https://doi.org/10.1109/TMM.2020.2965491 doi: 10.1109/TMM.2020.2965491 |
[45] | W. S. Zheng, X. Li, T. Xiang, S. Liao, J. Lai, S. Gong, Partial person re-identification, in IEEE International Conference on Computer Vision (ICCV), (2015), 4678–4686. https://doi.org/10.1109/ICCV.2015.531 |
[46] | S. R. Zhou, J. Wu, F. Zhang, P. Sehdev, Depth occlusion perception feature analysis for person re-identification, Pattern Recognit. Lett., 138 (2020), 617–623. https://doi.org/10.1016/j.patrec.2020.09.009 doi: 10.1016/j.patrec.2020.09.009 |
[47] | Y. F. Sun, Q. Xu, Y. L. Li, C. Zhang, Y. Li, S. Wang, et al., Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 393–402. https://doi.org/10.1109/CVPR.2019.00048 |