Deep learning has provided powerful support for person re-identification (person re-id) over the years, and superior performance has been achieved by state-of-the-art. While under practical application scenarios such as public monitoring, the cameras' resolutions are usually 720p, the captured pedestrian areas tend to be closer to $ 128\times 64 $ small pixel size. Research on person re-id at $ 128\times 64 $ small pixel size is limited by less effective pixel information. The frame image qualities are degraded and inter-frame information complementation requires a more careful selection of beneficial frames. Meanwhile, there are various large differences in person images, such as misalignment and image noise, which are harder to distinguish from person information at the small size, and eliminating a specific sub-variance is still not robust enough. The Person Feature Correction and Fusion Network (FCFNet) proposed in this paper introduces three sub-modules, which strive to extract discriminate video-level features from the perspectives of "using complementary valid information between frames" and "correcting large variances of person features". The inter-frame attention mechanism is introduced through frame quality assessment, guiding informative features to dominate the fusion process and generating a preliminary frame quality score to filter low-quality frames. Two other feature correction modules are fitted to optimize the model's ability to perceive information from small-sized images. The experiments on four benchmark datasets confirm the effectiveness of FCFNet.
Citation: Liang She, Meiyue You, Jianyuan Wang, Yangyan Zeng. Video-based Person re-identification with parallel correction and fusion of pedestrian area features[J]. Mathematical Biosciences and Engineering, 2023, 20(2): 3504-3527. doi: 10.3934/mbe.2023164
Deep learning has provided powerful support for person re-identification (person re-id) over the years, and superior performance has been achieved by state-of-the-art. While under practical application scenarios such as public monitoring, the cameras' resolutions are usually 720p, the captured pedestrian areas tend to be closer to $ 128\times 64 $ small pixel size. Research on person re-id at $ 128\times 64 $ small pixel size is limited by less effective pixel information. The frame image qualities are degraded and inter-frame information complementation requires a more careful selection of beneficial frames. Meanwhile, there are various large differences in person images, such as misalignment and image noise, which are harder to distinguish from person information at the small size, and eliminating a specific sub-variance is still not robust enough. The Person Feature Correction and Fusion Network (FCFNet) proposed in this paper introduces three sub-modules, which strive to extract discriminate video-level features from the perspectives of "using complementary valid information between frames" and "correcting large variances of person features". The inter-frame attention mechanism is introduced through frame quality assessment, guiding informative features to dominate the fusion process and generating a preliminary frame quality score to filter low-quality frames. Two other feature correction modules are fitted to optimize the model's ability to perceive information from small-sized images. The experiments on four benchmark datasets confirm the effectiveness of FCFNet.
[1] | M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, S. C. Hoi, Deep learning for person re-identification: A survey and outlook, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2021), 2872–2893. https://doi.org/10.1109/TPAMI.2021.3054775 doi: 10.1109/TPAMI.2021.3054775 |
[2] | Z. Ming, M. Zhu, X. Wang, J. Zhu, J. Cheng, C. Gao, et al., Deep learning-based person re-identification methods: A survey and outlook of recent works, Image Vision Comput., 119 (2022), 104394. https://doi.org/10.1016/j.imavis.2022.104394 doi: 10.1016/j.imavis.2022.104394 |
[3] | L. Zheng, Z. Bie, Y. Sun, J. Wang, C. Su, S. Wang, et al., Mars: A video benchmark for large-scale person re-identification, in European conference on computer vision, Springer, (2016), 868–884. |
[4] | T. Wang, S. Gong, X. Zhu, S. Wang, Person re-identification by video ranking, in European conference on computer vision, Springer, (2014), 688–703. |
[5] | M. Hirzer, C. Beleznai, P. M. Roth, H. Bischof, Person re-identification by descriptive and discriminative classification, in Scandinavian conference on Image analysis, Springer, (2011), 91–102. |
[6] | Z. Zheng, L. Zheng, Y. Yang, Pedestrian alignment network for large-scale person re-identification, IEEE Trans. Circuits Syst. Video Technol., 29 (2019), 3037–3045. https://doi.org/10.1109/TCSVT.2018.2873599 doi: 10.1109/TCSVT.2018.2873599 |
[7] | C. Su, J. Li, S. Zhang, J. Xing, W. Gao, Q. Tian, Pose-driven deep convolutional model for person re-identification, in 2017 IEEE International Conference on Computer Vision, (2017), 3980–3989. https://doi.org/10.1109/ICCV.2017.427 |
[8] | C. Wang, Q. Zhang, C. Huang, W. Liu, X. Wang, Mancs: A multi-task attentional network with curriculum sampling for person re-identification, in European Conference on Computer Vision, 2018. https://doi.org/10.1007/978-3-030-01225-0_23 |
[9] | Y. Liu, J. Yan, W. Ouyang, Quality aware network for set to set recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, (2017), 5790–5799. |
[10] | C. Chen, M. Qi, G. Huang, J. Wu, J. Jiang, X. Li, Learning discriminative features with a dual-constrained guided network for video-based person re-identification, Multimed. Tools Appl., 80 (2021), 28673–28696. https://doi.org/10.1007/s11042-021-11072-y doi: 10.1007/s11042-021-11072-y |
[11] | S. Wang, B. Huang, H. Li, G. Qi, D. Tao, Z. Yu, Key point-aware occlusion suppression and semantic alignment for occluded person re-identification, Inform. Sci., 606 (2022), 669–687. https://doi.org/10.1016/j.ins.2022.05.077 doi: 10.1016/j.ins.2022.05.077 |
[12] | Z. Zhu, Y. Luo, S. Chen, G. Qi, N. Mazur, C. Zhong, et al., Camera style transformation with preserved self-similarity and domain-dissimilarity in unsupervised person re-identification, J. Vis. Commun. Image Represent., 80 (2021), 103303. https://doi.org/10.1016/j.jvcir.2021.103303 doi: 10.1016/j.jvcir.2021.103303 |
[13] | S. Li, F. Li, K. Wang, G. Qi, H. Li, Mutual prediction learning and mixed viewpoints for unsupervised-domain adaptation person re-identification on blockchain, Simul. Model Pract. Theory, 119 (2022), 102568. https://doi.org/10.1016/j.simpat.2022.102568 doi: 10.1016/j.simpat.2022.102568 |
[14] | H. Li, N. Dong, Z. Yu, D. Tao, G. Qi, Triple adversarial learning and multi-view imaginative reasoning for unsupervised domain adaptation person re-identification, IEEE Trans. Circuits Syst. Video Technol., 32 (2021), 2814–2830. https://doi.org/10.1109/TCSVT.2021.3099943 doi: 10.1109/TCSVT.2021.3099943 |
[15] | Y. Lin, L. Zheng, Z. Zheng, Y. Wu, Z. Hu, C. Yan, et al., Improving person re-identification by attribute and identity learning, Pattern Recognit., 95 (2019), 151–161. https://doi.org/10.1016/j.patcog.2019.06.006 doi: 10.1016/j.patcog.2019.06.006 |
[16] | R. Hou, H. Chang, B. Ma, R. Huang, S. Shan, Bicnet-tks: Learning efficient spatial-temporal representation for video person re-identification, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 2014–2023. https://doi.org/10.1109/CVPR46437.2021.00205 |
[17] | H. Li, Y. Chen, D. Tao, Z. Yu, G. Qi, Attribute-aligned domain-invariant feature learning for unsupervised domain adaptation person re-identification, IEEE Trans. Inform. Forensics and Secur., 16 (2020), 1480–1494. https://doi.org/10.1109/TIFS.2020.3036800 doi: 10.1109/TIFS.2020.3036800 |
[18] | Y. Wang, G. Qi, S. Li, Y. Chai, H. Li, Body part-level domain alignment for domain-adaptive person re-identification with transformer framework, IEEE Trans. Inform. Forensics and Secur., 17 (2022), 3321–3334. https://doi.org/10.1109/TIFS.2022.3207893 doi: 10.1109/TIFS.2022.3207893 |
[19] | N. McLaughlin, J. M. Del Rincon, P. Miller, Recurrent convolutional network for video-based person re-identification, in Proceedings of the IEEE conference on computer vision and pattern recognition, (2016), 1325–1334. https://doi.org/10.1109/CVPR.2016.148 |
[20] | Z. Zhou, Y. Huang, W. Wang, L. Wang, T. Tan, See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification, in Proceedings of the IEEE conference on computer vision and pattern recognition, (2017), 4747–4756. https://doi.org/10.1109/CVPR.2017.717 |
[21] | Y. Yan, B. Ni, Z. Song, C. Ma, Y. Yan, X. Yang, Person re-identification via recurrent feature aggregation, in European conference on computer vision, Springer, (2016), 701–716. |
[22] | J. Gao, R. Nevatia, Revisiting temporal modeling for video-based person reid, preprint, arXiv: 1805.02104. |
[23] | Y. Zhao, X. Shen, Z. Jin, H. Lu, X. S. Hua, Attribute-driven feature disentangling and temporal aggregation for video person re-identification, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (2019), 4913–4922. https://doi.org/10.1109/CVPR.2019.00505 |
[24] | T. Rahman, M. Rochan, Y. Wang, Convolutional temporal attention model for video-based person re-identification, in 2019 IEEE International Conference on Multimedia and Expo (ICME), (2019), 1102–1107. https://doi.org/10.1109/ICME.2019.00193 |
[25] | D. Chen, H. Li, T. Xiao, S. Yi, X. Wang, Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding, in Proceedings of the IEEE conference on computer vision and pattern recognition, (2018), 1169–1178. https://doi.org/10.1109/CVPR.2018.00128 |
[26] | S. Xu, Y. Cheng, K. Gu, Y. Yang, S. Chang, P. Zhou, Jointly attentive spatial-temporal pooling networks for video-based person re-identification, in Proceedings of the IEEE international conference on computer vision, (2017), 4733–4742. https://doi.org/10.1109/ICCV.2017.507 |
[27] | M. Farenzena, L. Bazzani, A. Perina, V. Murino, M. Cristani, Person re-identification by symmetry-driven accumulation of local features, in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2010), 2360–2367. https://doi.org/10.1109/CVPR.2010.5539926 |
[28] | D. Gray, H. Tao, Viewpoint invariant pedestrian recognition with an ensemble of localized features, in European Conference on Computer Vision, (2008), 262–275. https://doi.org/10.1007/978-3-540-88682-2_21 |
[29] | S. Liao, Y. Hu, X. Zhu, S. Z. Li, Person re-identification by local maximal occurrence representation and metric learning, in 2015 IEEE Conference on Computer Vision and Pattern Recognition, (2015), 2197–2206. https://doi.org/10.1109/CVPR.2015.7298832 |
[30] | B. Ma, Y. Su, F. Jurie, Bicov: a novel image representation for person re-identification and face verification, in 2012 British Machine Vision Conference, (2012), 1–11. https://doi.org/10.5244/C.26.57 |
[31] | M. Geng, Y. Wang, T. Xiang, Y. Tian, Deep transfer learning for person re-identification, preprint, arXiv: 1611.05244. |
[32] | W. Song, Y. Wu, J. Zheng, C. Chen, F. Liu, Extended global–local representation learning for video person re-identification, IEEE Access, 7 (2019), 122684–122696. https://doi.org/10.1109/ACCESS.2019.2937974 doi: 10.1109/ACCESS.2019.2937974 |
[33] | Q. Xiao, H. Luo, C. Zhang, Margin sample mining loss: A deep learning based method for person re-identification, preprint, arXiv: 1710.00478. |
[34] | J. Meng, W. S. Zheng, J.-H. Lai, L. Wang, Deep graph metric learning for weakly supervised person re-identification, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2022), 6074–6093. https://doi.org/10.1109/TPAMI.2021.3084613 doi: 10.1109/TPAMI.2021.3084613 |
[35] | T. Matsukawa, E. Suzuki, Person re-identification using cnn features learned from combination of attributes, in 2016 23rd international conference on pattern recognition (ICPR), (2016), 2428–2433. https://doi.org/10.1109/ICPR.2016.7900000 |
[36] | M. Zheng, S. Karanam, Z. Wu, R. J. Radke, Re-identification with consistent attentive siamese networks, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 5728–5737. https://doi.org/10.1109/CVPR.2019.00588 |
[37] | Y. Sun, Q. Xu, Y. Li, C. Zhang, Y. Li, S. Wang, et al., Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification, preprint, arXiv: 1904.00537. |
[38] | Y. Fu, Y. Wei, Y. Zhou, H. Shi, G. Huang, X. Wang, et al., Horizontal pyramid matching for person re-identification, in Proceedings of the AAAI conference on artificial intelligence, 33 (2019), 8295–8302. https://doi.org/10.1609/aaai.v33i01.33018295 |
[39] | Z. Ming, Y. Yang, X. Wei, J. Yan, X. Wang, F. Wang, et al., Global-local dynamic feature alignment network for person re-identification, preprint, arXiv: 2109.05759. |
[40] | J. Li, S. Zhang, T. Huang, Multi-scale 3d convolution network for video based person re-identification, in Proceedings of the AAAI Conference on Artificial Intelligence, 33 (2019), 8618–8625. https://doi.org/10.1609/aaai.v33i01.33018618 |
[41] | J. Li, J. Wang, Q. Tian, W. Gao, S. Zhang, Global-local temporal representations for video person re-identification, preprint, arXiv: 1908.10049. |
[42] | K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition, (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90 |
[43] | M. Jaderberg, K. Simonyan, A. Zisserman, K. Kavukcuoglu, Spatial transformer networks, Adv. Neural Inform. Proc. Syst., (2015), 2017–2025. |
[44] | C. Eom, G. Lee, J. Lee, B. Ham, Video-based person re-identification with spatial and temporal memory networks, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 12036–12045. https://doi.org/10.1109/ICCV48922.2021.01182 |
[45] | Y. Wu, Y. Lin, X. Dong, Y. Yan, W. Ouyang, Y. Yang, Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning, in 2018 Proceedings of the IEEE conference on computer vision and pattern recognition, (2018), 5177–5186. https://doi.org/10.1109/CVPR.2018.00543 |
[46] | A. Dehghan, S. Modiri Assari, M. Shah, Gmmcp tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking, in Proceedings of the IEEE conference on computer vision and pattern recognition, (2015), 4091–4099. https://doi.org/10.1109/CVPR.2015.7299036 |
[47] | Z. Zheng, L. Zheng, Y. Yang, Unlabeled samples generated by GAN improve the person re-identification baseline in vitro, in 2017 IEEE International Conference on Computer Vision, (2017), 3774–3782. https://doi.org/10.1109/ICCV.2017.405 |
[48] | W. Wu, J. Liu, K. Zheng, Q. Sun, Z. J. Zha, Temporal complementarity-guided reinforcement learning for image-to-video person re-identification, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2022), 7319–7328. https://doi.org/10.1109/CVPR52688.2022.00717 |
[49] | L. Zheng, Y. Yang, A. G. Hauptmann, Person re-identification: Past, present and future, preprint, arXiv: 1610.02984. |
[50] | J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical image database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), 248–255. https://doi.org/10.1109/CVPR.2009.5206848 |
[51] | H. Luo, Y. Gu, X. Liao, S. Lai, W. Jiang, Bag of tricks and a strong baseline for deep person re-identification, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, (2019), 1487–1495, 2019. https://doi.org/10.1109/CVPRW.2019.00190 |
[52] | Z. Zhong, L. Zheng, G. Kang, S. Li, Y. Yang, Random erasing data augmentation, in Proceedings of the AAAI conference on artificial intelligence, 34 (2020), 13001–13008. https://doi.org/10.1609/aaai.v34i07.7000 |
[53] | W. Ruan, C. Liang, Y. Yu, Z. Wang, W. Liu, J. Chen, et al., Correlation discrepancy insight network for video re-identification, ACM Trans. Multimed. Comput. Commun. Appl., 16 (2020), 1–21. https://doi.org/10.1145/3402666 doi: 10.1145/3402666 |
[54] | L. Cheng, X. Y. Jing, X. Zhu, F. Ma, C. H. Hu, Z. Cai, et al., Scale-fusion framework for improving video-based person re-identification performance, Neural Comput. Appl., 32 (2020), 12841–12858. https://doi.org/10.1007/s00521-020-04730-z doi: 10.1007/s00521-020-04730-z |
[55] | Z. Liu, Y. Wang, A. Li, Hierarchical integration of rich features for video-based person re-identification, IEEE Trans. Circuits Syst. Video Technol., 29 (2018), 3646–3659. https://doi.org/10.1109/TCSVT.2018.2883995 doi: 10.1109/TCSVT.2018.2883995 |
[56] | W. Zhang, X. He, X. Yu, W. Lu, Z. Zha, Q. Tian, A multi-scale spatial-temporal attention model for person re-identification in videos, IEEE Trans. Image Proc., 29 (2020), 3365–3373. https://doi.org/10.1109/TIP.2019.2959653 doi: 10.1109/TIP.2019.2959653 |
[57] | Z. Chen, Z. Zhou, J. Huang, P. Zhang, B. Li, Frame-guided region-aligned representation for video person re-identification, in Proceedings of the AAAI Conference on Artificial Intelligence, 34 (2020), 10591–10598. https://doi.org/10.1609/aaai.v34i07.6632 |
[58] | Y. Wu, O. E. F. Bourahla, X. Li, F. Wu, Q. Tian, X. Zhou, Adaptive graph representation learning for video person re-identification, IEEE Trans. Image Proc., 29 (2020), 8821–8830. https://doi.org/10.1109/TIP.2020.3001693 doi: 10.1109/TIP.2020.3001693 |
[59] | Y. Yan, J. Qin, J. Chen, L. Liu, F. Zhu, Y. Tai, et al., Learning multi-granular hypergraphs for video-based person re-identification, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (2020), 2899–2908. https://doi.org/10.1109/CVPR42600.2020.00297 |