The correlation filter object tracking algorithm has gained extensive attention from scholars in the field of tracking because of its excellent tracking performance and efficiency. However, the mathematical modeling relationships of correlation filter tracking frameworks are unclear. Therefore, many forms of correlation filters are susceptible to confusion and misuse. To solve these problems, we attempted to review various forms of the correlation filter and discussed their intrinsic connections. First, we reviewed the basic definitions of the circulant matrix, convolution, and correlation operations. Then, the relationship among the three operations was discussed. Considering this, four mathematical modeling forms of correlation filter object tracking from the literature were listed, and the equivalence of the four modeling forms was theoretically proven. Then, the fast solution of the correlation filter was discussed from the perspective of the diagonalization property of the circulant matrix and the convolution theorem. In addition, we delved into the difference between the one-dimensional and two-dimensional correlation filter responses as well as the reasons for their generation. Numerical experiments were conducted to verify the proposed perspectives. The results showed that the filters calculated based on the diagonalization property and the convolution property of the cyclic matrix were completely equivalent. The experimental code of this paper is available at https://github.com/110500617/Correlation-filter/tree/main.
Citation: Yingpin Chen, Kaiwei Chen. Four mathematical modeling forms for correlation filter object tracking algorithms and the fast calculation for the filter[J]. Electronic Research Archive, 2024, 32(7): 4684-4714. doi: 10.3934/era.2024213
The correlation filter object tracking algorithm has gained extensive attention from scholars in the field of tracking because of its excellent tracking performance and efficiency. However, the mathematical modeling relationships of correlation filter tracking frameworks are unclear. Therefore, many forms of correlation filters are susceptible to confusion and misuse. To solve these problems, we attempted to review various forms of the correlation filter and discussed their intrinsic connections. First, we reviewed the basic definitions of the circulant matrix, convolution, and correlation operations. Then, the relationship among the three operations was discussed. Considering this, four mathematical modeling forms of correlation filter object tracking from the literature were listed, and the equivalence of the four modeling forms was theoretically proven. Then, the fast solution of the correlation filter was discussed from the perspective of the diagonalization property of the circulant matrix and the convolution theorem. In addition, we delved into the difference between the one-dimensional and two-dimensional correlation filter responses as well as the reasons for their generation. Numerical experiments were conducted to verify the proposed perspectives. The results showed that the filters calculated based on the diagonalization property and the convolution property of the cyclic matrix were completely equivalent. The experimental code of this paper is available at https://github.com/110500617/Correlation-filter/tree/main.
[1] | S. Javed, M. Danelljan, F. S. Khan, M. H. Khan, M. Felsberg, J. Matas, Visual object tracking with discriminative filters and siamese networks: A survey and outlook, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2023), 6552–6574. https://doi.org/10.1109/TPAMI.2022.3212594 doi: 10.1109/TPAMI.2022.3212594 |
[2] | F. Chen, X. Wang, Y. Zhao, S. Lv, X. Niu, Visual object tracking: A survey, Comput. Vision Image Understanding, 222 (2022), 103508. https://doi.org/10.1016/j.cviu.2022.103508 doi: 10.1016/j.cviu.2022.103508 |
[3] | D. Zhang, Z. Zheng, M. Li, R. Liu, CSART: Channel and spatial attention-guided residual learning for real-time object tracking, Neurocomputing, 436 (2021), 260–272. https://doi.org/10.1016/j.neucom.2020.11.046 doi: 10.1016/j.neucom.2020.11.046 |
[4] | F. Gu, J. Lu, C. Cai, Q. Zhu, Z. Ju, RTSformer: A robust toroidal transformer with spatiotemporal features for visual tracking, IEEE Trans. Hum.-Mach. Syst., 54 (2024), 214–225. https://doi.org/10.1109/THMS.2024.3370582 doi: 10.1109/THMS.2024.3370582 |
[5] | Y. Qian, L. Yu, W. Liu, A. G. Hauptmann, Electricity: An efficient multi-camera vehicle tracking system for intelligent city, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2020), 2511–2519. https://doi.org/10.1109/CVPRW50498.2020.00302 |
[6] | X. Chen, X. Xu, Y. Yang, Y. Huang, J. Chen, Y. Yan, Visual ship tracking via a hybrid kernelized correlation filter and anomaly cleansing framework, Appl. Ocean Res., 106 (2021), 102455. https://doi.org/10.1016/j.apor.2020.102455 doi: 10.1016/j.apor.2020.102455 |
[7] | H. Zhang, Y. Li, H. Liu, D. Yuan, Y. Yang, Feature block-aware correlation filters for real-time UAV tracking, IEEE Signal Process. Lett., 31 (2024), 840–844. https://doi.org/10.1109/LSP.2024.3373528 doi: 10.1109/LSP.2024.3373528 |
[8] | X. Wang, D. Zeng, Y. Li, M. Zou, Q. Zhao, S. Li, Enhancing UAV tracking: a focus on discriminative representations using contrastive instances, J. R.-Time Image Process., 21 (2024), 78. https://doi.org/10.1007/s11554-024-01456-2 doi: 10.1007/s11554-024-01456-2 |
[9] | C. Zhu, J. Yang, Z. Shao, C. Liu, Vision based hand gesture recognition using 3D shape context, IEEE/CAA J. Autom. Sin., 8 (2021), 1600–1613. https://doi.org/10.1109/JAS.2019.1911534 doi: 10.1109/JAS.2019.1911534 |
[10] | M. N. H. Mohd, M. S. M. Asaari, O. L. Ping, B. A. Rosdi, Vision-based hand detection and tracking using fusion of kernelized correlation filter and single-shot detection, Appl. Sci., 13 (2023), 7433. https://doi.org/10.3390/app13137433 doi: 10.3390/app13137433 |
[11] | J. F. Henriques, R. Caseiro, P. Martins, J. Batista, High-speed tracking with kernelized correlation filters, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015), 583–596. https://doi.org/10.1109/TPAMI.2014.2345390 doi: 10.1109/TPAMI.2014.2345390 |
[12] | Y. Li, J. Zhu, A scale adaptive kernel correlation filter tracker with feature integration, in Computer Vision-ECCV 2014 Workshops, 8926 (2014), 254–265. https://doi.org/10.1007/978-3-319-16181-5_18 |
[13] | M. Danelljan, G. Hager, F. S. Khan, M. Felsberg, Learning spatially regularized correlation filters for visual tracking, in 2015 IEEE International Conference on Computer Vision (ICCV), (2015), 4310–4318. https://doi.org/10.1109/ICCV.2015.490 |
[14] | C. Ma, X. Yang, C. Zhang, M. Yang, Long-term correlation tracking, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 5388–5396. https://doi.org/10.1109/CVPR.2015.7299177 |
[15] | M. Danelljan, G. Hä ger, F. S. Khan, M. Felsberg, Discriminative scale space tracking, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2017), 1561–1575. https://doi.org/10.1109/TPAMI.2016.2609928 doi: 10.1109/TPAMI.2016.2609928 |
[16] | M. Danelljan, G. Bhat, F. Shahbaz Khan, M. Felsberg, ECO: Efficient convolution operators for tracking, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 6931–6939. https://doi.org/10.1109/CVPR.2017.733 |
[17] | A. Lukezic, T. Vojir, L. C. Zajc, J. Matas, M. Kristan, Discriminative correlation filter with channel and spatial reliability, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 4847–4856. https://doi.org/10.1109/CVPR.2017.515 |
[18] | Z. Huang, C. Fu, Y. Li, F. Lin, P. Lu, Learning aberrance repressed correlation filters for real-time UAV tracking, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 2891–2900. https://doi.org/10.1109/ICCV.2019.00298 |
[19] | B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, J. Yan, SiamRPN++: Evolution of siamese visual tracking with very deep networks, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 4277–4286. https://doi.org/10.1109/CVPR.2019.00441 |
[20] | T. Xu, Z. Feng, X. Wu, J. Kittler, Joint group feature selection and discriminative filter learning for robust visual object tracking, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 7949–7959. https://doi.org/10.1109/ICCV.2019.00804 |
[21] | D. S. Bolme, J. R. Beveridge, B. A. Draper, Y. M. Lui, Visual object tracking using adaptive correlation filters, in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2010), 2544–2550. https://doi.org/10.1109/CVPR.2010.5539960 |
[22] | L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, P. H. Torr, Fully-convolutional siamese networks for object tracking, in Computer Vision-ECCV 2016 Workshops, 9914 (2016), 850–865. https://doi.org/10.1007/978-3-319-48881-3_56 |
[23] | H. K. Galoogahi, A. Fagg, S. Lucey, Learning background-aware correlation filters for visual tracking, in 2017 IEEE International Conference on Computer Vision (ICCV), (2017), 1144–1152. https://doi.org/10.1109/ICCV.2017.129 |
[24] | Y. Li, C. Fu, F. Ding, Z. Huang, G. Lu, Autotrack: Towards high-performance visual tracking for UAV with automatic spatio-temporal regularization, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 11920–11929. https://doi.org/10.1109/CVPR42600.2020.01194 |
[25] | Z. Song, J. Yu, Y. P. Chen, W. Yang, Transformer tracking with cyclic shifting window attention, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 8781–8790. https://doi.org/10.1109/CVPR52688.2022.00859 |
[26] | Y. Chen, H. Wu, Z. Deng, J. Zhang, H. Wang, L. Wang, et al., Deep-feature-based asymmetrical background-aware correlation filter for object tracking, Digital Signal Process., 148 (2024), 104446. https://doi.org/10.1016/j.dsp.2024.104446 doi: 10.1016/j.dsp.2024.104446 |
[27] | K. Chen, L. Wang, H. Wu, C. Wu, Y. Liao, Y. Chen, et al., Background-aware correlation filter for object tracking with deep CNN features, Eng. Lett., 32 (2024), 1353–1363. |
[28] | R. M. Gray, Toeplitz and circulant matrices: A review, Found. Trends Commun. Inf. Theory, 2 (2006), 155–239. http://doi.org/10.1561/0100000006 doi: 10.1561/0100000006 |
[29] | J. F. Henriques, R. Caseiro, P. Martins, J. Batista, Exploiting the circulant structure of tracking-by-detection with kernels, in Computer Vision-ECCV 2012, (2012), 702–715. https://doi.org/10.1007/978-3-642-33765-9_50 |
[30] | M. E. Kilmer, C. D. Martin, Factorization strategies for third-order tensors, Linear Algebra Appl., 435 (2011), 641–658. https://doi.org/10.1016/j.laa.2010.09.020 doi: 10.1016/j.laa.2010.09.020 |
[31] | N. Hao, M. E. Kilmer, K. Braman, R. C. Hoover, Facial recognition using tensor-tensor decompositions, SIAM J. Imaging Sci., 6 (2013), 437–463. https://doi.org/10.1137/110842570 doi: 10.1137/110842570 |
[32] | M. E. Kilmer, K. Braman, N. Hao, R. C. Hoover, Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging, SIAM J. Matrix Anal. Appl., 34 (2013), 148–172. https://doi.org/10.1137/110837711 doi: 10.1137/110837711 |
[33] | B. Hunt, A matrix theory proof of the discrete convolution theorem, IEEE Trans. Audio Electroacoust., 19 (1971), 285–288. https://doi.org/10.1109/TAU.1971.1162202 doi: 10.1109/TAU.1971.1162202 |
[34] | J. Martinez, R. Heusdens, R. C. Hendriks, A generalized Fourier domain: Signal processing framework and applications, Signal Process., 93 (2013), 1259–1267. https://doi.org/10.1016/j.sigpro.2012.10.015 doi: 10.1016/j.sigpro.2012.10.015 |
[35] | A. Iwasaki, Deriving the variance of the discrete Fourier transform test using Parseval's theorem, IEEE Trans. Inf. Theory, 66 (2020), 1164–1170. https://doi.org/10.1109/TIT.2019.2947045 doi: 10.1109/TIT.2019.2947045 |
[36] | Q. Hu, H. Wu, J. Wu, J. Shen, H. Hu, Y. Chen, et al., Spatio-temporal self-learning object tracking model based on anti-occlusion mechanism, Eng. Lett., 31 (2023), 1–10. |
[37] | Y. Huang, Y. Chen, C. Lin, Q. Hu, J. Song, Visual attention learning and antiocclusion-based correlation filter for visual object tracking, J. Electron. Imaging, 32 (2023), 13023. https://doi.org/10.1117/1.JEI.32.1.013023 doi: 10.1117/1.JEI.32.1.013023 |
[38] | J. Cui, J. Wu, L. Zhao, Learning channel-selective and aberrance repressed correlation filter with memory model for unmanned aerial vehicle object tracking, Front. Neurosci., 16 (2023). https://doi.org/10.3389/fnins.2022.1080521 doi: 10.3389/fnins.2022.1080521 |
[39] | C. Fan, H. Yu, Y. Huang, C. Shan, L. Wang, C. Li, SiamON: Siamese occlusion-aware network for visual tracking, IEEE Trans. Circuits Syst. Video Technol., 33 (2023), 186–199. https://doi.org/10.1109/TCSVT.2021.3102886 doi: 10.1109/TCSVT.2021.3102886 |
[40] | W. Hu, Q. Wang, L. Zhang, L. Bertinetto, P. H. S. Torr, SiamMask: A framework for fast online object tracking and segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2023), 3072–3089. |
[41] | D. Sharma, Z. A. Jaffery, Multiple object tracking through background learning, Comput. Syst. Sci. Eng., 44 (2023), 191–204. https://doi.org/10.32604/csse.2023.023728 doi: 10.32604/csse.2023.023728 |
[42] | J. Zhang, Y. He, S. Wang, Learning adaptive sparse spatially-regularized correlation filters for visual tracking, IEEE Signal Process. Lett., 30 (2023), 11–15. https://doi.org/10.1109/LSP.2023.3238277 doi: 10.1109/LSP.2023.3238277 |