Loading [MathJax]/jax/element/mml/optable/GeneralPunctuation.js
Research article

Four mathematical modeling forms for correlation filter object tracking algorithms and the fast calculation for the filter

  • Received: 23 April 2024 Revised: 04 July 2024 Accepted: 10 July 2024 Published: 29 July 2024
  • The correlation filter object tracking algorithm has gained extensive attention from scholars in the field of tracking because of its excellent tracking performance and efficiency. However, the mathematical modeling relationships of correlation filter tracking frameworks are unclear. Therefore, many forms of correlation filters are susceptible to confusion and misuse. To solve these problems, we attempted to review various forms of the correlation filter and discussed their intrinsic connections. First, we reviewed the basic definitions of the circulant matrix, convolution, and correlation operations. Then, the relationship among the three operations was discussed. Considering this, four mathematical modeling forms of correlation filter object tracking from the literature were listed, and the equivalence of the four modeling forms was theoretically proven. Then, the fast solution of the correlation filter was discussed from the perspective of the diagonalization property of the circulant matrix and the convolution theorem. In addition, we delved into the difference between the one-dimensional and two-dimensional correlation filter responses as well as the reasons for their generation. Numerical experiments were conducted to verify the proposed perspectives. The results showed that the filters calculated based on the diagonalization property and the convolution property of the cyclic matrix were completely equivalent. The experimental code of this paper is available at https://github.com/110500617/Correlation-filter/tree/main.

    Citation: Yingpin Chen, Kaiwei Chen. Four mathematical modeling forms for correlation filter object tracking algorithms and the fast calculation for the filter[J]. Electronic Research Archive, 2024, 32(7): 4684-4714. doi: 10.3934/era.2024213

    Related Papers:

    [1] Changhui Wu, Jinrong Shen, Kaiwei Chen, Yingpin Chen, Yuan Liao . UAV object tracking algorithm based on spatial saliency-aware correlation filter. Electronic Research Archive, 2025, 33(3): 1446-1475. doi: 10.3934/era.2025068
    [2] Ling Li, Dan He, Cheng Zhang . An image filtering method for dataset production. Electronic Research Archive, 2024, 32(6): 4164-4180. doi: 10.3934/era.2024187
    [3] Lingling Zhang . Vibration analysis and multi-state feedback control of maglev vehicle-guideway coupling system. Electronic Research Archive, 2022, 30(10): 3887-3901. doi: 10.3934/era.2022198
    [4] Zhizhou Zhang, Yueliang Pan, Weilong Zhao, Jinchu Zhang, Zheng Zi, Yuan Xie, Hehong Zhang . Frequency analysis of a discrete-time fast nonlinear tracking differentiator algorithm based on isochronic region method. Electronic Research Archive, 2024, 32(9): 5157-5175. doi: 10.3934/era.2024238
    [5] Bingjie Zhang, Junchao Yu, Zhe Kang, Tianyu Wei, Xiaoyu Liu, Suhua Wang . An adaptive preference retention collaborative filtering algorithm based on graph convolutional method. Electronic Research Archive, 2023, 31(2): 793-811. doi: 10.3934/era.2023040
    [6] Yuxia Liu, Qi Zhang, Wei Xiao, Tianguang Chu . Characteristic period analysis of the Chinese stock market using successive one-sided HP filter. Electronic Research Archive, 2023, 31(10): 6120-6133. doi: 10.3934/era.2023311
    [7] Ruiping Wen, Liang Zhang, Yalei Pei . A hybrid singular value thresholding algorithm with diagonal-modify for low-rank matrix recovery. Electronic Research Archive, 2024, 32(11): 5926-5942. doi: 10.3934/era.2024274
    [8] Yaxi Xu, Yi Liu, Ke Shi, Xin Wang, Yi Li, Jizong Chen . An airport apron ground service surveillance algorithm based on improved YOLO network. Electronic Research Archive, 2024, 32(5): 3569-3587. doi: 10.3934/era.2024164
    [9] Natália Bebiano, João da Providência, Wei-Ru Xu . Approximations for the von Neumann and Rényi entropies of graphs with circulant type Laplacians. Electronic Research Archive, 2022, 30(5): 1864-1880. doi: 10.3934/era.2022094
    [10] Mingtao Cui, Wennan Cui, Wang Li, Xiaobo Wang . A polygonal topology optimization method based on the alternating active-phase algorithm. Electronic Research Archive, 2024, 32(2): 1191-1226. doi: 10.3934/era.2024057
  • The correlation filter object tracking algorithm has gained extensive attention from scholars in the field of tracking because of its excellent tracking performance and efficiency. However, the mathematical modeling relationships of correlation filter tracking frameworks are unclear. Therefore, many forms of correlation filters are susceptible to confusion and misuse. To solve these problems, we attempted to review various forms of the correlation filter and discussed their intrinsic connections. First, we reviewed the basic definitions of the circulant matrix, convolution, and correlation operations. Then, the relationship among the three operations was discussed. Considering this, four mathematical modeling forms of correlation filter object tracking from the literature were listed, and the equivalence of the four modeling forms was theoretically proven. Then, the fast solution of the correlation filter was discussed from the perspective of the diagonalization property of the circulant matrix and the convolution theorem. In addition, we delved into the difference between the one-dimensional and two-dimensional correlation filter responses as well as the reasons for their generation. Numerical experiments were conducted to verify the proposed perspectives. The results showed that the filters calculated based on the diagonalization property and the convolution property of the cyclic matrix were completely equivalent. The experimental code of this paper is available at https://github.com/110500617/Correlation-filter/tree/main.



    Object tracking [1,2,3] technology has become a research hotspot in the field of computer vision [4] and it is widely employed in intelligent traffic management [5,6], unmanned aerial vehicle tracking [7,8], and human-computer interactions [9,10]. Correlation filter object tracking algorithms [11,12,13,14,15] have gained increasing attention in the field of tracking, owing to their excellent tracking performance and efficiency. These methods have become mainstream for visual tracking [16,17,18,19,20].

    The correlation operator is a signal processing operator that is used to measure signal similarity. Thus, it is widely employed in the field of object tracking. For example, the correlation operator was first introduced into the field of object tracking by Bolme et al. [21] in 2010. In 2015, Henriques et al. [11] proposed a correlation filter model in the form of a circulant matrix to train a classifier through intensive sampling by the cyclic shift. In 2016, Bertinetto et al. [22] introduced the correlation operator in a two-branches weight-shared deep learning network and proposed a SiamFC, a fully convolutional Siamese network. In 2017, Galoogahi et al. [23] proposed a background-aware correlation filter (BACF) in the form of vector multiplication, which cleverly avoids the boundary effect existing in the correlation filter tracking method. In 2020, Li et al. [24] proposed a correlation filter model in the form of convolution operations, which uses local and global information of response graphs to achieve adaptive spatio-temporal regularization. In 2022, Song et al. [25] proposed a Transformer tracker with cyclic shifting window attention, which is calculated by correlation operator. In 2024, Chen et al. [26] regarded the correlation operator as the convolution operation and proposed an asymmetrical background-aware correlation filter for object tracking by exploring the shape information of the object. In 2024, Chen et al. [27] introduced the deep-convolutional-neural-network-based features in correlation filter framework to further improve the tracking performance of BACF.

    The correlation filter object tracking method in the form of circulant matrix utilizes a cyclic shift matrix [28,29] to generate many virtual samples, thereby expanding the sample richness to improve algorithm performance. Specifically, the algorithm pulls the training samples into row vectors, and a matrix with a row circulant structure is subsequently formed via a continuous cyclic shift. The filter was designed using this matrix. There are two drawbacks in directly solving the correlation filter in the spatial domain: 1) The spatial domain operation involves the inversion of a large circulant matrix, resulting in high computational complexity; and 2) the matrix formed by the cyclic shift contains a large amount of redundant information, which will occupy a large amount of storage while calculating the filter. Therefore, the property that the circulant matrix can be diagonalized by the Fourier transform matrix is invoked [30,31,32] to transform the correlation operation into the entry-wise multiplication operation in the frequency domain to avoid the inverse operation of the large spatial matrix. Notably, the single sample in the frequency domain replaces the virtual sample generated by the cyclic shift, effectively reducing the complexity and storage requirements of the correlation operation.

    The discrete convolution operation [33] is important in signal processing. In a discrete convolution operation, the signal is reversed and shifted. This moving signal is multiplied entry-wise with another stationary signal and summed to obtain the convolution result. The difference between the correlation and convolution operations is that the correlation operation does not perform the reverse operation on the moving signal. Rather, the correlation operation directly moves the signal. Therefore, the correlation operation is a special type of convolution. Given the convolution operation, the translation, multiplication, and summation calculations of the spatial domain can be transformed into a frequency domain entry-wise multiplication operation based on the convolution theorem [33] and Parseval's theorem [34,35] to avoid the high storage and computation requirements involved with moving the signal in the spatial domain. Researchers have understood the correlation object tracking framework from the perspective of convolution.

    The two approaches previously described (the diagonalization of the circulant matrix [36,37] and the transformation of the correlation operator into a convolution) yield the same form of computation, namely, the calculation of the correlation operation via the frequency domain entry-wise multiplication operation, albeit from different perspectives. Hence, there must be a close internal relationship among the different mathematical modeling approaches of the CFs. With the improvement and perfection of correlation filter tracking theory [38,39,40,41,42], various forms of object tracking algorithms have been proposed. Based on a mathematical modeling perspective, correlation filter object tracking algorithms can be specifically classified into four forms: Correlation operations [21], vector multiplication operations [23], circulant matrix operations [29], and convolution operations [24]. These four modeling methods are expressed differently but are essentially equivalent.

    The motivation of this paper is to sort out four mathematical modeling methods for the correlation filter object tracking algorithm by exploring the properties of circulant matrix, convolution, and correlation operations. First, we review the definitions of these four modeling methods. Then, the internal relations of the four modeling methods are discussed in detail. Based on the properties and relationships among circulant matrix, convolution, and correlation operations, two fast correlation filter calculation methods are proposed. Both theoretical derivation and experimental results prove the equivalence of the two methods. Numerical experiments verified the proposed viewpoint. In addition, Most existing studies on the correlation filter [16,23] investigated filter calculation in the form of a one-dimensional filter. Recently, few studies have presented a solution to the correlation filter in the form of a two-dimensional matrix [26,37]. Thus, we further discuss the relationship and difference between the one-dimensional and two-dimensional filters.

    The main contributions of this study are as follows. 1) We comprehensively describe the definitions of the circulant matrix, convolution, and correlation operations and then theoretically prove the four theorems of the circulant matrix. Based on these theorems, the relationships of four modeling approaches for the correlation filter are further discussed. 2) The fast calculation of the correlation filter is discussed from two perspectives: the diagonalization property of the circulant matrix and the convolution theorem. The multiplication and inversion operations of the large-scale matrix are transformed into entry-wise multiplication and entry-wise division operations of the vector to improve the efficiency of the filter solution. 3) We convert a one-dimensional correlation filter into a two-dimensional correlation filter, present the calculation flow of the two filter methods, analyze the differences and connections between the two filter methods, and discuss the reasoning behind these relationships.

    The rest of this paper is organized as follows. In Section 2, we present the definitions of the three operations of correlation, circulant matrix, and convolution; argue the four theorems of the circulant matrix; and discuss the relationship among the three operations in depth. In Section 3, we enumerate the four forms of correlation filter tracking modeling. In Section 4, we present the solution to the filter from the perspectives of the diagonalization of circulant matrix and the convolution theorem. In Section 5, we discuss the differences and connections between one-dimensional and two-dimensional filters in detail. In Section 6, we present the verification of the viewpoints presented in this study through numerical experimentation and response plots to verify the equivalence of the two methods for solving the filter. Finally, in Section 7, we draw conclusions and present the outlook for future work.

    Suppose the first column vector of the matrix is x=(x0,x1,x2,,xN1)TRN×1, where the superscript T denotes the transpose operation. x is cyclically shifted by one bit to obtain the second-column vector v=(xN1,x0,x1,,xN2)TRN×1 of the column-vector-based circulant matrix. The N column vectors are obtained after N cyclic shifts. These vectors form the column circulant matrix C(x)=(x0xN1xN2x1x1x0xN1x2x2x1x0x3xN1xN2xN3x0)RN×N.

    Similarly, the vector xT=(x0,x1,x2,,xN1)R1×N is cyclically shifted N times as the base vector to obtain N row vectors. These vectors form the row-vector-based circulant matrix C(xT)=(x0x1x2xN1xN1x0x1xN2xN2xN1x0xN3x1x2x3x0)RN×N.

    The patches obtained by the traditional correlation filter through N cyclic shifts form a circulant matrix. Among the samples generated by the cyclic shift operation, only the first row represents the real sample.

    Discrete convolution is given by

    (xh)(n)=N1m=0x(m)h(nm), (1)

    where (xh)RN×1 and (xh)(n) are the n th element of the vector xh, is a one-dimensional convolution operation, and the signals x=(x0,x1,,xN1)TRN×1 and h=(h0,h1,,hN1)TRN×1 satisfy the periodic boundary conditions. Notably, in the correlation filter tracking framework, the h is the correlation filter, while in the Siamese tracking framework, h is the test sample, n=0,1,,N1.

    The correlation operation is defined as

    (xh)(n) = N1m=0x(m)h(n + m)=xTh[Δτn], (2)

    where is a one-dimensional correlation operator and h[Δτn]=circshift(h,n), circshift(h,n) denotes the cyclic shift operator that shifts the signal by the n(n=0,1,,N1) step.

    A circulant-matrix structure can effectively capture the motion characteristics of an object and provide accurate prediction information during tracking. However, there is redundancy in circulant matrix data, resulting in a large number of computations when operating in the spatial domain. To solve this problem, the computational complexity must be reduced using Theorem 1.

    Theorem 1: If the column-vector-based circulant matrix C(x)=(x0xN1xN2x1x1x0xN1x2x2x1x0x3xN1xN2xN3x0) is known, then the discrete Fourier transform matrix is FN=(11111ej2π×1×1Nej2π×(N2)×1Nej2π×(N1)×1N1ej2π×1×(N2)Nej2π×(N2)×(N2)Nej2π×(N1)×(N2)N1ej2π×1×(N1)Nej2π×(N2)×(N1)Nej2π×(N1)×(N1)N), the inverse Fourier transform matrix is, FN=(11111ej2π×1×1Nej2π×(N2)×1Nej2π×(N1)×1N1ej2π×1×(N2)Nej2π×(N2)×(N2)Nej2π×(N1)×(N2)N1ej2π×1×(N1)Nej2π×(N2)×(N1)Nej2π×(N1)×(N1)N), and ˆx = FNx is the one-dimensional Fourier transform of the vector x=(x0,x1,x2,,xN1)TRN×1. Then, we obtain FNC(x)F1N=Diag(ˆx)=(ˆx0000ˆx1000ˆxN1), where Diag is the operator that stacks the column vectors onto the diagonal of the diagonal matrix.

    Proof: In C(x)F1N, the first row of C(x) multiplied by the k + 1 th (k=0,1,,N1) column of F1N is denoted as f(0,k). Then, we have

    f(0,k)=1N(x0+xN1ekj2π ×1N+xN2ekj2π ×2N++x1ekj2π ×(N1)N). (3)

    Using the Euler relation e2πkj=cos(2πk)+jsin(2πk)=1, we have

    ekj2π N=ekj2π Ne2π kj=ekj2π N2π kj=ekj2π (N1)N. (4)

    Likewise, we have

    ekj2π (N1)N=ekj2π (N1)Ne2π kj=ekj2π (N1)N2π kj=ekj2π N. (5)

    According to the period invariance of the complex signals, Eq (3) can be rewritten as

    f(0,k)=1N(x0+xN1ekj2π ×(N1)N++x1ekj2π ×1N)=1N(x0+x1ekj2π ×1N+x2ekj2π ×2N+xN1ekj2π ×(N1)N)=1Nˆxk, (6)

    where ˆxk is the k + 1 th (k=0,1,,N1) element of ˆx=fft1(x)=FNx (fft1 is a one-dimensional fast Fourier transform operator), that is, the k + 1 th row of FN multiplied by the vector x.

    The second row of C(x) multiplied by the k + 1 th (k=0,1,,N1) column of F1N is denoted as f(1,k). The second row of C(x) is the right-shifted signal of the first row of C(x), we have:

    f(1,k)=1N(x1+x0ekj2π ×1N++x2ekj2π ×(N1)N)=1N(x0ekj2π ×1N+x1+x2ekj2π ×(N1)N++xN1ekj2π ×2N)=1N(x0ekj2π ×1N+x1+x2ekj2π ×1N++xN1ekj2π ×(N2)N)=ekj2π ×1NN(x0+x1ekj2π ×1N+x2ekj2π ×2N++xN1ekj2π ×(N1)N)=ekj2π ×1Nf(0,k). (7)

    Thus, we have

    C(x)F1N=(f(0,0)f(0,1)f(0,N1)f(1,0)f(1,1)f(1,N1)f(N1,0)f(N1,1)f(N1,N1))=1N(ˆx0ˆx1ˆxN1ej2π×0×1Nˆx0ej2π×1×1Nˆx1ej2π×(N1)×1NˆxN1ej2π×0×(N1)Nˆx0ej2π×1×(N1)Nˆx1ej2π×(N1)×(N1)NˆxN1)=1N(1111ej2π Nej2π (N1)N1ej2π (N1)Nej2π (N1)2N)(ˆx0ˆx1ˆxN1)=F1NDiag(ˆx). (8)

    Therefore, we have

    FNC(x)F1N=FNF1NDiag(ˆx)=Diag(ˆx). (9)

    Hence, Theorem 1 is proven.

    According to the diagonalization theorem of the column circulant matrix, the diagonalization theorem of the row circulant matrix is derived as follows.

    Theorem 2: If the row-vector-based circulant matrix C(xT)=(x0x1x2xN1xN1x0x1xN2xN2xN1x0xN3x1x2x3x0) is known, let Fn = FNN denote the normalized discrete Fourier transformation (DFT) matrix. Then, the row-vector-based circulant matrix satisfies C(xT)=FnDiag(ˆx)FHn (where FHn denotes the result of the conjugate transpose on Fn).

    Proof: The diagonalization theorem of the row-vector-based circulant matrix can be proven using the diagonalization theorem of the column-vector-based circulant matrix. According to Theorem 1, we simultaneously transpose both sides of the equation to obtain

    C(xT)=(FTN)(Diag(ˆx))T(F1N)T=(FTN)(Diag(ˆx))T(FNN)T, (10)

    where F1N=1NFN. By decomposing N into NN, we have

    C(xT)=(FTNN)(Diag(ˆx))T(FNN)T. (11)

    As the normalized DFT matrix Fn = FNN satisfies FTn=Fn and FHn=F - 1n, we obtain

    C(xT)=Fn(Diag(ˆx))TF1n=Fn(Diag(ˆx))FHn. (12)

    Hence, Theorem 2 is proven.

    Directly performing operations with the circulant matrix in the spatial domain leads to high computational complexity. However, another function of the correlation filter tracking algorithm is to utilize the relationship between the convolution and circulant matrices. The convolution theorem can transform it into an entry-wise operation in the frequency domain to circumvent large matrix multiplication and inverse operations, effectively reducing the number of operations and improving the computational efficiency.

    Theorem 3: By multiplying the column-vector-based circulant matrix C(x) by a signal h=(h0,h1,,hN1)TRN×1, then we have

    C(x)h=(x0xN1xN2x1x1x0xN1x2x2x1x0x3xN1xN2xN3x0)(h0h1h2hN1) = xh. (13)

    Proof: By multiplying both sides of Eq (9) with ˆh simultaneously, we have

    FNC(x)F1Nˆh=Diag(ˆx)ˆh. (14)

    Then

    FNC(x)h=ˆxˆh, (15)

    where denotes the entry-wise multiplication operation.

    According to the convolution theorem, the entry-wise multiplication of the spectrum of the two signals is equal to that of the spatial convolution signal, we have

    ˆxˆh=FN(xh). (16)

    By combining Eqs (15) and (16), we obtain

    FNC(x)h=FN(xh). (17)

    Then, it is seen that

    C(x)h=xh. (18)

    Hence, Theorem 3 is proven.

    According to the convolution theorem, spatial convolution can be calculated in the frequency domain. The specific calculation method is

    xh=real(ifft1(ˆxˆh)), (19)

    where real is the real part-taking operator and ifft1 is a one-dimensional inverse-Fourier transform operator.

    Comment 1: The calculation of xh can be expressed as C(x)h if directly calculated in the spatial domain. The memory space occupied by C(x)h is denoted by N2+N, and its multiplication complexity is denoted by O(N2). Notably, xh can be calculated in the frequency domain using the convolution theorem, that is, xh=real(ifft1(ˆxˆh)). The memory space occupied by real(ifft1(ˆxˆh)) is denoted by 4N, and its multiplication complexity is denoted by O(8Nlog2N+4N) (It includes two fast Fourier transforms, one inverse Fourier transform and the multiplication of complex numbers in the frequency domain, where two fast Fourier transforms require 2Nlog2N multiplications with complex number and real number, involving 4Nlog2N floating-point number multiplications. One fast inverse Fourier transform requires Nlog2N complex number multiplications, involving 4Nlog2N floating-point number multiplications, and the number of floating-point number multiplications required for the multiplication of N complex numbers in the frequency domain is 4N).

    Table 1 presents the occupied memory space and the complexity of the floating-point multiplication operation of the C(x)h and real(ifft1(ˆxˆh)) operations. For example, when N is 4, the C(x)h operation occupies N2+N|N=4 = 42 + 4 = 20 floating-point units, and the number of floating-point multiplications is N2|N=4 = 42 = 16. The real(ifft1(ˆxˆh)) operation occupies 4N|N=4 = 4×4 = 16 floating-point units, and the number of floating-point multiplications is 8Nlog2N+4N|N=4=8×4×log24+4×4=80. When N is 256, the C(x)h operation occupies N2+N|N=256 = 2562 + 256 = 65792 floating-point units, and the number of floating-point multiplications is N2|N=256 = 2562 = 65536. By contrast, the real(ifft1(ˆxˆh)) operation occupies 4N|N=256 = 4×256 = 1024 floating-point units, and the number of floating-point multiplications is 8Nlog2N+4N|N=256=8×256×log2256+4×256=17408. The results show that when the image size is large, the inverse operation of the large circular matrix in the spatial domain can be transformed into the entry-wise multiplication operation in the frequency domain according to Theorem 3 and the convolution theorem to effectively reduce the number of solving operations.

    Table 1.  Memory footprint and computational complexity analysis of the one-dimensional operation.
    Operation (image size RN×N) Memory space occupied/floating-point unit Complexity of floating-point multiplication operation/time
    C(x)h operation N2+N O(N2)
    real(ifft1(ˆxˆh)) operation 4N O(8Nlog2N+4N)

     | Show Table
    DownLoad: CSV

    Theorem 4: The row-vector-based circulant matrix satisfies C(xT)h=C(ˉx)h=ˉxh.

    Proof: By observing each column of the row-vector-based circulant matrix C(xT)=(x0x1x2xN1xN1x0x1xN2xN2xN1x0xN3x1x2x3x0), we find that each column of C(xT) is obtained by the cyclic shift of the previous column. The first column ˉx=(x0,xN1,,x1)TRN×1 of this matrix is the reverse signal of x=(x0,x1,,xN1)TRN×1 (e.g., if x=[1,2,3,4]T, then ˉx=[1,4,3,2]T). Then, we have: C(xT)=C(ˉx). Combining this result with Eq (13), we obtain

    C(xT)h=C(ˉx)h=ˉxh. (20)

    Hence, Theorem 4 is proven.

    Comment 2: We can regard h in ˉxh as a static signal and ˉx as a dynamic signal. According to the definition of the discrete convolution, the dynamic signal should be reversed as ¯ˉx, which is equal to x. Then, the reverse signal ¯ˉx should be cyclically shifted as the shifted vectors. The vector formed by the inner product of these shifted vectors with the static signal is the result of discrete convolution. The shifted vectors can be stacked into the row-vector-based circulant matrix C(xT). Thus, ˉxh = C(xT)h.

    Comment 3: ˉx satisfies fft1(ˉx) = ˆx* (where ˆx* is the conjugate signal of ˆx).

    Proof: If the spectral signal ˆx is the Fourier transform of the signal x, then the spectral elements of x are expressed as

    ˆx(k)=N1n=0x(n)ekj2πnN=N1n=1x(n)ekj2πnN+x(0). (21)

    Similarly, the spectral elements of ˉx are given by

    fft1(ˉx)(k)=N1n=0ˉx(n)ekj2πnN=N1n=1x(Nn)ekj2πnN+x(0), (22)

    where fft1(ˉx)(k) is the k + 1 th (k=0,1,,N1) element of fft1(ˉx).

    Based on the time-shifting properties of the discrete Fourier transform and Euler's formula e2πkj=cos(2πk)+jsin(2πk)=1, Eq (22) can be rewritten as follows

    fft1(ˉx)(k)=N1n=1x(Nn)ekj2πnNe2πkj+x(0)=N1n=1x(Nn)ekj2π(Nn)N+x(0)=N1t=1x(t)ekj2πtN+x(0)|t=Nn=N1t=0x(t)ekj2πtN. (23)

    Combining Eqs (21) and (23) yields

    fft1(ˉx) = ˆx*. (24)

    Hence, the proof is complete.

    According to Eq (20), the second type of proof for Theorem 3 is provided as follows.

    Proof:

    We observe that C(x)=(x0xN1xN2x1x1x0xN1x2x2x1x0x3xN1xN2xN3x0) = C(ˉxT), thereby we obtain

    C(x)h=C(ˉxT)h=¯ˉxh=xh, (25)

    where ¯ˉx = x.

    A tracking algorithm based on a correlation filter significantly improves the tracking speed by transforming complex correlation operations in the spatial domain into simple entry-wise multiplication operations in the frequency domain. Utilizing the relationship between the correlation operation and the convolution, we rewrite the correlation operator as the convolution form

    (xh)(n)=N1m=0x(m)h(n+m)=N1m=0x(m)h[n(m)]=(xˉh)(n), (26)

    where ˉh=(h0,hN1,,h1)TRN×1 is the reverse signal of h=(h0,h1,,hN1)TRN×1, ˉh satisfies one dimensional periodic boundary conditions.

    By combining Eqs (20) and (26), the relationship between the correlation operation and the row-vector-based circulant matrix is given by

    C(xT)h=¯xh. (27)

    A traditional discriminative tracking algorithm distinguishes between an object and its background by training its classifier. The background information and object are used as negative and positive samples, respectively. The candidate sample with the highest response is selected as the prediction result. The correlation filter uses ridge regression to design the filter h. The regularization term is added to prevent overfitting. The correlation operation form of the correlation filter is shown in Eq (28).

    E(h)=minh12yxh22+λ2h22, (28)

    where xRN×1 is the column vector form of the object sample after the weighted cosine window and N is the size of the pixels occupied by the object sample. yRN×1 is the desired correlation response, hRN×1 is the filter, and λ is the balancing parameter, which is utilized to balance the fidelity term 12yxh22 and ridge regression regularization term λ2h22.

    The vector multiplication form of the correlation filter is given by Eq (29).

    E(h)=minh12N1n=0(y(n)xTh[Δτn])2+λ2h22=minh12N1n=0(ˉy(n)hTx[Δτn])2+λ2h22, (29)

    where if r=C(hT)x, then rRN×1 and r(n)=xTh[Δτn].

    A key focus area of the correlation filter tracking algorithm is improving computational efficiency using the characteristics of the circulant matrix. The circulant matrix operation form of the correlation filter is given by Eq (30):

    E(h)=minh12yC(hT)x22+λ2h22=minh12ˉyC(xT)h22+λ2h22, (30)

    where C(hT) is the row-vector-based circulant matrix. C(hT) satisfies C(hT)x=ˉhx=xh and C(hT)x=¯C(xT)h.

    According to Eq (26), the correlation forms in Eq (28) can be rewritten in the convolutional form of Eq (31), that is

    E(h)=minˉh12yˉhx22+λ2ˉh22, (31)

    where denotes the convolution operator that satisfies xh = xˉh.

    According to the convolution and Parseval's theorems, Eq (31) can be written in the frequency-domain form as follows

    E(h)=minˆh*12Nˆyˆhˆx22+λ2Nˆh22, (32)

    where ˆh is the Fourier transform of h, and ˆh* is the conjugate signal of ˆh.

    Based on the above discussion, we determine the relationship among the four mathematical modeling forms of correlation filter tracking as follows

    E(h)=minh12yxh22+λ2h22=minh12N1n=0(y(n)xTh[Δτn])2+λ2h22=minh12yC(hT)x22+λ2h22=minˉh12yˉhx22+λ2ˉh22. (33)

    Comment 4: C(hT)x and C(xT)h are confusing in some studies. There is a reverse relationship between C(hT)x and C(xT)h, that is, C(hT)x=¯C(xT)h.

    Comment 5: The definition of the correlation operation xh differs across studies in the literature. If the element of the correlation operator is defined as (xh)(n)=xTh[Δτn], we obtain xh=C(hT)x=¯C(xT)h.

    In the spatial domain, by computing the first-order derivative of h in Eq (30) and setting it equal to zero, we obtain

    dE(h)dh=C(xT)H(C(xT)hˉy)+λh=0. (34)

    Then, the spatial domain optimal solution for h is given by

    h=(C(xT)HC(xT)+λ)1C(xT)Hˉy=(C(xT)TC(xT)+λ)1C(xT)Tˉy. (35)

    Because the introduction of a circulant matrix generates numerous virtual samples, considerable computation is required. The sample matrix can be transformed into a diagonal matrix for processing based on the diagonalization property of the row-vector-based circulant matrix. This method significantly accelerates the matrix calculations and reduces the computational complexity of directly computing solutions in the spatial domain, that is

    h=(C(xT)TC(xT)+λ)1C(xT)Tˉy=(FnDiag(ˆxˆx*)FHn+λFnDiag(δ)FHn)1C(x)ˉy=(FnDiag(ˆxˆx*)FHn+λFnDiag(δ)FHn)1C(x)ˉy=(Fn(Diag(ˆxˆx*) + λDiag(δ))FHn)1C(x)ˉy=FnDiag(1ˆxˆx*+λδ)FHnFnDiag(ˆx*)FHnˉy=FnDiag(ˆx*ˆxˆx*+λδ)FHnˉy=C(uT)ˉy|u=ifft1{ˆx*ˆxˆx*+λδ}=ˉuˉy, (36)

    where δ is the column vector whose elements are all 1, that is, δ=(11)RN×1.

    According to the convolution theorem, Eq (36) can be transformed into the frequency domain to calculate

    h=real(ifft1(ˆu*ˆy*))=real(ifft1(ˆxˆy*ˆxˆx*+λδ)). (37)

    For a new test sample z, we have

    ˉr=C(zT)h, (38)

    where ˉr is the reverse signal of the spatial domain response r.

    By reversing both sides of Eq (38), the spatial domain response r is expressed as

    r=¯C(zT)h=C(hT)z. (39)

    As C(zT)h=ˉzh, Eq (39) can be written in the convolutional form, as shown in Eq (40).

    r=¯C(zT)h=¯¯zh=z¯h. (40)

    According to Eqs (19) and (24), Eq. (40) can be rewritten as

    r=real(ifft1(ˆh*ˆz)). (41)

    In the frequency domain, by computing the first-order derivative of ˆh* in Eq (32) and setting it to zero, that is dE(h)dˆh=2ˆx(ˆhˆxˆy)N+2λˆhN = 0, we obtain

    ˆh* = ˆyˆx*ˆx*ˆx + λ, (42)

    where the sign of the division in Eq (42) denotes entry-wise division.

    For the new sample ˆz, the corresponding spatial response is

    r=real(ifft1(ˆh*ˆz)). (43)

    Comment 6: Eqs (41) and (43) show that the results obtained by solving the filter using the diagonalization property of the row-vector-based circulant matrix and the convolution theorem are completely consistent.

    For the object tracking techniques, the image being processed is a two-dimensional signal, whereas all the signals discussed in the previous section are one-dimensional. Hence, we generalize the one-dimensional signal convolution form in Eq (20) into a two-dimensional convolution form, as shown in Eq (44)

    Conv2(ˉI1,I2)=mat(Cat(vec(I1[Δτc,Δτr])T)vec(I2))=real(ifft2(ˆI1ˆI2)), (44)

    where mat is an operator that transforms column vectors into matrices, Cat is an operator that stacks the row vectors into a matrix, vec is an operator that transforms a matrix into a column vector, and Conv2 is a two-dimensional convolution operator. The elements of the c th row and th column of Conv2(ˉI1,I2) are vec(I1[Δτc,Δτr]),vec(I2)=vec(I1[Δτc,Δτr])Tvec(I2), and I1RN×N, I2RN×N, and N are integers. r=0,1,,N1 represents the number of row cyclic shifts, and c=0,1,,N1 represents the number of column cyclic shifts. Moreover, I1[Δτc,Δτr]RN×N denotes a matrix obtained by the first cyclic shift I1, row-by-row, using r units to obtain an intermediate matrix, and then performing a cyclic shift of the intermediate matrix, column-by-column, using c units. Finally, ˉI1 is the reverse matrix of I1, and the specific calculation is as follows: I1 first reverses the original matrix, row-by-row, to obtain an intermediate matrix. Then, it reverses the intermediate matrix, column-by-column, to obtain ˉI1. For example, I1=[123456789] is performed, row-by-row, on the original matrix to perform the reverse operation to obtain the intermediate matrix It=[132465798]. Then, it is carried out on the intermediate matrix It, column-by-column, to perform the reverse operation. This is followed by ˉI1=[132798465]. Here, mat(Cat(vec(I1[Δτc,Δτr])T)vec(I2))RN×N, Conv2(ˉI1,I2)RN×N is the two-dimensional convolution of images ˉI1 and I2, and ifft2 is a two-dimensional Fourier inverse transform operator.

    Comment 7: If Conv2(ˉI1,I2) is calculated in the spatial domain, that is, mat(Cat(vec(I1[Δτc,Δτr])T)vec(I2)), the occupied memory space is O(N2+N), and the computational multiplicative complexity is O(N2). If the convolution theorem is introduced, then the convolution in the spatial domain can be transformed into an entry-wise multiplication operation and a Fourier inverse transform in the frequency domain, that is, Conv2(ˉI1,I2)=real(ifft2(ˆI1ˆI2)). The memory space occupied by the algorithm is 4N, and the computational multiplication complexity is O(8Nlog2N+4N). A detailed calculation process of the two-dimensional filter R2D=Conv2(I1,I2) (I1=[147258369], I2=[0.11110.11110.1]) is listed in Table 2.

    Table 2.  Two-dimensional filter calculation process.
    Element of R2D Stage 1: Reverse Stage 2: Cyclic shift Stage 3: Multiplication Stage 4: Summation
    R2D(1,1) ˉI1=[174396285] ˉI1[Δτ0,Δτ0]=[174396285] ˉI1[Δτ0,Δτ0]I2=[174396285][0.11110.11110.1]=[0.17430.96280.5] 31.5
    R2D(1,2) ˉI1=[174396285] ˉI1[Δτ0,Δτ1]=[417639528] ˉI1[Δτ0,Δτ1]I2=[417639528][0.11110.11110.1]=[0.41760.39520.8] 31.5
    R2D(1,3) ˉI1=[174396285] ˉI1[Δτ0,Δτ2]=[741963852] ˉI1[Δτ0,Δτ2]I2=[741963852][0.11110.11110.1]=[0.74190.63850.2] 31.5
    R2D(2,1) ˉI1=[174396285] ˉI1[Δτ1,Δτ0]=[285174396] ˉI1[Δτ1,Δτ0]I2=[285174396][0.11110.11110.1]=[0.28510.74390.6] 31.5
    R2D(2,2) ˉI1=[174396285] ˉI1[Δτ1,Δτ1]=[528417639] ˉI1[Δτ1,Δτ1]I2=[528417639][0.11110.11110.1]=[0.52840.17630.9] 31.5
    R2D(2,3) ˉI1=[174396285] ˉI1[Δτ1,Δτ2]=[852741963] ˉI1[Δτ1,Δτ2]I2=[852741963][0.11110.11110.1]=[0.85270.41960.3] 31.5
    R2D(3,1) ˉI1=[174396285] ˉI1[Δτ2,Δτ0]=[396285174] ˉI1[Δτ2,Δτ0]I2=[396285174][0.11110.11110.1]=[0.39620.85170.4] 31.5
    R2D(3,2) ˉI1=[174396285] ˉI1[Δτ2,Δτ1]=[639528417] ˉI1[Δτ2,Δτ1]I2=[639528417][0.11110.11110.1]=[0.63950.28410.7] 31.5
    R2D(3,3) ˉI1=[174396285] ˉI1[Δτ2,Δτ2]=[963852741] ˉI1[Δτ2,Δτ2]I2=[963852741][0.11110.11110.1]=[0.96380.52740.1] 31.5

     | Show Table
    DownLoad: CSV

    A detailed calculation process of the one-dimensional filter r1D=vec(I1)vec(I2)=i1i2, where i1=(123456789), i2=(0.11110.11110.1), and ˉi1[Δτn] is the ˉi1 cyclic shift n times, which is listed in Table 3.

    Table 3.  One-dimensional filter calculation process.
    Element of r1D Stage 1: Reverse Stage 2: Cyclic shift Stage 3: Multiplication Stage 4: Summation
    r1D(1) ˉi1=(198765432) ˉi1[Δτ0]=(198765432) ˉi1[Δτ0]i2=(198765432)(0.11110.11110.1)=(0.19870.65430.2) 36.9
    r1D(2) ˉi1=(198765432) ˉi1[Δτ1]=(219876543) ˉi1[Δτ1]i2=(219876543)(0.11110.11110.1)=(0.21980.76540.3) 34.2
    r1D(3) ˉi1=(198765432) ˉi1[Δτ2]=(321987654) ˉi1[Δτ2]i2=(321987654)(0.11110.11110.1)=(0.32190.87650.4) 31.5
    r1D(4) ˉi1=(198765432) ˉi1[Δτ3]=(432198765) ˉi1[Δτ3]i2=(432198765)(0.11110.11110.1)=(0.43210.98760.5) 28.8
    r1D(5) ˉi1=(198765432) ˉi1[Δτ4]=(543219876) ˉi1[Δτ4]i2=(543219876)(0.11110.11110.1)=(0.54320.19870.6) 34.2
    r1D(6) ˉi1=(198765432) ˉi1[Δτ5]=(654321987) ˉi1[Δτ5]i2=(654321987)(0.11110.11110.1)=(0.65430.21980.7) 31.5
    r1D(7) ˉi1=(198765432) ˉi1[Δτ6]=(765432198) ˉi1[Δτ6]i2=(765432198)(0.11110.11110.1)=(0.76540.32190.8) 28.8
    r1D(8) ˉi1=(198765432) ˉi1[Δτ7]=(876543219) ˉi1[Δτ7]i2=(876543219)(0.11110.11110.1)=(0.87650.43210.9) 26.1
    r1D(9) ˉi1=(198765432) ˉi1[Δτ8]=(987654321) ˉi1[Δτ8]i2=(987654321)(0.11110.11110.1)=(0.98760.54320.1) 31.5

     | Show Table
    DownLoad: CSV

    Table 4 presents the occupied memory space and the complexity of the floating-point multiplication operation by the Cat(vec(I1[Δτc,Δτr])T)vec(I2) and real(ifft2(ˆI1ˆI2)) operations. Table 4 shows that the memory footprint and computational complexity of the one-dimensional and two-dimensional operations are completely consistent. The computational complexity of the real(ifft2(ˆI1ˆI2)) operation is much smaller than that of the Cat(vec(I1[Δτc,Δτr])T)vec(I2) operation when the image size is large.

    Table 4.  Memory footprint and computational complexity analysis of the two-dimensional operation.
    Operation (image size RN×N) Memory space occupied/floating-point unit Complexity of floating-point multiplication operation
    Cat(vec(I1[Δτc,Δτr])T)vec(I2) N2+N O(N2)
    real(ifft2(ˆI1ˆI2)) 4N O(8Nlog2N+4N)

     | Show Table
    DownLoad: CSV

    The one-dimensional and two-dimensional filters are equivalent in estimating the object's position. However, subtle differences exist between the one-dimensional and two-dimensional filter responses owing to the inconsistency in the receptive fields and periodic boundary conditions between the one-dimensional and two-dimensional convolutions. The receptive field of the one-dimensional convolution operation has only one dimension, whereas that of the two-dimensional convolution operation has two dimensions. By contrast, the periodic boundary-filling signal of a two-dimensional signal is a two-dimensional signal. However, the periodic boundary-filling signal from a two-dimensional image columnized into a one-dimensional signal is a one-dimensional signal. Thus, a difference exists in the data involved in the operation at the same spatial location between the one-dimensional and two-dimensional convolutions. Consequently, subtle differences occur in the results between the two filters.

    When we model the correlation filter tracking problem in matrix form, the mathematical formula is modeled as follows

    E(H)=minH12YX2H22+λ2H22, (45)

    where 2 represents a two-dimensional correlation operator, (X2H)(n,k)=N1m=0N1l=0X(m,l)H(n+m,k+l)=Conv2(ˉH,X), ¯H is the reversed two-dimensional signal of H, ¯H satisfies two dimensional periodic boundary conditions, X, Y and H are the matrix form of x, y and h.

    Rewrite the above formula into convolution form and we have

    E(H)=minH12YConv2(ˉH,X)22+λ2H22, (46)

    where ˉH satisfies fft2(ˉH)=ˆH, where ˆH denotes the spectrum of H, fft2 represents a two-dimensional fast Fourier transform operator, and ˆH is the conjugate matrix of ˆH.

    According to the convolution theorem, the above expression can be further arranged into the frequency domain form, namely

    E(ˆH*)=minˆH*12NˆYˆH*ˆX22+λ2NˆH*22. (47)

    Set dE(ˆH*)dˆH*=0, we have

    ˆX*(ˆH*ˆXˆY)+λˆH*N=0. (48)

    Then, the filter in the matrix form is calculated by

    ˆH = ˆYˆX*ˆX*ˆX + λ. (49)

    Figure 1(a) shows the training sample, and Figure 1(b) illustrates the desired response. Figure 1(c) shows the test sample obtained by the cyclic shift of the training sample. Figure 1(d) presents the plot of the two-dimensional filter's (the two-dimensional filter is calculated by ˆH = ˆYˆX*ˆX*ˆX + λ) response. Figure 1(e) shows the plot of the one-dimensional filter's (the one-dimensional filter is calculated by ˆh* = ˆyˆx*ˆx*ˆx + λ) response. Figure 1(f) shows the difference between the one-dimensional and two-dimensional filter responses. The results show that the one-dimensional and two-dimensional filters are equivalent in positioning but are not completely consistent. When the image is cyclically shifted to the edge, a significant difference is observed between the one-dimensional and two-dimensional filter responses.

    Figure 1.  Connections and differences between the one-dimensional and two-dimensional filter responses. (a) Training sample (the size of the sample is 100×100). (b) Desired response. (c) Test sample obtained via the cyclic shift of the training sample. (d) Two-dimensional filter response. (e) One-dimensional filter response. (f) Difference between the two-dimensional and one-dimensional filter responses.

    Experiments were conducted on a computer equipped with an i5-8265 (1.80 GHz) CPU. The proposed viewpoints and the equivalence of the two filter-solving methods were verified through numerical experimentation and by designing response maps, respectively, to ensure the mathematical rigor and scientific validity of the correlation filter object tracking algorithm.

    To verify Theorem 1, let x = sin(50πt). The sampling rate was taken as 100 Hz, and the sampling time is 0–0.99 s, that is, t=[0,0.01,0.02,,0.99]T. The sinusoidal signal was Fourier transformed, and its real and imaginary parts were taken, respectively. Figure 2(a),(c) show the amplitude of the real part of the ˆx and the amplitude of the imaginary part of the ˆx, respectively. The real and imaginary parts were taken for the diagonal elements of FNC(x)F1N. Figure 2(b) shows the amplitude of the real part of the diagonal elements of FNC(x)F1N, and Figure 2(d) presents the amplitude of the imaginary part of the diagonal elements of FNC(x)F1N. The FNC(x)F1NDiag(ˆx)2 in this experiment was 2.83×1014, where x2 represents the l2 norm of x. These results show that the real part map and imaginary part map of both are completely consistent, which proves Theorem 1.

    Figure 2.  Numerical experimental results for Theorem 1. (a) Amplitude of the real part of ˆx. (b) Amplitude of the real part of the FNC(x)F1N diagonal element. (c) Amplitude of the imaginary part of ˆx. (d) Amplitude of the imaginary part of the FNC(x)F1N diagonal element.

    To verify Theorem 2, let x = sin(50πt). The sampling rate was taken as 100 Hz, and the sampling time is 0–0.99s, that is, t=[0,0.01,0.02,,0.99]T. Figure 3(a) shows the pseudo-color map of the row-vector-based circulant matrix C(xT), and Figure 3(b) presents the pseudo-color map of the real part of FnDiag(ˆx)FHn. The two maps are completely similar. The FnDiag(ˆx)FHnC(xT)2 in this experiment was 2.86×1012, thereby proving Theorem 2.

    Figure 3.  Numerical experimental results for Theorem 2. (a) Pseudo-color map of C(xT). (b) Pseudo-color map of .

    To verify Theorem 3, let and . The sampling rate was taken as 100 Hz, and the sampling time is 0–0.99 s, that is, . Figures 4(a),(b) present the amplitudes of over 1 s and over 1 s, respectively. The results show that the two amplitude maps are completely consistent. The in this experiment was . Since , Theorem 3 is verified.

    Figure 4.  Numerical experimental results of Theorem 3. (a) Amplitude of and (b) amplitude of .

    To verify Theorem 4, let and . The sampling rate was taken as 100 Hz, and the sampling time is 0–0.99 s, that is, . Figure 5(a) shows the amplitude of over 1 s, and Figure 5(b) presents the amplitude of over 1 s. The results show that the two amplitude maps are completely consistent. The in this experiment was . As , Theorem 4 is verified.

    Figure 5.  Numerical experimental results for Theorem 4. (a) Amplitude of and (b) amplitude of .

    To verify Eq (44), let be the two-dimensional image signal, as shown in Figure 6(a), and let be the two-dimensional image signal, as shown in Figure 6(b). Figures 6(c),(d) illustrate the and responses. The results show that the responses were completely consistent as shown in Figures 6(c),(d). The in this experiment was , proving that Eq (44) holds.

    Figure 6.  Numerical experimental results to validate Eq (44). (a) image, (b) image, (c) response, and (d) response.

    Figure 7(a) shows the base sample, and Figure 7(b) shows the predicted sample obtained via the cyclic shift of the base sample. In addition, Figures 7(c),(d) present the desired correlation response and the spatial domain response obtained according to Eq (43), respectively. Figures 7(e),(f) show the spatial domain response obtained using Eqs (38) and (39), respectively. The results show that the spatial responses in Figures 7(d),(f) are completely consistent (i.e., the two filter-solving methods are equivalent).

    Figure 7.  Equivalence experiment of the two filter-solving methods. (a) Base sample. (b) Predicted sample obtained via the cyclic shift of the base sample. (c) Desired correlation response. (d) Spatial domain response based on Eq (43). (e) Spatial domain response based on Eq (38). (f) Spatial domain response based on Eq (39).

    Through the diagonalization property of the row-vector-based circulant matrix and the convolution theorem, the two filter-solving methods transform the spatial domain operation into an entry-wise multiplication operation in the frequency domain to circumvent the inverse operations of large matrices. Table 5 lists the running times required to calculate Eqs (35) and (37) for different image sizes, where was 0.1. The results indicate that solving the filter in the frequency domain can effectively reduce the operation time and improve the computational efficiency of the filter. The larger the signal size, the more apparent the advantage of solving the filter in the frequency domain.

    Table 5.  Time consumed to solve in the spatial and frequency domains.
    Average time consumed (the number of the experiments is 10, and the image size is )
    Time consumed in the spatial domain/s (according to Eq. (35))
    Time consumed in the frequency domain/s (according to Eq (37))

     | Show Table
    DownLoad: CSV

    In this study, we systematically elucidated the theoretical modeling system of the correlation filter. Based on existing literature on correlation filters, four types of mathematical modeling and two types of filter-fast calculation methods were summarized and experimentally proven. The relationship among the four modeling types for correlation filter were discussed in detail. Our conclusions are as follows:

    1) We elaborated on the definitions of the circulant matrix, convolution, and correlation operations in the correlation filter and their relationships. The viewpoints and mathematical findings provided in this study can provide useful theoretical support for research in the field of correlated filter object tracking.

    2) The diagonalization property of the circulant matrix and the convolution theorem were employed to solve the filter by transforming the spatial-domain operation into an entry-wise multiplication operation in the frequency domain. This approach avoids the inverse operation of large spatial-domain matrices and reduces the computational complexity compared with directly solving in the spatial domain. The experiments showed that the results obtained using the two filter-solving methods were consistent. The proposed fast filter calculation method is critical the efficient implementation of the correlation filter tracking algorithm.

    3) We experimentally proved the existence of slight differences between the one-dimensional and two-dimensional filter methods. The main reasons for these differences were discussed in detail. Subsequently, the equivalence of the two filter methods in object positioning was reflected via experimentation to provide a reliable foundation for the engineering realization of the two theoretical methods.

    Traditional correlation filter tracking frameworks utilize handcrafted features to distinguish the object and background. The discrimination ability of these features is limited; thus, the application of the correlation filter tracking method in complex scenes has some limitations. As deep learning technology gradually matures, it will provide a correlation filter theoretical framework with more discriminative visual features. Subsequent work should attempt to improve the correlation and deep-learning-based tracking algorithm to improve the overall performance of the tracker. For example, the computational theory proposed in this paper can be introduced into cyclic shifting attention computation [25] to obtain more efficient computation.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This work is supported by the Natural Science Foundation Project of Zhangzhou City (ZZ2023J37), the Principal Foundation of Minnan Normal University (KJ19019), the High-level Science Research Project of Minnan Normal University (GJ19019), Research Project on Education and Teaching of Undergraduate Colleges and Universities in Fujian Province (FBJY20230083), and the Education Research Program of Minnan Normal University (202211).

    The authors declare there is no conflict of interest.



    [1] S. Javed, M. Danelljan, F. S. Khan, M. H. Khan, M. Felsberg, J. Matas, Visual object tracking with discriminative filters and siamese networks: A survey and outlook, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2023), 6552–6574. https://doi.org/10.1109/TPAMI.2022.3212594 doi: 10.1109/TPAMI.2022.3212594
    [2] F. Chen, X. Wang, Y. Zhao, S. Lv, X. Niu, Visual object tracking: A survey, Comput. Vision Image Understanding, 222 (2022), 103508. https://doi.org/10.1016/j.cviu.2022.103508 doi: 10.1016/j.cviu.2022.103508
    [3] D. Zhang, Z. Zheng, M. Li, R. Liu, CSART: Channel and spatial attention-guided residual learning for real-time object tracking, Neurocomputing, 436 (2021), 260–272. https://doi.org/10.1016/j.neucom.2020.11.046 doi: 10.1016/j.neucom.2020.11.046
    [4] F. Gu, J. Lu, C. Cai, Q. Zhu, Z. Ju, RTSformer: A robust toroidal transformer with spatiotemporal features for visual tracking, IEEE Trans. Hum.-Mach. Syst., 54 (2024), 214–225. https://doi.org/10.1109/THMS.2024.3370582 doi: 10.1109/THMS.2024.3370582
    [5] Y. Qian, L. Yu, W. Liu, A. G. Hauptmann, Electricity: An efficient multi-camera vehicle tracking system for intelligent city, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2020), 2511–2519. https://doi.org/10.1109/CVPRW50498.2020.00302
    [6] X. Chen, X. Xu, Y. Yang, Y. Huang, J. Chen, Y. Yan, Visual ship tracking via a hybrid kernelized correlation filter and anomaly cleansing framework, Appl. Ocean Res., 106 (2021), 102455. https://doi.org/10.1016/j.apor.2020.102455 doi: 10.1016/j.apor.2020.102455
    [7] H. Zhang, Y. Li, H. Liu, D. Yuan, Y. Yang, Feature block-aware correlation filters for real-time UAV tracking, IEEE Signal Process. Lett., 31 (2024), 840–844. https://doi.org/10.1109/LSP.2024.3373528 doi: 10.1109/LSP.2024.3373528
    [8] X. Wang, D. Zeng, Y. Li, M. Zou, Q. Zhao, S. Li, Enhancing UAV tracking: a focus on discriminative representations using contrastive instances, J. R.-Time Image Process., 21 (2024), 78. https://doi.org/10.1007/s11554-024-01456-2 doi: 10.1007/s11554-024-01456-2
    [9] C. Zhu, J. Yang, Z. Shao, C. Liu, Vision based hand gesture recognition using 3D shape context, IEEE/CAA J. Autom. Sin., 8 (2021), 1600–1613. https://doi.org/10.1109/JAS.2019.1911534 doi: 10.1109/JAS.2019.1911534
    [10] M. N. H. Mohd, M. S. M. Asaari, O. L. Ping, B. A. Rosdi, Vision-based hand detection and tracking using fusion of kernelized correlation filter and single-shot detection, Appl. Sci., 13 (2023), 7433. https://doi.org/10.3390/app13137433 doi: 10.3390/app13137433
    [11] J. F. Henriques, R. Caseiro, P. Martins, J. Batista, High-speed tracking with kernelized correlation filters, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015), 583–596. https://doi.org/10.1109/TPAMI.2014.2345390 doi: 10.1109/TPAMI.2014.2345390
    [12] Y. Li, J. Zhu, A scale adaptive kernel correlation filter tracker with feature integration, in Computer Vision-ECCV 2014 Workshops, 8926 (2014), 254–265. https://doi.org/10.1007/978-3-319-16181-5_18
    [13] M. Danelljan, G. Hager, F. S. Khan, M. Felsberg, Learning spatially regularized correlation filters for visual tracking, in 2015 IEEE International Conference on Computer Vision (ICCV), (2015), 4310–4318. https://doi.org/10.1109/ICCV.2015.490
    [14] C. Ma, X. Yang, C. Zhang, M. Yang, Long-term correlation tracking, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 5388–5396. https://doi.org/10.1109/CVPR.2015.7299177
    [15] M. Danelljan, G. Hä ger, F. S. Khan, M. Felsberg, Discriminative scale space tracking, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2017), 1561–1575. https://doi.org/10.1109/TPAMI.2016.2609928 doi: 10.1109/TPAMI.2016.2609928
    [16] M. Danelljan, G. Bhat, F. Shahbaz Khan, M. Felsberg, ECO: Efficient convolution operators for tracking, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 6931–6939. https://doi.org/10.1109/CVPR.2017.733
    [17] A. Lukezic, T. Vojir, L. C. Zajc, J. Matas, M. Kristan, Discriminative correlation filter with channel and spatial reliability, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 4847–4856. https://doi.org/10.1109/CVPR.2017.515
    [18] Z. Huang, C. Fu, Y. Li, F. Lin, P. Lu, Learning aberrance repressed correlation filters for real-time UAV tracking, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 2891–2900. https://doi.org/10.1109/ICCV.2019.00298
    [19] B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, J. Yan, SiamRPN++: Evolution of siamese visual tracking with very deep networks, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 4277–4286. https://doi.org/10.1109/CVPR.2019.00441
    [20] T. Xu, Z. Feng, X. Wu, J. Kittler, Joint group feature selection and discriminative filter learning for robust visual object tracking, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 7949–7959. https://doi.org/10.1109/ICCV.2019.00804
    [21] D. S. Bolme, J. R. Beveridge, B. A. Draper, Y. M. Lui, Visual object tracking using adaptive correlation filters, in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2010), 2544–2550. https://doi.org/10.1109/CVPR.2010.5539960
    [22] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, P. H. Torr, Fully-convolutional siamese networks for object tracking, in Computer Vision-ECCV 2016 Workshops, 9914 (2016), 850–865. https://doi.org/10.1007/978-3-319-48881-3_56
    [23] H. K. Galoogahi, A. Fagg, S. Lucey, Learning background-aware correlation filters for visual tracking, in 2017 IEEE International Conference on Computer Vision (ICCV), (2017), 1144–1152. https://doi.org/10.1109/ICCV.2017.129
    [24] Y. Li, C. Fu, F. Ding, Z. Huang, G. Lu, Autotrack: Towards high-performance visual tracking for UAV with automatic spatio-temporal regularization, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 11920–11929. https://doi.org/10.1109/CVPR42600.2020.01194
    [25] Z. Song, J. Yu, Y. P. Chen, W. Yang, Transformer tracking with cyclic shifting window attention, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 8781–8790. https://doi.org/10.1109/CVPR52688.2022.00859
    [26] Y. Chen, H. Wu, Z. Deng, J. Zhang, H. Wang, L. Wang, et al., Deep-feature-based asymmetrical background-aware correlation filter for object tracking, Digital Signal Process., 148 (2024), 104446. https://doi.org/10.1016/j.dsp.2024.104446 doi: 10.1016/j.dsp.2024.104446
    [27] K. Chen, L. Wang, H. Wu, C. Wu, Y. Liao, Y. Chen, et al., Background-aware correlation filter for object tracking with deep CNN features, Eng. Lett., 32 (2024), 1353–1363.
    [28] R. M. Gray, Toeplitz and circulant matrices: A review, Found. Trends Commun. Inf. Theory, 2 (2006), 155–239. http://doi.org/10.1561/0100000006 doi: 10.1561/0100000006
    [29] J. F. Henriques, R. Caseiro, P. Martins, J. Batista, Exploiting the circulant structure of tracking-by-detection with kernels, in Computer Vision-ECCV 2012, (2012), 702–715. https://doi.org/10.1007/978-3-642-33765-9_50
    [30] M. E. Kilmer, C. D. Martin, Factorization strategies for third-order tensors, Linear Algebra Appl., 435 (2011), 641–658. https://doi.org/10.1016/j.laa.2010.09.020 doi: 10.1016/j.laa.2010.09.020
    [31] N. Hao, M. E. Kilmer, K. Braman, R. C. Hoover, Facial recognition using tensor-tensor decompositions, SIAM J. Imaging Sci., 6 (2013), 437–463. https://doi.org/10.1137/110842570 doi: 10.1137/110842570
    [32] M. E. Kilmer, K. Braman, N. Hao, R. C. Hoover, Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging, SIAM J. Matrix Anal. Appl., 34 (2013), 148–172. https://doi.org/10.1137/110837711 doi: 10.1137/110837711
    [33] B. Hunt, A matrix theory proof of the discrete convolution theorem, IEEE Trans. Audio Electroacoust., 19 (1971), 285–288. https://doi.org/10.1109/TAU.1971.1162202 doi: 10.1109/TAU.1971.1162202
    [34] J. Martinez, R. Heusdens, R. C. Hendriks, A generalized Fourier domain: Signal processing framework and applications, Signal Process., 93 (2013), 1259–1267. https://doi.org/10.1016/j.sigpro.2012.10.015 doi: 10.1016/j.sigpro.2012.10.015
    [35] A. Iwasaki, Deriving the variance of the discrete Fourier transform test using Parseval's theorem, IEEE Trans. Inf. Theory, 66 (2020), 1164–1170. https://doi.org/10.1109/TIT.2019.2947045 doi: 10.1109/TIT.2019.2947045
    [36] Q. Hu, H. Wu, J. Wu, J. Shen, H. Hu, Y. Chen, et al., Spatio-temporal self-learning object tracking model based on anti-occlusion mechanism, Eng. Lett., 31 (2023), 1–10.
    [37] Y. Huang, Y. Chen, C. Lin, Q. Hu, J. Song, Visual attention learning and antiocclusion-based correlation filter for visual object tracking, J. Electron. Imaging, 32 (2023), 13023. https://doi.org/10.1117/1.JEI.32.1.013023 doi: 10.1117/1.JEI.32.1.013023
    [38] J. Cui, J. Wu, L. Zhao, Learning channel-selective and aberrance repressed correlation filter with memory model for unmanned aerial vehicle object tracking, Front. Neurosci., 16 (2023). https://doi.org/10.3389/fnins.2022.1080521 doi: 10.3389/fnins.2022.1080521
    [39] C. Fan, H. Yu, Y. Huang, C. Shan, L. Wang, C. Li, SiamON: Siamese occlusion-aware network for visual tracking, IEEE Trans. Circuits Syst. Video Technol., 33 (2023), 186–199. https://doi.org/10.1109/TCSVT.2021.3102886 doi: 10.1109/TCSVT.2021.3102886
    [40] W. Hu, Q. Wang, L. Zhang, L. Bertinetto, P. H. S. Torr, SiamMask: A framework for fast online object tracking and segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2023), 3072–3089.
    [41] D. Sharma, Z. A. Jaffery, Multiple object tracking through background learning, Comput. Syst. Sci. Eng., 44 (2023), 191–204. https://doi.org/10.32604/csse.2023.023728 doi: 10.32604/csse.2023.023728
    [42] J. Zhang, Y. He, S. Wang, Learning adaptive sparse spatially-regularized correlation filters for visual tracking, IEEE Signal Process. Lett., 30 (2023), 11–15. https://doi.org/10.1109/LSP.2023.3238277 doi: 10.1109/LSP.2023.3238277
  • This article has been cited by:

    1. Changhui Wu, Jinrong Shen, Kaiwei Chen, Yingpin Chen, Yuan Liao, UAV object tracking algorithm based on spatial saliency-aware correlation filter, 2025, 33, 2688-1594, 1446, 10.3934/era.2025068
    2. Yingpin Chen, Yuan Liao, Yuxi He, Xianhui He, Qianqian Yu, Ting Chen, Jianhua Song, Hualin Zhang, Non-local similar block matching and hybrid low-rank tensor network for colour image inpainting, 2025, 162, 10512004, 105169, 10.1016/j.dsp.2025.105169
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1110) PDF downloads(59) Cited by(2)

Article outline

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog