Four mathematical modeling forms for correlation filter object tracking algorithms and the fast calculation for the filter

Yingpin Chen; Kaiwei Chen; Yingpin Chen; Kaiwei Chen

doi:10.3934/era.2024213

Electronic Research Archive

2024, Volume 32, Issue 7: 4684-4714. doi: 10.3934/era.2024213

Previous Article Next Article

Research article

Four mathematical modeling forms for correlation filter object tracking algorithms and the fast calculation for the filter

Yingpin Chen ^{1,2
,
,},
Kaiwei Chen ^{1
,
,}

1.
School of Physics and Information Engineering, Minnan Normal University, Zhangzhou 363000, China
2.
Key Laboratory of Light Field Manipulation and System Integration Applications in Fujian Province, Minnan Normal University, Zhangzhou 363000, China

Received: 23 April 2024 Revised: 04 July 2024 Accepted: 10 July 2024 Published: 29 July 2024

The correlation filter object tracking algorithm has gained extensive attention from scholars in the field of tracking because of its excellent tracking performance and efficiency. However, the mathematical modeling relationships of correlation filter tracking frameworks are unclear. Therefore, many forms of correlation filters are susceptible to confusion and misuse. To solve these problems, we attempted to review various forms of the correlation filter and discussed their intrinsic connections. First, we reviewed the basic definitions of the circulant matrix, convolution, and correlation operations. Then, the relationship among the three operations was discussed. Considering this, four mathematical modeling forms of correlation filter object tracking from the literature were listed, and the equivalence of the four modeling forms was theoretically proven. Then, the fast solution of the correlation filter was discussed from the perspective of the diagonalization property of the circulant matrix and the convolution theorem. In addition, we delved into the difference between the one-dimensional and two-dimensional correlation filter responses as well as the reasons for their generation. Numerical experiments were conducted to verify the proposed perspectives. The results showed that the filters calculated based on the diagonalization property and the convolution property of the cyclic matrix were completely equivalent. The experimental code of this paper is available at https://github.com/110500617/Correlation-filter/tree/main.

Keywords:

Citation: Yingpin Chen, Kaiwei Chen. Four mathematical modeling forms for correlation filter object tracking algorithms and the fast calculation for the filter[J]. Electronic Research Archive, 2024, 32(7): 4684-4714. doi: 10.3934/era.2024213

Related Papers:

[1]	Changhui Wu, Jinrong Shen, Kaiwei Chen, Yingpin Chen, Yuan Liao . UAV object tracking algorithm based on spatial saliency-aware correlation filter. Electronic Research Archive, 2025, 33(3): 1446-1475. doi: 10.3934/era.2025068
[2]	Ling Li, Dan He, Cheng Zhang . An image filtering method for dataset production. Electronic Research Archive, 2024, 32(6): 4164-4180. doi: 10.3934/era.2024187
[3]	Lingling Zhang . Vibration analysis and multi-state feedback control of maglev vehicle-guideway coupling system. Electronic Research Archive, 2022, 30(10): 3887-3901. doi: 10.3934/era.2022198
[4]	Zhizhou Zhang, Yueliang Pan, Weilong Zhao, Jinchu Zhang, Zheng Zi, Yuan Xie, Hehong Zhang . Frequency analysis of a discrete-time fast nonlinear tracking differentiator algorithm based on isochronic region method. Electronic Research Archive, 2024, 32(9): 5157-5175. doi: 10.3934/era.2024238
[5]	Bingjie Zhang, Junchao Yu, Zhe Kang, Tianyu Wei, Xiaoyu Liu, Suhua Wang . An adaptive preference retention collaborative filtering algorithm based on graph convolutional method. Electronic Research Archive, 2023, 31(2): 793-811. doi: 10.3934/era.2023040
[6]	Yuxia Liu, Qi Zhang, Wei Xiao, Tianguang Chu . Characteristic period analysis of the Chinese stock market using successive one-sided HP filter. Electronic Research Archive, 2023, 31(10): 6120-6133. doi: 10.3934/era.2023311
[7]	Ruiping Wen, Liang Zhang, Yalei Pei . A hybrid singular value thresholding algorithm with diagonal-modify for low-rank matrix recovery. Electronic Research Archive, 2024, 32(11): 5926-5942. doi: 10.3934/era.2024274
[8]	Yaxi Xu, Yi Liu, Ke Shi, Xin Wang, Yi Li, Jizong Chen . An airport apron ground service surveillance algorithm based on improved YOLO network. Electronic Research Archive, 2024, 32(5): 3569-3587. doi: 10.3934/era.2024164
[9]	Natália Bebiano, João da Providência, Wei-Ru Xu . Approximations for the von Neumann and Rényi entropies of graphs with circulant type Laplacians. Electronic Research Archive, 2022, 30(5): 1864-1880. doi: 10.3934/era.2022094
[10]	Mingtao Cui, Wennan Cui, Wang Li, Xiaobo Wang . A polygonal topology optimization method based on the alternating active-phase algorithm. Electronic Research Archive, 2024, 32(2): 1191-1226. doi: 10.3934/era.2024057

Abstract

1. Introduction

Object tracking ^[1,2,3] technology has become a research hotspot in the field of computer vision ^[4] and it is widely employed in intelligent traffic management ^[5,6], unmanned aerial vehicle tracking ^[7,8], and human-computer interactions ^[9,10]. Correlation filter object tracking algorithms ^{[11,12,13,14,15]} have gained increasing attention in the field of tracking, owing to their excellent tracking performance and efficiency. These methods have become mainstream for visual tracking ^{[16,17,18,19,20]}.

The correlation operator is a signal processing operator that is used to measure signal similarity. Thus, it is widely employed in the field of object tracking. For example, the correlation operator was first introduced into the field of object tracking by Bolme et al. ^[21] in 2010. In 2015, Henriques et al. ^[11] proposed a correlation filter model in the form of a circulant matrix to train a classifier through intensive sampling by the cyclic shift. In 2016, Bertinetto et al. ^[22] introduced the correlation operator in a two-branches weight-shared deep learning network and proposed a SiamFC, a fully convolutional Siamese network. In 2017, Galoogahi et al. ^[23] proposed a background-aware correlation filter (BACF) in the form of vector multiplication, which cleverly avoids the boundary effect existing in the correlation filter tracking method. In 2020, Li et al. ^[24] proposed a correlation filter model in the form of convolution operations, which uses local and global information of response graphs to achieve adaptive spatio-temporal regularization. In 2022, Song et al. ^[25] proposed a Transformer tracker with cyclic shifting window attention, which is calculated by correlation operator. In 2024, Chen et al. ^[26] regarded the correlation operator as the convolution operation and proposed an asymmetrical background-aware correlation filter for object tracking by exploring the shape information of the object. In 2024, Chen et al. ^[27] introduced the deep-convolutional-neural-network-based features in correlation filter framework to further improve the tracking performance of BACF.

The correlation filter object tracking method in the form of circulant matrix utilizes a cyclic shift matrix ^[28,29] to generate many virtual samples, thereby expanding the sample richness to improve algorithm performance. Specifically, the algorithm pulls the training samples into row vectors, and a matrix with a row circulant structure is subsequently formed via a continuous cyclic shift. The filter was designed using this matrix. There are two drawbacks in directly solving the correlation filter in the spatial domain: 1) The spatial domain operation involves the inversion of a large circulant matrix, resulting in high computational complexity; and 2) the matrix formed by the cyclic shift contains a large amount of redundant information, which will occupy a large amount of storage while calculating the filter. Therefore, the property that the circulant matrix can be diagonalized by the Fourier transform matrix is invoked ^[30,31,32] to transform the correlation operation into the entry-wise multiplication operation in the frequency domain to avoid the inverse operation of the large spatial matrix. Notably, the single sample in the frequency domain replaces the virtual sample generated by the cyclic shift, effectively reducing the complexity and storage requirements of the correlation operation.

The discrete convolution operation ^[33] is important in signal processing. In a discrete convolution operation, the signal is reversed and shifted. This moving signal is multiplied entry-wise with another stationary signal and summed to obtain the convolution result. The difference between the correlation and convolution operations is that the correlation operation does not perform the reverse operation on the moving signal. Rather, the correlation operation directly moves the signal. Therefore, the correlation operation is a special type of convolution. Given the convolution operation, the translation, multiplication, and summation calculations of the spatial domain can be transformed into a frequency domain entry-wise multiplication operation based on the convolution theorem ^[33] and Parseval's theorem ^[34,35] to avoid the high storage and computation requirements involved with moving the signal in the spatial domain. Researchers have understood the correlation object tracking framework from the perspective of convolution.

The two approaches previously described (the diagonalization of the circulant matrix ^[36,37] and the transformation of the correlation operator into a convolution) yield the same form of computation, namely, the calculation of the correlation operation via the frequency domain entry-wise multiplication operation, albeit from different perspectives. Hence, there must be a close internal relationship among the different mathematical modeling approaches of the CFs. With the improvement and perfection of correlation filter tracking theory ^{[38,39,40,41,42]}, various forms of object tracking algorithms have been proposed. Based on a mathematical modeling perspective, correlation filter object tracking algorithms can be specifically classified into four forms: Correlation operations ^[21], vector multiplication operations ^[23], circulant matrix operations ^[29], and convolution operations ^[24]. These four modeling methods are expressed differently but are essentially equivalent.

The motivation of this paper is to sort out four mathematical modeling methods for the correlation filter object tracking algorithm by exploring the properties of circulant matrix, convolution, and correlation operations. First, we review the definitions of these four modeling methods. Then, the internal relations of the four modeling methods are discussed in detail. Based on the properties and relationships among circulant matrix, convolution, and correlation operations, two fast correlation filter calculation methods are proposed. Both theoretical derivation and experimental results prove the equivalence of the two methods. Numerical experiments verified the proposed viewpoint. In addition, Most existing studies on the correlation filter ^[16,23] investigated filter calculation in the form of a one-dimensional filter. Recently, few studies have presented a solution to the correlation filter in the form of a two-dimensional matrix ^[26,37]. Thus, we further discuss the relationship and difference between the one-dimensional and two-dimensional filters.

The main contributions of this study are as follows. 1) We comprehensively describe the definitions of the circulant matrix, convolution, and correlation operations and then theoretically prove the four theorems of the circulant matrix. Based on these theorems, the relationships of four modeling approaches for the correlation filter are further discussed. 2) The fast calculation of the correlation filter is discussed from two perspectives: the diagonalization property of the circulant matrix and the convolution theorem. The multiplication and inversion operations of the large-scale matrix are transformed into entry-wise multiplication and entry-wise division operations of the vector to improve the efficiency of the filter solution. 3) We convert a one-dimensional correlation filter into a two-dimensional correlation filter, present the calculation flow of the two filter methods, analyze the differences and connections between the two filter methods, and discuss the reasoning behind these relationships.

The rest of this paper is organized as follows. In Section 2, we present the definitions of the three operations of correlation, circulant matrix, and convolution; argue the four theorems of the circulant matrix; and discuss the relationship among the three operations in depth. In Section 3, we enumerate the four forms of correlation filter tracking modeling. In Section 4, we present the solution to the filter from the perspectives of the diagonalization of circulant matrix and the convolution theorem. In Section 5, we discuss the differences and connections between one-dimensional and two-dimensional filters in detail. In Section 6, we present the verification of the viewpoints presented in this study through numerical experimentation and response plots to verify the equivalence of the two methods for solving the filter. Finally, in Section 7, we draw conclusions and present the outlook for future work.

2. Theoretical basis of the correlation filter

2.1. Definition of the three operations

2.1.1. Definition of the column-vector-based circulant matrix

Suppose the first column vector of the matrix is $\boldsymbol{x} = {({x_0}, {x_1}{\kern 1pt}, {x_2}, \cdots, {x_{N - 1}})^{\rm T}} \in {\mathbb{R}^{N \times 1}}$ , where the superscript ${\text{T}}$ denotes the transpose operation. $\boldsymbol{x}$ is cyclically shifted by one bit to obtain the second-column vector $\boldsymbol{v} = {({x_{N - 1}}{\kern 1pt}, {x_0}, {x_1}, \cdots, {x_{N - 2}})^{\rm T}} \in {\mathbb{R}^{N \times 1}}$ of the column-vector-based circulant matrix. The $N$ column vectors are obtained after $N$ cyclic shifts. These vectors form the column circulant matrix $\boldsymbol{C}\left(\boldsymbol{x} \right) = \left({\begin{array}{*{20}{c}} {{x_0}} & {{x_{N - 1}}} & {{x_{N - 2}}} & \cdots & {{x_1}} \\ {{x_1}} & {{x_0}} & {{x_{N - 1}}} & \cdots & {{x_2}} \\ {{x_2}} & {{x_1}} & {{x_0}} & \cdots & {{x_3}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ {{x_{N - 1}}} & {{x_{N - 2}}} & {{x_{N - 3}}} & \cdots & {{x_0}} \end{array}} \right) \in {\mathbb{R}^{N \times N}}$ .

Similarly, the vector ${\boldsymbol{x}^{\text{T}}} = ({x_0}, {x_1}{\kern 1pt}, {x_2}, \cdots, {x_{N - 1}}) \in {\mathbb{R}^{1 \times N}}$ is cyclically shifted $N$ times as the base vector to obtain $N$ row vectors. These vectors form the row-vector-based circulant matrix $\boldsymbol{C}\left({{\boldsymbol{x}^{\text{T}}}} \right) = \left({\begin{array}{*{20}{c}} {{x_0}} & {{x_1}} & {{x_2}} & \cdots & {{x_{N - 1}}} \\ {{x_{N - 1}}} & {{x_0}} & {{x_1}} & \cdots & {{x_{N - 2}}} \\ {{x_{N - 2}}} & {{x_{N - 1}}} & {{x_0}} & \cdots & {{x_{N - 3}}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ {{x_1}} & {{x_2}} & {{x_3}} & \cdots & {{x_0}} \end{array}} \right) \in {\mathbb{R}^{N \times N}}$ .

The patches obtained by the traditional correlation filter through $N$ cyclic shifts form a circulant matrix. Among the samples generated by the cyclic shift operation, only the first row represents the real sample.

2.1.2. Definition of the discrete convolution operator

Discrete convolution is given by

${\text{(}}\boldsymbol{x} * \boldsymbol{h}{\text{)(}}n{\text{)}} = \sum\limits_{m = 0}^{N - 1} {\boldsymbol{x}(m)\boldsymbol{h}} (n - m),$

(1)

where ${\text{(}}\boldsymbol{x} * \boldsymbol{h}{\text{)}} \in {\mathbb{R}^{N \times 1}}$ and ${\text{(}}\boldsymbol{x} * \boldsymbol{h}{\text{)(}}n{\text{)}}$ are the $n$ th element of the vector $\boldsymbol{x} * \boldsymbol{h}$ , $*$ is a one-dimensional convolution operation, and the signals $\boldsymbol{x} = {({x_0}, {x_1}, \cdots, {x_{N - 1}})^{\rm T}} \in {\mathbb{R}^{N \times 1}}$ and $\boldsymbol{h} = {({h_0}, {h_1}, \cdots, {h_{N - 1}})^{\rm T}} \in {\mathbb{R}^{N \times 1}}$ satisfy the periodic boundary conditions. Notably, in the correlation filter tracking framework, the $\boldsymbol{h}$ is the correlation filter, while in the Siamese tracking framework, $\boldsymbol{h}$ is the test sample, $n = 0, 1, \cdots, N - 1$ .

2.1.3. Definition of the correlation operation

The correlation operation is defined as

${\text{(}}\boldsymbol{x} \star \boldsymbol{h}{\text{)(}}n{\text{) = }}\sum\limits_{m = 0}^{N - 1} {\boldsymbol{x}(m)\boldsymbol{h}} (n{\text{ + }}m) = {\boldsymbol{x}^{\rm T}}\boldsymbol{h}{{[\Delta }}{{\mathsf{τ}}_n}{\text{],}}$

(2)

where $\star$ is a one-dimensional correlation operator and $\boldsymbol{h}{{[\Delta }}{{\mathsf{τ}}_n}{\text{]}} = {\text{circshift(}}\boldsymbol{h}, n{\text{)}}$ , ${\text{circshift(}}\boldsymbol{h}, n{\text{)}}$ denotes the cyclic shift operator that shifts the signal by the ${n^{}}(n = 0, 1, \cdots, N - 1)$ step.

2.2. Diagonalization theorem of circulant matrix

A circulant-matrix structure can effectively capture the motion characteristics of an object and provide accurate prediction information during tracking. However, there is redundancy in circulant matrix data, resulting in a large number of computations when operating in the spatial domain. To solve this problem, the computational complexity must be reduced using Theorem 1.

Theorem 1: If the column-vector-based circulant matrix $\boldsymbol{C}\left(\boldsymbol{x} \right) = \left({\begin{array}{*{20}{c}} {{x_0}} & {{x_{N - 1}}} & {{x_{N - 2}}} & \cdots & {{x_1}} \\ {{x_1}} & {{x_0}} & {{x_{N - 1}}} & \cdots & {{x_2}} \\ {{x_2}} & {{x_1}} & {{x_0}} & \cdots & {{x_3}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ {{x_{N - 1}}} & {{x_{N - 2}}} & {{x_{N - 3}}} & \cdots & {{x_0}} \end{array}} \right)$ is known, then the discrete Fourier transform matrix is ${\boldsymbol{F}_N} = \left({\begin{array}{*{20}{c}} 1 & 1 & \cdots & 1 & 1 \\ 1 & {{\text{e}^{ - j\frac{{2\pi \times 1 \times 1}}{N}}}} & \cdots & {{\text{e}^{ - j\frac{{2\pi \times (N - 2) \times 1}}{N}}}} & {{\text{e}^{ - j\frac{{2\pi \times (N - 1) \times 1}}{N}}}} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 1 & {{\text{e}^{ - j\frac{{2\pi \times 1 \times (N - 2)}}{N}}}} & \cdots & {{\text{e}^{ - j\frac{{2\pi \times (N - 2) \times (N - 2)}}{N}}}} & {{\text{e}^{ - j\frac{{2\pi \times (N - 1) \times (N - 2)}}{N}}}} \\ 1 & {{\text{e}^{ - j\frac{{2\pi \times 1 \times (N - 1)}}{N}}}} & \cdots & {{\text{e}^{ - j\frac{{2\pi \times (N - 2) \times (N - 1)}}{N}}}} & {{\text{e}^{ - j\frac{{2\pi \times (N - 1) \times (N - 1)}}{N}}}} \end{array}} \right)$ , the inverse Fourier transform matrix is, ${\boldsymbol{F}_N} = \left({\begin{array}{*{20}{c}} 1 & 1 & \cdots & 1 & 1 \\ 1 & {{\text{e}^{j\frac{{2\pi \times 1 \times 1}}{N}}}} & \cdots & {{\text{e}^{j\frac{{2\pi \times (N - 2) \times 1}}{N}}}} & {{\text{e}^{j\frac{{2\pi \times (N - 1) \times 1}}{N}}}} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 1 & {{\text{e}^{j\frac{{2\pi \times 1 \times (N - 2)}}{N}}}} & \cdots & {{\text{e}^{j\frac{{2\pi \times (N - 2) \times (N - 2)}}{N}}}} & {{\text{e}^{j\frac{{2\pi \times (N - 1) \times (N - 2)}}{N}}}} \\ 1 & {{\text{e}^{j\frac{{2\pi \times 1 \times (N - 1)}}{N}}}} & \cdots & {{\text{e}^{j\frac{{2\pi \times (N - 2) \times (N - 1)}}{N}}}} & {{\text{e}^{j\frac{{2\pi \times (N - 1) \times (N - 1)}}{N}}}} \end{array}} \right)$ , and $\boldsymbol{\hat x}{\text{ = }}{\boldsymbol{F}_N}\boldsymbol{x}$ is the one-dimensional Fourier transform of the vector $\boldsymbol{x} = {({x_0}, {x_1}{\kern 1pt}, {x_2}, \cdots, {x_{N - 1}})^{\rm T}} \in {\mathbb{R}^{N \times 1}}$ . Then, we obtain ${\boldsymbol{F}_N}\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{F}_N^{ - 1} = {\bf{Diag}}(\boldsymbol{\hat x}) = \left({\begin{array}{*{20}{c}} {{{\boldsymbol{\hat x}}_0}} & 0 & \cdots & 0 \\ 0 & {{{\boldsymbol{\hat x}}_1}} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & {{\boldsymbol{\hat x}_{N - 1}}} \end{array}} \right)$ , where Diag is the operator that stacks the column vectors onto the diagonal of the diagonal matrix.

Proof: In $\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{F}_N^{ - 1}$ , the first row of $\boldsymbol{C}\left(\boldsymbol{x} \right)$ multiplied by the $k{\text{ + }}1$ th ( $k = 0, 1, \cdots, N - 1$ ) column of $\boldsymbol{F}_N^{ - 1}$ is denoted as ${f_{(0, k)}}$ . Then, we have

${f_{(0,k)}} = \frac{1}{N}({x_0} + {x_{N - 1}}{\text{e}^{kj\frac{{2{\text{π }} \times {\text{1}}}}{N}}} + {x_{N - 2}}{\text{e}^{kj\frac{{2{\text{π }} \times 2}}{N}}} + \cdots + {x_1}{\text{e}^{kj\frac{{2{\text{π }} \times (N - 1)}}{N}}}).$

(3)

Using the Euler relation ${e^{ - 2\pi kj}} = \cos (- 2\pi k) + jsin(- 2\pi k) = 1$ , we have

${\text{e}^{kj\frac{{2{\text{π }}}}{N}}} = {\text{e}^{kj\frac{{2{\text{π }}}}{N}}} \cdot {\text{e}^{ - 2{\text{π }}kj}} = {\text{e}^{kj\frac{{2{\text{π }}}}{N} - 2{\text{π }}kj}} = {\text{e}^{ - kj\frac{{2{\text{π }}(N - 1)}}{N}}}.$

(4)

Likewise, we have

${\text{e}^{kj\frac{{2{\text{π }}(N - 1)}}{N}}} = {\text{e}^{kj\frac{{2{\text{π }}(N - 1)}}{N}}} \cdot {\text{e}^{ - 2{\text{π }}kj}} = {\text{e}^{kj\frac{{2{\text{π }}(N - 1)}}{N} - 2{\text{π }}kj}} = {\text{e}^{ - kj\frac{{2{\text{π }}}}{N}}}.$

(5)

According to the period invariance of the complex signals, Eq (3) can be rewritten as

$\begin{align} {f_{(0,k)}} &= \frac{1}{N}({x_0} + {x_{N - 1}}{\text{e}^{ - kj\frac{{2{\text{π }} \times (N - 1)}}{N}}} + \cdots + {x_1}{\text{e}^{ - kj\frac{{2{\text{π }} \times 1}}{N}}}) \hfill \\ & = \frac{1}{N}({x_0} + {x_1}{\text{e}^{ - kj\frac{{2{\text{π }} \times 1}}{N}}} + {x_2}{\text{e}^{ - kj\frac{{2{\text{π }} \times 2}}{N}}} \cdots + {x_{N - 1}}{\text{e}^{ - kj\frac{{2{\text{π }} \times (N - 1)}}{N}}}) \hfill \\ & = \frac{1}{N}{\boldsymbol{\hat x}_k}, \hfill \\ \end{align}$

(6)

where ${\boldsymbol{\hat x}_k}$ is the $k{\text{ + }}1$ th ( $k = 0, 1, \cdots, N - 1$ ) element of $\boldsymbol{\hat x} = {{\bf{fft}}_1}(\boldsymbol{x}) = {\boldsymbol{F}_N}\boldsymbol{x}$ ( ${{\bf{fft}}_1}$ is a one-dimensional fast Fourier transform operator), that is, the $k{\text{ + }}1$ th row of ${\boldsymbol{F}_N}$ multiplied by the vector $\boldsymbol{x}$ .

The second row of $\boldsymbol{C}\left(\boldsymbol{x} \right)$ multiplied by the $k{\text{ + }}1$ th ( $k = 0, 1, \cdots, N - 1$ ) column of $\boldsymbol{F}_N^{ - 1}$ is denoted as ${f_{(1, k)}}$ . The second row of $\boldsymbol{C}\left(\boldsymbol{x} \right)$ is the right-shifted signal of the first row of $\boldsymbol{C}\left(\boldsymbol{x} \right)$ , we have:

$\begin{align} {f_{(1,k)}}& = \frac{1}{N}({x_1} + {x_0}{\text{e}^{kj\frac{{2{\text{π }} \times {\text{1}}}}{N}}} + \cdots + {x_2}{\text{e}^{kj\frac{{2{\text{π }} \times (N - 1)}}{N}}}) \hfill \\ & = \frac{1}{N}({x_0}{\text{e}^{kj\frac{{2{\text{π }} \times {\text{1}}}}{N}}} + {x_1} + {x_2}{\text{e}^{kj\frac{{2{\text{π }} \times (N - 1)}}{N}}} + \cdots + {x_{N - 1}}{\text{e}^{kj\frac{{2{\text{π }} \times 2}}{N}}}) \hfill \\ & = \frac{1}{N}({x_0}{\text{e}^{kj\frac{{2{\text{π }} \times {\text{1}}}}{N}}} + {x_1} + {x_2}{\text{e}^{ - kj\frac{{2{\text{π }} \times 1}}{N}}} + \cdots + {x_{N - 1}}{\text{e}^{ - kj\frac{{2{\text{π }} \times (N - 2)}}{N}}}) \hfill \\ & = \frac{{{\text{e}^{\frac{{kj2{\text{π }} \times {\text{1}}}}{N}}}}}{N}({x_0} + {x_1}{\text{e}^{ - kj\frac{{2{\text{π }} \times 1}}{N}}} + {x_2}{\text{e}^{ - kj\frac{{2{\text{π }} \times 2}}{N}}} + \cdots + {x_{N - 1}}{\text{e}^{ - kj\frac{{2{\text{π }} \times (N - 1)}}{N}}}) \hfill \\ & = {\text{e}^{\frac{{kj2{\text{π }} \times {\text{1}}}}{N}}}{f_{(0,k)}}. \hfill \\ \end{align}$

(7)

Thus, we have

$\begin{align} \boldsymbol{C}\left( \boldsymbol{x} \right)\boldsymbol{F}_N^{ - 1} &= \left( {\begin{array}{*{20}{c}} {{f_{(0,0)}}}&{{f_{(0,1)}}}& \cdots &{{f_{(0,N - 1)}}} \\ {{f_{(1,0)}}}&{{f_{(1,1)}}}& \cdots &{{f_{(1,N - 1)}}} \\ \vdots & \vdots & \ddots & \vdots \\ {{f_{(N - 1,0)}}}&{{f_{(N - 1,1)}}}& \cdots &{{f_{(N - 1,N - 1)}}} \end{array}} \right) = \frac{1}{N}\left( {\begin{array}{*{20}{c}} {{\boldsymbol{\hat x}_0}}&{{\boldsymbol{\hat x}_1}}& \cdots &{{\boldsymbol{\hat x}_{N - 1}}} \\ {{\text{e}^{j\frac{{2\pi \times 0 \times 1}}{N}}}{\boldsymbol{\hat x}_0}}&{{\text{e}^{j\frac{{2\pi \times 1 \times 1}}{N}}}{\boldsymbol{\hat x}_1}}& \cdots &{{\text{e}^{j\frac{{2\pi \times (N - 1) \times 1}}{N}}}{\boldsymbol{\hat x}_{N - 1}}} \\ \vdots & \vdots & \ddots & \vdots \\ {{\text{e}^{j\frac{{2\pi \times 0 \times (N - 1)}}{N}}}{\boldsymbol{\hat x}_0}}&{{\text{e}^{j\frac{{2\pi \times 1 \times (N - 1)}}{N}}}{\boldsymbol{\hat x}_1}}& \cdots &{{\text{e}^{^{j\frac{{2\pi \times (N - 1) \times (N - 1)}}{N}}}}{\boldsymbol{\hat x}_{N - 1}}} \end{array}} \right) \hfill \\ & = \frac{1}{N}\left( {\begin{array}{*{20}{c}} 1&1& \cdots &1 \\ 1&{{\text{e}^{j\frac{{2{\text{π }}}}{N}}}}& \cdots &{{\text{e}^{j\frac{{2{\text{π }}(N - 1)}}{N}}}} \\ \vdots & \vdots & \ddots & \vdots \\ 1&{{\text{e}^{j\frac{{2{\text{π }}(N - 1)}}{N}}}}& \cdots &{{\text{e}^{j\frac{{2{\text{π }}{{(N - 1)}^2}}}{N}}}} \end{array}} \right)\left( {\begin{array}{*{20}{c}} {{\boldsymbol{\hat x}_0}}&{}&{}&{} \\ {}&{{\boldsymbol{\hat x}_1}}&{}&{} \\ {}&{}& \ddots &{} \\ {}&{}&{}&{{\boldsymbol{\hat x}_{N - 1}}} \end{array}} \right) = \boldsymbol{F}_N^{ - 1}{\bf{Diag}}(\boldsymbol{\hat x}). \hfill \\ \end{align}$

(8)

Therefore, we have

${\boldsymbol{F}_N}\boldsymbol{C}\left( \boldsymbol{x} \right)\boldsymbol{F}_N^{ - 1} = {\boldsymbol{F}_N}\boldsymbol{F}_N^{ - 1}{\bf{Diag}}(\boldsymbol{\hat x}) = {\bf{Diag}}(\boldsymbol{\hat x}).$

(9)

Hence, Theorem 1 is proven.

According to the diagonalization theorem of the column circulant matrix, the diagonalization theorem of the row circulant matrix is derived as follows.

Theorem 2: If the row-vector-based circulant matrix $\boldsymbol{C}\left({{\boldsymbol{x}^{\text{T}}}} \right) = \left({\begin{array}{*{20}{c}} {{x_0}} & {{x_1}} & {{x_2}} & \cdots & {{x_{N - 1}}} \\ {{x_{N - 1}}} & {{x_0}} & {{x_1}} & \cdots & {{x_{N - 2}}} \\ {{x_{N - 2}}} & {{x_{N - 1}}} & {{x_0}} & \cdots & {{x_{N - 3}}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ {{x_1}} & {{x_2}} & {{x_3}} & \cdots & {{x_0}} \end{array}} \right)$ is known, let ${\boldsymbol{F}_n}{\text{ = }}\frac{{{\boldsymbol{F}_N}}}{{\sqrt N }}$ denote the normalized discrete Fourier transformation (DFT) matrix. Then, the row-vector-based circulant matrix satisfies $\boldsymbol{C}\left({{\boldsymbol{x}^{\text{T}}}} \right) = {\boldsymbol{F}_n}{\bf{Diag}}(\boldsymbol{\hat x})\boldsymbol{F}_n^H$ (where $\boldsymbol{F}_n^H$ denotes the result of the conjugate transpose on ${\boldsymbol{F}_n}$ ).

Proof: The diagonalization theorem of the row-vector-based circulant matrix can be proven using the diagonalization theorem of the column-vector-based circulant matrix. According to Theorem 1, we simultaneously transpose both sides of the equation to obtain

$\boldsymbol{C}\left( {{\boldsymbol{x}^{\text{T}}}}\right) = (\boldsymbol{F}_N^{\text{T}}){({\bf{Diag}}(\boldsymbol{\hat x}))^{\text{T}}}{(\boldsymbol{F}_N^{ - 1})^{\text{T}}} = (\boldsymbol{F}_N^{\text{T}}){({\bf{Diag}}(\boldsymbol{\hat x}))^{\text{T}}}{\left( {\frac{{\boldsymbol{F}_N^*}}{N}} \right)^{\text{T}}},$

(10)

where $\boldsymbol{F}_N^{ - 1} = \frac{1}{N}\boldsymbol{F}_N^*$ . By decomposing $N$ into $\sqrt N \cdot \sqrt N$ , we have

$\boldsymbol{C}\left( {{\boldsymbol{x}^{\text{T}}}}\right) = \left( {\frac{{\boldsymbol{F}_N^{\text{T}}}}{{\sqrt N }}} \right){({\bf{Diag}}(\boldsymbol{\hat x}))^{\text{T}}}{\left( {\frac{{\boldsymbol{F}_N^*}}{{\sqrt N }}} \right)^{\text{T}}}.$

(11)

As the normalized DFT matrix ${\boldsymbol{F}_n}{\text{ = }}\frac{{{\boldsymbol{F}_N}}}{{\sqrt N }}$ satisfies $\boldsymbol{F}_n^{\rm T} = \boldsymbol{F}_n^{}$ and $\boldsymbol{F}_n^H = \boldsymbol{F}_n^{{\text{ - }}1}$ , we obtain

$\boldsymbol{C}\left( {{\boldsymbol{x}^{\text{T}}}}\right) = {\boldsymbol{F}_n}{({\bf{Diag}}(\boldsymbol{\hat x}))^{\text{T}}}\boldsymbol{F}_n^{ - 1} = {\boldsymbol{F}_n}({\bf{Diag}}(\boldsymbol{\hat x}))\boldsymbol{F}_n^H.$

(12)

Hence, Theorem 2 is proven.

2.3. Relationship between convolution and circulant matrices

Directly performing operations with the circulant matrix in the spatial domain leads to high computational complexity. However, another function of the correlation filter tracking algorithm is to utilize the relationship between the convolution and circulant matrices. The convolution theorem can transform it into an entry-wise operation in the frequency domain to circumvent large matrix multiplication and inverse operations, effectively reducing the number of operations and improving the computational efficiency.

Theorem 3: By multiplying the column-vector-based circulant matrix $\boldsymbol{C}\left(\boldsymbol{x} \right)$ by a signal $\boldsymbol{h} = {({h_0}, {h_1}, \cdots, {h_{N - 1}})^{\rm T}} \in {\mathbb{R}^{N \times 1}}$ , then we have

$\boldsymbol{C}\left( \boldsymbol{x} \right)\boldsymbol{h} = \left( {\begin{array}{*{20}{c}} {{x_0}}&{{x_{N - 1}}}&{{x_{N - 2}}}& \cdots &{{x_1}} \\ {{x_1}}&{{x_0}}&{{x_{N - 1}}}& \cdots &{{x_2}} \\ {{x_2}}&{{x_1}}&{{x_0}}& \cdots &{{x_3}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ {{x_{N - 1}}}&{{x_{N - 2}}}&{{x_{N - 3}}}& \cdots &{{x_0}} \end{array}} \right)\left( {\begin{array}{*{20}{c}} {{h_0}} \\ \begin{gathered} {h_1} \hfill \\ {h_2} \hfill \\ \end{gathered} \\ \vdots \\ {{h_{N - 1}}} \end{array}} \right){\text{ = }}\boldsymbol{x} * \boldsymbol{h}.$

(13)

Proof: By multiplying both sides of Eq (9) with $\boldsymbol{\hat h}$ simultaneously, we have

${\boldsymbol{F}_N}\boldsymbol{C}\left( \boldsymbol{x} \right)\boldsymbol{F}_N^{ - 1}\boldsymbol{\hat h} = {\bf{Diag}}(\boldsymbol{\hat x})\boldsymbol{\hat h}.$

(14)

Then

${\boldsymbol{F}_N}\boldsymbol{C}\left( \boldsymbol{x} \right)\boldsymbol{h} = \boldsymbol{\hat x} \odot \boldsymbol{\hat h},$

(15)

where $\odot$ denotes the entry-wise multiplication operation.

According to the convolution theorem, the entry-wise multiplication of the spectrum of the two signals is equal to that of the spatial convolution signal, we have

$\boldsymbol{\hat x} \odot \boldsymbol{\hat h} = {\boldsymbol{F}_N}(\boldsymbol{x} * \boldsymbol{h}).$

(16)

By combining Eqs (15) and (16), we obtain

${\boldsymbol{F}_N}\boldsymbol{C}\left( \boldsymbol{x} \right)\boldsymbol{h} = {\boldsymbol{F}_N}(\boldsymbol{x} * \boldsymbol{h}).$

(17)

Then, it is seen that

$\boldsymbol{C}\left( \boldsymbol{x} \right)\boldsymbol{h} = \boldsymbol{x} * \boldsymbol{h}.$

(18)

Hence, Theorem 3 is proven.

According to the convolution theorem, spatial convolution can be calculated in the frequency domain. The specific calculation method is

$\boldsymbol{x} * \boldsymbol{h} = {\bf{real}}\left( {{{\bf{ifft}}_1}\left( {\boldsymbol{\hat x} \odot \boldsymbol{\hat h}} \right)} \right),$

(19)

where $\bf{real}$ is the real part-taking operator and ${{\bf{ifft}}_1}$ is a one-dimensional inverse-Fourier transform operator.

Comment 1: The calculation of $\boldsymbol{x} * \boldsymbol{h}$ can be expressed as $\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{h}$ if directly calculated in the spatial domain. The memory space occupied by $\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{h}$ is denoted by ${N^2} + N$ , and its multiplication complexity is denoted by $\mathcal{O}({N^2})$ . Notably, $\boldsymbol{x} * \boldsymbol{h}$ can be calculated in the frequency domain using the convolution theorem, that is, $\boldsymbol{x} * \boldsymbol{h} = {\bf real}\left({{{\bf{ifft}}_1}\left({\boldsymbol{\hat x} \odot \boldsymbol{\hat h}} \right)} \right)$ . The memory space occupied by ${\bf real}\left({{{\bf{ifft}}_1}\left({\boldsymbol{\hat x} \odot \boldsymbol{\hat h}} \right)} \right)$ is denoted by ${\text{4}}N$ , and its multiplication complexity is denoted by $\mathcal{O}(8N{\log _2}N + 4N)$ (It includes two fast Fourier transforms, one inverse Fourier transform and the multiplication of complex numbers in the frequency domain, where two fast Fourier transforms require $2N{\log _2}N$ multiplications with complex number and real number, involving $4N{\log _2}N$ floating-point number multiplications. One fast inverse Fourier transform requires $N{\log _2}N$ complex number multiplications, involving $4N{\log _2}N$ floating-point number multiplications, and the number of floating-point number multiplications required for the multiplication of $N$ complex numbers in the frequency domain is $4N$ ).

presents the occupied memory space and the complexity of the floating-point multiplication operation of the $\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{h}$ and ${\bf real}\left({{{\bf{ifft}}_1}\left({\boldsymbol{\hat x} \odot \boldsymbol{\hat h}} \right)} \right)$ operations. For example, when $N$ is 4, the $\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{h}$ operation occupies ${N^2} + N\left| {_{N = 4}} \right.{\text{ = }}{4^2}{\text{ + }}4{\text{ = 20}}$ floating-point units, and the number of floating-point multiplications is ${N^2}\left| {_{N = 4}} \right.{\text{ = }}{4^2}{\text{ = }}16$ . The ${\bf real}\left({{{\bf{ifft}}_1}\left({\boldsymbol{\hat x} \odot \boldsymbol{\hat h}} \right)} \right)$ operation occupies $4N\left| {_{N = 4}} \right.{\text{ = }}4 \times 4{\text{ = }}16$ floating-point units, and the number of floating-point multiplications is $8N{\log _2}N + 4N\left| {_{N = 4}} \right. = 8 \times 4 \times {\log _2}4 + 4 \times 4 = 80$ . When $N$ is 256, the $\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{h}$ operation occupies ${N^2} + N\left| {_{N = 256}} \right.{\text{ = }}{256^2}{\text{ + }}256{\text{ = 65792}}$ floating-point units, and the number of floating-point multiplications is ${N^2}\left| {_{N = 256}} \right.{\text{ = }}{256^2}{\text{ = 65536}}$ . By contrast, the ${\bf real}\left({{{\bf{ifft}}_1}\left({\boldsymbol{\hat x} \odot \boldsymbol{\hat h}} \right)} \right)$ operation occupies $4N\left| {_{N = 256}} \right.{\text{ = }}4 \times 256{\text{ = }}1024$ floating-point units, and the number of floating-point multiplications is $8N{\log _2}N + 4N\left| {_{N = 256}} \right. = 8 \times 256 \times {\log _2}256 + 4 \times 256 = 17408$ . The results show that when the image size is large, the inverse operation of the large circular matrix in the spatial domain can be transformed into the entry-wise multiplication operation in the frequency domain according to Theorem 3 and the convolution theorem to effectively reduce the number of solving operations.

Table 1. Memory footprint and computational complexity analysis of the one-dimensional operation.

Operation (image size ${\mathbb{R}^{\sqrt N \times \sqrt N }}$ )	Memory space occupied/floating-point unit	Complexity of floating-point multiplication operation/time
$\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{h}$ operation	${N^2} + N$	$\mathcal{O}{\text{(}}{N^2}{\text{)}}$
${\bf real}\left({{{\bf{ifft}}_1}\left({\boldsymbol{\hat x} \odot \boldsymbol{\hat h}} \right)} \right)$ operation	$4N$	$\mathcal{O}(8N{\log _2}N + 4N)$

| Show Table

DownLoad: CSV

Theorem 4: The row-vector-based circulant matrix satisfies $\boldsymbol{C}\left({{\boldsymbol{x}^{\rm T}}} \right)\boldsymbol{h} = \boldsymbol{C}\left(\boldsymbol{\bar x} \right)\boldsymbol{h} = \boldsymbol{\bar x} * \boldsymbol{h}$ .

Proof: By observing each column of the row-vector-based circulant matrix $\boldsymbol{C}\left({{\boldsymbol{x}^{\text{T}}}}\right) = \left({\begin{array}{*{20}{c}} {{x_0}} & {{x_1}} & {{x_2}} & \cdots & {{x_{N - 1}}} \\ {{x_{N - 1}}} & {{x_0}} & {{x_1}} & \cdots & {{x_{N - 2}}} \\ {{x_{N - 2}}} & {{x_{N - 1}}} & {{x_0}} & \cdots & {{x_{N - 3}}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ {{x_1}} & {{x_2}} & {{x_3}} & \cdots & {{x_0}} \end{array}} \right)$ , we find that each column of $\boldsymbol{C}\left({{\boldsymbol{x}^{\rm T}}} \right)$ is obtained by the cyclic shift of the previous column. The first column $\boldsymbol{\bar x} = {({x_0}, {x_{N - 1}}, \cdots, {x_1})^{\rm T}} \in {\mathbb{R}^{N \times 1}}$ of this matrix is the reverse signal of $\boldsymbol{x} = {({x_0}, {x_1}, \cdots, {x_{N - 1}})^{\rm T}} \in {\mathbb{R}^{N \times 1}}$ (e.g., if $\boldsymbol{x} = {\left[ {1, 2, 3, 4} \right]^{\rm T}}$ , then $\boldsymbol{\bar x} = {\left[ {1, 4, 3, 2} \right]^{\rm T}}$ ). Then, we have: $\boldsymbol{C}\left({{\boldsymbol{x}^{\text{T}}}}\right) = \boldsymbol{C}\left(\boldsymbol{\bar x} \right)$ . Combining this result with Eq (13), we obtain

$\boldsymbol{C}\left( {{\boldsymbol{x}^{\rm T}}} \right)\boldsymbol{h} = \boldsymbol{C}\left( \boldsymbol{\bar x} \right)\boldsymbol{h} = \boldsymbol{\bar x} * \boldsymbol{h}.$

(20)

Hence, Theorem 4 is proven.

Comment 2: We can regard $\boldsymbol{h}$ in $\boldsymbol{\bar x} * \boldsymbol{h}$ as a static signal and $\boldsymbol{\bar x}$ as a dynamic signal. According to the definition of the discrete convolution, the dynamic signal should be reversed as $\overline{\boldsymbol{\bar x}}$ , which is equal to $\boldsymbol{x}$ . Then, the reverse signal $\overline{\boldsymbol{\bar x}}$ should be cyclically shifted as the shifted vectors. The vector formed by the inner product of these shifted vectors with the static signal is the result of discrete convolution. The shifted vectors can be stacked into the row-vector-based circulant matrix $\boldsymbol{C}\left({{\boldsymbol{x}^{\rm T}}} \right)$ . Thus, $\boldsymbol{\bar x} * \boldsymbol{h}{\text{ = }}\boldsymbol{C}\left({{\boldsymbol{x}^{\rm T}}} \right)\boldsymbol{h}$ .

Comment 3: $\boldsymbol{\bar x}$ satisfies ${{\bf{fft}}_1}\left(\boldsymbol{\bar x} \right){\text{ = }}{\boldsymbol{\hat x}^{\text{*}}}$ (where ${\boldsymbol{\hat x}^{\text{*}}}$ is the conjugate signal of $\boldsymbol{\hat x}$ ).

Proof: If the spectral signal $\boldsymbol{\hat x}$ is the Fourier transform of the signal $\boldsymbol{x}$ , then the spectral elements of $\boldsymbol{x}$ are expressed as

$\boldsymbol{\hat x}(k) = \sum\limits_{n = 0}^{N - 1} {\boldsymbol{x}(n)} {e^{ - kj\frac{{2\pi n}}{N}}} = \sum\limits_{n = 1}^{N - 1} {\boldsymbol{x}(n)} {e^{ - kj\frac{{2\pi n}}{N}}} + \boldsymbol{x}(0).$

(21)

Similarly, the spectral elements of $\boldsymbol{\bar x}$ are given by

${{\bf{fft}}_1}\left( \boldsymbol{\bar x} \right)(k) = \sum\limits_{n = 0}^{N - 1} {\boldsymbol{\bar x}(n)} {e^{ - kj\frac{{2\pi n}}{N}}} = \sum\limits_{n = 1}^{N - 1} {\boldsymbol{x}(N - n)} {e^{ - kj\frac{{2\pi n}}{N}}} + \boldsymbol{x}(0),$

(22)

where ${{\bf{fft}}_1}\left(\boldsymbol{\bar x} \right)(k)$ is the $k{\text{ + }}1$ th ( $k = 0, 1, \cdots, N - 1$ ) element of ${{\bf{fft}}_1}\left(\boldsymbol{\bar x} \right)$ .

Based on the time-shifting properties of the discrete Fourier transform and Euler's formula ${e^{2\pi kj}} = \cos (2\pi k) + j \sin(2\pi k) = 1$ , Eq (22) can be rewritten as follows

$\begin{align} {{\bf{fft}}_1}\left( \boldsymbol{\bar x} \right)(k) &= \sum\limits_{n = 1}^{N - 1} {\boldsymbol{x}(N - n)} {e^{ - kj\frac{{2\pi n}}{N}}}{e^{2\pi kj}} + \boldsymbol{x}(0) \hfill \\ &= \sum\limits_{n = 1}^{N - 1} {\boldsymbol{x}(N - n)} {e^{kj\frac{{2\pi \left( {N - n} \right)}}{N}}} + \boldsymbol{x}(0) \hfill \\ &= \sum\limits_{t = 1}^{N - 1} {\boldsymbol{x}(t)} {e^{kj\frac{{2\pi t}}{N}}} + \boldsymbol{x}(0)\left| {_{t = N - n}} \right. \hfill \\ &= \sum\limits_{t = 0}^{N - 1} {\boldsymbol{x}(t)} {e^{kj\frac{{2\pi t}}{N}}}. \hfill \\ \end{align}$

(23)

Combining Eqs (21) and (23) yields

${{\bf{fft}}_1}\left( \boldsymbol{\bar x} \right){\text{ = }}{\boldsymbol{\hat x}^{\text{*}}}.$

(24)

Hence, the proof is complete.

According to Eq (20), the second type of proof for Theorem 3 is provided as follows.

Proof:

We observe that $\boldsymbol{C}\left(\boldsymbol{x} \right) = \left({\begin{array}{*{20}{c}} {{x_0}} & {{x_{N - 1}}} & {{x_{N - 2}}} & \cdots & {{x_1}} \\ {{x_1}} & {{x_0}} & {{x_{N - 1}}} & \cdots & {{x_2}} \\ {{x_2}} & {{x_1}} & {{x_0}} & \cdots & {{x_3}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ {{x_{N - 1}}} & {{x_{N - 2}}} & {{x_{N - 3}}} & \cdots & {{x_0}} \end{array}} \right){\text{ = }}\boldsymbol{C}\left({{\boldsymbol{\bar x}^{\rm T}}} \right)$ , thereby we obtain

$\boldsymbol{C}\left( \boldsymbol{x} \right)\boldsymbol{h} = \boldsymbol{C}\left( {{\boldsymbol{\bar x}^{\rm T}}} \right)\boldsymbol{h} = \overline{\boldsymbol{\bar x}} * \boldsymbol{h} = \boldsymbol{x} * \boldsymbol{h},$

(25)

where $\overline{\boldsymbol{\bar x}} {\text{ = }}\boldsymbol{x}$ .

2.4. Relationship among the correlation operation, convolution, and circulant matrix

A tracking algorithm based on a correlation filter significantly improves the tracking speed by transforming complex correlation operations in the spatial domain into simple entry-wise multiplication operations in the frequency domain. Utilizing the relationship between the correlation operation and the convolution, we rewrite the correlation operator as the convolution form

${\text{(}}\boldsymbol{x} \star \boldsymbol{h}{\text{)(}}n{\text{)}} = \sum\limits_{m = 0}^{N - 1} {\boldsymbol{x}(m)\boldsymbol{h}} (n + m) = \sum\limits_{m = 0}^{N - 1} {\boldsymbol{x}(m)\boldsymbol{h}} [n - ( - m)] = (\boldsymbol{x} * \boldsymbol{\bar h})(n),$

(26)

where $\boldsymbol{\bar h} = {({h_0}, {h_{N - 1}}, \cdots, {h_1})^{\rm T}} \in {\mathbb{R}^{N \times 1}}$ is the reverse signal of $\boldsymbol{h} = {({h_0}, {h_1}, \cdots, {h_{N - 1}})^{\rm T}} \in {\mathbb{R}^{N \times 1}}$ , $\boldsymbol{\bar h}$ satisfies one dimensional periodic boundary conditions.

By combining Eqs (20) and (26), the relationship between the correlation operation and the row-vector-based circulant matrix is given by

$\boldsymbol{C}\left( {{\boldsymbol{x}^{\rm T}}} \right)\boldsymbol{h} = \overline {\boldsymbol{x} \star \boldsymbol{h}} .$

(27)

3. Four modeling forms for correlation filter

3.1. First form of correlation filter tracking modeling: Correlation operation form

A traditional discriminative tracking algorithm distinguishes between an object and its background by training its classifier. The background information and object are used as negative and positive samples, respectively. The candidate sample with the highest response is selected as the prediction result. The correlation filter uses ridge regression to design the filter $\boldsymbol{h}$ . The regularization term is added to prevent overfitting. The correlation operation form of the correlation filter is shown in Eq (28).

$E(\boldsymbol{h}) = \mathop {\min }\limits_{\boldsymbol{h}} \frac{1}{2}\left\| {\boldsymbol{y} - \boldsymbol{x} \star \boldsymbol{h}} \right\|_2^2 + \frac{\lambda }{2}\left\| \boldsymbol{h} \right\|_2^2,$

(28)

where $\boldsymbol{x} \in {\mathbb{R}^{N \times {\text{1}}}}$ is the column vector form of the object sample after the weighted cosine window and $N$ is the size of the pixels occupied by the object sample. $\boldsymbol{y} \in {\mathbb{R}^{N \times {\text{1}}}}$ is the desired correlation response, $\boldsymbol{h} \in {\mathbb{R}^{N \times {\text{1}}}}$ is the filter, and $\lambda$ is the balancing parameter, which is utilized to balance the fidelity term $\frac{1}{2}\left\| {\boldsymbol{y} - \boldsymbol{x} \star \boldsymbol{h}} \right\|_2^2$ and ridge regression regularization term $\frac{\lambda }{2}\left\| \boldsymbol{h} \right\|_2^2$ .

3.2. Second form of correlation filter tracking modeling: Vector multiplication form

The vector multiplication form of the correlation filter is given by Eq (29).

$\begin{align} E(\boldsymbol{h}) = & \mathop {\min }\limits_{\boldsymbol{h}} \frac{1}{2}\sum\limits_{n = 0}^{N - 1} {{{\left( {\boldsymbol{y} (n) - {\boldsymbol{x}^{\rm T}}\boldsymbol{h}{{[\Delta }}{{\mathsf{τ}}_n}{\text{]}}} \right)}^2}} + \frac{\lambda }{2}\left\| \boldsymbol{h} \right\|_2^2 \hfill \\ & = \mathop {\min }\limits_{\boldsymbol{h}} \frac{1}{2}\sum\limits_{n = 0}^{N - 1} {{{\left( {\boldsymbol{\bar y} (n) - {\boldsymbol{h}^{\rm T}}\boldsymbol{x}{{[\Delta }}{{\mathsf{τ}}_n}{\text{]}}} \right)}^2}} + \frac{\lambda }{2}\left\| \boldsymbol{h} \right\|_2^2, \hfill \\ \end{align}$

(29)

where if $\boldsymbol{r} = C\left({\boldsymbol{h}_{}^{\rm T}} \right)\boldsymbol{x}$ , then $\boldsymbol{r} \in {\mathbb{R}^{N \times 1}}$ and $\boldsymbol{r}\left(n \right) = \boldsymbol{x}_{}^{\rm T}\boldsymbol{h}{{[\Delta }}{{\mathsf{τ}}_n}{\text{]}}$ .

3.3. Third form of correlation filter tracking modeling: The circulant matrix operation form

A key focus area of the correlation filter tracking algorithm is improving computational efficiency using the characteristics of the circulant matrix. The circulant matrix operation form of the correlation filter is given by Eq (30):

$\begin{align} E(\boldsymbol{h}) & = \mathop {\min }\limits_{\boldsymbol{h}} \frac{1}{2}\left\| {\boldsymbol{y} - \boldsymbol{C}\left( {{\boldsymbol{h}^{\rm T}}} \right)\boldsymbol{x}} \right\|_2^2 + \frac{\lambda }{2}\left\| \boldsymbol{h} \right\|_2^2 \hfill \\ & = \mathop {\min }\limits_{\boldsymbol{h}} \frac{1}{2}\left\| {\boldsymbol{\bar y} - \boldsymbol{C}\left( {{\boldsymbol{x}^{\rm T}}} \right)\boldsymbol{h}} \right\|_2^2 + \frac{\lambda }{2}\left\| \boldsymbol{h} \right\|_2^2, \hfill \\ \end{align}$

(30)

where $C\left({\boldsymbol{h}_{}^{\rm T}} \right)$ is the row-vector-based circulant matrix. $C\left({\boldsymbol{h}_{}^{\rm T}} \right)$ satisfies ${C}\left({\boldsymbol{h}_{}^{\rm T}} \right)\boldsymbol{x} = \boldsymbol{\bar h} * \boldsymbol{x} = \boldsymbol{x} \star \boldsymbol{h}$ and ${C}\left({\boldsymbol{h}_{}^{\rm T}} \right)\boldsymbol{x} = \overline{C\left(\boldsymbol{x}^{\mathrm{T}}\right) \boldsymbol{h}}$ .

3.4. Fourth form of correlation filter tracking modeling: Convolution operation form

According to Eq (26), the correlation forms in Eq (28) can be rewritten in the convolutional form of Eq (31), that is

$E(\boldsymbol{h}) = \mathop {\min }\limits_{\boldsymbol{\bar h}} \frac{1}{2}\left\| {\boldsymbol{y} - \boldsymbol{\bar h} * \boldsymbol{x}} \right\|_2^2 + \frac{\lambda }{2}\left\| \boldsymbol{\bar h} \right\|_2^2,$

(31)

where $*$ denotes the convolution operator that satisfies $\boldsymbol{x} \star \boldsymbol{h}{\text{ = }}\boldsymbol{x} * \boldsymbol{\bar h}$ .

According to the convolution and Parseval's theorems, Eq (31) can be written in the frequency-domain form as follows

$E(\boldsymbol{h}) = \mathop {\min }\limits_{{\boldsymbol{\hat h}^{\text{*}}}} \frac{1}{{2N}}\left\| {\boldsymbol{\hat y} - {\boldsymbol{\hat h}^*} \odot \boldsymbol{\hat x}} \right\|_2^2 + \frac{\lambda }{{2N}}\left\| {{\boldsymbol{\hat h}^*}} \right\|_2^2,$

(32)

where $\boldsymbol{\hat h}$ is the Fourier transform of $\boldsymbol{h}$ , and ${\boldsymbol{\hat h}^{\text{*}}}$ is the conjugate signal of $\boldsymbol{\hat h}$ .

3.5. Relationship among the four modeling forms of correlation filter object tracking

Based on the above discussion, we determine the relationship among the four mathematical modeling forms of correlation filter tracking as follows

$\begin{align} E(\boldsymbol{h}) & = \mathop {\min }\limits_{\boldsymbol{h}} \frac{1}{2}\left\| {\boldsymbol{y} - \boldsymbol{x} \star \boldsymbol{h}} \right\|_2^2 + \frac{\lambda }{2}\left\| \boldsymbol{h} \right\|_2^2 \hfill \\ & = \mathop {\min }\limits_{\boldsymbol{h}} \frac{1}{2}\sum\limits_{n = 0}^{N - 1} {{{\left( {\boldsymbol{y} (n) - {\boldsymbol{x}^{\rm T}}\boldsymbol{h}{{[\Delta }}{{\mathsf{τ}}_n}{\text{]}}} \right)}^2}} + \frac{\lambda }{2}\left\| \boldsymbol{h} \right\|_2^2 \hfill \\ & = \mathop {\min }\limits_{\boldsymbol{h}} \frac{1}{2}\left\| {\boldsymbol{y} - \boldsymbol{C}\left( {{\boldsymbol{h}^{\rm T}}} \right)\boldsymbol{x}} \right\|_2^2 + \frac{\lambda }{2}\left\| \boldsymbol{h} \right\|_2^2 \hfill \\ & = \mathop {\min }\limits_{\boldsymbol{\bar h}} \frac{1}{2}\left\| {\boldsymbol{y} - \boldsymbol{\bar h} * \boldsymbol{x}} \right\|_2^2 + \frac{\lambda }{2}\left\| \boldsymbol{\bar h} \right\|_2^2. \hfill \\ \end{align}$

(33)

Comment 4: $\boldsymbol{C}\left({{\boldsymbol{h}^{\rm T}}} \right)\boldsymbol{x}$ and $\boldsymbol{C}\left({{\boldsymbol{x}^{\rm T}}} \right)\boldsymbol{h}$ are confusing in some studies. There is a reverse relationship between $\boldsymbol{C}\left({{\boldsymbol{h}^{\rm T}}} \right)\boldsymbol{x}$ and $\boldsymbol{C}\left({{\boldsymbol{x}^{\rm T}}} \right)\boldsymbol{h}$ , that is, $\boldsymbol{C}\left({{\boldsymbol{h}^{\rm T}}} \right)\boldsymbol{x} = \overline{\boldsymbol{C}\left(\boldsymbol{x}^{\mathrm{T}}\right) \boldsymbol{h}}$ .

Comment 5: The definition of the correlation operation $\boldsymbol{x} \star \boldsymbol{h}$ differs across studies in the literature. If the element of the correlation operator is defined as $\left({\boldsymbol{x} \star \boldsymbol{h}} \right)(n) = {\boldsymbol{x}^{\rm T}}\boldsymbol{h}{{[\Delta }}{{\mathsf{τ}}_n}{\text{]}}$ , we obtain $\boldsymbol{x} \star \boldsymbol{h} = \boldsymbol{C}\left({{\boldsymbol{h}^{\rm T}}} \right)\boldsymbol{x} = \overline{\boldsymbol{C}\left(\boldsymbol{x}^{\mathrm{T}}\right) \boldsymbol{h}}$ .

4. Equivalence of the two filter-solving methods

4.1. Filter solution based on the diagonalization property of the circulant matrix

In the spatial domain, by computing the first-order derivative of $\boldsymbol{h}$ in Eq (30) and setting it equal to zero, we obtain

$\frac{{dE(\boldsymbol{h})}}{{d \boldsymbol{h}}} = \boldsymbol{C}{\left( {{\boldsymbol{x}^{\text{T}}}} \right)^{\rm H}}\left( {\boldsymbol{C}\left( {{\boldsymbol{x}^{\text{T}}}}\right)\boldsymbol{h} - \boldsymbol{\bar y}} \right) + \lambda \boldsymbol{h} = {\bf{0}}.$

(34)

Then, the spatial domain optimal solution for $\boldsymbol{h}$ is given by

$\begin{align} \boldsymbol{h} = & {\left( {\boldsymbol{C}{{\left( {{\boldsymbol{x}^{\text{T}}}} \right)}^{\rm H}}\boldsymbol{C}\left( {{\boldsymbol{x}^{\text{T}}}}\right) + \lambda } \right)^{ - 1}}\boldsymbol{C}{\left( {{\boldsymbol{x}^{\text{T}}}} \right)^{H}}\boldsymbol{\bar y} \hfill \\ = & {\left( {\boldsymbol{C}{{\left( {{\boldsymbol{x}^{\text{T}}}} \right)}^{\text{T}}}\boldsymbol{C}\left( {{\boldsymbol{x}^{\text{T}}}}\right) + \lambda } \right)^{ - 1}}\boldsymbol{C}{\left( {{\boldsymbol{x}^{\text{T}}}} \right)^{\text{T}}}\boldsymbol{\bar y}. \hfill \\ \end{align}$

(35)

Because the introduction of a circulant matrix generates numerous virtual samples, considerable computation is required. The sample matrix can be transformed into a diagonal matrix for processing based on the diagonalization property of the row-vector-based circulant matrix. This method significantly accelerates the matrix calculations and reduces the computational complexity of directly computing solutions in the spatial domain, that is

$\begin{align} \boldsymbol{h} &= {(\boldsymbol{C}{({\boldsymbol{x}^{\text{T}}})^{\text{T}}}\boldsymbol{C}\left( {{\boldsymbol{x}^{\rm T}}} \right) + \lambda )^{ - 1}}\boldsymbol{C}{({\boldsymbol{x}^{\text{T}}})^{\text{T}}}\boldsymbol{\bar y} \hfill \\ &= {({\boldsymbol{F}_n}{\bf{Diag}}(\boldsymbol{\hat x} \odot {\boldsymbol{\hat x}^{\text{*}}})\boldsymbol{F}_n^{\rm H} + \lambda {\boldsymbol{F}_n}{\bf{Diag}}(\boldsymbol{\delta })\boldsymbol{F}_n^{\rm H})^{ - 1}}C(\boldsymbol{x})\boldsymbol{\bar y} \hfill \\ &= {({\boldsymbol{F}_n}{\bf{Diag}}(\boldsymbol{\hat x} \odot {\boldsymbol{\hat x}^{\text{*}}})\boldsymbol{F}_n^{\rm H} + \lambda {\boldsymbol{F}_n}{\bf{Diag}}(\boldsymbol{\delta })\boldsymbol{F}_n^{\rm H})^{ - 1}}C(\boldsymbol{x})\boldsymbol{\bar y} \hfill \\ &= {{\text{(}}{\boldsymbol{F}_n}({\bf{Diag}}(\boldsymbol{\hat x} \odot {\boldsymbol{\hat x}^{\text{*}}}){\text{ + }}\lambda {\bf{Diag}}(\boldsymbol{\delta }))\boldsymbol{F}_n^{\rm H}{\text{)}}^{ - 1}}C(\boldsymbol{x})\boldsymbol{\bar y} \hfill \\ &= {\boldsymbol{F}_n}{\bf{Diag}}\left( {\frac{1}{{\boldsymbol{\hat x} \odot {\boldsymbol{\hat x}^{\text{*}}} + \lambda \boldsymbol{\delta }}}} \right)\boldsymbol{F}_n^{\rm H}{\boldsymbol{F}_n}{\bf{Diag}}({\boldsymbol{\hat x}^{\text{*}}})\boldsymbol{F}_n^{\rm H} \boldsymbol{\bar y} \hfill \\& = {\boldsymbol{F}_n}{\bf{Diag}}\left( {\frac{{{{\hat x}^{\text{*}}}}}{{\boldsymbol{\hat x} \odot {\boldsymbol{\hat x}^{\text{*}}} + \lambda \boldsymbol{\delta }}}} \right)\boldsymbol{F}_n^{\rm H}\boldsymbol{\bar y} \hfill \\ &= \boldsymbol{C}({\boldsymbol{u}^{\text{T}}})\boldsymbol{\bar y}\left| {_{\boldsymbol{u} = {{\bf{ifft}}_1}\left\{ {\frac{{{\boldsymbol{\hat x}^{\text{*}}}}}{{\boldsymbol{\hat x} \odot {\boldsymbol{\hat x}^{\text{*}}} + \lambda \boldsymbol{\delta }}}} \right\}}} \right. = \boldsymbol{\bar u} * \boldsymbol{\bar y}, \hfill \\ \end{align}$

(36)

where $\boldsymbol{\delta}$ is the column vector whose elements are all 1, that is, $\boldsymbol{\delta } = \left({\begin{array}{*{20}{c}} 1 \\ \vdots \\ 1 \end{array}} \right) \in {\mathbb{R}^{N \times 1}}$ .

According to the convolution theorem, Eq (36) can be transformed into the frequency domain to calculate

$\boldsymbol{h} = {\bf real}\left( {{{\bf{ifft}}_1}\left( {{\boldsymbol{\hat u}^{\text{*}}} \odot {\boldsymbol{\hat y}^{\text{*}}}} \right)} \right) = {\bf real}\left( {{{\bf{ifft}}_1}\left( {\frac{{\boldsymbol{\hat x} \odot {\boldsymbol{\hat y}^{\text{*}}}}}{{\boldsymbol{\hat x} \odot {\boldsymbol{\hat x}^{\text{*}}} + \lambda \boldsymbol{\delta }}}} \right)} \right).$

(37)

For a new test sample $\boldsymbol{z}$ , we have

$\boldsymbol{\bar r} = \boldsymbol{C}\left( {{\boldsymbol{z}^{\rm T}}} \right)\boldsymbol{h},$

(38)

where $\boldsymbol{\bar r}$ is the reverse signal of the spatial domain response $\boldsymbol{r}$ .

By reversing both sides of Eq (38), the spatial domain response $\boldsymbol{r}$ is expressed as

$\boldsymbol{r} = \overline {\boldsymbol{C}\left( {{\boldsymbol{z}^{\rm T}}} \right)\boldsymbol{h}} = \boldsymbol{C}\left( {{\boldsymbol{h}^{\rm T}}} \right)\boldsymbol{z}.$

(39)

As $\boldsymbol{C}\left({{\boldsymbol{z}^{\rm T}}} \right)\boldsymbol{h} = \boldsymbol{\bar z} * \boldsymbol{h}$ , Eq (39) can be written in the convolutional form, as shown in Eq (40).

$\boldsymbol{r} = \overline{\boldsymbol{C}\left(\boldsymbol{z}^{\rm T}\right)\boldsymbol{h}} = \overline{\overline{\boldsymbol{z}}\ast \boldsymbol{h}} = \boldsymbol{z}\ast \overline{\boldsymbol{h}}.$

(40)

According to Eqs (19) and (24), Eq. (40) can be rewritten as

$\boldsymbol{r} = {\bf real}\left( {{{\bf{ifft}}_1}\left( {{\boldsymbol{\hat h}^{\text{*}}} \odot \boldsymbol{\hat z}} \right)} \right).$

(41)

4.2. Filter solution based on the convolution theorem

In the frequency domain, by computing the first-order derivative of ${\boldsymbol{\hat h}^{\text{*}}}$ in Eq (32) and setting it to zero, that is $\frac{{dE(\boldsymbol{h})}}{{d{\boldsymbol{\hat h}^*}}} = \frac{{2 \boldsymbol{\hat x}_{}^* \odot \left({{\boldsymbol{\hat h}^*} \odot \boldsymbol{\hat x} - \boldsymbol{\hat y}} \right)}}{N} + \frac{{2\lambda {\boldsymbol{\hat h}^*}}}{N}{\text{ = }}{\bf{0}}$ , we obtain

${\boldsymbol{\hat h}^{\text{*}}}{\text{ = }}\frac{{\boldsymbol{\hat y} \odot {\boldsymbol{\hat x}^{\text{*}}}}}{{{\boldsymbol{\hat x}^{\text{*}}} \odot \boldsymbol{\hat x}{\text{ + }}\lambda }},$

(42)

where the sign of the division in Eq (42) denotes entry-wise division.

For the new sample $\boldsymbol{\hat z}$ , the corresponding spatial response is

$\boldsymbol{r} = {\bf real}\left( {{{\bf{ifft}}_1}\left( {{\boldsymbol{\hat h}^{\text{*}}} \odot \boldsymbol{\hat z}} \right)} \right).$

(43)

Comment 6: Eqs (41) and (43) show that the results obtained by solving the filter using the diagonalization property of the row-vector-based circulant matrix and the convolution theorem are completely consistent.

5. Difference and connection between the one-dimensional and two-dimensional filter methods

For the object tracking techniques, the image being processed is a two-dimensional signal, whereas all the signals discussed in the previous section are one-dimensional. Hence, we generalize the one-dimensional signal convolution form in Eq (20) into a two-dimensional convolution form, as shown in Eq (44)

${{\bf{Conv}}_2}\left( {{\boldsymbol{\bar I}_1}, {\boldsymbol{I}_2}} \right) = {\bf{mat}}\left( {{\bf{{\bf{Cat}}}}\left( {{\bf{vec}}{{({\boldsymbol{I}_1}[{{\Delta }}{{\mathsf{τ}}_c},{{\Delta }}{{\mathsf{τ}}_r}])}^{\rm T}}} \right){\bf{vec}}({\boldsymbol{I}_2})} \right) = {\bf real}\left( {{{\bf{ifft}}_2}\left( {\boldsymbol{\hat I}_1^* \odot {\boldsymbol{\hat I}_2}} \right)} \right),$

(44)

where $\bf mat$ is an operator that transforms column vectors into matrices, $\bf Cat$ is an operator that stacks the row vectors into a matrix, $vec$ is an operator that transforms a matrix into a column vector, and ${{\bf{Conv}}_2}$ is a two-dimensional convolution operator. The elements of the $c$ th row and th column of ${{\bf{Conv}}_2}\left({{\boldsymbol{\bar I}_1}, {\boldsymbol{I}_2}} \right)$ are $\left\langle {{\bf{vec}}({\boldsymbol{I}_1}[{{\Delta }}{{\mathsf{τ}}_c}, {{\Delta }}{{\mathsf{τ}}_r}]), {\bf{vec}}({\boldsymbol{I}_2})} \right\rangle = {\bf{vec}}{({\boldsymbol{I}_1}[{{\Delta }}{{\mathsf{τ}}_c}, {{\Delta }}{{\mathsf{τ}}_r}])^{\rm T}}{\bf{vec}}({\boldsymbol{I}_2})$ , and ${\boldsymbol{I}_1} \in {\mathbb{R}^{\sqrt N \times \sqrt N }}$ , ${\boldsymbol{I}_2} \in {\mathbb{R}^{\sqrt N \times \sqrt N }}$ , and $\sqrt N$ are integers. $r = 0, 1, \cdots, \sqrt N - 1$ represents the number of row cyclic shifts, and $c = 0, 1, \cdots, \sqrt N - 1$ represents the number of column cyclic shifts. Moreover, ${\boldsymbol{I}_1}[{{\Delta }}{{\mathsf{τ}}_c}, {{\Delta }}{{\mathsf{τ}}_r}] \in {\mathbb{R}^{\sqrt N \times \sqrt N }}$ denotes a matrix obtained by the first cyclic shift ${\boldsymbol{I}_1}$ , row-by-row, using $\boldsymbol{r}$ units to obtain an intermediate matrix, and then performing a cyclic shift of the intermediate matrix, column-by-column, using $c$ units. Finally, ${\boldsymbol{\bar I}_1}$ is the reverse matrix of ${\boldsymbol{I}_1}$ , and the specific calculation is as follows: ${\boldsymbol{I}_1}$ first reverses the original matrix, row-by-row, to obtain an intermediate matrix. Then, it reverses the intermediate matrix, column-by-column, to obtain ${\boldsymbol{\bar I}_1}$ . For example, ${\boldsymbol{I}_1} = \left[ {\begin{array}{*{20}{c}} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{array}} \right]$ is performed, row-by-row, on the original matrix to perform the reverse operation to obtain the intermediate matrix ${\boldsymbol{I}_t} = \left[ {\begin{array}{*{20}{c}} 1 & 3 & 2 \\ 4 & 6 & 5 \\ 7 & 9 & 8 \end{array}} \right]$ . Then, it is carried out on the intermediate matrix ${\boldsymbol{I}_t}$ , column-by-column, to perform the reverse operation. This is followed by ${\boldsymbol{\bar I}_1} = \left[ {\begin{array}{*{20}{c}} 1 & 3 & 2 \\ 7 & 9 & 8 \\ 4 & 6 & 5 \end{array}} \right]$ . Here, ${\bf{mat}}\left({{\bf{Cat}}\left({{\bf{vec}}{{({\boldsymbol{I}_1}[{{\Delta }}{{\mathsf{τ}}_c}, {{\Delta }}{{\mathsf{τ}}_r}])}^{\rm T}}} \right){\bf{vec}}({\boldsymbol{I}_2})} \right) \in {\mathbb{R}^{\sqrt N \times \sqrt N }}$ , ${{\bf{Conv}}_2}\left({{\boldsymbol{\bar I}_1}, {\boldsymbol{I}_2}} \right) \in {\mathbb{R}^{\sqrt N \times \sqrt N }}$ is the two-dimensional convolution of images ${\boldsymbol{\bar I}_1}$ and ${\boldsymbol{I}_2}$ , and ${{\bf{ifft}}_2}$ is a two-dimensional Fourier inverse transform operator.

Comment 7: If ${{\bf{Conv}}_2}\left({{\boldsymbol{\bar I}_1}, {\boldsymbol{I}_2}} \right)$ is calculated in the spatial domain, that is, ${\bf{mat}}\left({{\bf{Cat}}\left({{\bf{vec}}{{({\boldsymbol{I}_1}[{{\Delta }}{{\mathsf{τ}}_c}, {{\Delta }}{{\mathsf{τ}}_r}])}^{\rm T}}} \right){\bf{vec}}({\boldsymbol{I}_2})} \right)$ , the occupied memory space is $\mathcal{O}({N^2} + N)$ , and the computational multiplicative complexity is $\mathcal{O}({N^2})$ . If the convolution theorem is introduced, then the convolution in the spatial domain can be transformed into an entry-wise multiplication operation and a Fourier inverse transform in the frequency domain, that is, ${{\bf{Conv}}_2}\left({{\boldsymbol{\bar I}_1}, {\boldsymbol{I}_2}} \right) = {\bf real}\left({{{\bf{ifft}}_2}\left({\boldsymbol{\hat I}_1^* \odot {\boldsymbol{\hat I}_2}} \right)} \right)$ . The memory space occupied by the algorithm is ${\text{4}}N$ , and the computational multiplication complexity is $\mathcal{O}(8N{\log _2}N + 4N)$ . A detailed calculation process of the two-dimensional filter ${{\bf{R}}_{2D}} = {{\bf{Conv}}_2}\left({{\boldsymbol{I}_1}, {\boldsymbol{I}_2}} \right)$ ( ${\boldsymbol{I}_1} = \left[ {\begin{array}{*{20}{c}} 1 & 4 & 7 \\ 2 & 5 & 8 \\ 3 & 6 & 9 \end{array}} \right]$ , ${\boldsymbol{I}_2} = \left[ {\begin{array}{*{20}{c}} {0.1} & 1 & 1 \\ 1 & {0.1} & 1 \\ 1 & 1 & {0.1} \end{array}} \right]$ ) is listed in Table 2.

Table 2. Two-dimensional filter calculation process.

Element of ${{\bf{R}}_{2D}}$	Stage 1: Reverse	Stage 2: Cyclic shift	Stage 3: Multiplication	Stage 4: Summation
${{\bf{R}}_{2D}}\left({1, 1} \right)$	${\boldsymbol{\bar I}_1} = \left[ {\begin{array}{*{20}{c}} 1 & 7 & 4 \\ 3 & 9 & 6 \\ 2 & 8 & 5 \end{array}} \right]$	${\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_0}, {{\Delta }}{{\mathsf{τ}}_0}] = \left[ {\begin{array}{*{20}{c}} 1 & 7 & 4 \\ 3 & 9 & 6 \\ 2 & 8 & 5 \end{array}} \right]$	$\begin{array}{l} \quad {\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_0}, {{\Delta }}{{\mathsf{τ}}_0}] \odot {\boldsymbol{I}_2} \hfill \\ = \left[ {\begin{array}{{20}{c}} 1 & 7 & 4 \\ 3 & 9 & 6 \\ 2 & 8 & 5 \end{array}} \right] \odot \left[ {\begin{array}{{20}{c}} {0.1} & 1 & 1 \\ 1 & {0.1} & 1 \\ 1 & 1 & {0.1} \end{array}} \right] \hfill \\ = \left[ {\begin{array}{*{20}{c}} {0.1} & 7 & 4 \\ 3 & {0.9} & 6 \\ 2 & 8 & {0.5} \end{array}} \right] \hfill \\ \end{array}$	$31.5$
${{\bf{R}}_{2D}}\left({1, 2} \right)$	${\boldsymbol{\bar I}_1} = \left[ {\begin{array}{*{20}{c}} 1 & 7 & 4 \\ 3 & 9 & 6 \\ 2 & 8 & 5 \end{array}} \right]$	${\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_0}, {{\Delta }}{{\mathsf{τ}}_1}] = \left[ {\begin{array}{*{20}{c}} 4 & 1 & 7 \\ 6 & 3 & 9 \\ 5 & 2 & 8 \end{array}} \right]$	$\begin{array}{l} \quad {\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_0}, {{\Delta }}{{\mathsf{τ}}_1}] \odot {\boldsymbol{I}_2} \hfill \\ = \left[ {\begin{array}{{20}{c}} 4 & 1 & 7 \\ 6 & 3 & 9 \\ 5 & 2 & 8 \end{array}} \right] \odot \left[ {\begin{array}{{20}{c}} {0.1} & 1 & 1 \\ 1 & {0.1} & 1 \\ 1 & 1 & {0.1} \end{array}} \right] \hfill \\ = \left[ {\begin{array}{*{20}{c}} {0.4} & 1 & 7 \\ 6 & {0.3} & 9 \\ 5 & 2 & {0.8} \end{array}} \right] \hfill \\ \end{array}$	$31.5$
${{\bf{R}}_{2D}}\left({1, 3} \right)$	${\boldsymbol{\bar I}_1} = \left[ {\begin{array}{*{20}{c}} 1 & 7 & 4 \\ 3 & 9 & 6 \\ 2 & 8 & 5 \end{array}} \right]$	${\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_0}, {{\Delta }}{{\mathsf{τ}}_2}] = \left[ {\begin{array}{*{20}{c}} 7 & 4 & 1 \\ 9 & 6 & 3 \\ 8 & 5 & 2 \end{array}} \right]$	$\begin{array}{l} \quad {\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_0}, {{\Delta }}{{\mathsf{τ}}_2}] \odot {\boldsymbol{I}_2} \hfill \\ = \left[ {\begin{array}{{20}{c}} 7 & 4 & 1 \\ 9 & 6 & 3 \\ 8 & 5 & 2 \end{array}} \right] \odot \left[ {\begin{array}{{20}{c}} {0.1} & 1 & 1 \\ 1 & {0.1} & 1 \\ 1 & 1 & {0.1} \end{array}} \right] \hfill \\ = \left[ {\begin{array}{*{20}{c}} {0.7} & 4 & 1 \\ 9 & {0.6} & 3 \\ 8 & 5 & {0.2} \end{array}} \right] \hfill \\ \end{array}$	$31.5$
${{\bf{R}}_{2D}}\left({2, 1} \right)$	${\boldsymbol{\bar I}_1} = \left[ {\begin{array}{*{20}{c}} 1 & 7 & 4 \\ 3 & 9 & 6 \\ 2 & 8 & 5 \end{array}} \right]$	${\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_1}, {{\Delta }}{{\mathsf{τ}}_0}] = \left[ {\begin{array}{*{20}{c}} 2 & 8 & 5 \\ 1 & 7 & 4 \\ 3 & 9 & 6 \end{array}} \right]$	$\begin{array}{l} \quad {\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_1}, {{\Delta }}{{\mathsf{τ}}_0}] \odot {\boldsymbol{I}_2} \hfill \\ = \left[ {\begin{array}{{20}{c}} 2 & 8 & 5 \\ 1 & 7 & 4 \\ 3 & 9 & 6 \end{array}} \right] \odot \left[ {\begin{array}{{20}{c}} {0.1} & 1 & 1 \\ 1 & {0.1} & 1 \\ 1 & 1 & {0.1} \end{array}} \right] \hfill \\ = \left[ {\begin{array}{*{20}{c}} {0.2} & 8 & 5 \\ 1 & {0.7} & 4 \\ 3 & 9 & {0.6} \end{array}} \right] \hfill \\ \end{array}$	$31.5$
${{\bf{R}}_{2D}}\left({2, 2} \right)$	${\boldsymbol{\bar I}_1} = \left[ {\begin{array}{*{20}{c}} 1 & 7 & 4 \\ 3 & 9 & 6 \\ 2 & 8 & 5 \end{array}} \right]$	${\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_1}, {{\Delta }}{{\mathsf{τ}}_1}] = \left[ {\begin{array}{*{20}{c}} 5 & 2 & 8 \\ 4 & 1 & 7 \\ 6 & 3 & 9 \end{array}} \right]$	$\begin{array}{l} \quad {\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_1}, {{\Delta }}{{\mathsf{τ}}_1}] \odot {\boldsymbol{I}_2} \hfill \\ = \left[ {\begin{array}{{20}{c}} 5 & 2 & 8 \\ 4 & 1 & 7 \\ 6 & 3 & 9 \end{array}} \right] \odot \left[ {\begin{array}{{20}{c}} {0.1} & 1 & 1 \\ 1 & {0.1} & 1 \\ 1 & 1 & {0.1} \end{array}} \right] \hfill \\ = \left[ {\begin{array}{*{20}{c}} {0.5} & 2 & 8 \\ 4 & {0.1} & 7 \\ 6 & 3 & {0.9} \end{array}} \right] \hfill \\ \end{array}$	$31.5$
${{\bf{R}}_{2D}}\left({2, 3} \right)$	${\boldsymbol{\bar I}_1} = \left[ {\begin{array}{*{20}{c}} 1 & 7 & 4 \\ 3 & 9 & 6 \\ 2 & 8 & 5 \end{array}} \right]$	${\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_1}, {{\Delta }}{{\mathsf{τ}}_2}] = \left[ {\begin{array}{*{20}{c}} 8 & 5 & 2 \\ 7 & 4 & 1 \\ 9 & 6 & 3 \end{array}} \right]$	$\begin{array}{l} \quad {\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_1}, {{\Delta }}{{\mathsf{τ}}_2}] \odot {\boldsymbol{I}_2} \hfill \\ = \left[ {\begin{array}{{20}{c}} 8 & 5 & 2 \\ 7 & 4 & 1 \\ 9 & 6 & 3 \end{array}} \right] \odot \left[ {\begin{array}{{20}{c}} {0.1} & 1 & 1 \\ 1 & {0.1} & 1 \\ 1 & 1 & {0.1} \end{array}} \right] \hfill \\ = \left[ {\begin{array}{*{20}{c}} {0.8} & 5 & 2 \\ 7 & {0.4} & 1 \\ 9 & 6 & {0.3} \end{array}} \right] \hfill \\ \end{array}$	$31.5$
${{\bf{R}}_{2D}}\left({3, 1} \right)$	${\boldsymbol{\bar I}_1} = \left[ {\begin{array}{*{20}{c}} 1 & 7 & 4 \\ 3 & 9 & 6 \\ 2 & 8 & 5 \end{array}} \right]$	${\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_2}, {{\Delta }}{{\mathsf{τ}}_0}] = \left[ {\begin{array}{*{20}{c}} 3 & 9 & 6 \\ 2 & 8 & 5 \\ 1 & 7 & 4 \end{array}} \right]$	$\begin{array}{l} \quad {\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_2}, {{\Delta }}{{\mathsf{τ}}_0}] \odot {\boldsymbol{I}_2} \hfill \\ = \left[ {\begin{array}{{20}{c}} 3 & 9 & 6 \\ 2 & 8 & 5 \\ 1 & 7 & 4 \end{array}} \right] \odot \left[ {\begin{array}{{20}{c}} {0.1} & 1 & 1 \\ 1 & {0.1} & 1 \\ 1 & 1 & {0.1} \end{array}} \right] \hfill \\ = \left[ {\begin{array}{*{20}{c}} {0.3} & 9 & 6 \\ 2 & {0.8} & 5 \\ 1 & 7 & {0.4} \end{array}} \right] \hfill \\ \end{array}$	$31.5$
${{\bf{R}}_{2D}}\left({3, 2} \right)$	${\boldsymbol{\bar I}_1} = \left[ {\begin{array}{*{20}{c}} 1 & 7 & 4 \\ 3 & 9 & 6 \\ 2 & 8 & 5 \end{array}} \right]$	${\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_2}, {{\Delta }}{{\mathsf{τ}}_1}] = \left[ {\begin{array}{*{20}{c}} 6 & 3 & 9 \\ 5 & 2 & 8 \\ 4 & 1 & 7 \end{array}} \right]$	$\begin{array}{l} \quad {\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_2}, {{\Delta }}{{\mathsf{τ}}_1}] \odot {\boldsymbol{I}_2} \hfill \\ = \left[ {\begin{array}{{20}{c}} 6 & 3 & 9 \\ 5 & 2 & 8 \\ 4 & 1 & 7 \end{array}} \right] \odot \left[ {\begin{array}{{20}{c}} {0.1} & 1 & 1 \\ 1 & {0.1} & 1 \\ 1 & 1 & {0.1} \end{array}} \right] \hfill \\ = \left[ {\begin{array}{*{20}{c}} {0.6} & 3 & 9 \\ 5 & {0.2} & 8 \\ 4 & 1 & {0.7} \end{array}} \right] \hfill \\ \end{array}$	$31.5$
${{\bf{R}}_{2D}}\left({3, 3} \right)$	${\boldsymbol{\bar I}_1} = \left[ {\begin{array}{*{20}{c}} 1 & 7 & 4 \\ 3 & 9 & 6 \\ 2 & 8 & 5 \end{array}} \right]$	${\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_2}, {{\Delta }}{{\mathsf{τ}}_2}] = \left[ {\begin{array}{*{20}{c}} 9 & 6 & 3 \\ 8 & 5 & 2 \\ 7 & 4 & 1 \end{array}} \right]$	$\begin{array}{l} \quad {\boldsymbol{\bar I}_1}[{{\Delta }}{{\mathsf{τ}}_2}, {{\Delta }}{{\mathsf{τ}}_2}] \odot {\boldsymbol{I}_2} \hfill \\ = \left[ {\begin{array}{{20}{c}} 9 & 6 & 3 \\ 8 & 5 & 2 \\ 7 & 4 & 1 \end{array}} \right] \odot \left[ {\begin{array}{{20}{c}} {0.1} & 1 & 1 \\ 1 & {0.1} & 1 \\ 1 & 1 & {0.1} \end{array}} \right] \hfill \\ = \left[ {\begin{array}{*{20}{c}} {0.9} & 6 & 3 \\ 8 & {0.5} & 2 \\ 7 & 4 & {0.1} \end{array}} \right] \hfill \\ \end{array}$	$31.5$

| Show Table

DownLoad: CSV

A detailed calculation process of the one-dimensional filter ${\boldsymbol{r}_{1D}} = {\bf{vec}}\left({{\boldsymbol{I}_1}} \right) * {\bf{vec}}\left({{\boldsymbol{I}_2}} \right) = {\boldsymbol{i}_1} * {\boldsymbol{i}_2}$ , where ${\boldsymbol{i}_1} = \left({\begin{array}{*{20}{c}} 1 \\ 2 \\ 3 \\ 4 \\ 5 \\ 6 \\ 7 \\ 8 \\ 9 \end{array}} \right)$ , ${\boldsymbol{i}_2} = \left({\begin{array}{*{20}{c}} {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \end{array}} \right)$ , and ${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_n}]$ is the ${\boldsymbol{\bar i}_1}$ cyclic shift $n$ times, which is listed in Table 3.

Table 3. One-dimensional filter calculation process.

Element of ${\boldsymbol{r}_{1D}}$	Stage 1: Reverse	Stage 2: Cyclic shift	Stage 3: Multiplication	Stage 4: Summation
${\boldsymbol{r}_{1D}}\left(1 \right)$	${\boldsymbol{\bar i}_1} = \left({\begin{array}{*{20}{c}} 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_0}] = \left({\begin{array}{*{20}{c}} 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_0}] \odot {\boldsymbol{i}_2} = \left({\begin{array}{{20}{c}} 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \end{array}} \right) \odot \left({\begin{array}{{20}{c}} {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \end{array}} \right) = \left({\begin{array}{*{20}{c}} {0.1} \\ 9 \\ 8 \\ 7 \\ {0.6} \\ 5 \\ 4 \\ 3 \\ {0.2} \end{array}} \right)$	$36.9$
${\boldsymbol{r}_{1D}}\left(2 \right)$	${\boldsymbol{\bar i}_1} = \left({\begin{array}{*{20}{c}} 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_1}] = \left({\begin{array}{*{20}{c}} 2 \\ 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_1}] \odot {\boldsymbol{i}_2} = \left({\begin{array}{{20}{c}} 2 \\ 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \end{array}} \right) \odot \left({\begin{array}{{20}{c}} {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \end{array}} \right) = \left({\begin{array}{*{20}{c}} {0.2} \\ 1 \\ 9 \\ 8 \\ {0.7} \\ 6 \\ 5 \\ 4 \\ {0.3} \end{array}} \right)$	$34.2$
${\boldsymbol{r}_{1D}}\left(3 \right)$	${\boldsymbol{\bar i}_1} = \left({\begin{array}{*{20}{c}} 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_2}] = \left({\begin{array}{*{20}{c}} 3 \\ 2 \\ 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_2}] \odot {\boldsymbol{i}_2} = \left({\begin{array}{{20}{c}} 3 \\ 2 \\ 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \end{array}} \right) \odot \left({\begin{array}{{20}{c}} {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \end{array}} \right) = \left({\begin{array}{*{20}{c}} {0.3} \\ 2 \\ 1 \\ 9 \\ {0.8} \\ 7 \\ 6 \\ 5 \\ {0.4} \end{array}} \right)$	$31.5$
${\boldsymbol{r}_{1D}}\left(4 \right)$	${\boldsymbol{\bar i}_1} = \left({\begin{array}{*{20}{c}} 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_3}] = \left({\begin{array}{*{20}{c}} 4 \\ 3 \\ 2 \\ 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_3}] \odot {\boldsymbol{i}_2} = \left({\begin{array}{{20}{c}} 4 \\ 3 \\ 2 \\ 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \end{array}} \right) \odot \left({\begin{array}{{20}{c}} {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \end{array}} \right) = \left({\begin{array}{*{20}{c}} {0.4} \\ 3 \\ 2 \\ 1 \\ {0.9} \\ 8 \\ 7 \\ 6 \\ {0.5} \end{array}} \right)$	$28.8$
${\boldsymbol{r}_{1D}}\left(5 \right)$	${\boldsymbol{\bar i}_1} = \left({\begin{array}{*{20}{c}} 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_4}] = \left({\begin{array}{*{20}{c}} 5 \\ 4 \\ 3 \\ 2 \\ 1 \\ 9 \\ 8 \\ 7 \\ 6 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_4}] \odot {\boldsymbol{i}_2} = \left({\begin{array}{{20}{c}} 5 \\ 4 \\ 3 \\ 2 \\ 1 \\ 9 \\ 8 \\ 7 \\ 6 \end{array}} \right) \odot \left({\begin{array}{{20}{c}} {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \end{array}} \right) = \left({\begin{array}{*{20}{c}} {0.5} \\ 4 \\ 3 \\ 2 \\ {0.1} \\ 9 \\ 8 \\ 7 \\ {0.6} \end{array}} \right)$	$34.2$
${\boldsymbol{r}_{1D}}\left(6 \right)$	${\boldsymbol{\bar i}_1} = \left({\begin{array}{*{20}{c}} 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_5}] = \left({\begin{array}{*{20}{c}} 6 \\ 5 \\ 4 \\ 3 \\ 2 \\ 1 \\ 9 \\ 8 \\ 7 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_5}] \odot {\boldsymbol{i}_2} = \left({\begin{array}{{20}{c}} 6 \\ 5 \\ 4 \\ 3 \\ 2 \\ 1 \\ 9 \\ 8 \\ 7 \end{array}} \right) \odot \left({\begin{array}{{20}{c}} {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \end{array}} \right) = \left({\begin{array}{*{20}{c}} {0.6} \\ 5 \\ 4 \\ 3 \\ {0.2} \\ 1 \\ 9 \\ 8 \\ {0.7} \end{array}} \right)$	$31.5$
${\boldsymbol{r}_{1D}}\left(7 \right)$	${\boldsymbol{\bar i}_1} = \left({\begin{array}{*{20}{c}} 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_6}] = \left({\begin{array}{*{20}{c}} 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \\ 1 \\ 9 \\ 8 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_6}] \odot {\boldsymbol{i}_2} = \left({\begin{array}{{20}{c}} 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \\ 1 \\ 9 \\ 8 \end{array}} \right) \odot \left({\begin{array}{{20}{c}} {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \end{array}} \right) = \left({\begin{array}{*{20}{c}} {0.7} \\ 6 \\ 5 \\ 4 \\ {0.3} \\ 2 \\ 1 \\ 9 \\ {0.8} \end{array}} \right)$	$28.8$
${\boldsymbol{r}_{1D}}\left(8 \right)$	${\boldsymbol{\bar i}_1} = \left({\begin{array}{*{20}{c}} 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_7}] = \left({\begin{array}{*{20}{c}} 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \\ 1 \\ 9 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_7}] \odot {\boldsymbol{i}_2} = \left({\begin{array}{{20}{c}} 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \\ 1 \\ 9 \end{array}} \right) \odot \left({\begin{array}{{20}{c}} {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \end{array}} \right) = \left({\begin{array}{*{20}{c}} {0.8} \\ 7 \\ 6 \\ 5 \\ {0.4} \\ 3 \\ 2 \\ 1 \\ {0.9} \end{array}} \right)$	$26.1$
${\boldsymbol{r}_{1D}}\left(9 \right)$	${\boldsymbol{\bar i}_1} = \left({\begin{array}{*{20}{c}} 1 \\ 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_8}] = \left({\begin{array}{*{20}{c}} 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \\ 1 \end{array}} \right)$	${\boldsymbol{\bar i}_1}[{{\Delta }}{{\mathsf{τ}}_8}] \odot {\boldsymbol{i}_2} = \left({\begin{array}{{20}{c}} 9 \\ 8 \\ 7 \\ 6 \\ 5 \\ 4 \\ 3 \\ 2 \\ 1 \end{array}} \right) \odot \left({\begin{array}{{20}{c}} {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \\ 1 \\ 1 \\ 1 \\ {0.1} \end{array}} \right) = \left({\begin{array}{*{20}{c}} {0.9} \\ 8 \\ 7 \\ 6 \\ {0.5} \\ 4 \\ 3 \\ 2 \\ {0.1} \end{array}} \right)$	$31.5$

| Show Table

DownLoad: CSV

presents the occupied memory space and the complexity of the floating-point multiplication operation by the ${\bf{Cat}}\left({{\bf{vec}}{{({\boldsymbol{I}_1}[{{\Delta }}{{\mathsf{τ}}_c}, {{\Delta }}{{\mathsf{τ}}_r}])}^{\rm T}}} \right){\bf{vec}}({\boldsymbol{I}_2})$ and ${\bf real}\left({{{\bf{ifft}}_2}\left({\boldsymbol{\hat I}_1^* \odot {\boldsymbol{\hat I}_2}} \right)} \right)$ operations. shows that the memory footprint and computational complexity of the one-dimensional and two-dimensional operations are completely consistent. The computational complexity of the ${\bf real}\left({{{\bf{ifft}}_2}\left({\boldsymbol{\hat I}_1^* \odot {\boldsymbol{\hat I}_2}} \right)} \right)$ operation is much smaller than that of the ${\bf{Cat}}\left({{\bf{vec}}{{({\boldsymbol{I}_1}[{{\Delta }}{{\mathsf{τ}}_c}, {{\Delta }}{{\mathsf{τ}}_r}])}^{\rm T}}} \right){\bf{vec}}({\boldsymbol{I}_2})$ operation when the image size is large.

Table 4. Memory footprint and computational complexity analysis of the two-dimensional operation.

Operation (image size ${\mathbb{R}^{\sqrt N \times \sqrt N }}$ )	Memory space occupied/floating-point unit	Complexity of floating-point multiplication operation
${\bf{Cat}}\left({{\bf{vec}}{{({\boldsymbol{I}_1}[{{\Delta }}{{\mathsf{τ}}_c}, {{\Delta }}{{\mathsf{τ}}_r}])}^{\rm T}}} \right){\bf{vec}}({\boldsymbol{I}_2})$	${N^2} + N$	$\mathcal{O}{\text{(}}{N^2}{\text{)}}$
${\bf real}\left({{{\bf{ifft}}_2}\left({\boldsymbol{\hat I}_1^* \odot {\boldsymbol{\hat I}_2}} \right)} \right)$	$4N$	$\mathcal{O}(8N{\log _2}N + 4N)$

| Show Table

DownLoad: CSV

The one-dimensional and two-dimensional filters are equivalent in estimating the object's position. However, subtle differences exist between the one-dimensional and two-dimensional filter responses owing to the inconsistency in the receptive fields and periodic boundary conditions between the one-dimensional and two-dimensional convolutions. The receptive field of the one-dimensional convolution operation has only one dimension, whereas that of the two-dimensional convolution operation has two dimensions. By contrast, the periodic boundary-filling signal of a two-dimensional signal is a two-dimensional signal. However, the periodic boundary-filling signal from a two-dimensional image columnized into a one-dimensional signal is a one-dimensional signal. Thus, a difference exists in the data involved in the operation at the same spatial location between the one-dimensional and two-dimensional convolutions. Consequently, subtle differences occur in the results between the two filters.

When we model the correlation filter tracking problem in matrix form, the mathematical formula is modeled as follows

$E(\boldsymbol{H}) = \mathop {\min }\limits_{\boldsymbol{H}} \frac{1}{2}\left\| {\boldsymbol{Y} - \boldsymbol{X}{ \star _2}\boldsymbol{H}} \right\|_2^2 + \frac{\lambda }{2}\left\| \boldsymbol{H} \right\|_2^2,$

(45)

where ${ \star _2}$ represents a two-dimensional correlation operator, $\left({\boldsymbol{X}{ \star _2}\boldsymbol{H}} \right)\left({n, k} \right) = \sum\limits_{m = 0}^{\sqrt N - 1} {\sum\limits_{l = 0}^{\sqrt N - 1} {\boldsymbol{X}(m, l)} \boldsymbol{H}} (n + m, k + l) = {{\bf{Conv}}_2}\left({\boldsymbol{\bar H}, \boldsymbol{X}} \right)$ , $\boldsymbol{\overline H}$ is the reversed two-dimensional signal of $\boldsymbol{H}$ , $\boldsymbol{\overline H}$ satisfies two dimensional periodic boundary conditions, $\boldsymbol{X}$ , $\boldsymbol{Y}$ and $\boldsymbol{H}$ are the matrix form of $\boldsymbol{x}$ , $\boldsymbol{y}$ and $\boldsymbol{h}$ .

Rewrite the above formula into convolution form and we have

$E(\boldsymbol{H}) = \mathop {\min }\limits_{\boldsymbol{H}} \frac{1}{2}\left\| {\boldsymbol{Y} - {{\bf{Conv}}_2}\left( {\boldsymbol{\bar H}, \boldsymbol{X}} \right)} \right\|_2^2 + \frac{\lambda }{2}\left\| \boldsymbol{H} \right\|_2^2,$

(46)

where $\boldsymbol{\bar H}$ satisfies ${{\bf{fft}}_2}\left(\boldsymbol{\bar H} \right) = {\boldsymbol{\hat H}^*}$ , where $\boldsymbol{\hat H}$ denotes the spectrum of $\boldsymbol{H}$ , ${{\bf{fft}}_2}$ represents a two-dimensional fast Fourier transform operator, and ${\boldsymbol{\hat H}^*}$ is the conjugate matrix of $\boldsymbol{\hat H}$ .

According to the convolution theorem, the above expression can be further arranged into the frequency domain form, namely

$E({\boldsymbol{\hat H}^{\text{*}}}) = \mathop {\min }\limits_{{\boldsymbol{\hat H}^{\text{*}}}} \frac{1}{{2N}}\left\| {\boldsymbol{\hat Y} - {\boldsymbol{\hat H}^{\text{*}}} \odot \boldsymbol{\hat X}} \right\|_2^2 + \frac{\lambda }{{2N}}\left\| {{\boldsymbol{\hat H}^{\text{*}}}} \right\|_2^2.$

(47)

Set $\frac{{dE({\boldsymbol{\hat H}^{\text{*}}})}}{{d{\boldsymbol{\hat H}^{\text{*}}}}} = \bf{0}$ , we have

$\frac{{{\boldsymbol{\hat X}^{\text{*}}} \odot \left( {{\boldsymbol{\hat H}^{\text{*}}} \odot \boldsymbol{\hat X} - \boldsymbol{\hat Y}} \right) + \lambda {\boldsymbol{\hat H}^{\text{*}}}}}{N} = \bf{0}.$

(48)

Then, the filter in the matrix form is calculated by

${\boldsymbol{\hat H}^*}{\text{ = }}\frac{{\boldsymbol{\hat Y} \odot {\boldsymbol{\hat X}^{\text{*}}}}}{{{\boldsymbol{\hat X}^{\text{*}}} \odot \boldsymbol{\hat X}{\text{ + }}\lambda }}.$

(49)

Figure 1(a) shows the training sample, and illustrates the desired response. shows the test sample obtained by the cyclic shift of the training sample. presents the plot of the two-dimensional filter's (the two-dimensional filter is calculated by ${\boldsymbol{\hat H}^*}{\text{ = }}\frac{{\boldsymbol{\hat Y} \odot {\boldsymbol{\hat X}^{\text{*}}}}}{{{\boldsymbol{\hat X}^{\text{*}}} \odot \boldsymbol{\hat X}{\text{ + }}\lambda }}$ ) response. shows the plot of the one-dimensional filter's (the one-dimensional filter is calculated by ${\boldsymbol{\hat h}^{\text{*}}}{\text{ = }}\frac{{\boldsymbol{\hat y} \odot {\boldsymbol{\hat x}^{\text{*}}}}}{{{\boldsymbol{\hat x}^{\text{*}}} \odot \boldsymbol{\hat x}{\text{ + }}\lambda }}$ ) response. Figure 1(f) shows the difference between the one-dimensional and two-dimensional filter responses. The results show that the one-dimensional and two-dimensional filters are equivalent in positioning but are not completely consistent. When the image is cyclically shifted to the edge, a significant difference is observed between the one-dimensional and two-dimensional filter responses.

Figure 1. Connections and differences between the one-dimensional and two-dimensional filter responses. (a) Training sample (the size of the sample is

$100 \times 100$ ). (b) Desired response. (c) Test sample obtained via the cyclic shift of the training sample. (d) Two-dimensional filter response. (e) One-dimensional filter response. (f) Difference between the two-dimensional and one-dimensional filter responses.

DownLoad: Full-Size Img PowerPoint

6. Experiments

Experiments were conducted on a computer equipped with an i5-8265 (1.80 GHz) CPU. The proposed viewpoints and the equivalence of the two filter-solving methods were verified through numerical experimentation and by designing response maps, respectively, to ensure the mathematical rigor and scientific validity of the correlation filter object tracking algorithm.

6.1. Numerical experimental verification of Theorem 1

To verify Theorem 1, let $\boldsymbol{x}{\text{ = }}\sin \left({50\pi \boldsymbol{t}} \right)$ . The sampling rate was taken as 100 Hz, and the sampling time is 0–0.99 s, that is, $\boldsymbol{t} = {\left[ {0, 0.01, 0.02, \cdots, 0.99} \right]^{\rm T}}$ . The sinusoidal signal was Fourier transformed, and its real and imaginary parts were taken, respectively. , show the amplitude of the real part of the $\boldsymbol{\hat x}$ and the amplitude of the imaginary part of the $\boldsymbol{\hat x}$ , respectively. The real and imaginary parts were taken for the diagonal elements of ${\boldsymbol{F}_N}\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{F}_N^{ - 1}$ . shows the amplitude of the real part of the diagonal elements of ${\boldsymbol{F}_N}\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{F}_N^{ - 1}$ , and presents the amplitude of the imaginary part of the diagonal elements of ${\boldsymbol{F}_N}\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{F}_N^{ - 1}$ . The ${\left\| {{\boldsymbol{F}_N}\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{F}_N^{ - 1} - {\bf{Diag}}\left(\boldsymbol{\hat x} \right)} \right\|_2}$ in this experiment was $2.83 \times {10^{ - 14}}$ , where ${\left\| \boldsymbol{x} \right\|_2}$ represents the ${l_2}$ norm of $\boldsymbol{x}$ . These results show that the real part map and imaginary part map of both are completely consistent, which proves Theorem 1.

Figure 2. Numerical experimental results for Theorem 1. (a) Amplitude of the real part of

$\boldsymbol{\hat x}$ . (b) Amplitude of the real part of the

${\boldsymbol{F}_N}\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{F}_N^{ - 1}$ diagonal element. (c) Amplitude of the imaginary part of

$\boldsymbol{\hat x}$ . (d) Amplitude of the imaginary part of the

${\boldsymbol{F}_N}\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{F}_N^{ - 1}$ diagonal element.

DownLoad: Full-Size Img PowerPoint

6.2. Numerical experimental verification of Theorem 2

To verify Theorem 2, let $\boldsymbol{x}{\text{ = }}\sin \left({50\pi \boldsymbol{t}} \right)$ . The sampling rate was taken as 100 Hz, and the sampling time is 0–0.99s, that is, $\boldsymbol{t} = {\left[ {0, 0.01, 0.02, \cdots, 0.99} \right]^{\rm T}}$ . shows the pseudo-color map of the row-vector-based circulant matrix $\boldsymbol{C}\left({{\boldsymbol{x}^{\text{T}}}}\right)$ , and presents the pseudo-color map of the real part of ${\boldsymbol{F}_n}{\bf{Diag}}(\boldsymbol{\hat x})\boldsymbol{F}_n^H$ . The two maps are completely similar. The ${\left\| {{\boldsymbol{F}_n}{\bf{Diag}}(\boldsymbol{\hat x})\boldsymbol{F}_n^H - \boldsymbol{C}\left({{\boldsymbol{x}^{\text{T}}}}\right)} \right\|_2}$ in this experiment was $2.86 \times {10^{ - 12}}$ , thereby proving Theorem 2.

Figure 3. Numerical experimental results for Theorem 2. (a) Pseudo-color map of

$\boldsymbol{C}\left({{\boldsymbol{x}^{\text{T}}}}\right)$ . (b) Pseudo-color map of ${\boldsymbol{F}_n}{\bf{Diag}}(\boldsymbol{\hat x})\boldsymbol{F}_n^H$ .

DownLoad: Full-Size Img PowerPoint

6.3. Numerical experimental verification of Theorem 3

To verify Theorem 3, let $\boldsymbol{x}{\text{ = }}\sin \left({50\pi \boldsymbol{t}} \right)$ and $\boldsymbol{h}{\text{ = }}\sin \left({20\pi \boldsymbol{t}} \right)$ . The sampling rate was taken as 100 Hz, and the sampling time is 0–0.99 s, that is, $\boldsymbol{t} = {\left[ {0, 0.01, 0.02, \cdots, 0.99} \right]^{\rm T}}$ . , present the amplitudes of $\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{h}$ over 1 s and ${\bf real}\left({{{\bf{ifft}}_1}\left({\boldsymbol{\hat x} \odot \boldsymbol{\hat h}} \right)} \right)$ over 1 s, respectively. The results show that the two amplitude maps are completely consistent. The ${\left\| {\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{h} - {\bf real}\left({{{\bf{ifft}}_1}\left({\boldsymbol{\hat x} \odot \boldsymbol{\hat h}} \right)} \right)} \right\|_2}$ in this experiment was $2.44 \times {10^{ - 12}}$ . Since $\boldsymbol{x} * \boldsymbol{h}{\text{ = }}{\bf real}\left({{{\bf{ifft}}_1}\left({\boldsymbol{\hat x} \odot \boldsymbol{\hat h}} \right)} \right)$ , Theorem 3 is verified.

Figure 4. Numerical experimental results of Theorem 3. (a) Amplitude of $\boldsymbol{C}\left(\boldsymbol{x} \right)\boldsymbol{h}$ and (b) amplitude of ${\bf real}\left({{{\bf{ifft}}_1}(\boldsymbol{\hat x} \odot \boldsymbol{\hat h})} \right)$ .

DownLoad: Full-Size Img PowerPoint

6.4. Numerical experimental verification of Theorem 4

To verify Theorem 4, let $\boldsymbol{x}{\text{ = }}\sin \left({50\pi \boldsymbol{t}} \right)$ and $\boldsymbol{h}{\text{ = }}\sin \left({20\pi \boldsymbol{t}} \right)$ . The sampling rate was taken as 100 Hz, and the sampling time is 0–0.99 s, that is, $\boldsymbol{t} = {\left[ {0, 0.01, 0.02, \cdots, 0.99} \right]^{\rm T}}$ . shows the amplitude of $\boldsymbol{C}\left({{\boldsymbol{x}^{\rm T}}} \right)\boldsymbol{h}$ over 1 s, and presents the amplitude of ${\bf real}\left({{{\bf{ifft}}_1}\left({{\boldsymbol{\hat x}^*} \odot \boldsymbol{\hat h}} \right)} \right)$ over 1 s. The results show that the two amplitude maps are completely consistent. The ${\left\| {\boldsymbol{C}\left({{\boldsymbol{x}^{\rm T}}} \right)\boldsymbol{h} - {\bf real}\left({{{\bf{ifft}}_1}\left({{\boldsymbol{\hat x}^{\text{*}}} \odot \boldsymbol{\hat h}} \right)} \right)} \right\|_2}$ in this experiment was $2.44 \times {10^{ - 12}}$ . As $\boldsymbol{\bar x} * \boldsymbol{h}{\text{ = }}{\bf real}\left({{{\bf{ifft}}_1}\left({{\boldsymbol{\hat x}^*} \odot \boldsymbol{\hat h}} \right)} \right)$ , Theorem 4 is verified.

Figure 5. Numerical experimental results for Theorem 4. (a) Amplitude of $\boldsymbol{C}\left({{\boldsymbol{x}^{\rm T}}} \right)\boldsymbol{h}$ and (b) amplitude of ${\bf real}\left({{{\bf{ifft}}_1}\left({{\boldsymbol{\hat x}^*} \odot \boldsymbol{\hat h}} \right)} \right)$ .

DownLoad: Full-Size Img PowerPoint

6.5. Numerical experimental verification of Eq (44)

To verify Eq (44), let ${\boldsymbol{I}_1}$ be the two-dimensional image signal, as shown in , and let ${\boldsymbol{I}_2}$ be the two-dimensional image signal, as shown in . , illustrate the ${\bf real}\left({{{\bf{ifft}}_2}\left({\boldsymbol{\hat I}_1^* \odot {\boldsymbol{\hat I}_2}} \right)} \right)$ and ${\bf{mat}}\left({{\bf{Cat}}\left({{\bf{vec}}{{({\boldsymbol{I}_1}[{{\Delta }}{{\mathsf{τ}}_c}, {{\Delta }}{{\mathsf{τ}}_r}])}^{\rm T}}} \right){\bf{vec}}({\boldsymbol{I}_2})} \right)$ responses. The results show that the responses were completely consistent as shown in ,. The ${\left\| {{\bf real}\left({{{\bf{ifft}}_2}\left({\boldsymbol{\hat I}_1^* \odot {\boldsymbol{\hat I}_2}} \right)} \right) - {\bf{mat}}\left({{\bf{Cat}}\left({{\bf{vec}}{{({\boldsymbol{I}_1}[{{\Delta }}{{\mathsf{τ}}_c}, {{\Delta }}{{\mathsf{τ}}_r}])}^{\rm T}}} \right){\bf{vec}}({\boldsymbol{I}_2})} \right)} \right\|_2}$ in this experiment was $1.55 \times {10^{ - 15}}$ , proving that Eq (44) holds.

Figure 6. Numerical experimental results to validate Eq (44). (a) ${\boldsymbol{I}_1}$ image, (b) ${\boldsymbol{I}_2}$ image, (c) ${\bf real}\left({{{\bf{ifft}}_2}\left({\boldsymbol{\hat I}_1^* \odot {\boldsymbol{\hat I}_2}} \right)} \right)$ response, and (d) ${\bf{mat}}\left({{\bf{Cat}}\left({{\bf{vec}}{{({\boldsymbol{I}_1}[{{\Delta }}{{\mathsf{τ}}_c}, {{\Delta }}{{\mathsf{τ}}_r}])}^{\rm T}}} \right){\bf{vec}}({\boldsymbol{I}_2})} \right)$ response.

DownLoad: Full-Size Img PowerPoint

6.6. Equivalence verification of the two filter-solving methods

shows the base sample, and shows the predicted sample obtained via the cyclic shift of the base sample. In addition, , present the desired correlation response $y$ and the spatial domain response obtained according to Eq (43), respectively. Figures 7(e),(f) show the spatial domain response obtained using Eqs (38) and (39), respectively. The results show that the spatial responses in Figures 7(d),(f) are completely consistent (i.e., the two filter-solving methods are equivalent).

Figure 7. Equivalence experiment of the two filter-solving methods. (a) Base sample. (b) Predicted sample obtained via the cyclic shift of the base sample. (c) Desired correlation response. (d) Spatial domain response based on Eq (43). (e) Spatial domain response based on Eq (38). (f) Spatial domain response based on Eq (39).

DownLoad: Full-Size Img PowerPoint

Through the diagonalization property of the row-vector-based circulant matrix and the convolution theorem, the two filter-solving methods transform the spatial domain operation into an entry-wise multiplication operation in the frequency domain to circumvent the inverse operations of large matrices. lists the running times required to calculate Eqs (35) and (37) for different image sizes, where $\lambda$ was 0.1. The results indicate that solving the filter in the frequency domain can effectively reduce the operation time and improve the computational efficiency of the filter. The larger the signal size, the more apparent the advantage of solving the filter in the frequency domain.

Table 5. Time consumed to solve in the spatial and frequency domains.

Average time consumed (the number of the experiments is 10, and the image size is ${\mathbb{R}^{\sqrt N \times \sqrt N }}$ )	$N{\text{ = }}256$	$N{\text{ = }}1024$	$N{\text{ = }}4096$
Time consumed in the spatial domain/s (according to Eq. (35))	$0.002429$	$0.088521$	$3.248634$
Time consumed in the frequency domain/s (according to Eq (37))	$0.000965$	$0.001281$	$0.001537$

| Show Table

DownLoad: CSV

7. Conclusions

In this study, we systematically elucidated the theoretical modeling system of the correlation filter. Based on existing literature on correlation filters, four types of mathematical modeling and two types of filter-fast calculation methods were summarized and experimentally proven. The relationship among the four modeling types for correlation filter were discussed in detail. Our conclusions are as follows:

1) We elaborated on the definitions of the circulant matrix, convolution, and correlation operations in the correlation filter and their relationships. The viewpoints and mathematical findings provided in this study can provide useful theoretical support for research in the field of correlated filter object tracking.

2) The diagonalization property of the circulant matrix and the convolution theorem were employed to solve the filter by transforming the spatial-domain operation into an entry-wise multiplication operation in the frequency domain. This approach avoids the inverse operation of large spatial-domain matrices and reduces the computational complexity compared with directly solving in the spatial domain. The experiments showed that the results obtained using the two filter-solving methods were consistent. The proposed fast filter calculation method is critical the efficient implementation of the correlation filter tracking algorithm.

3) We experimentally proved the existence of slight differences between the one-dimensional and two-dimensional filter methods. The main reasons for these differences were discussed in detail. Subsequently, the equivalence of the two filter methods in object positioning was reflected via experimentation to provide a reliable foundation for the engineering realization of the two theoretical methods.

Traditional correlation filter tracking frameworks utilize handcrafted features to distinguish the object and background. The discrimination ability of these features is limited; thus, the application of the correlation filter tracking method in complex scenes has some limitations. As deep learning technology gradually matures, it will provide a correlation filter theoretical framework with more discriminative visual features. Subsequent work should attempt to improve the correlation and deep-learning-based tracking algorithm to improve the overall performance of the tracker. For example, the computational theory proposed in this paper can be introduced into cyclic shifting attention computation ^[25] to obtain more efficient computation.

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

This work is supported by the Natural Science Foundation Project of Zhangzhou City (ZZ2023J37), the Principal Foundation of Minnan Normal University (KJ19019), the High-level Science Research Project of Minnan Normal University (GJ19019), Research Project on Education and Teaching of Undergraduate Colleges and Universities in Fujian Province (FBJY20230083), and the Education Research Program of Minnan Normal University (202211).

Conflict of interest

The authors declare there is no conflict of interest.

References

[1]	S. Javed, M. Danelljan, F. S. Khan, M. H. Khan, M. Felsberg, J. Matas, Visual object tracking with discriminative filters and siamese networks: A survey and outlook, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2023), 6552–6574. https://doi.org/10.1109/TPAMI.2022.3212594 doi: 10.1109/TPAMI.2022.3212594
[2]	F. Chen, X. Wang, Y. Zhao, S. Lv, X. Niu, Visual object tracking: A survey, Comput. Vision Image Understanding, 222 (2022), 103508. https://doi.org/10.1016/j.cviu.2022.103508 doi: 10.1016/j.cviu.2022.103508
[3]	D. Zhang, Z. Zheng, M. Li, R. Liu, CSART: Channel and spatial attention-guided residual learning for real-time object tracking, Neurocomputing, 436 (2021), 260–272. https://doi.org/10.1016/j.neucom.2020.11.046 doi: 10.1016/j.neucom.2020.11.046
[4]	F. Gu, J. Lu, C. Cai, Q. Zhu, Z. Ju, RTSformer: A robust toroidal transformer with spatiotemporal features for visual tracking, IEEE Trans. Hum.-Mach. Syst., 54 (2024), 214–225. https://doi.org/10.1109/THMS.2024.3370582 doi: 10.1109/THMS.2024.3370582
[5]	Y. Qian, L. Yu, W. Liu, A. G. Hauptmann, Electricity: An efficient multi-camera vehicle tracking system for intelligent city, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2020), 2511–2519. https://doi.org/10.1109/CVPRW50498.2020.00302
[6]	X. Chen, X. Xu, Y. Yang, Y. Huang, J. Chen, Y. Yan, Visual ship tracking via a hybrid kernelized correlation filter and anomaly cleansing framework, Appl. Ocean Res., 106 (2021), 102455. https://doi.org/10.1016/j.apor.2020.102455 doi: 10.1016/j.apor.2020.102455
[7]	H. Zhang, Y. Li, H. Liu, D. Yuan, Y. Yang, Feature block-aware correlation filters for real-time UAV tracking, IEEE Signal Process. Lett., 31 (2024), 840–844. https://doi.org/10.1109/LSP.2024.3373528 doi: 10.1109/LSP.2024.3373528
[8]	X. Wang, D. Zeng, Y. Li, M. Zou, Q. Zhao, S. Li, Enhancing UAV tracking: a focus on discriminative representations using contrastive instances, J. R.-Time Image Process., 21 (2024), 78. https://doi.org/10.1007/s11554-024-01456-2 doi: 10.1007/s11554-024-01456-2
[9]	C. Zhu, J. Yang, Z. Shao, C. Liu, Vision based hand gesture recognition using 3D shape context, IEEE/CAA J. Autom. Sin., 8 (2021), 1600–1613. https://doi.org/10.1109/JAS.2019.1911534 doi: 10.1109/JAS.2019.1911534
[10]	M. N. H. Mohd, M. S. M. Asaari, O. L. Ping, B. A. Rosdi, Vision-based hand detection and tracking using fusion of kernelized correlation filter and single-shot detection, Appl. Sci., 13 (2023), 7433. https://doi.org/10.3390/app13137433 doi: 10.3390/app13137433
[11]	J. F. Henriques, R. Caseiro, P. Martins, J. Batista, High-speed tracking with kernelized correlation filters, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015), 583–596. https://doi.org/10.1109/TPAMI.2014.2345390 doi: 10.1109/TPAMI.2014.2345390
[12]	Y. Li, J. Zhu, A scale adaptive kernel correlation filter tracker with feature integration, in Computer Vision-ECCV 2014 Workshops, 8926 (2014), 254–265. https://doi.org/10.1007/978-3-319-16181-5_18
[13]	M. Danelljan, G. Hager, F. S. Khan, M. Felsberg, Learning spatially regularized correlation filters for visual tracking, in 2015 IEEE International Conference on Computer Vision (ICCV), (2015), 4310–4318. https://doi.org/10.1109/ICCV.2015.490
[14]	C. Ma, X. Yang, C. Zhang, M. Yang, Long-term correlation tracking, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 5388–5396. https://doi.org/10.1109/CVPR.2015.7299177
[15]	M. Danelljan, G. Hä ger, F. S. Khan, M. Felsberg, Discriminative scale space tracking, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2017), 1561–1575. https://doi.org/10.1109/TPAMI.2016.2609928 doi: 10.1109/TPAMI.2016.2609928
[16]	M. Danelljan, G. Bhat, F. Shahbaz Khan, M. Felsberg, ECO: Efficient convolution operators for tracking, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 6931–6939. https://doi.org/10.1109/CVPR.2017.733
[17]	A. Lukezic, T. Vojir, L. C. Zajc, J. Matas, M. Kristan, Discriminative correlation filter with channel and spatial reliability, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 4847–4856. https://doi.org/10.1109/CVPR.2017.515
[18]	Z. Huang, C. Fu, Y. Li, F. Lin, P. Lu, Learning aberrance repressed correlation filters for real-time UAV tracking, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 2891–2900. https://doi.org/10.1109/ICCV.2019.00298
[19]	B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, J. Yan, SiamRPN++: Evolution of siamese visual tracking with very deep networks, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 4277–4286. https://doi.org/10.1109/CVPR.2019.00441
[20]	T. Xu, Z. Feng, X. Wu, J. Kittler, Joint group feature selection and discriminative filter learning for robust visual object tracking, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 7949–7959. https://doi.org/10.1109/ICCV.2019.00804
[21]	D. S. Bolme, J. R. Beveridge, B. A. Draper, Y. M. Lui, Visual object tracking using adaptive correlation filters, in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2010), 2544–2550. https://doi.org/10.1109/CVPR.2010.5539960
[22]	L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, P. H. Torr, Fully-convolutional siamese networks for object tracking, in Computer Vision-ECCV 2016 Workshops, 9914 (2016), 850–865. https://doi.org/10.1007/978-3-319-48881-3_56
[23]	H. K. Galoogahi, A. Fagg, S. Lucey, Learning background-aware correlation filters for visual tracking, in 2017 IEEE International Conference on Computer Vision (ICCV), (2017), 1144–1152. https://doi.org/10.1109/ICCV.2017.129
[24]	Y. Li, C. Fu, F. Ding, Z. Huang, G. Lu, Autotrack: Towards high-performance visual tracking for UAV with automatic spatio-temporal regularization, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 11920–11929. https://doi.org/10.1109/CVPR42600.2020.01194
[25]	Z. Song, J. Yu, Y. P. Chen, W. Yang, Transformer tracking with cyclic shifting window attention, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 8781–8790. https://doi.org/10.1109/CVPR52688.2022.00859
[26]	Y. Chen, H. Wu, Z. Deng, J. Zhang, H. Wang, L. Wang, et al., Deep-feature-based asymmetrical background-aware correlation filter for object tracking, Digital Signal Process., 148 (2024), 104446. https://doi.org/10.1016/j.dsp.2024.104446 doi: 10.1016/j.dsp.2024.104446
[27]	K. Chen, L. Wang, H. Wu, C. Wu, Y. Liao, Y. Chen, et al., Background-aware correlation filter for object tracking with deep CNN features, Eng. Lett., 32 (2024), 1353–1363.
[28]	R. M. Gray, Toeplitz and circulant matrices: A review, Found. Trends Commun. Inf. Theory, 2 (2006), 155–239. http://doi.org/10.1561/0100000006 doi: 10.1561/0100000006
[29]	J. F. Henriques, R. Caseiro, P. Martins, J. Batista, Exploiting the circulant structure of tracking-by-detection with kernels, in Computer Vision-ECCV 2012, (2012), 702–715. https://doi.org/10.1007/978-3-642-33765-9_50
[30]	M. E. Kilmer, C. D. Martin, Factorization strategies for third-order tensors, Linear Algebra Appl., 435 (2011), 641–658. https://doi.org/10.1016/j.laa.2010.09.020 doi: 10.1016/j.laa.2010.09.020
[31]	N. Hao, M. E. Kilmer, K. Braman, R. C. Hoover, Facial recognition using tensor-tensor decompositions, SIAM J. Imaging Sci., 6 (2013), 437–463. https://doi.org/10.1137/110842570 doi: 10.1137/110842570
[32]	M. E. Kilmer, K. Braman, N. Hao, R. C. Hoover, Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging, SIAM J. Matrix Anal. Appl., 34 (2013), 148–172. https://doi.org/10.1137/110837711 doi: 10.1137/110837711
[33]	B. Hunt, A matrix theory proof of the discrete convolution theorem, IEEE Trans. Audio Electroacoust., 19 (1971), 285–288. https://doi.org/10.1109/TAU.1971.1162202 doi: 10.1109/TAU.1971.1162202
[34]	J. Martinez, R. Heusdens, R. C. Hendriks, A generalized Fourier domain: Signal processing framework and applications, Signal Process., 93 (2013), 1259–1267. https://doi.org/10.1016/j.sigpro.2012.10.015 doi: 10.1016/j.sigpro.2012.10.015
[35]	A. Iwasaki, Deriving the variance of the discrete Fourier transform test using Parseval's theorem, IEEE Trans. Inf. Theory, 66 (2020), 1164–1170. https://doi.org/10.1109/TIT.2019.2947045 doi: 10.1109/TIT.2019.2947045
[36]	Q. Hu, H. Wu, J. Wu, J. Shen, H. Hu, Y. Chen, et al., Spatio-temporal self-learning object tracking model based on anti-occlusion mechanism, Eng. Lett., 31 (2023), 1–10.
[37]	Y. Huang, Y. Chen, C. Lin, Q. Hu, J. Song, Visual attention learning and antiocclusion-based correlation filter for visual object tracking, J. Electron. Imaging, 32 (2023), 13023. https://doi.org/10.1117/1.JEI.32.1.013023 doi: 10.1117/1.JEI.32.1.013023
[38]	J. Cui, J. Wu, L. Zhao, Learning channel-selective and aberrance repressed correlation filter with memory model for unmanned aerial vehicle object tracking, Front. Neurosci., 16 (2023). https://doi.org/10.3389/fnins.2022.1080521 doi: 10.3389/fnins.2022.1080521
[39]	C. Fan, H. Yu, Y. Huang, C. Shan, L. Wang, C. Li, SiamON: Siamese occlusion-aware network for visual tracking, IEEE Trans. Circuits Syst. Video Technol., 33 (2023), 186–199. https://doi.org/10.1109/TCSVT.2021.3102886 doi: 10.1109/TCSVT.2021.3102886
[40]	W. Hu, Q. Wang, L. Zhang, L. Bertinetto, P. H. S. Torr, SiamMask: A framework for fast online object tracking and segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2023), 3072–3089.
[41]	D. Sharma, Z. A. Jaffery, Multiple object tracking through background learning, Comput. Syst. Sci. Eng., 44 (2023), 191–204. https://doi.org/10.32604/csse.2023.023728 doi: 10.32604/csse.2023.023728
[42]	J. Zhang, Y. He, S. Wang, Learning adaptive sparse spatially-regularized correlation filters for visual tracking, IEEE Signal Process. Lett., 30 (2023), 11–15. https://doi.org/10.1109/LSP.2023.3238277 doi: 10.1109/LSP.2023.3238277

This article has been cited by:

1.	Changhui Wu, Jinrong Shen, Kaiwei Chen, Yingpin Chen, Yuan Liao, UAV object tracking algorithm based on spatial saliency-aware correlation filter, 2025, 33, 2688-1594, 1446, 10.3934/era.2025068
2.	Yingpin Chen, Yuan Liao, Yuxi He, Xianhui He, Qianqian Yu, Ting Chen, Jianhua Song, Hualin Zhang, Non-local similar block matching and hybrid low-rank tensor network for colour image inpainting, 2025, 162, 10512004, 105169, 10.1016/j.dsp.2025.105169

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Electronic Research Archive

1 1.3

Metrics

Article views(1110) PDF downloads(59) Cited by(2)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Abstract

Introduction

Theoretical basis of the correlation filter

Four modeling forms for correlation filter

Equivalence of the two filter-solving methods

Difference and connection between the one-dimensional and two-dimensional filter methods

Experiments

Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

Figures and Tables

Figures(7) / Tables(5)

Electronic Research Archive

Four mathematical modeling forms for correlation filter object tracking algorithms and the fast calculation for the filter

Related Papers:

Abstract

1. Introduction

2. Theoretical basis of the correlation filter

2.1. Definition of the three operations

2.1.1. Definition of the column-vector-based circulant matrix

2.1.2. Definition of the discrete convolution operator

2.1.3. Definition of the correlation operation

2.2. Diagonalization theorem of circulant matrix

2.3. Relationship between convolution and circulant matrices

2.4. Relationship among the correlation operation, convolution, and circulant matrix

3. Four modeling forms for correlation filter

3.1. First form of correlation filter tracking modeling: Correlation operation form

3.2. Second form of correlation filter tracking modeling: Vector multiplication form

3.3. Third form of correlation filter tracking modeling: The circulant matrix operation form

3.4. Fourth form of correlation filter tracking modeling: Convolution operation form

3.5. Relationship among the four modeling forms of correlation filter object tracking

4. Equivalence of the two filter-solving methods

4.1. Filter solution based on the diagonalization property of the circulant matrix

4.2. Filter solution based on the convolution theorem

5. Difference and connection between the one-dimensional and two-dimensional filter methods

6. Experiments

6.1. Numerical experimental verification of Theorem 1

6.2. Numerical experimental verification of Theorem 2

6.3. Numerical experimental verification of Theorem 3

6.4. Numerical experimental verification of Theorem 4

6.5. Numerical experimental verification of Eq (44)

6.6. Equivalence verification of the two filter-solving methods

7. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog

Abstract

1. Introduction

2. Theoretical basis of the correlation filter

2.1. Definition of the three operations

2.1.1. Definition of the column-vector-based circulant matrix

2.1.2. Definition of the discrete convolution operator

2.1.3. Definition of the correlation operation

2.2. Diagonalization theorem of circulant matrix

2.3. Relationship between convolution and circulant matrices

2.4. Relationship among the correlation operation, convolution, and circulant matrix

3. Four modeling forms for correlation filter

3.1. First form of correlation filter tracking modeling: Correlation operation form

3.2. Second form of correlation filter tracking modeling: Vector multiplication form

3.3. Third form of correlation filter tracking modeling: The circulant matrix operation form

3.4. Fourth form of correlation filter tracking modeling: Convolution operation form

3.5. Relationship among the four modeling forms of correlation filter object tracking

4. Equivalence of the two filter-solving methods

4.1. Filter solution based on the diagonalization property of the circulant matrix

4.2. Filter solution based on the convolution theorem

5. Difference and connection between the one-dimensional and two-dimensional filter methods

6. Experiments

6.1. Numerical experimental verification of Theorem 1

6.2. Numerical experimental verification of Theorem 2

6.3. Numerical experimental verification of Theorem 3

6.4. Numerical experimental verification of Theorem 4

6.5. Numerical experimental verification of Eq (44)

6.6. Equivalence verification of the two filter-solving methods

7. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References