1.
Introduction
Structural covariance/correlation matrices appear in diverse fields. For example, in array signal processing, the covariance matrix of received array signal is noise covariance plus a weighted sum of rank-one matrices [1,2,3]; in time-series analysis, a common form is the Toeplitz structural autocovariance/autocorrelation matrix [4]; and in certain high-dimensional problems, the covariance matrix can often be sparse or exhibit a banded structure [5]. Covariance estimation using structure constraints is common in research articles, e.g., [5,6,7,8]. It is believed that structural estimates are more accurate than that excluding structural information because exploiting structure information in the estimation process usually implies a reduction in the number of parameters to be estimated. Structural covariance/correlation estimation plays an important role in several applications, such as low-rank signal detection [9,10], direction-of-arrival (DOA) estimation [2,3], and spectral analysis of time-series data [4].
The Weighted sum of known Rank-one matrices with unknown Weights (W-Rank1-W) structural matrices are often used to describe the covariance of received array signal [2]. In [11], a weighted covariance fitting criterion was proposed, providing a W-Rank1-W structural covariance estimation of the received array signals. Based on the maximum likelihood principle, [12] estimated the W-Rank1-W structural covariance of the received array signals that were assumed to follow a circular Gaussian distribution. The weight estimates in [11,12] are widely known to play an important role in DOA estimation. Subsequently, many researchers have noted that the weights of W-Rank1-W are sparse in DOA estimation, and similar DOA estimation ideas were implemented [2,13], where the W-Rank1-W structural covariance estimation was addressed specifically under sparse weight constraints. We estimate the W-Rank1-W structural covariance by minimizing the relative entropy (also known as Kullback-Leibler divergence) between two circular Gaussian distributions [14], yielding a low-complexity algorithm based on the Majorization-Minimization (MM) algorithm framework [15]. The entropy loss measures the difference between pairs of distributions with two different covariance matrices. The proposed MM algorithm is based on the result that the entropy loss function is bounded above by a sequence of separable functions. Our numerical experiments reveal that weight estimates based on minimizing the entropy loss are more accurate for DOA estimation than the estimates reported in [11,12] using the weighted covariance fitting criteria and the maximum-likelihood principle.
An auto-covariance matrix in a stationary time series always exhibits a Toeplitz structure [4]. Moreover, in certain cases, the autocorrelation coefficient is inversely related to the time lag, e.g., in auto-regressive time series, some moving average time series, and some auto-regressive moving average time series. [16] approached Toeplitz covariance estimation by minimizing the F-norm loss of the sample covariance matrix, and [6,17] proposed maximum likelihood Toeplitz estimates under circularly-symmetric complex normal distribution and complex elliptical distribution assumptions. [18] provided a banded Toeplitz estimator by projecting a given positive definite matrix onto the convex cone of nonnegative definite banded Toeplitz matrices. [19,20] proposed Toeplitz estimates by minimizing entropy loss and modified entropy loss, respectively. [21] proposed Toeplitz estimation based on entropy loss, considering the possible sparsity of the covariance matrix. This study considers the inverse relationship between the autocorrelation coefficient and time lag in time-series analysis and proposes a Toeplitz autocorrelation matrix estimation by minimizing the entropy loss between two normal distributions. As Toeplitz matrices can be expressed in the W-Rank1-W structure, an MM algorithm similar to that for the W-Rank1-W structural estimation is proposed; however, in each iteration of the algorithm, a second-order cone programming (SOCP) problem is required to be solved. Numerical experiments reveal that our estimates are more accurate than those that do not consider the inverse relationship between the autocorrelation coefficient and time lag.
The remainder of this paper is organized as follows: In Section 2, we formulate the estimation problem of interest. Section 3 presents two different algorithms for the W-Rank1-W structural covariance estimation and Section 4 presents a MM algorithm for the autocorrelation matrix estimation involving the inverse relationship between the autocorrelation coefficient and time lag. Finally, Section 5 describes the results of our numerical experiments, revealing the characteristics and applications of the proposed algorithms.
Notations: Cv(Rv) denotes the set of v-dimensional vectors in complex (real) field. Sv+(Sv++) denotes the set of v×v symmetric (real field) and hermitian (complex field) positive semi-definite (definite) matrices. 0 denotes the zero matrix and Iv denotes the v×v identity matrix. X1−X2⪰0(≻0) indicates that the matrix X1−X2 is positive semi-definite (definite). x⪰0(≻0) indicates that all elements in the vector x are non-negative (positive). Xi,j denotes the (i, j)-th entry of the matrix X, where the row and column indices begin from 0. diag(x) denotes a diagonal matrix with the vector x defining its diagonal elements. The superscripts (⋅)T and (⋅)H represent the transpose and conjugate transpose matrices, respectively. ‖⋅‖ denotes the l2-norm of a vector, and |⋅| denotes the determinant of a scale.
2.
Problem formulation
Let CN1 and CN2 denote circularly-symmetric complex normal distributions with zero mean and Σ1∈Sv++ and Σ2∈Sv++ as covariance matrices, respectively. The probability densities of CN1 and CN2 are
The relative entropy from CN1 to CN2 is defined [14] as follows:
where Ez∼CN1(⋅) denotes the expectation operator, given that z follows the distribution, CN1. By a brief proof (Appendix A), we have (2.2) rewritten as
This relative entropy measures the difference between the distributions, CN1 and CN2.
In applications such as array signal processing and time-series analysis, the covariance is structural, and structural covariance estimation is a common problem. Let Σ1≜Σ be an unknown structural covariance matrix and Σ2≜S be a given covariance matrix. By minimizing the entropy loss, D(CN2‖CN1),
where Ω denotes a nonempty set that is the intersection of the closed set that characterizes covariance structure and a positive semi-definite cone Sv+, we achieve structural covariance estimation.
Throughout this study, we adopt the following assumptions: the loss function L0(Σk)→+∞ when the sequence {Σk} tends towards the boundary of the positive semi-definite cone, Sv+. Under this assumption, the case in which Σ is singular can be excluded from the algorithm analysis.
3.
W-Rank1-W structural covariance matrix estimation
Estimation of the W-Rank1-W structural covariance matrix is described in this section, i.e., the estimation of Σ in the set Ω1:
where a0,…,am denote the vectors known in advance. W-Rank1-W structural covariance commonly occurs in array signal processing and can often be applied to DOA estimation.
Given a covariance matrix, which is often the sample covariance matrix, we solve the problem (2.4) with Ω=Ω1:
yielding a W-Rank1-W structural covariance estimate. Any Σ∈Ω1 can be rewritten as:
where A=(a0,a1,⋯,am) and p=(p0,p1,⋯,pm)T. Because Σ is a function of p (A is known in advance), the objective function in (3.2) can be rewritten as the following function of p:
Using the objective function expression, L(p), the optimization problem (3.2) can be rewritten as follows:
The objective function in (3.2) is a convex function in terms of Σ [22], and Σ is an affine map of p; therefore, it is convex in p, i.e., L(p) is convex in p. Moreover, the constraint set {p∣p⪰0} is convex. Thus, (3.5) represents a convex optimization problem.
The following algorithms for W-Rank1-W covariance estimation are used to solve problem (3.5). When p∗ denotes the minimum point of the problem (3.5):
the W-Rank1-W structural covariance estimation is given by
We present two algorithms in the following subsections—the primal-dual interior-point method and another designed within the MM algorithm framework. These two algorithms have different computational complexities, and we briefly discuss and compare them.
3.1. Primal-dual interior-point algorithm
Problem (3.5) is a convex optimization problem with linear inequality constraints that can be solved effectively using the primal-dual interior-point method [23].
Algorithm 1 outlines the primal-dual interior-point method in the context of (3.5). In the algorithm, the primal-dual search direction (Δp,Δλ) at the point (p,λ) is defined as the solution of the linear equation:
where τ>0 denotes a preset parameter, and h=(hi) and H=(Hij) with i,j=0,1,…,m denote the gradient and Hessian matrix of L(p) at the point (p,λ), respectively. Specifically,
3.2. MM algorithm
In each iteration of the primal-dual interior-point method given by Algorithm 1, the Hessian matrix computation has computational complexity = O(m5). This section presents a novel method for decreasing the computational complexity of each iteration using the MM algorithm framework [15].
In the MM algorithm framework, if a continuously differentiable function g(p∣pk) satisfies the following condition:
where equality is achieved at p=pk, the sequence {pk} generated by
converges to a stationary point of problem (3.5). Because the optimization problem (3.5) is convex, its stationary point is actually the optimal point, p∗.
logdet(X) is a concave function that is bounded above by its first-order Taylor expansion at any Xk [15]:
with the equality achieved at X=Xk. Let X≜(Adiag(p)AH)−1 and Xk≜(Adiag(pk)AH)−1, and then inserting them into (3.13) achieves
where the equality in (3.14b) is achieved at p=pk and s0 is a constant. Applying the Schur complement (a brief proof is provided in Appendix B), for any pk≻0, we obtain
where
and equality is achieved at p=pk. By substituting (3.15) into (3.14b), we obtain
where equality is achieved at p=pk. Because logdet((Adiag(p)AH)−1)=−logdet((Adiag(p)AH)), then by (3.17) we have
where equality is achieved at p=pk. Let us denote
by
By substituting (3.18) into (3.4), for any pk≻0, we obtain
with equality achieved at p=pk.
By applying (3.21) under the MM algorithm framework, we derive the following proposition:
Proposition 3.1. For any value of p0≻0: sequentially solving
generates a sequence {pk} that converges to the global optimal point p∗ of the problem (3.5).
Proof. Because (3.21) holds only for any pk≻0, and not for any pk⪰0 as necessary in (3.11), the general convergence result of MM algorithms cannot be applied here. Therefore, we need to analyze the convergence further.
Consider the following ϵ-approximation of the problem (3.5):
with ϵ>0.
Now, applying (3.22) to ˜p≜p+ϵ1≻0, we conclude that the sequence {pϵk} converges to the optimal point (pϵ)∗ of (3.23), and
for any feasible direction d, where ∇Lϵ((pϵ)∗) denotes a gradient of the objective function Lϵ(p) for p in (3.23) at (pϵ)∗.
Let ϵk′ be a positive sequence, with limk′→+∞ϵk′=0. Then, as ∇Lϵ((pϵ)∗) is continuous around (pϵ)∗ and ϵ, the limit point, p∗, of the sequence, {(pϵk′)∗}, is the optimal point for problem (3.5).
In practice, ϵ can be chosen as to be an arbitrarily small number. Directly applying (3.22) or adapting it to solve the ϵ-approximation problem is essentially idetnical. Therefore, the sequence {pk} in (3.22) converges to the optimal point of problem (3.5). □
Problem (3.22), which generates {pk}, is a simple optimization problem. Thus, obtaining an analytical solution to it is easy. We denote the i-th element of pk+1 by (pk+1)i. We have
where (Wk)i,i and Mi,i denote the i-th diagonal elements of Wk in (3.16) and M in (3.20), respectively.
We outline the MM algorithm for (3.5) in Algorithm 2.
The computational complexity of each iteration is O(m3), which is mainly generated by matrix inversion and multiplication. To demonstrate the difference in terms of computational complexity between Algorithms 1 and 2, we compare their iteration numbers and computation times in the numerical experiments presented in Section 5.
4.
Real Toeplitz correlation matrix estimation
This section describes the estimation of Toeplitz matrices. Complex Toeplitz covariance matrices appear commonly in linear array signal processing [17,24]. During the application of sparse signal models, the complex Toeplitz covariance matrices can always be formulated as W-Rank1-W structural matrices that have been discussed in Section 3 [2]. Thus, in this section, we consider only real Toeplitz matrices—specifically the Toeplitz auto-correlation matrix with decreasing correlation coefficients with respect to increasing time lags in the time series analysis. Note that, the mentioned auto-correlation matrix is composed of auto-correlation coefficients at different lags.
Let us denote
and define toep(c) to be a symmetric Toeplitz matrix with c as its first column:
This section estimates the Toeplitz correlation matrix C∈Ω2, where
Note that the constraint c0=1 can be rewritten as follows:
where e denotes a row vector with 1 as its first element, 0 as its other elements. The constraint |c0|≥|c1|≥⋯≥|cv−1| can be rewritten as linear constraints:
where E1 denotes a (v−1)×v matrix with diagonal elements = −1 and the first subdiagonal elements = 1, and E2 denotes a (v−1)×v matrix with diagonal elements = −1 and the first subdiagonal elements = −1:
We denote a normal distribution with correlation matrix R∈Sv++ by N0 and a normal distribution with Toeplitz correlation matrix C∈Ω2 by N. Assuming the variances to be equal, the relative entropy from N to N0 is:
Given a correlation matrix estimate of R, such as the sample correlation matrix, minimizing D(N0‖N) in terms of C:
we deduce the Toeplitz correlation estimate C∗∈Ω2. The optimization problem (4.8) is convex and can be resolved using the primal-dual interior-point algorithm with the SDT3 toolbox in Matlab. This problem involves a nonlinear objective function and a positive-definiteness constraint of C. Application of the primal-dual interior-point algorithm requires the computation of the Hessian matrix and the positive-definiteness to be ensured in each iteration. In the following section, we present an estimation algorithm that avoids these time-consuming computations.
4.1. Estimation of the real Toeplitz correlation matrix in Ω2
In this subsection, we first transform the estimate of the real Toeplitz correlation matrix to an estimate of the W-Rank1-W structural matrix by special variable substitution and then deduce an estimation algorithm within the MM algorithm framework for this estimation.
Given a circulant matrix T=toep(t) with
and di=dl+1−i, i=1,⋯,l (specifically, t=(c0,c1,⋯,cv−1,cv−1,cv−2,⋯,c1)T if l=0), i.e.,
where
Then, the upper-left v-order matrix of T is the Toeplitz matrix:
The circulant matrix T has a Fourier feature decomposition [25]:
where F is ml=l+2v−1 order Fourier matrix and
Specifically, Fi,j=1√mleij2πιml with ι as the imaginary unit, i,j=0,1,⋯,ml−1, and T⪰0 if and only if p(l)⪰0.
By substituting (4.14) into (4.13), we derive
where Al=(Ip,0)FH.
The Corollary C3 in [25] states that
where p(l) is a function of t in (4.15) and t is a function of c in (4.9). Moreover, the item 2 of the Notes in [25] states that toep(c)≻0 implies toep(c)∈{C∣C=toep(c),p(l)⪰0} for a sufficiently large l, meaning that if l is sufficiently large,
In the following, we simplify the constraint toep(c)≻0 in Ω2 to p(l)≻0 for certain l. If l is sufficiently large, compared to {C∣C=toep(c),toep(c)≻0} containing only positive-definite matrices, by (4.18), the set {C∣C=toep(c),p(l)⪰0} contains more positive semi-definite matrices. Moreover, as C tends toward the boundary of the positive semi-definite cone, the value of the objective function in (4.8) tends to +∞. Thus, although we change the constraint set, if l is sufficiently large, the modification ensures that the optimal point remains constant. The numerical experiments reported in Section 5 demonstrate that the estimation accuracy remains good for different l, and we outline some numerical results there.
Now, we present an algorithm for the problem with the constraint toep(c)≻0 modified to p(l)⪰0 for certain l:
where
In particular, by applying the expressions (4.4), (4.5), (4.10), (4.11), (4.15) and (4.16), the problem to be resolved, i.e., problem (4.19), becomes
It is easy to see FHF=I and (4.21c) can be rewritten as follows:
By substituting (4.22) into (4.21e), we obtain (4.21e) as follows:
Now, we apply the equality constraints (4.21c) and (4.21e), i.e., (4.22) and (4.23), and transform the optimization problem (4.21) into a new optimization problem involving only the variable, p(l).
It is easy to see that I(−1)FH=FHI(−1). Then, by substituting (4.22) into (4.21d), we obtain the constraint (4.21d) as follows:
By substituting (4.23) into (4.21f)–(4.21h), we obtain the constraints (4.21f)–(4.21h) as follows:
Most importantly, applying (4.24)–(4.27), the optimization problem can be rewritten as a problem involving only the variable, p(l), i.e., as an estimation problem of W-Rank-W structural matrices with an unknown weight vector, p(l):
Denoting the optimal point of (4.28) by p∗(l), we have C∗=Aldiag(p∗(l))AHl as an estimate of the correlation matrix.
In the following subsection, we present an MM algorithm for problem (4.28) by applying the design concept of Algorithm 2.
4.1.1. MM algorithm for the real Toeplitz correlation matrix estimation
By substituting (4.28b) into the objective function in (4.28a), we obtain: Applying (3.18) to the objective function (4.28a), we obtain the following result. For any (p(l))k≻0, we have
where s1 is a constant and
and
and
and the equality in (4.29) is achieved at p(l)=(p(l))k.
Applying (4.30) and convergence analysis similar to that for Proposition 3.1, we have the following proposition under the MM algorithm framework.
Proposition 4.1. Given (p(l))0≻0, sequentially solving the optimization problem
generates a sequence {(p(l))k} that converges to the optimal point p∗(l) of the problem (4.28).
Denoting ri=1/qi and using the epigraph form, we can rewrite problem (4.35) as a SOCP problem:
This SOCP problem can be solved using the Matlab toolbox, SDPT3.
In conclusion, the MM algorithm for problem (4.28) is presented as Algorithm 3, and is referred to as the MM-T algorithm in the following. In each iteration, Algorithm 3 involves SOCP optimization that can be resolved using the Matlab Toolbox, SDP3.
5.
Numerical experiments
In this section, we evaluate the performance of Algorithms 1–3 numerically. First, Algorithms 1 and 2 are utilized for DOA estimation. Then, Algorithm 3 is used to estimate Toeplitz autocorrelation matrices in time-series analysis and its performance is compared with those of other estimation methods. All computations were performed using Matlab2018a on a system equipped with an Intel(R) Xeon(R) Platinum 8372HC CPU at 3.40GHz.
5.1. DOA estimation
The observation x(t)∈Cv received by a uniform linear array at time t is always modeled as [2]:
where θi=(i/2)∘ denotes the grid point in 180-degree coverage: 0∘∽180∘, si(t)∈C denotes the source signal from direction θi, a(θi)=(1,eıπcos(θi),⋯,e(v−1)ıπcos(θi))T denotes the steering vector, and n(t)∈Cv denotes the noise vector. In addition, let us assume that s(t)=(si(t)) is CN-distributed with zero mean and covariance diag(pd), n(t) is CN-distributed with zero mean and covariance diag(σ), and s(t) and n(t) are both temporally and mutually independent. In particular, the noise energy vector is σ=σ21 with 1 as a vector with all elements being 1, and σ can be designed to vary the signal-to-noise ratio (SNR) as follows:
where Ad=(a(θ0),a(θ1),⋯,a(θ360)). The covariance matrix of x(t) is given by the W-Rank1-W structure:
where A=[Ad,I] and p=((pd)T,σT)T. We denote pd=(p′i), where p′i is the signal energy in the direction, θi. Peak detection is performed on the energy vector, pd, where the peak energy occurs at the source signal, enabling the DOA estimation.
In this experimental scenario, 4 source signals arrived from the directions: ξ1=30∘, ξ2=60∘, ξ3=120∘, and ξ4=160∘, with corresponding signal energies of 15, 5, 5, and 15, respectively. Let us consider the case in which the linear array consists of v=15 sensors and n independent array observations are simulated by the model (5.1) to obtain the sample covariance matrix. By applying Algorithms 1 and 2 to estimate Σ in (5.3), we obtain an estimate of pd, denoted by p∗d. By performing peak detection on p∗d, we obtained the DOA estimates: ξ∗1,⋯,ξ∗4, assuming the number of true DOAs is known in advance.
Table 1 presents a performance comparison of Algorithms 1 and 2 in terms of computational complexity and estimation accuracy, where
where p∗(i) denotes the estimate of p in (5.3) in the i-th Monte Carlo simulation. These results demonstrate that the MM algorithm (Algorithm 2) exhibits better computational efficiency than the PDIP (primal-dual interior-point) algorithm (Algorithm 1), and that their computational accuracies are comparable. In Figure 1, the DOA estimation performances of the MM method (Algorithm 2) and the classical DOA estimation methods: Likelihood-based estimation of sparse parameters (LIKES) in [12] and sparse iterative covariance-based estimation (SPICE) in [11], are compared, where
and (ξ∗j)(i) is the estimate of ξj in the i-th Monte Carlo simulation. It is observed that the MM method offers higher DOA estimation accuracy. In Figure 2, the normalized spectrum is defined to be p∗/max(p∗), where max(p∗) denotes the maximal element in the vector, p∗. The black dashed lines represent the true DOAs and the normalized spectrum curves represent the results of a randomly selected realization. The location of the peak spectrum lies very close to the true DOA, and a higher SNR is observed corresponding to a higher angular resolution.
5.2. Toeplitz correlation matrix estimation
In a q-order moving average time series (MA(ˆq)),
where ϕi=ϕi, i=1,⋯,ˆq, and εt,⋯,εt−ˆq are independently and identically distributed with zero mean and variance 1. A simple calculation yields that the v×v autocorrelation matrix C exhibits a Toeplitz structure with decreasing autocorrelation coefficients as the time lag increases. Denote the first column of C as c, and use ˆq=5. This experiment assessed the performance of the MM-T algorithm in estimating this autocorrelation matrix, focusing on its estimation accuracy and computation time, and compared them with these of other methods:
● PDIP-T: the primal-dual interior-point algorithm solving (4.8);
● Samp: the estimator of Toeplitz covariance matrices mentioned in [16] that minimizes the F-norm loss between the sample covariance matrices;
● HY: the estimation of Toeplitz covariances via the entropy loss function using the alternating direction method of multipliers algorithm given in [21].
Note that the sample autocorrelation matrix is used as the given correlation matrix R in the MM-T and PDIP-T algorithms, and the sample autoconvariance matrix is used as the given covariance matrix in the HY method, and that the sample size is n=1000.
Table 2 lists the RMSEtoep corresponding to different l values under different v and ϕ. The RMSEtoep is defined as
where c∗(i) denotes the estimate of c achieved during the i-th Monte Carlo repetition. For different l, RMSEtoep values remain stable. Therefore, we assumed l=10 to complete this experiment.
Figure 3(a) depicts the computation times of the MM-T and PDIP-T algorithms. The results indicate that as the order of the autocorrelation matrix increases (i.e., the considered time lag v−1 increases), the rate of increase in the computational time of MM-T becomes slower than that of PDIP-T, and when v≥50, MM-T requires less time.
Figure 3(b) compares the estimation accuracies of the different methods. The MM-T and PDIP-T algorithms exhibit lower RMSEtoep values, which is reasonable as they utilize a decreasing correlation structure, unlike the Samp and HY methods. The estimation accuracies of the MM-T and PDIP-T algorithms are essentially equal; however, when v is moderate to large, the computation time required by MM-T is shorter. Therefore, the MM-T algorithm is recommended over PDIP-T.
5.3. Real data analysis
We collected the closing stock prices of 511 companies in the technology field on Fridays during the period between 2023-9-15 and 2023-11-17 (8 weeks). We compute the volatility series as follows:
where Closecurrent Friday and Closeprevious Friday denote the closing stock prices on current and previous Fridays, respectively, and Volatilitycurrent Friday denotes the volatility of closing stock prices on Fridays.
Subsequently, to investigate the temporal dependencies in the volatility series, we assume that the autocorrelation matrix of the volatility series exhibits a Toeplitz structure: We apply the MM-T and HY algorithms to estimate the autocorrelation matrix. Table 3 presents the first column of the autocorrelation matrix estimation; i.e., the autocorrelation coefficients corresponding to different time lags.
Table 3 shows that the results of the MM-T and HY methods are similar. Moreover, the autocorrelation coefficients corresponding to different lags are nearly identical, but lie around a small value of 0.22. The small autocorrelation coefficients indicate no obvious dependency between volatilities on different Fridays, which is consistent with the unpredictable nature of stock price fluctuations.
6.
Conclusions
We focus on estimating of the W-Rank1-W structural covariance matrix, which appears commonly in signal processing, and the auto-correlation matrix characterized by decreasing correlation coefficients, which appears commonly in time-series analysis. To this end, novel algorithms are proposed using the MM algorithm framework. Numerical experiments reveal that the proposed MM algorithms exhibit superior computational efficiencies. The proposed strategy for dealing with the surrogate function under the MM framework can be applied to other similar convex optimization problems.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
This work was supported in part by the National Nature Science Foundation of China under Grants 62203451 and the Sichuan Province Science and Technology Support Program under Grants 2022JDRC0068 and 2021JDRC0080. The Fundamental Research Funds for the Central Universities 24CAFUC03055.
Conflict of interest
The authors declare that they have no competing interests.