
In this paper, we considered the condition number theory of a new generalized ridge regression model. The explicit expressions of different types of condition numbers were derived to measure the ill-conditionness of the generalized ridge regression problem with respect to different circumstances. To overcome the computational difficulty of computing the exact value of the condition number, we employed the statistical condition estimation theory to design efficient condition number estimators, and the numerical examples were also given to illustrate its efficiency.
Citation: Jing Kong, Shaoxin Wang. Condition numbers of the generalized ridge regression and its statistical estimation[J]. AIMS Mathematics, 2024, 9(2): 4178-4193. doi: 10.3934/math.2024205
[1] | Mahvish Samar, Xinzhong Zhu . Structured conditioning theory for the total least squares problem with linear equality constraint and their estimation. AIMS Mathematics, 2023, 8(5): 11350-11372. doi: 10.3934/math.2023575 |
[2] | Salim Bouzebda, Amel Nezzal . Uniform in number of neighbors consistency and weak convergence of $ k $NN empirical conditional processes and $ k $NN conditional $ U $-processes involving functional mixing data. AIMS Mathematics, 2024, 9(2): 4427-4550. doi: 10.3934/math.2024218 |
[3] | Salim Bouzebda, Amel Nezzal, Issam Elhattab . Limit theorems for nonparametric conditional U-statistics smoothed by asymmetric kernels. AIMS Mathematics, 2024, 9(9): 26195-26282. doi: 10.3934/math.20241280 |
[4] | Dayang Dai, Dabuxilatu Wang . A generalized Liu-type estimator for logistic partial linear regression model with multicollinearity. AIMS Mathematics, 2023, 8(5): 11851-11874. doi: 10.3934/math.2023600 |
[5] | Lingsheng Meng, Limin Li . Condition numbers of the minimum norm least squares solution for the least squares problem involving Kronecker products. AIMS Mathematics, 2021, 6(9): 9366-9377. doi: 10.3934/math.2021544 |
[6] | Muhammad Nauman Akram, Muhammad Amin, Ahmed Elhassanein, Muhammad Aman Ullah . A new modified ridge-type estimator for the beta regression model: simulation and application. AIMS Mathematics, 2022, 7(1): 1035-1057. doi: 10.3934/math.2022062 |
[7] | Oussama Bouanani, Salim Bouzebda . Limit theorems for local polynomial estimation of regression for functional dependent data. AIMS Mathematics, 2024, 9(9): 23651-23691. doi: 10.3934/math.20241150 |
[8] | Sizhong Zhou, Jiang Xu, Lan Xu . Component factors and binding number conditions in graphs. AIMS Mathematics, 2021, 6(11): 12460-12470. doi: 10.3934/math.2021719 |
[9] | Jian Yang, Yuefen Chen, Zhiqiang Li . Some sufficient conditions for a tree to have its weak Roman domination number be equal to its domination number plus 1. AIMS Mathematics, 2023, 8(8): 17702-17718. doi: 10.3934/math.2023904 |
[10] | Salim Bouzebda . Weak convergence of the conditional single index $ U $-statistics for locally stationary functional time series. AIMS Mathematics, 2024, 9(6): 14807-14898. doi: 10.3934/math.2024720 |
In this paper, we considered the condition number theory of a new generalized ridge regression model. The explicit expressions of different types of condition numbers were derived to measure the ill-conditionness of the generalized ridge regression problem with respect to different circumstances. To overcome the computational difficulty of computing the exact value of the condition number, we employed the statistical condition estimation theory to design efficient condition number estimators, and the numerical examples were also given to illustrate its efficiency.
Ridge regression [1] is a classical and powerful statistical method to inhibit the multicollinearity among covariates in the following linear regression model:
y=Xβ+ϵ, | (1.1) |
where y∈Rn is the observations of response, X∈Rn×p is the design matrix of covariates, β∈Rp is the unknown parameter vector, and ϵ∼N(0,σ2In) with In being the identity matrix of order n. The ridge estimator of model (1.1) is given by
ˆβr=(XTX+λIn)−1XTy, | (1.2) |
where λ>0 is the ridge parameter. Finding the ridge estimator (1.2) is also equivalent to solving the following optimization problem:
ˆβr=argminβ∈Rp‖y−Xβ‖22+λ‖β‖22, | (1.3) |
where ‖⋅‖2 is the Euclidian vector norm. Equation (1.3) is also known as the famous Tikhonov regularization technique dealing with discrete ill-posed problems [2]. Incorporating weights to the observations and generalizing the ridge penalty leads to the following generalized ridge regression (GRR) model:
ˆβgr=argminβ∈Rp(y−Xβ)TΣ(y−Xβ)+(β−β0)TΘ(β−β0), | (1.4) |
where Σ∈Rn×n is a diagonal matrix with nonnegative elements on its diagonal, Θ∈Rp×p is a symmetric positive definite matrix and determines the speed and direction of shrinkage, and β0 is a user-specified, non-random target vector [3]. By the Karush-Kuhn-Tucker (KKT) optimality condition, taking the derivative of the objective function in (1.4) with respect to β shows that the generalized ridge estimator (GRE) denoted by ˆβgr satisfies the following normal equation:
(XTΣX+Θ)β=XTΣy+Θβ0 | (1.5) |
and ˆβgr is given by
ˆβgr=(XTΣX+Θ)−1(XTΣy+Θβ0). | (1.6) |
Though the diagonal matrix Σ can give different weights to the observations, the correlation structure between observations cannot be captured. For this reason, we relax the assumption and allow Σ to be a symmetric positive definite matrix, which makes (1.4) the exact combination of weighted least squares loss and the generalized ridge penalty. By varying the weighting matrices Σ and Θ, the generalized Tikhonov regularization [2] can also be derived from (1.4).
The multicollinearity in covariates often leads to a large condition number of XTX, which plays a central role in the sensitivity analysis of the least squares estimate of model (1.1) [4,Ch. 5]. Condition number is a powerful tool in measuring the sensitivity of a problem and gives the maximum amplification of the resulting change in solution, with respect to a small perturbation in the input data, and has been studied for too many numerical linear algebra problems to list here. {The development of the definition of condition number and their comparison should be referred to [5,6,7,8,9]. In this paper, we employ the following very general definition of condition number given by [6], which covers most of the popular condition numbers in the current literature as its special cases.
Definition 1. (projected condition number) [6] Let F:Rp→Rq be a continuous map defined on an open set Dom(F), the domain of definition of F, and L∈Rq×k, then the projected condition number of F at x∈Dom(F) with respect to L∈Rq×k is defined by
κLF(x)=limδ→0sup‖Δxα‖μ≤δ‖LT(F(x+Δx)−F(x))ξL‖ν‖Δxα‖μ, | (1.7) |
where ⋅/⋅ is the componentwise division satisfying that a nonzero numerator divided by a zero denominator remains unchanged, ξL∈Rk and α∈Rp are parameters with a requirement that if some element of α is zero, then the corresponding element of Δx must also be zero, and ‖⋅‖μ and ‖⋅‖ν are two vector norms defined on Rp and Rk, respectively.
In Definition 1, the matrix L can be treated as an operator to transform the image vector of F, for example, when we set L=ei, the i-th element of the canonical base of Rq, Definition 1 gives the condition number of the i-th element of the image vector. The two vector norms ‖⋅‖μ and ‖⋅‖ν also enable us to give different measures of the perturbations in the input and output spaces, respectively. When the map F is Fréchet differentiable, we can get the following Theorem 1.1, which largely reduces the difficulty of finding the supremum of (1.7) to establishing the Fréchet derivative of F. The rationality of Definition 1 and its relationship with other popular condition numbers has been extensively discussed, and the interested readers are referred to [6,7,8] and the references therein.
Theorem 1. [6] When the map F in Definition 1 is Fréchet differentiable at x, the explicit expression of the projected condition number κLF(x) is given by
κLF(x)=‖Diag(ξ‡L)LTDF(x)Diag(α)‖μν, |
where DF(x) is the Fréchet derivative of F at x, ξ‡L is a vector with ξ‡Li={1/ξLi, ξLi≠0, 1, ξLi=0, Diag(ξ‡L) and Diag(α) are diagonal matrices with the elements of the vector on the diagonal, and ‖⋅‖μν is the matrix norm induced by the vector norms ‖⋅‖μ and ‖⋅‖ν.
The condition number theory of some special cases of model (1.4) have been well studied in the literature, like (weighted) linear least squares problem [6,7,10,11] and (generalized) Tikhonov regularization [12,13,14]. In this paper, we will extend the current results and consider the condition number theory of the GRR model (1.4). To compute the exact value of the condition number is usually expensive, we also propose some efficient statistical condition estimation methods to reduce the computation burden but still give reasonable estimates. The rest of the paper is organized as follows. In Section 2, we present the explicit expressions of condition numbers for the GRR model (1.4). Section 3 contains the statistical condition estimation method for estimating the condition numbers of the GRR model (1.4). Numerical experiments and a conclusion of the whole paper are given as Sections 4 and 5, respectively.
In order to present the condition number of the GRR model (1.4), we first explicitly define the map F in (1.7) as follows:
F:Rn×Rn×p×Rn×n×Rp×p×Rp→Rp,F(y,X,Σ,Θ,β0)→ˆβgr=(XTΣX+Θ)−1(XTΣy+Θβ0). | (2.1) |
Let ˜y=y+Δy, ˜X=X+ΔX, ˜Σ=Σ+ΔΣ, ˜Θ=Θ+ΔΘ and ˜β0=β0+Δβ0, then the perturbed GRR model is given by
ˆβ~gr=argminβ∈Rp(˜y−˜Xβ)T˜Σ(˜y−˜Xβ)+(β−˜β0)T˜Θ(β−~β0), | (2.2) |
where ˆβ~gr is the GRE of the perturbed GRR model (2.2) and satisfies the following normal equation:
(˜XT˜Σ˜X+˜Θ)ˆβ~gr=˜XT˜Σ˜y+˜Θ˜β0. | (2.3) |
For simplicity of presentation and with a little abuse of notation, we set
vec(y,X,Σ,Θ,β0):=[yT,vec(X)T,vec(Σ)T,vec(Θ)T,βT0]T |
then the projected condition number of the GRR model (1.4) can be defined as follows.
Definition 2. Considering the perturbed GRR model (2.2), the projected condition number of the GRR model (1.4), with respect to L∈Rq×k, is defined by
κLF=limδ→0sup‖vec(Δyγ,ΔXΨ,ΔΣΦ,ΔΘΥ,Δβ0ϖ)‖μ≤δ‖LT(F(˜y,˜X,˜Σ,˜Θ,˜β0)−F(y,X,Σ,Θ,β0))ξL‖ν‖vec(Δyγ,ΔXΨ,ΔΣΦ,ΔΘΥ,Δβ0ϖ)‖μ, |
where ξL∈Rk, γ∈Rn, Ψ∈Rn×p, Φ∈Rn×n, Υ∈Rp×p, ϖ∈Rp are parameters with a requirement that if some element is zero, then the corresponding element in the numerator must also be zero, and ‖⋅‖μ and ‖⋅‖ν are two vector norms defined on Rn+np+n2+p2+p and Rk, respectively.
If we set ˆβ~gr=ˆβgr+Δβ and further assume that
max{‖Δy‖,‖ΔΣ‖,‖ΔX‖,‖ΔΘ‖,‖Δβ0‖}≤ϵ |
with ‖⋅‖ denoting appropriate norm and ϵ being a sufficiently small positive real number, then with some algebra and omitting the higher order terms, we can get the following equality by subtracting (1.5) from (2.3)
WΔβ=XTΣ(Δy−ΔXˆβgr)+XTΔΣr+ΔXTΣr+ΘΔβ0+ΔΘ(β0−ˆβgr)+O(ϵ2), | (2.4) |
where W=XTΣX+Θ and r=y−Xˆβgr. Backing to Definition 2, we have
F(˜y,˜X,˜Σ,˜Θ,˜β0)−F(y,X,Σ,Θ,β0)=Δβ |
and
Δβ=W−1XTΣ(Δy−ΔXˆβgr)+W−1XTΔΣr+W−1ΔXTΣr+W−1ΘΔβ0+W−1ΔΘ(β0−ˆβgr)+O(ϵ2). | (2.5) |
With (2.5), the first order expansion of Δβ, we can establish the explicit expression of the projected condition number for the GRR model (1.4).
Theorem 2. For the GRR model (1.4), when Σ and Θ are symmetric positive definite matrices, the explicit expression of its projected condition number can be established and given as
κLF=‖Diag(ξ‡L)LT[W−1XTΣ,M,rT⊗(W−1XT),N,W−1Θ]Diag(vec(γ,Ψ,Φ,Υ,ϖ))‖μν, |
where W=XTΣX+Θ, r=y−Xˆβgr, M=W−1⊗(Σr)T−ˆβgr⊗(W−1XTΣ) and N=(β0−ˆβgr)T⊗W−1.
Proof. Since vec(ABC)=(CT⊗A)vec(B), we apply vec(⋅) operator to Δβ and get
Δβ=W−1XTΣΔy+(W−1⊗(Σr)T−ˆβTgr⊗(W−1XTΣ))vec(ΔX)+(rT⊗(W−1XT))vec(ΔΣ)+((β0−ˆβgr)T⊗W−1)vec(ΔΘ)+W−1ΘΔβ0+O(ϵ2). | (2.6) |
Let M=W−1⊗(Σr)T−ˆβTgr⊗(W−1XTΣ) and N=(β0−ˆβgr)T⊗W−1. We can rewrite (2.6) as
Δβ=[W−1XTΣ,M,rT⊗(W−1XT),N,W−1Θ][Δyvec(ΔX)vec(ΔΣ)vec(ΔΘ)Δβ0]+O(ϵ2). | (2.7) |
With (2.7), we can get the following Fréchet derivative of F
DF=[W−1XTΣ,M,rT⊗(W−1XT),N,W−1Θ] | (2.8) |
and complete the proof by Theorem 1.
Theorem 2 presents the generic form of the condition number for the GRR problem (1.4). For practical applications, we need to specify the norms and the parameters with respect to concrete backgrounds. We also note that the explicit expression of the condition number contains specific structures due to the Kronecker product, which enlarges the size of the matrix and increases its computational burden. In the following, we will discuss some specific forms of the condition number and its computational issues.
When μ=ν=2, the parameter vectors and matrices γ, Ψ, Φ, Υ, and ϖ reduce to real numbers and equal to ‖vec([y,X,Σ,Θ,β0])‖2 and ξL=‖LTˆβgr‖2≠0, κLF gives an overall treatment of the perturbations, then we can obtain the projected relative normwise condition number of the GRR model (1.4) from Theorem 2, which is given as follows.
Theorem 3. When μ=ν=2, γ=Ψ=Φ=Υ=ϖ=‖vec(y,X,Σ,Θ,β0)‖2, and ξL=‖LTˆβgr‖2≠0, the projected relative normwise condition number of the GRR model (1.4) is given by
κ2LF=‖LT[W−1XTΣ,M,rT⊗(W−1XT),N,W−1Θ]‖2‖vec(y,X,Σ,Θ,β0)‖2‖LTˆβgr‖2, |
where ‖⋅‖2 denotes the spectral norm of a matrix or the Euclidian norm of a vector, W=XTΣX+Θ, r=y−Xˆβgr, M=W−1⊗(Σr)T−ˆβTgr⊗(W−1XTΣ) and N=(β0−ˆβgr)T⊗W−1.
Note that the main difficulty for explicitly computing the value of κ2LF lies in the following term
κ2aLF:=‖LT[W−1XTΣ,M,rT⊗(W−1XT),N,W−1Θ]‖2, | (2.9) |
which contains the Kronecker product and is also called the projected absolute normwise condition number. To remove the Kronecker product in κ2aLF, the matrix cross-product and Cholesky factorization techniques may be used to establish the compact but equivalent forms of the normwise condition number [6,7]. However for large problems, there is no need to form the condition number explicitly and a suitable estimate will be enough [15,Ch. 15]. Here, we present a compact form of κ2aLF that not only removes the Kronecker product, but also provides important support for its estimation procedure described in Section 3.
Theorem 4. When μ=ν=2, γ=Ψ=Φ=Υ=ϖ=‖vec(y,X,Σ,Θ,β0)‖2, and ξL=‖LTˆβgr‖2≠0, the projected absolute normwise condition number κ2aLF of the GRR model (1.4) can also be written as
κ2aLF1=‖LTW−1KW−1L−LTW−1(ˆβgrrTΣ2X+XTΣ2rˆβTgr)W−1L‖122. | (2.10) |
In particular, when L=ei, we can get the projected absolute normwise condition number of the i-th element in the solution
κ2aeiF1=‖eTiW−1KW−1ei−eTiW−1(ˆβgrrTΣ2X+XTΣ2rˆβTgr)W−1ei‖122, | (2.11) |
where W=XTΣX+Θ, r=y−Xˆβgr and K=(1+‖ˆβgr‖22)XTΣ2X+‖r‖22XTX+(‖β0−ˆβgr‖22+rTΣ2r)Ip+Θ2.
Proof. For the spectral norm, we have ‖A‖2=‖AAT‖122 with A∈Rm×n. We apply this equality to κ2aLF and get its equivalent form
κ2aLF1:=‖LT[W−1XTΣ,M,rT⊗(W−1XT),N,W−1Θ][ΣXW−1MTr⊗(XW−1)NTΘW−1]L‖122. |
Since
MMT=(W−1⊗(Σr)T−ˆβTgr⊗(W−1XTΣ))(W−1⊗(Σr)−ˆβgr⊗(ΣXW−1))=rTΣ2rW−2−W−1ˆβgrrTΣ2XW−1−W−1XTΣ2rˆβTgrW−1+‖ˆβgr‖22W−1XTΣ2XW−1 |
and
NNT=((β0−ˆβgr)T⊗W−1)((β0−ˆβgr)⊗W−1)=‖β0−ˆβgr‖22W−2, |
we can easily get
κ2aLF1=‖LTW−1KW−1L−LTW−1(ˆβgrrTΣ2X+XTΣ2rˆβTgr)W−1L‖122 |
with K=(1+‖ˆβgr‖22)XTΣ2X+‖r‖22XTX+(‖β0−ˆβgr‖22+rTΣ2r)Ip+Θ2.
Remark 1. Theorem 4 presents a compact but equivalent form κ2aLF1 of the absolute condition number κ2aLF that does not contain the Kronecker product. Comparing the size of the matrix, we can find that κ2aLF1 requires much less storage memory compared with κ2aLF. When the exact value of the normwise condition number is computed, κ2aLF1 also needs much less Central Processing Unit (CPU) time on a computer, and this will be illustrated through numerical experiment in Section 4. Here, we also need to point out that the Cholesky factorization technique was not employed to derive the compact form of κ2aLF. The main reason is that the Cholesky factorization based compact form only gives a moderate size of matrix, which is still larger than κ2aLF1, though much smaller than κ2aLF. Thus, considering the economics of storage space, we only derived κ2aLF1.
Since the normwise condition number gives an overall treatment of all the parameters and ignores the data structure and scaling of the input data, the mixed and componentwise condition numbers are proposed [8,9]. By varying the norms and parameters in Theorem 2, we can also obtain the mixed and componentwise condition numbers of the GRR model (1.4).
Theorem 5. When μ=ν=∞, γ=y, Ψ=X, Φ=Σ, Υ=Θ, ϖ=β0, ξL=‖LTˆβgr‖∞, and LTˆβgr in sequel, the projected mixed and componentwise condition numbers of the GRR model (1.4) are given by
κ∞mLF=‖|LT[W−1XTΣ,M,rT⊗(W−1XT),N,W−1Θ]||vec([y,X,Σ,Θ,β0])|‖∞‖LTˆβgr‖∞ |
and
κ∞cLF=‖|LT[W−1XTΣ,M,rT⊗(W−1XT),N,W−1Θ]||vec([y,X,Σ,Θ,β0])||LTˆβgr|‖∞, |
respectively, where ‖⋅‖∞ is the infinite norm giving the largest magnitude among each element of a vector.
Proof. The proof is quite easy due to the following fact that for matrix A and vector d, the following equalities hold
‖ADiag(d)‖∞=‖|A||Diag(d)|‖∞=‖|A||d|‖∞. |
Applying the above equalities to Theorem 2 gives the desired results.
For the mixed and componentwise condition numbers, the aforementioned matrix-cross product techniques cannot be used to remove its Kronecker product. Here, we present some upper bounds that require little storage space and can be efficiently computed.
Theorem 6. The mixed and componentwise condition numbers of the GRR model (1.4) can be correspondingly bounded as follows:
κ∞ubdmLF=‖Gubd‖∞‖LTˆβgr‖∞andκ∞ubdcLF=‖GubdLTˆβgr‖∞, |
where
Gubd=|LTW−1XTΣ|(|y|+|X||ˆβgr|)+|LTW−1XT||Σ||r|+|LTW−1||XT|Σr|+|LTW−1Θ||β0|+|LTW−1||Θ|(|β0−ˆβgr|). |
Proof. The explicit expression of Gubd is derived with the Lemma 5 in [10], which is straightforward and omitted here.
The condition number not only gives a measure of the sensitivity of the problem, but also can be used to give the first order estimate of the forward error. Thus, in many practical applications, the exact value of the condition number is usually not needed and within a 10 factor estimate will be enough. Many techniques have been proposed to estimate the normwise, mixed, and componentwise condition numbers, like the power method [15], probabilistic, and statistical based methods [16]. Considering the computational efficiency and its powerful adaptability, we would like to employ the statistical condition estimation method to estimate the condition numbers of the GRR model.
The small-sample statistical condition estimation (SSCE) theory has been widely applied to estimate the normwise, mixed, and componentwise condition numbers of many numerical linear algebra problems, and examples include the linear system [17], least squares problem [18,19], matrix factorization [20,21], eigenvalue problem [22], matrix function [16], Sylvester equation [23], and so on.
To introduce the framework of SSCE, we consider the following differentiable function f:Rp→R, and its Taylor expansion is given by
f(x+δz)=f(x)+δ∇f(x)Tz+O(δ2), |
where δ is a small positive real number, ∇f(x) is the gradient vector of f at x, and z is a unit 2-norm vector. It is well known that ‖∇f(x)‖2 is the absolute condition number and gives an appropriate measure of the local sensitivity of f at x. According to [16], if we choose z uniformly and randomly from a unit sphere Sp−1, then the expected value of |∇f(x)Tz| is
E(|∇f(x)Tz|)=‖∇f(x)‖2Ep, |
where E1=1, E2=π2, and for p>2,
Ep={1⋅3⋅5⋯(p−2)2⋅4⋅6⋯(p−1), for p odd,2π2⋅4⋅6⋯(p−2)3⋅5⋅7⋯(p−1),for p even. |
Ep is the Wallis factor and can be approximated by Ep≈√2π(p−12) with high accuracy [18], then we can define
η≡|∇f(x)Tz|Ep |
as the condition estimator, which can give a very reliable estimate of ‖∇f(x)‖2; specifically, the following probability inequality holds
P(‖∇f(x)‖2τ≤η≤τ‖∇f(x)‖2)≥1−2πτ+O(1τ2). |
The accuracy of condition estimator can be further improved by adding k samples
η(k)≡EkEp√|∇f(x)Tz1|2+|∇f(x)Tz2|2+⋯+|∇f(x)Tzk|2, |
where [z1,z2,⋯,zk] is the orthonormalization of z1,z2,⋯,zk being selected uniformly and randomly from Sp−1. The k-sample condition estimator η(k) can achieve a very high accuracy with a small size of samples, for example, when k=3 and τ=10, we have
P(‖∇f(x)‖210≤η(3)≤10‖∇f(x)‖2)≈0.9988389, |
which means the reliability of a condition estimate within a factor 10 can be improved from 0.936338 (k=1) to 0.9988389 by just adding 2 extra samples.
Note that the SSCE method is very suitable for estimating the condition number of certain elements in the solution. However, for the normwise condition number derived in Section 2.1, what we need to estimate is the spectral norm of a large matrix. This means we need to make some modification on the SSCE method to estimate the normwise condition number of the GRR model (1.4). Here, we employ the strategy proposed by [19] for estimating the normwise condition number of the linear least squares problem.
According to [19], to estimate the normwise condition number of the GRR model (1.4), we first need to estimate the condition number of zTix with
κi=‖zTiW−1KW−1zi−zTiW−1(ˆβgrrTΣ2X+XTΣ2rˆβTgr)W−1zi‖122, |
and then the normwise condition number can be estimated by
κN=EkEp(k∑i=1κ2i), | (3.1) |
where [z1,⋯,zk] can be obtained via QR factorization [4,Ch. 5] of a random matrix Z∈Rp×k. Note that when W and K are available, the main computational cost for computing κi is O(13p3) when Cholesky factorization is used to compute W−1zi. If we further take the QR factorization into account, then we can find that the total computational cost for estimating the normwise condition number is O(k3p3+pk2). We summarize the above procedure as the following Algorithm 1.
Algorithm 1 Absolute normwise condition number estimator. |
(1) Generate k vectors z1,⋯,zk∈Rp with entries from uniform distribution U(0,1). (2) Orthonormalize the vectors z1,⋯,zk with QR factorization. (3) Repeat k times for computing κi=‖zTiW−1KW−1zi−zTiW−1(ˆβgrrTΣ2X+XTΣ2rˆβTgr)W−1zi‖122, with i=1,⋯,k, and get κ1,⋯,κk. (4) Compute the absolute normwise condition number estimator by κN=EkEp(k∑i=1κ2i), with Ep=√2π(p−12). |
From Theorem 5, we note that computing the mixed and componentwise condition numbers is equivalent to finding the largest elements in a vector. The SSCE method can be easily applied to estimate the mixed and componentwise condition numbers by extending the aforementioned SSCE procedure from scalar valued function to vector valued function. We present it in the following Algorithm 2. Note that when the orthonormal vectors are obtained, the computational complexity of SSCE for mixed and componentwise condition numbers also lies in solving the positive definite linear system, which makes it similar to that of Algorithm 1.
To illustrate our theoretical results, we will use a randomly generated GRR model with a known solution. The desired GRR model is constructed as follows. The coefficient matrices are given by
Σ=U1Λ1UT1,Θ=V1Λ2VT1,andX=U2[Λ30]VT2, | (4.1) |
where Ui and Vi (i=1,2) are random orthogonal matrices and Λi (i=1,2,3) are diagonal matrices with λij arranged in a descending order on its diagonal. Actually, (4.1) exactly gives the eigenvalue decomposition of Σ and Θ and the singular value decomposition of X [4]. Λ1 and Λ2 contain the eigenvalues of Σ and Θ, and Λ3 the singular values of X. β is a random vector generated from the standard normal distribution. r is a random vector with specified magnitude, that is, ‖r‖2 is given. Then, based on the normal Eq (1.5), we set
y=r−Xβ and β0=β−Θ−1XTΣr, |
and get the random GRR model with specified solution. To compute ˆβgr, the preconditioned conjugate gradients (PCG) method will be employed to solve the normal Eq (1.5). With the above setting, we can control the condition number of the coefficient matrices and easily get
κ(Σ)=‖Σ‖2‖Σ−1‖2=λ11λ1n,κ(Θ)=λ21λ2p,andκ(X)=λ31λ3n. | (4.2) |
All the experiments are performed in Matlab R2018a on a PC with Intel i7-10700 CPU @ 2.90 GHz and 16.00 GB random access memory.
Algorithm 2 Mixed and componentwise condition numbers estimator. |
(1) Generate k groups of matrices and vectors (Δy1,ΔX1,ΔΣ1,ΔΘ1,Δβ01),⋯,(Δyk,ΔXk,ΔΣk,ΔΘk,Δβ0k) with elements from the standard normal distribution N(0,1). (2) Obtain the orthonormal vectors [q1,⋯,qk] by orthonormalizing the following matrix [Δy1⋯Δykvec(ΔX1)⋯vec(ΔXk)vec(ΔΣ1)⋯vec(ΔΣk)vec(ΔΘ1)⋯vec(ΔΘk)Δβ01⋯Δβ0k], and reconstruct (Δy1,ΔX1,ΔΣ1,ΔΘ1,Δβ01),⋯,(Δyk,ΔXk,ΔΣk,ΔΘk,Δβ0k) with the corresponding orthonormal vectors [q1,⋯,qk], respectively. (3) For i=1,⋯,k, compute xi=W−1XTΣ(Δyi−ΔXiˆβgr)+W−1XTΔΣir+W−1ΔXTiΣr+W−1ΘΔβ0i+W−1ΔΘi(β0−ˆβgr) (4) Estimate the absolute condition vector with k samples by CGRRabs(k)=EkEp√∑ki=1|xi|2, where the power and absolute operation are performed on the elements of the vectors. (5) The mixed and componentwise condition number estimators are given by κM=‖CGRRabs(k)‖∞‖ˆβgr‖∞ and κC=‖CGRRabs(k)ˆβgr‖∞, respectively. |
Example 1. In this example, we will test the tightness and computational efficiency of the upper bounds of the mixed and componentwise condition numbers derived in Theorem 6. For this, we define the following ratios:
Rmix=κ∞ubdmLFκ∞mLF,Rcomp=κ∞ubdcLFκ∞cLF,RTmix=κ∞ubdmLFCPUκ∞mLFCPU, and RTcomp=κ∞ubdcLFCPUκ∞cLFCPU. |
Rmix and Rcomp demonstrate that the closer of the ratios are to 1, the tighter the upper bounds are. RTmix and RTcomp measure the computational efficiency of the upper bounds with respect to CPU time and the larger value shows better computational efficiency. For a clear presentation, we present the numerical results in Figure 1, using a red asterisk and blue plus sign to denote Rmix and Rcomp, correspondingly. Note that there is very little difference in computational complexity between mixed and componentwise condition numbers and its upper bounds. We only present the results of CPU time comparison of the mixed condition number and its upper bound denoted by a green square.
In our computation, we set L=In, n=100, p=60, ‖r‖2=10−2, λijs in the denominator of (4.2) are equal to 1, λi1 in the numerator are set to accommodate the given condition number, and the rest λijs are equally spaced. The normal Eq (1.5) is solved with the Matlab command pcg with relative residual smaller than 10−10 and maximum iterations smaller than 100. By varying the condition numbers of coefficient matrices (4.2), we repeat 1000 times for each setting and present the numerical results in Figure 1. From the first row of Figure 1, we observe that the majority of the ratios are close to 1. This indicates that the derived upper bounds for the mixed and componentwise condition numbers of the GRR model (1.4) are quite tight. On the other hand, from the second row of Figure 1, we notice that most of the ratios exceed 20. This implies that these upper bounds can be computed efficiently, resulting in a significant improvement in CPU time by a factor of 20. Thus, in practical applications, we can use the upper bounds to measure the illness of the GRR model rather than the exact mixed and componentwise condition numbers.
Example 2. In this example, we will check the efficiency of the SSCE of condition numbers of the GRR model (1.4) with respect to accuracy and CPU time. Similar to the former example, we also define some ratios
RN1=κNκ2aLF,RM=κMκ∞mLF,RC=κCκ∞cLF, |
RTN1=κNCPUκ2aLFCPU,RTN2=κNCPUκ2aLF1CPU,RTC=κCCPUκ∞cLFCPU. |
The first row of the above ratios measures the accuracy of the SSCE method and the second row measures the efficiency. Specifically, although κ2aLF in (2.9) and κ2aLF1 in (2.10) give the same values, they have very different storage requirement and computational efficiency. Thus, we only give RN1 to measure the accuracy of the normwise SSCE condition estimator, whereas RTN1 and RTN2 are employed to compare the computational efficiency. Moreover, note that the computational complexities of computing or estimating the mixed and componentwise condition numbers have very little difference, so we only present the CPU time comparison via RTC. For a clarity, we also use figures to present our results.
In our experiment, we set L=In, n=200, p=100, ‖r‖2=10−1, κ(Σ)=10, κ(Θ)=10, κ(X)=10, and the other settings are the same as that in Example 1. The numerical results are reported in Figure 2. From the first row in Figure 2, we can find that the SSCE method can give reliable estimates of the mixed and componentwise condition numbers for its ratios within a factor of 10. For the normwise condition number, the SSCE method may give slight overestimates, which coincides with the phenomenon in [19]. From the second row in Figure 2, we note that the ratios are much smaller than 1, which means, in general, the SSCE condition number estimates require much less CPU time compared with the explicit computation of the condition numbers. Comparing the first and second panels in the second row, we can also find that the compact form of the normwise condition number gives much gain of computational efficiency compared with its original form.
In this paper, we extended the GRR model given in [3] by allowing the serial dependence of the observations and investigated the condition number theory of the new model. We first established the generic expression of the condition number of the GRR model (1.4). By varying the norms and parameters in the generic expression, the popular normwise, mixed, and componentwise condition numbers can be obtained as its special cases. Considering the computational difficulty in calculating the exact value of the condition number, we provided the compact form of the normwise condition number and the upper bounds of the mixed and componentwise condition numbers. We also proposed the SSCE condition number estimators for the normwise, mixed, and componentwise condition numbers. Numerical experiments were given to show the tightness and efficiency of the upper bounds and the proposed condition estimators.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
The work was supported by Shandong Provincial Natural Science Foundation (Grant No. ZR2020QA034). WSX was also supported by Shandong Province Higher Education Youth Innovation and Technology Support Program (Grant No. 2023KJ199).
The authors would like to give their sincere thanks to the anonymous five reviewers for their detailed and helpful comments, which led to a much better presentation of their work.
The authors affirm that they have no conflicts of interest to disclose.
[1] |
A. Hoerl, R. Kennard, Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12 (1970), 55–67. https://doi.org/10.1080/00401706.1970.10488634 doi: 10.1080/00401706.1970.10488634
![]() |
[2] | P. Hansen, Rank-deficient and discrete ill-posed problems: numerical aspects of linear inversion, Philadelphia: Society for Industrial and Applied Mathematics, 1998. https://doi.org/10.1137/1.9780898719697 |
[3] | W. van Wieringen, Lecture notes on ridge regression, arXiv: 1509.09169. |
[4] | G. Golub, C. van Loan, Matrix computations, 4 Eds., Baltimore: Johns Hopkins University Press, 2013. |
[5] | J. Rice, A theory of condition, SIAM J. Numer. Anal., 3 (1966), 287–310. https://doi.org/10.1137/0703023 |
[6] |
S. Wang, L. Meng, A contribution to the conditioning theory of the indefinite least squares problems, Appl. Numer. Math., 177 (2022), 137–159. https://doi.org/10.1016/j.apnum.2022.03.012 doi: 10.1016/j.apnum.2022.03.012
![]() |
[7] |
S. Wang, H. Yang, Conditioning theory of the equality constrained quadratic programming and its applications, Linear Multilinear A., 69 (2021), 1161–1183. https://doi.org/10.1080/03081087.2019.1623858 doi: 10.1080/03081087.2019.1623858
![]() |
[8] |
Z. Xie, W. Li, X. Jin, On condition numbers for the canonical generalized polar decompostion of real matrices, Electron. J. Linear Al., 26 (2013), 842–857. https://doi.org/10.13001/1081-3810.1691 doi: 10.13001/1081-3810.1691
![]() |
[9] |
I. Gohberg, I. Koltracht, Mixed, componentwise, and structured condition numbers, SIAM J. Matrix Anal. Appl., 14 (1993), 688–704. https://doi.org/10.1137/0614049 doi: 10.1137/0614049
![]() |
[10] |
F. Cucker, H. Diao, Y. Wei, On mixed and componentwise condition numbers for Moore-Penrose inverse and linear least squares problems, Math. Comp., 76 (2007), 947–963. https://doi.org/10.1090/S0025-5718-06-01913-2 doi: 10.1090/S0025-5718-06-01913-2
![]() |
[11] |
Y. Wei, D. Wang, Condition numbers and perturbation of the weighted Moore-Penrose inverse and weighted linear least squares problem, Appl. Math. Comput., 145 (2003), 45–58. https://doi.org/10.1016/S0096-3003(02)00437-X doi: 10.1016/S0096-3003(02)00437-X
![]() |
[12] |
D. Chu, L. Lin, R. Tan, Y. Wei, Condition numbers and perturbation analysis for the Tikhonov regularization of discrete ill-posed problems, Numer. Linear Algebra, 18 (2011), 87–103. https://doi.org/10.1002/nla.702 doi: 10.1002/nla.702
![]() |
[13] |
H. Diao, Y. Wei, S. Qiao, Structured condition numbers of structured Tikhonov regularization problem and their estimations, J. Comput. Appl. Math., 308 (2016), 276–300. https://doi.org/10.1016/j.cam.2016.05.023 doi: 10.1016/j.cam.2016.05.023
![]() |
[14] |
L. Meng, B. Zheng, Structured condition numbers for the Tikhonov regularization of discrete ill-posed problems, J. Comput. Math., 35 (2017), 169–186. https://doi.org/10.4208/jcm.1608-m2015-0279 doi: 10.4208/jcm.1608-m2015-0279
![]() |
[15] | N. Higham, Accuracy and stability of numerical algorithms, 2Eds., Philadelphia: Society for Industrial and Applied Mathematics, 2002. https://doi.org/10.1137/1.9780898718027 |
[16] |
C. Kenney, A. Laub, Small-sample statistical condition estimates for general matrix functions, SIAM J. Sci. Comput., 15 (1994), 36–61. https://doi.org/10.1137/0915003 doi: 10.1137/0915003
![]() |
[17] |
A. Laub, J. Xia, Applications of statistical condition estimation to the solution of linear systems, Numer. Linear Algebra, 15 (2008), 489–513. https://doi.org/10.1002/nla.570 doi: 10.1002/nla.570
![]() |
[18] |
C. Kenney, A. Laub, M. Reese, Statistical condition estimation for linear least squares, SIAM J. Matrix Anal. Appl., 19 (1998), 906–923. https://doi.org/10.1137/S0895479895291935 doi: 10.1137/S0895479895291935
![]() |
[19] | M. Baboulin, S. Gratton, R. Lacroix, A. Laub, Statistical estimates for the conditioning of linear least squares problems, In: Parallel processing and applied mathematics, Berlin: Springer, 2014,124–133. https://doi.org/10.1007/978-3-642-55224_13 |
[20] |
A. Farooq, M. Samar, Sensitivity analysis for the generalized Cholesky block downdating problem, Linear Multilinear A., 70 (2022), 997–1022. https://doi.org/10.1080/03081087.2020.1751033 doi: 10.1080/03081087.2020.1751033
![]() |
[21] |
A. Farooq, M. Samar, H. Li, C. Mu, Sensitivity analysis for the block Cholesky downdating problem, Int. J. Comput. Math., 97 (2020), 1234–1253. https://doi.org/10.1080/00207160.2019.1613528 doi: 10.1080/00207160.2019.1613528
![]() |
[22] |
A. Laub, J. Xia, Fast condition estimation for a class of structured eigenvalue problems, SIAM J. Matrix Anal. Appl., 30 (2009), 1658–1676. https://doi.org/10.1137/070707713 doi: 10.1137/070707713
![]() |
[23] |
H. Diao, H. Xiang, Y. Wei, Mixed, componentwise condition numbers and small sample statistical condition estimation of Sylvester equations, Numer. Linear Algebra, 19 (2012), 639–654. https://doi.org/10.1002/nla.790 doi: 10.1002/nla.790
![]() |