Research article Special Issues

Condition numbers of the generalized ridge regression and its statistical estimation

  • In this paper, we considered the condition number theory of a new generalized ridge regression model. The explicit expressions of different types of condition numbers were derived to measure the ill-conditionness of the generalized ridge regression problem with respect to different circumstances. To overcome the computational difficulty of computing the exact value of the condition number, we employed the statistical condition estimation theory to design efficient condition number estimators, and the numerical examples were also given to illustrate its efficiency.

    Citation: Jing Kong, Shaoxin Wang. Condition numbers of the generalized ridge regression and its statistical estimation[J]. AIMS Mathematics, 2024, 9(2): 4178-4193. doi: 10.3934/math.2024205

    Related Papers:

    [1] Mahvish Samar, Xinzhong Zhu . Structured conditioning theory for the total least squares problem with linear equality constraint and their estimation. AIMS Mathematics, 2023, 8(5): 11350-11372. doi: 10.3934/math.2023575
    [2] Salim Bouzebda, Amel Nezzal . Uniform in number of neighbors consistency and weak convergence of $ k $NN empirical conditional processes and $ k $NN conditional $ U $-processes involving functional mixing data. AIMS Mathematics, 2024, 9(2): 4427-4550. doi: 10.3934/math.2024218
    [3] Salim Bouzebda, Amel Nezzal, Issam Elhattab . Limit theorems for nonparametric conditional U-statistics smoothed by asymmetric kernels. AIMS Mathematics, 2024, 9(9): 26195-26282. doi: 10.3934/math.20241280
    [4] Dayang Dai, Dabuxilatu Wang . A generalized Liu-type estimator for logistic partial linear regression model with multicollinearity. AIMS Mathematics, 2023, 8(5): 11851-11874. doi: 10.3934/math.2023600
    [5] Lingsheng Meng, Limin Li . Condition numbers of the minimum norm least squares solution for the least squares problem involving Kronecker products. AIMS Mathematics, 2021, 6(9): 9366-9377. doi: 10.3934/math.2021544
    [6] Muhammad Nauman Akram, Muhammad Amin, Ahmed Elhassanein, Muhammad Aman Ullah . A new modified ridge-type estimator for the beta regression model: simulation and application. AIMS Mathematics, 2022, 7(1): 1035-1057. doi: 10.3934/math.2022062
    [7] Oussama Bouanani, Salim Bouzebda . Limit theorems for local polynomial estimation of regression for functional dependent data. AIMS Mathematics, 2024, 9(9): 23651-23691. doi: 10.3934/math.20241150
    [8] Sizhong Zhou, Jiang Xu, Lan Xu . Component factors and binding number conditions in graphs. AIMS Mathematics, 2021, 6(11): 12460-12470. doi: 10.3934/math.2021719
    [9] Jian Yang, Yuefen Chen, Zhiqiang Li . Some sufficient conditions for a tree to have its weak Roman domination number be equal to its domination number plus 1. AIMS Mathematics, 2023, 8(8): 17702-17718. doi: 10.3934/math.2023904
    [10] Salim Bouzebda . Weak convergence of the conditional single index $ U $-statistics for locally stationary functional time series. AIMS Mathematics, 2024, 9(6): 14807-14898. doi: 10.3934/math.2024720
  • In this paper, we considered the condition number theory of a new generalized ridge regression model. The explicit expressions of different types of condition numbers were derived to measure the ill-conditionness of the generalized ridge regression problem with respect to different circumstances. To overcome the computational difficulty of computing the exact value of the condition number, we employed the statistical condition estimation theory to design efficient condition number estimators, and the numerical examples were also given to illustrate its efficiency.



    Ridge regression [1] is a classical and powerful statistical method to inhibit the multicollinearity among covariates in the following linear regression model:

    y=Xβ+ϵ, (1.1)

    where yRn is the observations of response, XRn×p is the design matrix of covariates, βRp is the unknown parameter vector, and ϵN(0,σ2In) with In being the identity matrix of order n. The ridge estimator of model (1.1) is given by

    ˆβr=(XTX+λIn)1XTy, (1.2)

    where λ>0 is the ridge parameter. Finding the ridge estimator (1.2) is also equivalent to solving the following optimization problem:

    ˆβr=argminβRpyXβ22+λβ22, (1.3)

    where 2 is the Euclidian vector norm. Equation (1.3) is also known as the famous Tikhonov regularization technique dealing with discrete ill-posed problems [2]. Incorporating weights to the observations and generalizing the ridge penalty leads to the following generalized ridge regression (GRR) model:

    ˆβgr=argminβRp(yXβ)TΣ(yXβ)+(ββ0)TΘ(ββ0), (1.4)

    where ΣRn×n is a diagonal matrix with nonnegative elements on its diagonal, ΘRp×p is a symmetric positive definite matrix and determines the speed and direction of shrinkage, and β0 is a user-specified, non-random target vector [3]. By the Karush-Kuhn-Tucker (KKT) optimality condition, taking the derivative of the objective function in (1.4) with respect to β shows that the generalized ridge estimator (GRE) denoted by ˆβgr satisfies the following normal equation:

    (XTΣX+Θ)β=XTΣy+Θβ0 (1.5)

    and ˆβgr is given by

    ˆβgr=(XTΣX+Θ)1(XTΣy+Θβ0). (1.6)

    Though the diagonal matrix Σ can give different weights to the observations, the correlation structure between observations cannot be captured. For this reason, we relax the assumption and allow Σ to be a symmetric positive definite matrix, which makes (1.4) the exact combination of weighted least squares loss and the generalized ridge penalty. By varying the weighting matrices Σ and Θ, the generalized Tikhonov regularization [2] can also be derived from (1.4).

    The multicollinearity in covariates often leads to a large condition number of XTX, which plays a central role in the sensitivity analysis of the least squares estimate of model (1.1) [4,Ch. 5]. Condition number is a powerful tool in measuring the sensitivity of a problem and gives the maximum amplification of the resulting change in solution, with respect to a small perturbation in the input data, and has been studied for too many numerical linear algebra problems to list here. {The development of the definition of condition number and their comparison should be referred to [5,6,7,8,9]. In this paper, we employ the following very general definition of condition number given by [6], which covers most of the popular condition numbers in the current literature as its special cases.

    Definition 1. (projected condition number) [6] Let F:RpRq be a continuous map defined on an open set Dom(F), the domain of definition of F, and LRq×k, then the projected condition number of F at xDom(F) with respect to LRq×k is defined by

    κLF(x)=limδ0supΔxαμδLT(F(x+Δx)F(x))ξLνΔxαμ, (1.7)

    where / is the componentwise division satisfying that a nonzero numerator divided by a zero denominator remains unchanged, ξLRk and αRp are parameters with a requirement that if some element of α is zero, then the corresponding element of Δx must also be zero, and μ and ν are two vector norms defined on Rp and Rk, respectively.

    In Definition 1, the matrix L can be treated as an operator to transform the image vector of F, for example, when we set L=ei, the i-th element of the canonical base of Rq, Definition 1 gives the condition number of the i-th element of the image vector. The two vector norms μ and ν also enable us to give different measures of the perturbations in the input and output spaces, respectively. When the map F is Fréchet differentiable, we can get the following Theorem 1.1, which largely reduces the difficulty of finding the supremum of (1.7) to establishing the Fréchet derivative of F. The rationality of Definition 1 and its relationship with other popular condition numbers has been extensively discussed, and the interested readers are referred to [6,7,8] and the references therein.

    Theorem 1. [6] When the map F in Definition 1 is Fréchet differentiable at x, the explicit expression of the projected condition number κLF(x) is given by

    κLF(x)=Diag(ξL)LTDF(x)Diag(α)μν,

    where DF(x) is the Fréchet derivative of F at x, ξL is a vector with ξLi={1/ξLi, ξLi01, ξLi=0,  Diag(ξL) and Diag(α) are diagonal matrices with the elements of the vector on the diagonal, and μν is the matrix norm induced by the vector norms μ and ν.

    The condition number theory of some special cases of model (1.4) have been well studied in the literature, like (weighted) linear least squares problem [6,7,10,11] and (generalized) Tikhonov regularization [12,13,14]. In this paper, we will extend the current results and consider the condition number theory of the GRR model (1.4). To compute the exact value of the condition number is usually expensive, we also propose some efficient statistical condition estimation methods to reduce the computation burden but still give reasonable estimates. The rest of the paper is organized as follows. In Section 2, we present the explicit expressions of condition numbers for the GRR model (1.4). Section 3 contains the statistical condition estimation method for estimating the condition numbers of the GRR model (1.4). Numerical experiments and a conclusion of the whole paper are given as Sections 4 and 5, respectively.

    In order to present the condition number of the GRR model (1.4), we first explicitly define the map F in (1.7) as follows:

    F:Rn×Rn×p×Rn×n×Rp×p×RpRp,F(y,X,Σ,Θ,β0)ˆβgr=(XTΣX+Θ)1(XTΣy+Θβ0). (2.1)

    Let ˜y=y+Δy, ˜X=X+ΔX, ˜Σ=Σ+ΔΣ, ˜Θ=Θ+ΔΘ and ˜β0=β0+Δβ0, then the perturbed GRR model is given by

    ˆβ~gr=argminβRp(˜y˜Xβ)T˜Σ(˜y˜Xβ)+(β˜β0)T˜Θ(β~β0), (2.2)

    where ˆβ~gr is the GRE of the perturbed GRR model (2.2) and satisfies the following normal equation:

    (˜XT˜Σ˜X+˜Θ)ˆβ~gr=˜XT˜Σ˜y+˜Θ˜β0. (2.3)

    For simplicity of presentation and with a little abuse of notation, we set

    vec(y,X,Σ,Θ,β0):=[yT,vec(X)T,vec(Σ)T,vec(Θ)T,βT0]T

    then the projected condition number of the GRR model (1.4) can be defined as follows.

    Definition 2. Considering the perturbed GRR model (2.2), the projected condition number of the GRR model (1.4), with respect to LRq×k, is defined by

    κLF=limδ0supvec(Δyγ,ΔXΨ,ΔΣΦ,ΔΘΥ,Δβ0ϖ)μδLT(F(˜y,˜X,˜Σ,˜Θ,˜β0)F(y,X,Σ,Θ,β0))ξLνvec(Δyγ,ΔXΨ,ΔΣΦ,ΔΘΥ,Δβ0ϖ)μ,

    where ξLRk, γRn, ΨRn×p, ΦRn×n, ΥRp×p, ϖRp are parameters with a requirement that if some element is zero, then the corresponding element in the numerator must also be zero, and μ and ν are two vector norms defined on Rn+np+n2+p2+p and Rk, respectively.

    If we set ˆβ~gr=ˆβgr+Δβ and further assume that

    max{Δy,ΔΣ,ΔX,ΔΘ,Δβ0}ϵ

    with denoting appropriate norm and ϵ being a sufficiently small positive real number, then with some algebra and omitting the higher order terms, we can get the following equality by subtracting (1.5) from (2.3)

    WΔβ=XTΣ(ΔyΔXˆβgr)+XTΔΣr+ΔXTΣr+ΘΔβ0+ΔΘ(β0ˆβgr)+O(ϵ2), (2.4)

    where W=XTΣX+Θ and r=yXˆβgr. Backing to Definition 2, we have

    F(˜y,˜X,˜Σ,˜Θ,˜β0)F(y,X,Σ,Θ,β0)=Δβ

    and

    Δβ=W1XTΣ(ΔyΔXˆβgr)+W1XTΔΣr+W1ΔXTΣr+W1ΘΔβ0+W1ΔΘ(β0ˆβgr)+O(ϵ2). (2.5)

    With (2.5), the first order expansion of Δβ, we can establish the explicit expression of the projected condition number for the GRR model (1.4).

    Theorem 2. For the GRR model (1.4), when Σ and Θ are symmetric positive definite matrices, the explicit expression of its projected condition number can be established and given as

    κLF=Diag(ξL)LT[W1XTΣ,M,rT(W1XT),N,W1Θ]Diag(vec(γ,Ψ,Φ,Υ,ϖ))μν,

    where W=XTΣX+Θ, r=yXˆβgr, M=W1(Σr)Tˆβgr(W1XTΣ) and N=(β0ˆβgr)TW1.

    Proof. Since vec(ABC)=(CTA)vec(B), we apply vec() operator to Δβ and get

    Δβ=W1XTΣΔy+(W1(Σr)TˆβTgr(W1XTΣ))vec(ΔX)+(rT(W1XT))vec(ΔΣ)+((β0ˆβgr)TW1)vec(ΔΘ)+W1ΘΔβ0+O(ϵ2). (2.6)

    Let M=W1(Σr)TˆβTgr(W1XTΣ) and N=(β0ˆβgr)TW1. We can rewrite (2.6) as

    Δβ=[W1XTΣ,M,rT(W1XT),N,W1Θ][Δyvec(ΔX)vec(ΔΣ)vec(ΔΘ)Δβ0]+O(ϵ2). (2.7)

    With (2.7), we can get the following Fréchet derivative of F

    DF=[W1XTΣ,M,rT(W1XT),N,W1Θ] (2.8)

    and complete the proof by Theorem 1.

    Theorem 2 presents the generic form of the condition number for the GRR problem (1.4). For practical applications, we need to specify the norms and the parameters with respect to concrete backgrounds. We also note that the explicit expression of the condition number contains specific structures due to the Kronecker product, which enlarges the size of the matrix and increases its computational burden. In the following, we will discuss some specific forms of the condition number and its computational issues.

    When μ=ν=2, the parameter vectors and matrices γ, Ψ, Φ, Υ, and ϖ reduce to real numbers and equal to vec([y,X,Σ,Θ,β0])2 and ξL=LTˆβgr20, κLF gives an overall treatment of the perturbations, then we can obtain the projected relative normwise condition number of the GRR model (1.4) from Theorem 2, which is given as follows.

    Theorem 3. When μ=ν=2, γ=Ψ=Φ=Υ=ϖ=vec(y,X,Σ,Θ,β0)2, and ξL=LTˆβgr20, the projected relative normwise condition number of the GRR model (1.4) is given by

    κ2LF=LT[W1XTΣ,M,rT(W1XT),N,W1Θ]2vec(y,X,Σ,Θ,β0)2LTˆβgr2,

    where 2 denotes the spectral norm of a matrix or the Euclidian norm of a vector, W=XTΣX+Θ, r=yXˆβgr, M=W1(Σr)TˆβTgr(W1XTΣ) and N=(β0ˆβgr)TW1.

    Note that the main difficulty for explicitly computing the value of κ2LF lies in the following term

    κ2aLF:=LT[W1XTΣ,M,rT(W1XT),N,W1Θ]2, (2.9)

    which contains the Kronecker product and is also called the projected absolute normwise condition number. To remove the Kronecker product in κ2aLF, the matrix cross-product and Cholesky factorization techniques may be used to establish the compact but equivalent forms of the normwise condition number [6,7]. However for large problems, there is no need to form the condition number explicitly and a suitable estimate will be enough [15,Ch. 15]. Here, we present a compact form of κ2aLF that not only removes the Kronecker product, but also provides important support for its estimation procedure described in Section 3.

    Theorem 4. When μ=ν=2, γ=Ψ=Φ=Υ=ϖ=vec(y,X,Σ,Θ,β0)2, and ξL=LTˆβgr20, the projected absolute normwise condition number κ2aLF of the GRR model (1.4) can also be written as

    κ2aLF1=LTW1KW1LLTW1(ˆβgrrTΣ2X+XTΣ2rˆβTgr)W1L122. (2.10)

    In particular, when L=ei, we can get the projected absolute normwise condition number of the i-th element in the solution

    κ2aeiF1=eTiW1KW1eieTiW1(ˆβgrrTΣ2X+XTΣ2rˆβTgr)W1ei122, (2.11)

    where W=XTΣX+Θ, r=yXˆβgr and K=(1+ˆβgr22)XTΣ2X+r22XTX+(β0ˆβgr22+rTΣ2r)Ip+Θ2.

    Proof. For the spectral norm, we have A2=AAT122 with ARm×n. We apply this equality to κ2aLF and get its equivalent form

    κ2aLF1:=LT[W1XTΣ,M,rT(W1XT),N,W1Θ][ΣXW1MTr(XW1)NTΘW1]L122.

    Since

    MMT=(W1(Σr)TˆβTgr(W1XTΣ))(W1(Σr)ˆβgr(ΣXW1))=rTΣ2rW2W1ˆβgrrTΣ2XW1W1XTΣ2rˆβTgrW1+ˆβgr22W1XTΣ2XW1

    and

    NNT=((β0ˆβgr)TW1)((β0ˆβgr)W1)=β0ˆβgr22W2,

    we can easily get

    κ2aLF1=LTW1KW1LLTW1(ˆβgrrTΣ2X+XTΣ2rˆβTgr)W1L122

    with K=(1+ˆβgr22)XTΣ2X+r22XTX+(β0ˆβgr22+rTΣ2r)Ip+Θ2.

    Remark 1. Theorem 4 presents a compact but equivalent form κ2aLF1 of the absolute condition number κ2aLF that does not contain the Kronecker product. Comparing the size of the matrix, we can find that κ2aLF1 requires much less storage memory compared with κ2aLF. When the exact value of the normwise condition number is computed, κ2aLF1 also needs much less Central Processing Unit (CPU) time on a computer, and this will be illustrated through numerical experiment in Section 4. Here, we also need to point out that the Cholesky factorization technique was not employed to derive the compact form of κ2aLF. The main reason is that the Cholesky factorization based compact form only gives a moderate size of matrix, which is still larger than κ2aLF1, though much smaller than κ2aLF. Thus, considering the economics of storage space, we only derived κ2aLF1.

    Since the normwise condition number gives an overall treatment of all the parameters and ignores the data structure and scaling of the input data, the mixed and componentwise condition numbers are proposed [8,9]. By varying the norms and parameters in Theorem 2, we can also obtain the mixed and componentwise condition numbers of the GRR model (1.4).

    Theorem 5. When μ=ν=, γ=y, Ψ=X, Φ=Σ, Υ=Θ, ϖ=β0, ξL=LTˆβgr, and LTˆβgr in sequel, the projected mixed and componentwise condition numbers of the GRR model (1.4) are given by

    κmLF=|LT[W1XTΣ,M,rT(W1XT),N,W1Θ]||vec([y,X,Σ,Θ,β0])|LTˆβgr

    and

    κcLF=|LT[W1XTΣ,M,rT(W1XT),N,W1Θ]||vec([y,X,Σ,Θ,β0])||LTˆβgr|,

    respectively, where is the infinite norm giving the largest magnitude among each element of a vector.

    Proof. The proof is quite easy due to the following fact that for matrix A and vector d, the following equalities hold

    ADiag(d)=|A||Diag(d)|=|A||d|.

    Applying the above equalities to Theorem 2 gives the desired results.

    For the mixed and componentwise condition numbers, the aforementioned matrix-cross product techniques cannot be used to remove its Kronecker product. Here, we present some upper bounds that require little storage space and can be efficiently computed.

    Theorem 6. The mixed and componentwise condition numbers of the GRR model (1.4) can be correspondingly bounded as follows:

    κubdmLF=GubdLTˆβgrandκubdcLF=GubdLTˆβgr,

    where

    Gubd=|LTW1XTΣ|(|y|+|X||ˆβgr|)+|LTW1XT||Σ||r|+|LTW1||XT|Σr|+|LTW1Θ||β0|+|LTW1||Θ|(|β0ˆβgr|).

    Proof. The explicit expression of Gubd is derived with the Lemma 5 in [10], which is straightforward and omitted here.

    The condition number not only gives a measure of the sensitivity of the problem, but also can be used to give the first order estimate of the forward error. Thus, in many practical applications, the exact value of the condition number is usually not needed and within a 10 factor estimate will be enough. Many techniques have been proposed to estimate the normwise, mixed, and componentwise condition numbers, like the power method [15], probabilistic, and statistical based methods [16]. Considering the computational efficiency and its powerful adaptability, we would like to employ the statistical condition estimation method to estimate the condition numbers of the GRR model.

    The small-sample statistical condition estimation (SSCE) theory has been widely applied to estimate the normwise, mixed, and componentwise condition numbers of many numerical linear algebra problems, and examples include the linear system [17], least squares problem [18,19], matrix factorization [20,21], eigenvalue problem [22], matrix function [16], Sylvester equation [23], and so on.

    To introduce the framework of SSCE, we consider the following differentiable function f:RpR, and its Taylor expansion is given by

    f(x+δz)=f(x)+δf(x)Tz+O(δ2),

    where δ is a small positive real number, f(x) is the gradient vector of f at x, and z is a unit 2-norm vector. It is well known that f(x)2 is the absolute condition number and gives an appropriate measure of the local sensitivity of f at x. According to [16], if we choose z uniformly and randomly from a unit sphere Sp1, then the expected value of |f(x)Tz| is

    E(|f(x)Tz|)=f(x)2Ep,

    where E1=1, E2=π2, and for p>2,

    Ep={135(p2)246(p1), for p odd,2π246(p2)357(p1),for p even.

    Ep is the Wallis factor and can be approximated by Ep2π(p12) with high accuracy [18], then we can define

    η|f(x)Tz|Ep

    as the condition estimator, which can give a very reliable estimate of f(x)2; specifically, the following probability inequality holds

    P(f(x)2τητf(x)2)12πτ+O(1τ2).

    The accuracy of condition estimator can be further improved by adding k samples

    η(k)EkEp|f(x)Tz1|2+|f(x)Tz2|2++|f(x)Tzk|2,

    where [z1,z2,,zk] is the orthonormalization of z1,z2,,zk being selected uniformly and randomly from Sp1. The k-sample condition estimator η(k) can achieve a very high accuracy with a small size of samples, for example, when k=3 and τ=10, we have

    P(f(x)210η(3)10f(x)2)0.9988389,

    which means the reliability of a condition estimate within a factor 10 can be improved from 0.936338 (k=1) to 0.9988389 by just adding 2 extra samples.

    Note that the SSCE method is very suitable for estimating the condition number of certain elements in the solution. However, for the normwise condition number derived in Section 2.1, what we need to estimate is the spectral norm of a large matrix. This means we need to make some modification on the SSCE method to estimate the normwise condition number of the GRR model (1.4). Here, we employ the strategy proposed by [19] for estimating the normwise condition number of the linear least squares problem.

    According to [19], to estimate the normwise condition number of the GRR model (1.4), we first need to estimate the condition number of zTix with

    κi=zTiW1KW1zizTiW1(ˆβgrrTΣ2X+XTΣ2rˆβTgr)W1zi122,

    and then the normwise condition number can be estimated by

    κN=EkEp(ki=1κ2i), (3.1)

    where [z1,,zk] can be obtained via QR factorization [4,Ch. 5] of a random matrix ZRp×k. Note that when W and K are available, the main computational cost for computing κi is O(13p3) when Cholesky factorization is used to compute W1zi. If we further take the QR factorization into account, then we can find that the total computational cost for estimating the normwise condition number is O(k3p3+pk2). We summarize the above procedure as the following Algorithm 1.

    Algorithm 1 Absolute normwise condition number estimator.
    (1) Generate k vectors z1,,zkRp with entries from uniform distribution U(0,1).
    (2) Orthonormalize the vectors z1,,zk with QR factorization.
    (3) Repeat k times for computing
        κi=zTiW1KW1zizTiW1(ˆβgrrTΣ2X+XTΣ2rˆβTgr)W1zi122,
    with i=1,,k, and get κ1,,κk.
    (4) Compute the absolute normwise condition number estimator by
        κN=EkEp(ki=1κ2i), with Ep=2π(p12).

    From Theorem 5, we note that computing the mixed and componentwise condition numbers is equivalent to finding the largest elements in a vector. The SSCE method can be easily applied to estimate the mixed and componentwise condition numbers by extending the aforementioned SSCE procedure from scalar valued function to vector valued function. We present it in the following Algorithm 2. Note that when the orthonormal vectors are obtained, the computational complexity of SSCE for mixed and componentwise condition numbers also lies in solving the positive definite linear system, which makes it similar to that of Algorithm 1.

    To illustrate our theoretical results, we will use a randomly generated GRR model with a known solution. The desired GRR model is constructed as follows. The coefficient matrices are given by

    Σ=U1Λ1UT1,Θ=V1Λ2VT1,andX=U2[Λ30]VT2, (4.1)

    where Ui and Vi (i=1,2) are random orthogonal matrices and Λi (i=1,2,3) are diagonal matrices with λij arranged in a descending order on its diagonal. Actually, (4.1) exactly gives the eigenvalue decomposition of Σ and Θ and the singular value decomposition of X [4]. Λ1 and Λ2 contain the eigenvalues of Σ and Θ, and Λ3 the singular values of X. β is a random vector generated from the standard normal distribution. r is a random vector with specified magnitude, that is, r2 is given. Then, based on the normal Eq (1.5), we set

    y=rXβ and β0=βΘ1XTΣr,

    and get the random GRR model with specified solution. To compute ˆβgr, the preconditioned conjugate gradients (PCG) method will be employed to solve the normal Eq (1.5). With the above setting, we can control the condition number of the coefficient matrices and easily get

    κ(Σ)=Σ2Σ12=λ11λ1n,κ(Θ)=λ21λ2p,andκ(X)=λ31λ3n. (4.2)

    All the experiments are performed in Matlab R2018a on a PC with Intel i7-10700 CPU @ 2.90 GHz and 16.00 GB random access memory.

    Algorithm 2 Mixed and componentwise condition numbers estimator.
    (1) Generate k groups of matrices and vectors
         (Δy1,ΔX1,ΔΣ1,ΔΘ1,Δβ01),,(Δyk,ΔXk,ΔΣk,ΔΘk,Δβ0k)
    with elements from the standard normal distribution N(0,1).
    (2) Obtain the orthonormal vectors [q1,,qk] by orthonormalizing the following matrix
        [Δy1Δykvec(ΔX1)vec(ΔXk)vec(ΔΣ1)vec(ΔΣk)vec(ΔΘ1)vec(ΔΘk)Δβ01Δβ0k],
    and reconstruct (Δy1,ΔX1,ΔΣ1,ΔΘ1,Δβ01),,(Δyk,ΔXk,ΔΣk,ΔΘk,Δβ0k) with the corresponding orthonormal vectors [q1,,qk], respectively.
    (3) For i=1,,k, compute
        xi=W1XTΣ(ΔyiΔXiˆβgr)+W1XTΔΣir+W1ΔXTiΣr+W1ΘΔβ0i+W1ΔΘi(β0ˆβgr)
    (4) Estimate the absolute condition vector with k samples by
        CGRRabs(k)=EkEpki=1|xi|2,
    where the power and absolute operation are performed on the elements of the vectors.
    (5) The mixed and componentwise condition number estimators are given by
        κM=CGRRabs(k)ˆβgr and κC=CGRRabs(k)ˆβgr,
    respectively.

    Example 1. In this example, we will test the tightness and computational efficiency of the upper bounds of the mixed and componentwise condition numbers derived in Theorem 6. For this, we define the following ratios:

    Rmix=κubdmLFκmLF,Rcomp=κubdcLFκcLF,RTmix=κubdmLFCPUκmLFCPU, and RTcomp=κubdcLFCPUκcLFCPU.

    Rmix and Rcomp demonstrate that the closer of the ratios are to 1, the tighter the upper bounds are. RTmix and RTcomp measure the computational efficiency of the upper bounds with respect to CPU time and the larger value shows better computational efficiency. For a clear presentation, we present the numerical results in Figure 1, using a red asterisk and blue plus sign to denote Rmix and Rcomp, correspondingly. Note that there is very little difference in computational complexity between mixed and componentwise condition numbers and its upper bounds. We only present the results of CPU time comparison of the mixed condition number and its upper bound denoted by a green square.

    Figure 1.  Tightness and computational efficiency of the upper bounds for the mixed and componentwise condition numbers.

    In our computation, we set L=In, n=100, p=60, r2=102, λijs in the denominator of (4.2) are equal to 1, λi1 in the numerator are set to accommodate the given condition number, and the rest λijs are equally spaced. The normal Eq (1.5) is solved with the Matlab command pcg with relative residual smaller than 1010 and maximum iterations smaller than 100. By varying the condition numbers of coefficient matrices (4.2), we repeat 1000 times for each setting and present the numerical results in Figure 1. From the first row of Figure 1, we observe that the majority of the ratios are close to 1. This indicates that the derived upper bounds for the mixed and componentwise condition numbers of the GRR model (1.4) are quite tight. On the other hand, from the second row of Figure 1, we notice that most of the ratios exceed 20. This implies that these upper bounds can be computed efficiently, resulting in a significant improvement in CPU time by a factor of 20. Thus, in practical applications, we can use the upper bounds to measure the illness of the GRR model rather than the exact mixed and componentwise condition numbers.

    Example 2. In this example, we will check the efficiency of the SSCE of condition numbers of the GRR model (1.4) with respect to accuracy and CPU time. Similar to the former example, we also define some ratios

    RN1=κNκ2aLF,RM=κMκmLF,RC=κCκcLF,
    RTN1=κNCPUκ2aLFCPU,RTN2=κNCPUκ2aLF1CPU,RTC=κCCPUκcLFCPU.

    The first row of the above ratios measures the accuracy of the SSCE method and the second row measures the efficiency. Specifically, although κ2aLF in (2.9) and κ2aLF1 in (2.10) give the same values, they have very different storage requirement and computational efficiency. Thus, we only give RN1 to measure the accuracy of the normwise SSCE condition estimator, whereas RTN1 and RTN2 are employed to compare the computational efficiency. Moreover, note that the computational complexities of computing or estimating the mixed and componentwise condition numbers have very little difference, so we only present the CPU time comparison via RTC. For a clarity, we also use figures to present our results.

    In our experiment, we set L=In, n=200, p=100, r2=101, κ(Σ)=10, κ(Θ)=10, κ(X)=10, and the other settings are the same as that in Example 1. The numerical results are reported in Figure 2. From the first row in Figure 2, we can find that the SSCE method can give reliable estimates of the mixed and componentwise condition numbers for its ratios within a factor of 10. For the normwise condition number, the SSCE method may give slight overestimates, which coincides with the phenomenon in [19]. From the second row in Figure 2, we note that the ratios are much smaller than 1, which means, in general, the SSCE condition number estimates require much less CPU time compared with the explicit computation of the condition numbers. Comparing the first and second panels in the second row, we can also find that the compact form of the normwise condition number gives much gain of computational efficiency compared with its original form.

    Figure 2.  Efficiency and accuracy of the SSCE of condition numbers.

    In this paper, we extended the GRR model given in [3] by allowing the serial dependence of the observations and investigated the condition number theory of the new model. We first established the generic expression of the condition number of the GRR model (1.4). By varying the norms and parameters in the generic expression, the popular normwise, mixed, and componentwise condition numbers can be obtained as its special cases. Considering the computational difficulty in calculating the exact value of the condition number, we provided the compact form of the normwise condition number and the upper bounds of the mixed and componentwise condition numbers. We also proposed the SSCE condition number estimators for the normwise, mixed, and componentwise condition numbers. Numerical experiments were given to show the tightness and efficiency of the upper bounds and the proposed condition estimators.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    The work was supported by Shandong Provincial Natural Science Foundation (Grant No. ZR2020QA034). WSX was also supported by Shandong Province Higher Education Youth Innovation and Technology Support Program (Grant No. 2023KJ199).

    The authors would like to give their sincere thanks to the anonymous five reviewers for their detailed and helpful comments, which led to a much better presentation of their work.

    The authors affirm that they have no conflicts of interest to disclose.



    [1] A. Hoerl, R. Kennard, Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12 (1970), 55–67. https://doi.org/10.1080/00401706.1970.10488634 doi: 10.1080/00401706.1970.10488634
    [2] P. Hansen, Rank-deficient and discrete ill-posed problems: numerical aspects of linear inversion, Philadelphia: Society for Industrial and Applied Mathematics, 1998. https://doi.org/10.1137/1.9780898719697
    [3] W. van Wieringen, Lecture notes on ridge regression, arXiv: 1509.09169.
    [4] G. Golub, C. van Loan, Matrix computations, 4 Eds., Baltimore: Johns Hopkins University Press, 2013.
    [5] J. Rice, A theory of condition, SIAM J. Numer. Anal., 3 (1966), 287–310. https://doi.org/10.1137/0703023
    [6] S. Wang, L. Meng, A contribution to the conditioning theory of the indefinite least squares problems, Appl. Numer. Math., 177 (2022), 137–159. https://doi.org/10.1016/j.apnum.2022.03.012 doi: 10.1016/j.apnum.2022.03.012
    [7] S. Wang, H. Yang, Conditioning theory of the equality constrained quadratic programming and its applications, Linear Multilinear A., 69 (2021), 1161–1183. https://doi.org/10.1080/03081087.2019.1623858 doi: 10.1080/03081087.2019.1623858
    [8] Z. Xie, W. Li, X. Jin, On condition numbers for the canonical generalized polar decompostion of real matrices, Electron. J. Linear Al., 26 (2013), 842–857. https://doi.org/10.13001/1081-3810.1691 doi: 10.13001/1081-3810.1691
    [9] I. Gohberg, I. Koltracht, Mixed, componentwise, and structured condition numbers, SIAM J. Matrix Anal. Appl., 14 (1993), 688–704. https://doi.org/10.1137/0614049 doi: 10.1137/0614049
    [10] F. Cucker, H. Diao, Y. Wei, On mixed and componentwise condition numbers for Moore-Penrose inverse and linear least squares problems, Math. Comp., 76 (2007), 947–963. https://doi.org/10.1090/S0025-5718-06-01913-2 doi: 10.1090/S0025-5718-06-01913-2
    [11] Y. Wei, D. Wang, Condition numbers and perturbation of the weighted Moore-Penrose inverse and weighted linear least squares problem, Appl. Math. Comput., 145 (2003), 45–58. https://doi.org/10.1016/S0096-3003(02)00437-X doi: 10.1016/S0096-3003(02)00437-X
    [12] D. Chu, L. Lin, R. Tan, Y. Wei, Condition numbers and perturbation analysis for the Tikhonov regularization of discrete ill-posed problems, Numer. Linear Algebra, 18 (2011), 87–103. https://doi.org/10.1002/nla.702 doi: 10.1002/nla.702
    [13] H. Diao, Y. Wei, S. Qiao, Structured condition numbers of structured Tikhonov regularization problem and their estimations, J. Comput. Appl. Math., 308 (2016), 276–300. https://doi.org/10.1016/j.cam.2016.05.023 doi: 10.1016/j.cam.2016.05.023
    [14] L. Meng, B. Zheng, Structured condition numbers for the Tikhonov regularization of discrete ill-posed problems, J. Comput. Math., 35 (2017), 169–186. https://doi.org/10.4208/jcm.1608-m2015-0279 doi: 10.4208/jcm.1608-m2015-0279
    [15] N. Higham, Accuracy and stability of numerical algorithms, 2Eds., Philadelphia: Society for Industrial and Applied Mathematics, 2002. https://doi.org/10.1137/1.9780898718027
    [16] C. Kenney, A. Laub, Small-sample statistical condition estimates for general matrix functions, SIAM J. Sci. Comput., 15 (1994), 36–61. https://doi.org/10.1137/0915003 doi: 10.1137/0915003
    [17] A. Laub, J. Xia, Applications of statistical condition estimation to the solution of linear systems, Numer. Linear Algebra, 15 (2008), 489–513. https://doi.org/10.1002/nla.570 doi: 10.1002/nla.570
    [18] C. Kenney, A. Laub, M. Reese, Statistical condition estimation for linear least squares, SIAM J. Matrix Anal. Appl., 19 (1998), 906–923. https://doi.org/10.1137/S0895479895291935 doi: 10.1137/S0895479895291935
    [19] M. Baboulin, S. Gratton, R. Lacroix, A. Laub, Statistical estimates for the conditioning of linear least squares problems, In: Parallel processing and applied mathematics, Berlin: Springer, 2014,124–133. https://doi.org/10.1007/978-3-642-55224_13
    [20] A. Farooq, M. Samar, Sensitivity analysis for the generalized Cholesky block downdating problem, Linear Multilinear A., 70 (2022), 997–1022. https://doi.org/10.1080/03081087.2020.1751033 doi: 10.1080/03081087.2020.1751033
    [21] A. Farooq, M. Samar, H. Li, C. Mu, Sensitivity analysis for the block Cholesky downdating problem, Int. J. Comput. Math., 97 (2020), 1234–1253. https://doi.org/10.1080/00207160.2019.1613528 doi: 10.1080/00207160.2019.1613528
    [22] A. Laub, J. Xia, Fast condition estimation for a class of structured eigenvalue problems, SIAM J. Matrix Anal. Appl., 30 (2009), 1658–1676. https://doi.org/10.1137/070707713 doi: 10.1137/070707713
    [23] H. Diao, H. Xiang, Y. Wei, Mixed, componentwise condition numbers and small sample statistical condition estimation of Sylvester equations, Numer. Linear Algebra, 19 (2012), 639–654. https://doi.org/10.1002/nla.790 doi: 10.1002/nla.790
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1213) PDF downloads(75) Cited by(0)

Figures and Tables

Figures(2)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog