1.
Introduction
Linear models, as one of the core methods in classical statistics and machine learning, hold significant theoretical and practical importance [1]. Theoretical research on linear models highlights their interpretability, solvability, and a solid mathematical foundation, enabling a deeper understanding of the patterns underlying model predictions and providing foundations for the development of more advanced models as well as algorithms [2]. In practical applications, linear models are intuitive, easily comprehensible, and applicable to various tasks. They have achieved significant outcomes in domains like financial risk control and medical diagnosis [3]. Additionally, linear models bring the advantages of low computational complexity, suitability for large-scale datasets and even online learning tasks, regularization techniques to improve generalizability, and inherent feature selection capabilities. Hence, linear models possess high practical value in real-world applications.
Consider a linear regression model
where y=(y1,⋯,yn)′ is a random vector of responses, e=(e1,⋯,en)′ is the vector of errors with mean E(e)=0 and covariance matrix D(e)=σ2In, X=(x1,⋯,xn)′ with xi=(xi1,⋯,xip)′ for i=1,⋯,n is the regressor matrix of full column rank, the constant β0, the vector of regression parameters β=(β1,⋯,βp)′, and the error variance σ2 are assumed to be unknown, 1 is a vector of ones with suitable orders, 0 is a vector or matrix of zeros with suitable orders, and In denotes the identity matrix of order n. In addition, assume 1∉R(X), in which R(X) denotes the (column) range space of X.
It is well known that the ordinary LS estimators for β0 and β (denoted by ˆβ0 and ˆβ, respectively) play an important role in parametric estimation theory, which can be expressed as the solution of the following regular equation
However, when severe multicollinearity is present in the model (1.1), the LS estimator usually performs poorly under the mean squared error (MSE) criterion. The problem of multicollinearity usually occurs in the case that there is potentially high approximate correlation among the regressors, which can lead to unstable parameter estimation, increased variance of explanatory variables, and decreased reliability and interpretability of the model.
To overcome the problem of multicollinearity, various biased estimators for different models were put forward in the literature, such as the ordinary and generalized ridge regression estimators [4,5,6,7,8,9,10,11,12], and very recently, the PCR estimator [13], the Liu and Liu-type estimators [14,15] and their improved versions [16,17], and the double-k class estimators [18]. These biased estimators can locally improve the LS estimator by appropriately choosing the biasing parameters involved. Among them, the PCR estimator is of particular interest to us because of its geometric meaning and interpretation in trying to capture the essence of the model and its effectiveness in addressing multicollinearity and enhancing model stability. However, it involves dimensionality reduction, which may lead to information loss. While the amount of information loss can be customized by the user, it can also give rise to subsequent issues and challenges. In this paper, we analyze a shortcoming of the PCR estimator in detail and then put forward an improvement from the perspective of overcoming the model instability and inaccurate estimation caused by multicollinearity, while minimizing or even avoiding the loss of information carried by the data as much as possible.
The remainder of the paper is organized as follows. Section 2 briefly analyzes the classical PCR estimator. In Section 3, we discuss the motivation by exemplifying the advantages and disadvantages of PCR. We then propose three versions of hybrid PCR estimators and provide the corresponding optimal solutions under the PRESS criterion. In Section 4, we apply the theoretical results to two real examples and conduct a simulation study. Section 5 provides concluding remarks and two suggestions for the estimators' use.
2.
Classical PCR estimation
In this section, we concisely describe the classical PCR estimation and discuss a potential flaw of it when used in practice. Centralize X as Xc=X−1n11′X such that 1′Xc=0. Pre-multiplying the two sides of (1.2) with the nonsingular partitioned matrix [10−1nX′1In], we have the following equivalent regular equation
By direct operations, the LS estimators are given as
in which ¯y=1ny′1=1n∑ni=1yi and ¯x=1nX′1=1n∑ni=1xi denote the sample mean of the responses and that of the regressors, respectively. Also, (2.1) can be derived from the centeralized model of (1.1), y=α01+Xcβ+e with α0=β0+¯x′β.
When multicollinearity is present in the centralized model, Xc is ill-conditioned. For this case, ˆβ0 and ˆβ can be improved by PCR estimators [13]. Let λ1⩾⋯⩾λp (>0) be the eigenvalues of X′cXc, and q1,⋯,qp be the corresponding standardized eigenvectors.
We set Λ=diag(λ1,⋯,λp), Q=(q1,⋯,qp), Z=XcQ≜(z1,⋯,zp), and γ=Q′β≜(γ1,⋯,γp)′. It follows that the centralized model can be written as
considering that Q is an orthogonal matrix. If z′r+1zr+1=λr+1≈0 holds for some r (1⩽r<p), the value of ∑pj=r+1γjzj is close to 0, and therefore can be omitted approximately or merged into the intercept term, α01. The number r can be commonly determined by letting the cumulative percent (λ1+⋯+λr)/(λ1+⋯+λp) be as large as possible, specifically, not less than 85%. In this sense, the canonical model (2.2) reduces to
with Z1=(z1,⋯,zr) and γ1=(γ1,⋯,γr)′. That is, γr+1,⋯,γp are regarded (or estimated) as zeros. This means β=Qγ≈Q1γ1, with Q1=(q1,⋯,qr). Set Λ1=diag(λ1,⋯,λr). Imposing the LS principle on the reduced model (2.3), it gives the PCR estimators as
3.
Hybrid PCR estimation
In this section, we briefly discuss the limitations of the classical PCR estimator, which motivates us to define three hybrid PCR estimators. We then employ the PRESS criterion to obtain the optimal hybrid PCR estimators.
3.1. Motivation and definition
The classical PCR estimators given in (2.4) can improve the LS estimator by discarding the redundant part of the centralized regressor matrix. However, as we can see from the previous procedure, there may be some potential problems for the PCR estimators: ⅰ) The cumulative percent (85% or other values) is subjective; ⅱ) A small cumulative percent can lead to too much loss of useful information; and ⅲ) A large cumulative percent will produce estimators performing badly.
This can also be illustrated by the following two toy examples. One is:
and the other is with λ1=30,λ2=13.3,λ3=6.6,λ4=0.1. Both of them suffer from multicollinearity, since they have the large (and identical) condition number, 300. Clearly, for the former, choosing the first two principal components to estimate the regression parameters are reasonable because λ1+λ2⩾0.85(λ1+⋯+λ9) and all of λ3,⋯,λ8 are very small relative to λ1, while for the latter, it is undesirable to discard the third principal component although λ1+λ2⩾0.85(λ1+⋯+λ4).
To overcome the problems (ⅱ) and (ⅲ), one can use different cumulative percents (⩾85%, or ⩾90%, or ⩾95%, etc.) in different problems. However, this may lead to much more subjectivity and thus intensifying (ⅰ). An alternative way is to combine all possible PCR estimators in a suitable way such that the contribution of each PCR estimator can be automatically computed. This will be studied in the next section.
As illustrated by the second example in Section 2, the third principal component with contribution percent λ3/(λ1+⋯+λ4)=6.6/(30+13.3+6.6+0.1)=13.2% should not be discarded directly, but should be used with an appropriate proportion. This can be done by first weighting each principal component and then estimating the parameters. This will yield a nonlinear estimator with respect to the weights, and thus lead to new difficulties in determining the values of the weights.
An alternative method is to linearly weight all of the PCR estimators. This leads to the following concept of hybrid PCR (HPCR) estimation:
Definition 1. Denote the PCR estimator of β based on the first k principal components by ˜β(k). For any p constants w1,⋯,wp∈R, we call β∗w≜∑pk=1wk˜β(k) and β∗0,w≜¯y−¯x′β∗w to be the HPCR estimators for β and β0, respectively, with respect to w=(w1,⋯,wp)′.
Clearly, β∗w and β∗0,w reduce to the classical PCR estimators presented in (2.4) if taking wr=1 and wi=0 for any i≠r. When taking w1=⋯=wp−1=0 and wp=1, the LS estimators given in (2.1) are derived. Hence, Definition 1 gives a set of estimators including classical PCR and LS estimations.
For Definition 1, the problem is to determine the values of w=(w1,⋯,wp)′. A feasible method is to simply take wi as the contribution percent of the ith principal component. This means that the first PCR estimator gets the largest w1, the second PCR estimator gets the second largest w2, and so on. However, this may not be suitable in some situations.
For example, consider the model (1.1) with λ1=23.3,λ2=20,λ3=6.6, and λ4=0.1. In this example, the first PCR estimator only uses all of the 23.3/(23.3+20+6.6+0.1)=46.6% information about the regressors, while the second PCR estimator uses (23.3+20)/(23.3+20+6.6+0.1)=86.6% out of all information. Therefore, the first PCR estimator is quite bad relative to the second PCR estimator, and thus it should not be given the largest weight in the HPCR estimator. In this sense, the selection of w is a key procedure in getting a fine HPCR estimator. In what follows, we provide a selection under the PRESS criterion.
3.2. PRESS criterion
To find an optimal HPCR estimator, we use the PRESS put forward by [19,20] to measure how w influences on the predictive performance of β∗w and β∗0,w. We do not consider how β∗w and β∗0,w are different from β and β0, because multicollinearity causes the differences between the true and estimated values to be no longer true. For example, for a model y=β0+β1x1+β2x2+β3x3+e with x3≈2x1−3x2, it follows that y≈β0+(β1+2β3)x1+(β2−3β3)x2+0⋅x3+e. This means that the estimators of β1+2β3 and β2−3β3 can be good enough to estimate β1 and β2.
Observe now the PRESS criterion. Let ˆα−i be denoted as an estimator of α based on all data points except the ith one. With this notation (and some other similar ones), the PRESS statistic of the LS estimators, that of classical and hybrid PCR estimators, can be expressed as follows:
Note that the expression of PRESS(β∗0,w,β∗w;β0,β) contains w, so the PRESS criterion imposed on hybrid PCR estimators is to find w such that PRESS(β∗0,w,β∗w;β0,β) is minimized.
The PRESS statistic is seemingly similar to the sum of the residuals, ∑ni=1[yi−(ˆβ0+x′iˆβ)]2, of the original LS principle. However, PRESS is essentially different from LS, because it avoids granting an observation (data point) to play a dual role in simultaneously fitting old observations and predicting new observations, and it can facilitate exploiting the predictive performance of estimation. This is why we consider using the PRESS criterion.
3.3. Optimal HPCR estimators under the PRESS criterion
To find the PRESS-optimal HPCR estimators, we rewrite (3.3) as follows:
in view of the algebraic facts that yi−¯y−i=nn−1(yi−¯y) and xi−¯x−i=nn−1(xi−¯x).
Denote yc=(y1−¯y,⋯,yn−¯y)′ and A=(aik)n×p, with aik≜(xi−¯x)′˜β(k)−i for i=1,⋯,n and k=1,⋯,p. With these notations, we have
If no constraints are imposed on w1,⋯,wp, it is clear that minimizing PRESS(β∗0,w,β∗w;β0,β) gives
This further implies
with PA=A(A′A)−A=AA+ being the orthogonal projection matrix [1, p. 24] over the (column) range space, R(A), where A− is any 1-inverse, and A+ is the unique Moore-Penrose inverse (Definition 2.2 of [1]) of A.
According to the above derivations, we can present the following theorem:
Theorem 1. Let w∗=(w∗1,⋯,w∗p)′ be defined in (3.5). Then, β∗w∗=∑pk=1w∗k˜β(k) and β∗0,w∗=¯y−¯x′β∗w∗ have the minimal PRESS value presented in (3.6) in all HPCR estimators.
This theorem concludes how to choose w under the PRESS criterion to get a fine HPCR estimator. As seen, if the matrix A is of full column rank, w∗ is unique; otherwise, w∗ changes along with different selections of the generalized inverse of A. For convenience, we will always use the Moore-Penrose inverse, A+, in the simulation study.
Computationally, in the case that both of the matrices X and A are of full column rank, A is usually more ill-conditioned than X. Although we cannot prove this result theoretically, the simulation study will show this to us. The major reason may be that A derives from some PCR estimators consisting of too many minor principal components. A potential solution is to discard the last several PCR estimators, which contain one or more principal components with a too-small individual percentage (such as 5% and even smaller) of variance, when using HPCR estimators. Specifically, letting K∈{1,⋯,p−1} satisfy
the HPCR estimators that Definition 1 presents can be modified as β∗∗w≜∑Kk=1wk˜β(k) and β∗∗0,w≜¯y−¯x′β∗∗w, with w=(w1,⋯,wK)′. That is, we use only the first K PCR estimators to get the hybrid version. Under this modification, the PRESS-optimal selection for w1,⋯,wK can be obtained in a similar fashion. The details are omitted here.
3.4. Optimal WPCR estimators
Now, we assume w1,⋯,wp are weights, satisfying ∑pk=1wk=1′w=1. In this case, we call β∗w and β∗0,w the WPCR estimators. Here, we note that, similar to the ordinary HPCR estimators, WPCR estimators also do not require w1,⋯,wp to take nonnegative values, because a negative wi implies the ith PCR estimator may produce some opposite estimates for the corresponding parameters to other PCR estimators, and the negativity of wi can offset such effects in a way. Then, the problem of finding optimal WPCR estimators under the PRESS criterion is equivalent to solving the optimization problem
To solve (3.7), we denote the Lagrange function by L(w,ℓ)=y′cyc−2y′cAw+w′A′Aw+2ℓ(1′w−1), in which ℓ is the Lagrange multiplier. By the formulas for partial derivatives of matrix functions [1, pp. 38–47], we obtain the following matrix equations:
Equivalently, we have the following constrained regular equation:
Note here that A′A is symmetric and nonnegative definite. In what follows, we show the Eq (3.8) is consistent. In fact, as proven by [21], it can be shown that
Hence, we get
This shows the consistency of Eq (3.8). Using the formula for the generalized inverse (see Theorem 2.6 of [1]) of a partitioned matrix that
in which T=S+LL′ and Q=L′T−L with S being symmetric and nonnegative definite, we have
with T=A′A+11′. Here, 1′T−1≠0, and this is an algebraic fact explained in what follows: First of all, we note 1∈R(1)⊆R([A′,1])=R([A′,1][A′,1]′)=R(A′A+11′)=R(T), in which R(⋅) denotes the range space. This implies: ⅰ) the value of 1′T−1 is independent of the selection of T−, and therefore 1′T−1=1′T+1; and ⅱ) PT1=1.
Now, we prove 1′T−1≠0 holds. Suppose 1′T−1=0. Combined with the fact that T is symmetric and nonnegative definite, we obtain 1′T+1=1′T−1=0⇒T+1=0⇒1=PT1=TT+1=0. This contradicts with "1≠0'', so we must have 1′T−1≠0. Therefore,
Further, the w-solution of (3.8) can be expressed as
Note that both 1′T−1 and 1′T−A′ are invariant with respect to all generalized inverses of T, since 1∈R(T) and R(A′)⊆R(T). Clearly, ∑pk=1w∗∗k=1′w∗∗=1. Recalling that the objective function of (3.7) is quadratic with respect to w, this gives the globally optimal WPCR estimators under the PRESS criterion. The result is summarized in the following theorem:
Theorem 2. Let w∗∗=(w∗∗1,⋯,w∗∗p)′ be defined in (3.9). Then,
have the minimal PRESS value in all of the WPCR estimators.
As Theorem 1 does, Theorem 2 also provides us with the method of choosing the weights to get the optimal WPCR estimators, β∗w∗∗ and β∗0,w∗∗, under the PRESS criterion. Further, if the matrix T is of full column rank, w∗∗ is unique; otherwise, w∗∗ changes along with T. In the simulation study, we will always use the Moore-Penrose inverse, T+, when considering β∗w∗∗ and β∗0,w∗∗. Note that, in any case, the minimal PRESS value remains unchanged.
3.5. Optimal WPCR estimators with nonnegative weights
The above two subsections obtain optimal HPCR and WPCR estimators, respectively. Finally, we assume constants w1,⋯,wp are weights (that is, ∑pk=1wk=1) and each weighting constant is nonnegative. In this case, we call β∗w and β∗0,w the WPCR estimators with nonnegative weights (WnnPCR estimators). That is, we need to solve the following quadratic programming (QP) problem
Problem (3.10) can be solved by the commonly used procedure of quadratic programming in various mathematical softwares. To improve the performance, we take w∗∗+≜(w∗∗1;+,⋯,w∗∗p;+)′ as the initial value of the search, in which
with u∗∗i=max{w∗∗i,0}, for i=1,⋯,p. Here, w∗∗1,⋯,w∗∗p are defined in (3.9).
In what follows, we give a procedure of getting an approximate solution of the QP problem (3.10). Let I be a subset of {1,⋯,p}, and we denote the following QP problem as QP(I):
We note here that this problem has the same structure as (3.7), because the constraint wi=0 with i∈I renders the reduction of matrix A in (3.11) to a sub-matrix consisting of the columns except those in I. Then, the approximate solution of (3.10) can be obtained by the following steps:
Step 1: Initialize I(k)=Ø and k=0.
Step 2: Use (3.9) to get a solution of QP(I(k)), namely w(k)≜(w(k)1,⋯,w(k)p)′. We set
Step 3: If J(k)≠Ø, solve QP(I(k)∪{j}) for every j∈J(k), find jmin which minimizes the QP objectives, set
and k←k+1, and then go to Step 2. Otherwise, return to the approximate solution of the QP problem (3.10), w∗∗∗≜w(k).
This procedure modifies negative weights as 0 stepwise. In the whole process, all calculations can be theoretically performed. Therefore, it is essentially different from the solution derived by any mathematical software, when nonnegativity is required for weights.
We mention here that, as explained after Definition 1, LS and PCR are two special cases of HPCR (as well as WPCR and WnnPCR), so the optimal HPCR/WPCR/WnnPCR estimators will always perform better than LS/PCR theoretically in the PRESS sense.
4.
Numerical study
In this section, we first apply the theoretical results to two real examples, namely Hald data [22] and Acetylene data [23], to preliminarily observe their performance. To investigate the numerical performance of classical and hybrid PCR in detail, we then conduct a simulation study to observe the changes in the PRESS values of the estimators under various degrees of multicollinearity, and analyze the potential reasons behind the observations.
4.1. Real examples
The Hald dataset [22] uses the heat of hardening after 180 days as the response and four ingredients as regressors, while the Acetylene dataset [23] uses the reactor temperature, rate of H2 to n-heptane, and contact time as regressors and conversion of n-heptane to acetylene (%) as the response. Under the model (1.1) with p=4 for Hald and p=3 for Acetylene, the condition numbers for the regressor matrices are 20.5846 and 36935.9119, so these two datasets represent moderate multicollinearity and severe multicollinearity, respectively.
By direct computations, we obtain the PRESS values and then list them into Tables 1 and 2. The results reveal that:
● Regardless of the severity of multicollinearity, the estimators (HPCR/WPCR/WnnPCR) consistently yield lower PRESS compared to LS and PCR. As a result, the new estimators can be considered as competitive biased estimators in practical applications.
● When the condition number of the regressor matrix is not excessively high, PCR tends to eliminate valuable information due to its inherent construction features, leading to a higher value of PRESS. However, when the condition number of the regression matrix is extremely high, PCR usually performs relatively well.
● For the Hald data, the PRESS value of the classic PCR estimator unexpectedly exceeds that of the LS estimator and three hybrid PCR estimators. After careful checking, we find that PCR uses only the first principal component (contributing 86.60%) to estimate parameters, while the information carried by the second and third principal components (contributing 11.29 and 2.07%, respectively) is directly discarded! Furthermore, we find that, as one of the hybrid PCR estimators, WnnPCR nearly gives a 100% (to be more specifically, it is 99.9999999995337%) proportion to the estimator based on the first three principal components. This means that the WnnPCR estimator for Hald data is very close to the one constructed by the first three principal components, retaining 86.60+11.29+2.07=99.96% of all information, rather than just retaining 85% of the information in the traditional sense.
● For the Acetylene data, the situation is different. As seen, the WnnPCR estimator is composed of the first two PCR estimators, with weights 51.45 and 48.55%, respectively. Note that the contribution rate of the first principal component is 99.53%, while that of the second is only 0.47%. Therefore, the WnnPCR assigns a weight slightly lower than 50% to the second PCR estimator. Maybe this is just an attempt to extract as much useful information as possible from the information carried by the second principal component, which contributes 0.47% only.
4.2. Simulation
This subsection makes a short simulation study to examine the numerical performance of LS and classical/hybrid PCR estimators for the model (1.1). In this study, we take n=30, 70, 100, 200 and p=3,6,9. For each case of p, take σ from {0.75,0.25}. The explanatory variables are generated by using the simulation procedure suggested by McDonald and Galarneau [24]:
where ζij's are independent standard normal pseudo-random numbers, and ρ2 is the correlation between any two explanatory variables. To see how multicollinearity influences the performance, we take ρ as 0.5, 0.9, 0.999, and 0.99999, respectively, to get regressor matrices with different condition numbers, from small to large. In addition, for each case, we randomly generate β0 and β from the interval [−5,5]. After that, we create a pseudo observation to compute the PRESS values of the five estimates (including LS, PCR with cumulative percent not less than 85%, HPCR, WPCR, and weighted PCR with nonnegative weights (WnnPCR). 100 runs are then performed and averaged for each case.
The results are computed and presented in Tables A1–A4 in the Appendix. By the tables, it follows that: (ⅰ) PCR can improve LS only when explanatory variables are highly correlated, and the degree of improvement depends on the error variance σ2. In particular, when σ2=0.752, PCR has smaller PRESS values than LS if ρ takes either 0.999 or 0.99999; when σ2=0.252, PCR cannot improve LS unless ρ=0.99999. Especially, in the case of ρ=0.5 or 0.9, PCR performs very badly. (ⅱ) Each of HPCR, WPCR, and WnnPCR improves LS and PCR substantially because, in any case, these three estimators have far smaller PRESS values than LS/PCR estimators. Naturally, HPCR performs the best, since the values of w can be selected from a wider range. (ⅲ) The degree of the improvement of HPCR/WPCR/WnnPCR over LS/PCR depends on p and ρ. Specifically, the larger p is, the higher the degree is; and the lager ρ is, the higher the degree is. (ⅳ) All estimators can be computed efficiently.
In view of the fact that LS can be regarded as a special PCR with all principal components, PCR can perform the same theoretically as LS if taking the cumulative percent as 100%. However, in this case, PCR fails to deal with multicollinearity. Therefore, HPCR/WPCR/WnnPCR can be a desirable remedying procedure, because these estimators collect information carried by all possible PCR estimators in an efficient way.
4.3. Discussion
It is well known that when multicollinearity is severe, the LS estimator performs poorly under the MSE criterion. However, as shown by the two real examples and the simulation study, the LS estimator seems to be relatively robust under the PRESS criterion. Although it is only slightly worse than the three newly proposed HPCR estimators, it is not to the extent of being surprising.
Why is this?
In fact, this is directly related to the nature of the MSE and PRESS criteria. MSE measures the difference between the regression parameters and their estimates, while PRESS considers the contribution of each observation point, rather than the direct difference between parameters and estimates. Although the LS estimator appears to be only slightly worse than HPCR estimators in the sense of PRESS, this slight difference has indicated a substantial improvement of HPCR over LS.
On the other hand, we also note that under severe multicollinearity, the PRESS value of a classical PCR estimator is very large. The reason for the poor performance of classical PCR is different from the aforementioned reasons. Instead, this is mainly because the contribution rate, namely 85%, is chosen subjectively rather than being data-driven, which leads to the results of a PCR falling short of theoretical expectations.
To check how contribution rates influence the corresponding PRESS values, we reevaluate the classical PCR estimators in simulation studies, with the contribution rates taking 75, 85, and 95%, respectively. All of the results are presented in corresponding tables. The results indicate that:
● In any case of n, p, σ, and ρ, the value of PRESS of PCR decreases as the cumulative contribution rate of the principal components increases, although the decrease in PRESS may not be strict. For example, in the case of n=30, p=6, and σ=0.75, the PRESS values of the three PCR estimators are 263.6573,234.4363, and 108.8466 when ρ takes 0.9, while the values are equal to each other when ρ takes 0.999.
● For any case of n, p, σ, and a fixed cumulative contribution rate of the principal components, the value of PRESS of PCR strictly decreases as large ρ increases. Taking also the case of n=30, p=6, and σ=0.75, the PRESS values of the 95% PCR estimators are 108.8466, 13.1987, and 9.8573 for ρ taking 0.9, 0.999, and 0.99999.
● In any case of p, σ, ρ, and a fixed cumulative contribution rate of the principal components, the averaged value of PRESS of PCR with respect to n, namely 1nPRESS, strictly decreases as n increases. For example, in the case of p=6, σ=0.75, and ρ=0.9, the averaged PRESS values of the 95% PCR estimators are
respectively, for n=30,70,100, and 200.
For the Hald and Acetylene data, we consider the performance of the ordinary ridge regression (ORR) [4] and the Liu estimator (LE)[14], since each of these two estimators involves only one biased parameter, which can be easily adjusted by linearly changing the values from 0 to 1 or to a smaller/larger scalar when computing the PRESS values for the associated estimates. By direct computations, the results are derived and presented in Figures 1 and 2. By the figures, it follows that
● For the Hald data, LE and ORR have the minimal PRESS values 97.6613 and 96.8488, respectively, when the Liu parameter d takes 0.1954 and the ridge parameter k takes 0.002153.
● For the Acetylene data, LE and ORR get the minimal PRESS values 330.8642 and 311.2461, respectively, when d=0.9345 and the ridge parameter k takes 0.005355.
By Tables 1 and 2, all of the three new estimators (HPCR, WPCR, and WnnPCR) have much smaller PRESS values and therefore outperform LE and ORR under the PRESS criterion.
Additionally, note here that the smaller the PRESS value, the better the model's predictive ability. We employ a predicted version of R2 to measure the predictive ability of the model. The predicted R2 of an estimator, (→β0,→β), is defined as follows:
By direct computations, the R2PRESS values of the seven estimators (LS, PCR, HPCR, WPCR, WnnPCR, LE, and ORR) in the Hald and Acetylene data are
The results indicate similar expected conclusions to that of Subsection 4.1.
5.
Conclusions and suggestion
This paper addresses the issues existing in the classic PCR estimation and proposes three hybrid PCR estimators. The two real examples and the simulation study demonstrate the desirable performance of the new methods. Also, the three hybrid PCR estimators could also be studied under the MSE criterion. However, since they are biased estimators, the determination of the weights in the MSE sense can only be iteratively solved from a numerical perspective. This also implies that the estimators will no longer be linear estimators after the first iteration, making it difficult to accurately represent the value of MSE and only approximate results can be obtained. In short, the study of hybrid PCR estimation under the MSE criterion is challenging. In what follows, we give two suggestions for the use of the new estimators.
Suggestion 1: Despite the issue of selecting the contribution rate, classic PCR estimation still yields decent estimators by automatically determining which cumulative contribution rate to use (in essence, this is equivalent to how many principal components to use). Therefore, in cases where data size is large and there are numerous regression variables, users can still employ the classic PCR method to estimate parameters. This can be seen from the aforementioned fact that the averaged PRESS value decreases as the data size increases.
Suggestion 2: We can determine which estimator to use by considering the degree of multicollinearity. If multicollinearity is absent or weak, we can use the LS estimator directly. If multicollinearity is moderate, we can combine the above Suggestion 1 to choose a classical PCR estimator with an appropriate cumulative contribution rate. If multicollinearity is severe, it is recommended to use the hybrid PCR estimator.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
The authors are very grateful to the four anonymous reviewers for their valuable comments and constructive suggestions, which were helpful in improving the paper. They would also like to thank Miss Bing-Jie Li for her constructive comments during the drafting of this paper.
Conflict of interest
The authors declare that there are no conflicts of interest.
Appendix