
Treatment effects with heterogeneity and heteroskedasticity are widely studied and applied in many fields, such as statistics and econometrics. The conditional average treatment effect provides an excellent measure of the heterogeneous treatment effect. In this paper, we propose a model averaging estimation for the conditional average treatment effect with partially linear models based on the jackknife-type criterion under heteroscedastic error. Within this context, we provide theoretical justification for our model averaging approach, and we establish asymptotic optimality and weight convergence properties for our model under certain conditions. The performance of our proposed estimator is compared with that of classical estimators by using a Monte Carlo study and empirical analysis.
Citation: Xiaowei Zhang, Junliang Li. Model averaging with causal effects for partially linear models[J]. AIMS Mathematics, 2024, 9(6): 16392-16421. doi: 10.3934/math.2024794
[1] | Yongge Tian . An effective treatment of adding-up restrictions in the inference of a general linear model. AIMS Mathematics, 2023, 8(7): 15189-15200. doi: 10.3934/math.2023775 |
[2] | Zhongqi Liang, Yanqiu Zhou . Model averaging based on weighted generalized method of moments with missing responses. AIMS Mathematics, 2023, 8(9): 21683-21699. doi: 10.3934/math.20231106 |
[3] | Xiaocong Chen, Qunying Wu . Complete convergence and complete integral convergence of partial sums for moving average process under sub-linear expectations. AIMS Mathematics, 2022, 7(6): 9694-9715. doi: 10.3934/math.2022540 |
[4] | Kai Zhang, Yunpeng Ji, Qiuwei Pan, Yumei Wei, Yong Ye, Hua Liu . Sensitivity analysis and optimal treatment control for a mathematical model of Human Papillomavirus infection. AIMS Mathematics, 2020, 5(3): 2646-2670. doi: 10.3934/math.2020172 |
[5] | Dayang Dai, Dabuxilatu Wang . A generalized Liu-type estimator for logistic partial linear regression model with multicollinearity. AIMS Mathematics, 2023, 8(5): 11851-11874. doi: 10.3934/math.2023600 |
[6] | Avijit Duary, Md. Al-Amin Khan, Sayan Pani, Ali Akbar Shaikh, Ibrahim M. Hezam, Adel Fahad Alrasheedi, Jeonghwan Gwak . Inventory model with nonlinear price-dependent demand for non-instantaneous decaying items via advance payment and installment facility. AIMS Mathematics, 2022, 7(11): 19794-19821. doi: 10.3934/math.20221085 |
[7] | Qiang Zhao, Zhaodi Wang, Jingjing Wu, Xiuli Wang . Weighted expectile average estimation based on CBPS with responses missing at random. AIMS Mathematics, 2024, 9(8): 23088-23099. doi: 10.3934/math.20241122 |
[8] | Taher S. Hassan, Amir Abdel Menaem, Hasan Nihal Zaidi, Khalid Alenzi, Bassant M. El-Matary . Improved Kneser-type oscillation criterion for half-linear dynamic equations on time scales. AIMS Mathematics, 2024, 9(10): 29425-29438. doi: 10.3934/math.20241426 |
[9] | Jieqiong Lu, Peixin Zhao, Xiaoshuang Zhou . Orthogonality based modal empirical likelihood inferences for partially nonlinear models. AIMS Mathematics, 2024, 9(7): 18117-18133. doi: 10.3934/math.2024884 |
[10] | Ashraf Al-Quran . T-spherical linear Diophantine fuzzy aggregation operators for multiple attribute decision-making. AIMS Mathematics, 2023, 8(5): 12257-12286. doi: 10.3934/math.2023618 |
Treatment effects with heterogeneity and heteroskedasticity are widely studied and applied in many fields, such as statistics and econometrics. The conditional average treatment effect provides an excellent measure of the heterogeneous treatment effect. In this paper, we propose a model averaging estimation for the conditional average treatment effect with partially linear models based on the jackknife-type criterion under heteroscedastic error. Within this context, we provide theoretical justification for our model averaging approach, and we establish asymptotic optimality and weight convergence properties for our model under certain conditions. The performance of our proposed estimator is compared with that of classical estimators by using a Monte Carlo study and empirical analysis.
Causal effects are fundamental to project evaluation across various fields, such as economics, finance, and biomedicine. It is well known that the central issue in project evaluation in these fields is to identify the "causal relationships" that exist between project treatments and project outcomes and to quantify the "causal effects". For example, pharmaceutical companies are interested in the effect of a new drug or device developed to treat a disease, and investment banks are concerned with the profitability of companies that have received significant capital investment. These research questions rely on estimating treatment effects; thus, studying the identification, estimation, and empirical application of treatment effects is crucial and meaningful. Causal effects are also referred to as treatment effects in the literature.
It is essential, in some cases, to accurately capture heterogeneous treatment effects. For example, a new medicine could be more beneficial for children than for adults; an advertising strategy for sanitary napkins may be more persuasive for women than for men. These results indicate that using knowledge of heterogeneous treatment effects can maximize the value and effectiveness of treatment programs [15]. A commonly used heterogeneous treatment effects measurement is the conditional average treatment effect (CATE), to which much recent attention has been given.
In several studies, CATE has been estimated by fitting a parametric model to the relationship between observations and baseline covariates, in conjunction with treatment assignment. Alternatively, CATE can be estimated by modeling the conditional mean of potential outcomes using a parametric model based on the baseline covariates for each treatment group. Examples include [2,9,19,22]. However, in practice, practitioners generally have datasets in which not all covariates show a linear relationship with each other. Instead, the data may be nonparametric. In this context, partially linear models (PLMs) are more applicable to such datasets, which combine the interpretability of linear models with the flexibility of nonparametric models.
Therefore, we consider a partially linear regression framework [3] in which the distribution of the response Yi may depend on a binary treatment assignment indicator δi∈{0,1} and the baseline covariates (Xi,Ui). To segregate the treatment differences of the primary interest, we assume for the observations that
Yt,i−Yc,i=μi+ei=∞∑j=1xijβj+g(Ui)+ei,i=1,…,n, | (1.1) |
where {Yt,i,Yc,i} denote the Yi associated pair of potential outcomes and δi represents the treatment indicator variable, taking δi=1 if the individual belongs to the treatment group and δi=0 otherwise. Xi=(xi1,xi2,…)T is covariate and Ui is univariate covariate. β is an unknown coefficient vector associated with Xi, g(⋅) is an unknown smooth nonlinear function, and e1,…,en are unobservable heteroscedastic random errors independent of {Xi,Ui}ni=1 with conditional mean E(ei|Xi,Ui)=0 and conditional variance E(e2i|Xi,Ui)=σ2i. A fundamental problem in statistics and causal inference is to improve predictive accuracy based on observable datasets {Yi,Xi,Ui,δi}ni=1.
Researchers, however, usually have multiple candidate models assumed based on various understandings of the observed data. Thus, Polling and Yang [15] proposed the treatment effect across-validation (TECV) method. It is a model selection method for studying CATE, which is able to identify a model that is most suitable for estimating CATE among multiple candidate models. However, there is a risk of model uncertainty, and it is impossible to know whether the model is the most suitable for the dataset. If researchers cannot select the optimal model, they will face missing useful information, leading to unstable estimation. The model averaging method can reduce the risk of regression estimation and the bias introduced by selecting a single model, avoid ignoring the useful information contained in the remaining candidate models, and thereby improve prediction accuracy.
Model averaging methods have been applied to handle causal inference problems. For example, Gao et al., [5] developed a model averaging method based on the JMA of [7] to estimate the average treatment effect (ATE). [11] proposed a data-driven approach to averaging the estimators over candidate specifications to address the specification uncertainty in the weighted estimation of propensity scores for the average treatment effect on treated (ATT). Rolling et al., [16] introduced a model combination technique, treatment effect estimation by mixing (TEEM), which was designed to amalgamate estimators from various programs. This approach is used to generate more accurate CATE estimations. Although TEEM is compatible with parametric, nonparametric, or semiparametric statistical models, as well as nonstatistical machine learning processes and even subjective expert judgment, the approach proposed by [16] may encounter difficulties in finding treatment-control pairs in each cell after partitioning. This is especially true when dealing with a large or even moderate number of covariates. Thus, we discussed a model averaging estimation for the CATE with multiple candidate partially linear regression models.
To the best of our knowledge, no optimal model averaging estimation has been developed for PLMs based on the jackknife-type criterion to address causal inference problems. Motivated by this, under heteroscedastic error, the primary goal of the current article is to develop a model averaging estimation for the conditional average treatment effect with PLMs based on a jackknife-type criterion called the CPLJMA method, where C, PL, and JMA represent causal effects, partially linear, and jackknife model averaging, respectively. However, it is difficult to directly extend the available results to our idea for the following reasons: For existing optimal model averaging estimations for PLMs, important examples include [27], who proposed optimal model averaging estimation for PLMs with the Mallows-type criterion based on kernel estimation (MAPLM). Zeng et al., [25] proposed focused information criteria and frequentist model averaging estimators for semiparametric partially linear models with missing responses and established their theoretical properties. We utilize a different jackknife-type weight choice criterion from theirs, and, additionally, we add an extra theorem of weight convergence. Our main contributions to these challenges are as follows: (ⅰ) On a theoretical basis, the proposed estimator is asymptotically optimal in terms of minimizing the squared error loss. (ⅱ) The convergence property of the weights is investigated, and we prove that the sum of the weights assigned to the correct candidate model is one as the sample size increases to infinity. This is true, provided that there is at least one correctly specified candidate model. In the simulation section, numerical simulations and empirical analysis are applied to verify the validity of the proposed model averaging approach and the analytical framework; this proves theoretical support for the wide application of such model averaging methods.
The remainder of this paper is organized as follows. In Section 2, we describe the estimation procedure of the jackknife criterion for CATE based on PLMs, and we study its theoretical properties in Section 3. Section 4 illustrates the performance of our proposal via simulations and data examples. Concluding remarks are made in Section 5. Technical proofs are deferred to the Supplementary.
First, we use the potential outcomes framework [8,18] to define the CATE as the expectation of the random variable conditional on the observed value of the baseline covariates (Xi,Ui), that is
μi=E(Yt,i−Yc,i|Xi,Ui), |
where Yt,i−Yc,i indicates the treatment effect for the ith individual; this is also labeled the "fundamental problem of causal inference" because Yt,i and Yc,i are infeasible to observe simultaneously. The main goal of this study is to estimate μi based on the model averaging method.
Under the causal inference framework, we make the following identifiability assumptions [1]:
Assumption 1. Consistency: Yi=δiYt,i+(1−δi)Yc,i;
Assumption 2. Unconfoundedness: {Yt,i,Yc,i}⊥δi∣{Xi,Ui};
Assumption 3. Positivity: 0<cπ≤π(Xi,Ui)≤1−cπ<1 almost surely, where π(Xi,Ui)=P(δi=1|Yt,i,Yc,i,Xi,Ui)=P(δi=1|Xi,Ui) denotes the propensity score and cπ is a positive constant.
Assumption 1 links potential outcomes to observed outcomes and requires the potential outcomes to be well defined. Assumption 2 is conditional independence, which implies that the treatment assignment indicator δi is independent of the potential outcome {Yt,i,Yc,i} given covariates Xi and Ui, and it requires that all potential confounding information on the relationship between treatments and potential outcomes is observed in the covariates; it precludes potential confounding between treatment assignments and outcomes. Assumption 3 implies that treatment assignments are not deterministic, which is crucial for controlling confounding bias and systematic differences between the treatment and control groups. This assumption also guarantees π(Xi,Ui) and 1−π(Xi,Ui) are invertible with probability one. [17] referred to the combination of Assumptions 2 and 3 as a "strongly ignorable treatment assignment".
Inverse probability weighting (IPW) based on the potential outcome framework is a powerful tool for correcting confounding bias. The IPW approach utilizes the inverse of the propensity scores to construct weights for observed outcomes to balance baseline covariates between groups [10]. Under the strongly ignorable treatment assignment assumption, suppose
Zπ,i=δiYiπ(Xi,Ui)−(1−δi)Yi1−π(Xi,Ui) |
to construct the unbiased estimator of μi given the conditions Xi, Ui. Then, after careful calculation, we have
E(Zπ,i|Xi,Ui)=E[δiYiπ(Xi,Ui)−(1−δi)Yi1−π(Xi,Ui)|Xi,Ui]=E(δi|Xi,Ui)π(Xi,Ui)E(Yt,i|Xi,Ui)−E(1−δi|Xi,Ui)1−π(Xi,Ui)E(Yc,i|Xi,Ui)=E(Yt,i|Xi,Ui)−E(Yc,i|Xi,Ui)=μi. | (2.1) |
The unreasonable model selected to estimate the CATE may yield uncertain estimations. To provide a more reasonable and robust CATE estimator, we develop a model averaging estimation for CATE with multiple candidate PLMs.
As discussed earlier, we can obtain computationally feasible PLMs of heterogeneous causal effects based on treatment-control pairs. In accordance with Eq (2.1) and model (1.1), we let eπ,i=Zπ,i−μi. Then, we have
Zπ,i=μi+eπ,i=∞∑j=1xijβj+g(Ui)+eπ,i, | (2.2) |
where μi denotes the CATE for the ith individual, E(eπ,i|Xi,Ui)=0, and E(e2π,i|Xi,Ui)=σ2π,i. Let us note that {Zπ,i,Xi,Ui}ni=1 is fully observed when π(Xi,Ui) is known.
Specifically, we consider multiple candidate models for model (2.2) of the form
μ(m)i=km∑j=1x(m)ijβ(m)j+g(Ui),m=1,…,Mn, | (2.3) |
used for evaluating μi, where x(m)ij is the jth entry of X(m)i, X(m)i is an km-dimensional subvector of Xi, β(m)j is the corresponding regression coefficient vector, g(⋅) is an unknown function in the nonparametric part, and Mn denotes the total number of candidate models allowed to go to infinity.
Define Zπ=(Zπ,1,…,Zπ,n)T∈Rn, X(m)=(X(m)1,…,X(m)n)T∈Rn×km, where {x(m)Tij}ni=1 is an 1×km column vector, g(U)=(g(U1),…,g(Un))T∈Rn, and eπ=(eπ,1,…,eπ,n)T∈Rn. Then, the mth candidate model in matrix form is
Zπ=X(m)β(m)+g(U)+eπ. |
To estimate the nonparametric function, we use the B-spline regression method. Let Sn be the space of polynomial splines of degree l≥1, and let {ψk,k=1,…,dn} denote a normalized B-spline basis. For any gn∈Sn, we have
gn(U)=dn∑k=1ψk(U)αk=ΨT(U)α, |
for some coefficients {αk}dnk=1, where Ψ(U)=(ψ1(U),…,ψdn(U))T, α=(α1,…,αdn)T. Here, dn increases with n. We define the n×dn matrix K=(Ψ(U1),…,Ψ(Un))T. Then, we assume that the n×(p+dn) matrix X(m)∗=(X(m),K) is nonsingular and associated with the unknown (p+dn)-dimensional parameter vector γ(m)=(β(m)T,αT)T. Thus, we have
μ(m)π,n=X(m)∗γ(m)=X(m)β(m)+Kα. |
By regressing Zπ on X(m)∗, the least squares estimator of β(m) and α can be obtained as
ˆβ(m)={X(m)T(I−Q)X(m)}−1X(m)T(I−Q)Zπ, |
and
ˆα=(KTK)−1KT(Zπ−X(m)ˆβ(m)), |
where Q=K(KTK)−1KT is a symmetric idempotent matrix. Then
ˆμ(m)π=X(m)ˆβ(m)+Kˆα={Q+˜X(m)(˜X(m)T˜X(m))−1˜X(m)T}Zπ=P(m)Zπ, |
where ˜X(m)=(I−Q)X(m) is a symmetric idempotent matrix and P(m)=Q+˜X(m)(˜X(m)T˜X(m))−1˜X(m)T. Then the corresponding model averaging estimator of μ can be formulated as
ˆμπ(ω)=Mn∑m=1ωmˆμ(m)π=P(ω)Zπ, | (2.4) |
where P(ω)=∑Mnm=1ωmP(m) and ω=(ω1,…,ωMn)T is a weight vector belonging to the continuous set Hn={ω∈[0,1]Mn:∑Mnm=1ωm=1}.
Notably, the choice of weight vector is crucial in the model averaging method. Thus, we consider the jackknife-type criterion to choose the weight vector ω for (2.4) in the PLMs framework. Specifically, leave-one-out cross-validation (LOO-CV) is used to estimate μ, and then the estimator in the mth candidate model is given by
˜μ(m)π=˜P(m)Zπ and ˜P(m)=P(m)−D(m)A(m), |
where D(m)=diag(D(m)11,…,D(m)nn)∈Rn×n with the ith diagonal element D(m)ii=hm,ii/(1−hm,ii), A(m)=In−P(m), and hm,ii is the ith diagonal entry of P(m). Thus, the jackknife-type model averaging estimator is
˜μπ(ω)=Mn∑m=1ωm˜μ(m)π=˜P(ω)Zπ, |
where ˜P=∑Mnm=1ωm˜P(m). Then, the weight choice criterion is
CVπ(ω)=‖Zπ−˜μπ(ω)‖2. | (2.5) |
The optimal weight vector is obtained by minimizing the criterion in (2.5) over the space Hn. However, such a minimization process in real-world data analysis is computationally infeasible because π(Xi,Ui) is generally unknown. In our modeling framework, we estimate it by adopting the logistic partially linear models (LPLMs) in [24],
ˆπ(Xi,Ui)=eXTiθ+κ(Ui)1+eXTiθ+κ(Ui),i=1,…,n | (2.6) |
which relies on generalized partially linear models (GPLMs) in [14], where the coefficients θ for the linear part and the nonparametric part κ(⋅) are estimated using B-spline basis estimation. This method is facilitated by the "sgplm1" function within the R package "gplm", with the degrees of freedom set to "df=3". Further elucidation of this choice is provided in the subseqent numerical simulations. Then, ˆπ(Xi,Ui) is substituted for π(Xi,Ui) with Zπ to obtain Zˆπ. Then, a feasible counterpart of CVπ(ω) in (2.5) becomes
CVˆπ(ω)=‖Zˆπ−˜μˆπ(ω)‖2. |
The optimal weights ˆωcv are obtained by selecting ω∈Hn to minimize the jackknife-type criterion in
ˆωcv=argminω∈HnCVˆπ(ω). |
Given ˜ωcv, substituting it into (2.4) yields the optimal model averaging estimator of μ as ˆμˆπ(ˆωcv) in the case where ˆπ(Xi,Ui). CVˆπ(ω) is a quadratic programming problem with respect to ω, that is, CVˆπ(ω)=ωTHTˆπHˆπω, where Hπ=(Zˆπ−˜P(1)Zˆπ,…,Zˆπ−˜P(Mn)Zˆπ).
In this section, we focus on some theoretical properties of our proposed model averaging method. In Subsection 3.1, we prove the asymptotic optimality of the model averaging estimator ˆμ(ˆωcv) by illustrating that the selected weight vector ˆωcv yields a squared error that is asymptotically identical to that of the infeasible optimal weight vector. Subsection 3.2 concerns the convergence property of the optimal weight vector ˆωcv. When the sample size tends to infinity, the sum of the weights assigned to the correct model by the optimal weight vector obtained by the proposed method converges to one in probability.
Before introducing the theoretical properties, we first define some notations. Let μt,i=E(Yt,i|Xi,Ui) and et,i=Yt,i−μt,i denote the conditional expectation and the random error of the treatment group, respectively, in which σ2t,i=E(e2t,i|Xi,Ui). The loss function and the corresponding risk function of ˆμπ(ω) are defined as
Lπ(ω)=‖μ−ˆμπ(ω)‖2 and Rπ(ω)=E{Lπ(ω)|Xi,Ui}, |
respectively, where ‖⋅‖ is the Euclidean norm. Let ξπ=infω∈HnRπ(ω), ˉk=max1≤m≤Mnkm, and ˉh=max1≤m≤Mnmax1≤i≤nhm,ii. The following conditions are assumed with respect to n→∞.
(C1) √n‖ˆθn−θ0‖=Op(1) and ‖ˆκˆθn−κ0‖=op(n−1/4), in which θ0 and κ0 are the true values of θ and κ, respectively, and the first derivatives of ˆπ(Xi,Ui;θ,κ) with respect to θ and κ are continuous and bounded.
(C2) For some integers G≥1,
maxi{E(e4Gi|Xi,Ui),E(e4Gt,i|Xi,Ui),|μi|,|μt,i|}≤ˉC<∞,a.s. |
where i=1,…,n, and ˉC is a positive constant.
(C3) For some integers 1≤G≤∞,
Mnξ−2GπMn∑m=1{Rπ(ωom)}Ga.s.→0. |
(C4) ˉha.s.→0 and (ˉd+ˉk)ξ−2πa.s.→0.
(C5) The functions {gj(U)}nj=1 belong to a class of functions F, whose rth derivative g(r)j exists and is Lipschitz of order η,
F={gj(⋅):|g(r)j(s)−g(r)j(t)|≤G|s−t|ηfors,t∈[a,b]}, |
for some positive constant G, where r is a nonnegative integer and η∈(0,1], such that υ=r+η>0.5.
Condition (C1), a commonly used restriction for GPLMs, requires √n-consistent estimates for the parametric component ˆθn. The nonparametric component ˆκˆθn is viewed as a function of the parametric component to achieve consistency. This restriction is reasonable [20]. Condition (C2) is a moment condition concerning random errors, and it is satisfied by {μi,μt,i}ni=1, which are bounded. Condition (C3) is a convergence condition that imposes certain restrictions on the circumstances for applying our asymptotic outcome. A prerequisite for Condition (C3) to hold is ξπ→∞, which requires that no finite-dimensional correct model exists in the class of candidate models [6]. It also requires that Mn and max1≤m≤MnRπ(ωom) go to infinity slowly enough. Condition (C4), an assumption that excludes the case of extremely unbalanced design matrices as candidate models, is widely imposed in studies of optimal model averaging based on cross-validation, such as [7,12], among others. The B-spline approximation in PLMs requires Condition (C5) with references to [4,21], which is a regularity condition that necessitates the nonparametric coefficient function to be sufficiently smooth.
Theorem 1. If Conditions (C1)–(C5) are satisfied, then
Lˆπ(ˆωcv)infω∈HnLˆπ(ω)P⟶1. |
Theorem 1 demonstrates its asymptotic optimality by minimizing the approximate risk with squared error loss. This theorem illustrates that the square error with selected model weights ˆωcv minimized by the LOO-CV criterion is asymptotically equal to that of the infeasible optimal weight vector.
In this subsection, we concentrate on the convergence properties of the optimal weights in model averaging. It should be noted that, in this article, the mth candidate model in (2.3) is deemed correctly specified or a correct model if there exists β(m)∗ such that μ=X(m)β(m)∗+g(U); otherwise, model (2.3) is said to be mispecified or an incorrect model.
We first introduce some notation. ˆscv is defined as the sum of the weights assigned to the correct candidate models using our proposed method, which can be denoted as ˆscv=∑m0m=1ˆωcv,m in mathematical notation, where m0 indicates that the first m0 candidate models are all correctly specified. Let HF={ω∈[0,1]Mn:∑Mnm=m0+1ωm=1} be the weight set of all incorrect candidate models and ξπ,F=infω∈HFRπ(ω) is the optimal risk when the weights are assigned to all the mispecified candidate models. We specify some necessary conditions for further analysis based on n→∞.
(C6) For some integers 1≤G≤∞,
ξ−2Gπ,Fmax{m0(ˉd+ˉk)2G,(Mn−m0)Mn−m0∑m=m0{Rπ(ωom)}G}a.s.→0. |
(C7) ˉh=O(n−1/2).
Condition (C6) requires ξ−2Gπ,F to grow at a rate no slower than m0(ˉd+ˉk)2G and (Mn−m0)∑Mn−m0m=m0{Rπ(ωom)}G. It is worth noting that if (Mn−m0)∑Mn−m0m=m0{Rπ(ωom)}G is larger than m0(ˉd+ˉk)2G, then Condition (C6) is identical to (Mn−m0)ξ−2Gπ,F∑Mn−m0m=m0{Rπ(ωom)}Ga.s.→0 and is thus analogous to Condition (C3). Condition (C7), excluding the case of peculiar models as candidate models, is from [13].
Theorem 2. If Conditions (C1)–(C7) are satisfied, then
ˆscvp⟶1. |
Theorem 2 indicates that the model averaging estimator ˆμ(ˆωcv) of our proposal is sufficient for the sum of the weights assigned to the correct models to converge to one in probability as the sample size goes to infinity and automatically excludes the incorrect models.
To demonstrate the theoretical properties of Section 3, in this section we conduct two Monte Carlo experiments on the finite-sample performance, where Case 1 verifies the asymptotic optimality of Theorem 1 with the candidate models all misspecified, and Case 2 justifies the weight convergence property of Theorem 2 with at least one correct model specified. In addition, the superiority of our method is illustrated by applying it to the Diabetes dataset. For a better analysis, we also consider several relevant existing methods as competitors to our CPLJMA approach, including the model selection methods for AIC, BIC, and treatment effect cross-validation (TECV) proposed by [15]; Information criterion-based model averaging methods such as SAIC and SBIC; the equal weight method (EW); and treatment effect estimation by mixing (TEEM) and the Mallows averaging of partially linear models (MAPLM). We calculate the mean squared error (MSE) to assess the performances of the estimators, defined as MSE=1nD∑Dd=1‖{ˆμ(ω)}(d)−μ(d)‖2, where μ(d) and {ˆμ(ω)}(d) are the CATE and model averaging estimator in the dth replicate, respectively. D denotes the number of replicates of the simulation. Additionally, we calculated MSEmedian in the empirical analysis, defined as MSEmedian=mediand=1,…,DMSE(d), and the optimal rate, which is the percentage of the smallest MSE value. In complement to the rigorous numerical investigations detailed earlier, we embark on the task of determining the number of (interior) knots. Like the pivotal role of bandwidth in kernel smoothing, these knots, akin to tuning parameters, have a remarkable influence on the smoothness and adaptability of our spline models. The details of the numerical study and its results are described in the following subsections.
Case 1: Without correct candidate models
The data-generating process (DGP) is as follows:
Yt,i−Yc,i=μi+ei=500∑j=1Xijβj+g(Ui)+0.5X2i2+et,i−ec,i, |
where Xi1=1. {Xij}Jj=2, the covariates of the linear part, are generated from a multivariate normal distribution with mean 0 and covariance 0.5|j1−j2| between xij1 and xij2. The associated coefficients in the linear component are taken as βj=1/j. The coefficient function g(Ui)=sin(2πUi), where Ui∼Uniform[0,1]. {et,i,ec,i} are independent random errors distributed from N(0,σ2X2i2), where the parameter σ2 is chosen by R2=var(μi)/var(Yt,i−Yc,i), which varies on a grid between 0.1 and 0.9. Then
μi=500∑j=1j−1Xij+sin(2πUi)+0.5X2i2andei=et,i−ec,i. | (4.1) |
We rescaled μi to have unit variance so that the expected R2 equals 11+σ2 for the unknown model. It is clear from (4.1) that the class of candidate models considered in this case is misspecified. In addition, to obtain {δ}ni=1, the propensity score is taken as
π(Xi,Ui)=exp(0.75Xi2+sin(2πUi))1+exp(0.75Xi2+sin(2πUi)). |
Thus, we obtain {Yi}ni=1. For ˆπ(Xi,Ui), we use the GPLMs in (2.6) to approximate the coefficients of the linear part and the form of the nonparametric part.
First, we delve into the influence of interior knots within the B-spline basis on the performance of our proposed CPLJMA approach. By employing the "bs(⋅,df)" function from the R package "splines", we generate an appropriate B-spline basis matrix. Here, the degree of freedom parameter, denoted as df, is a crucial factor, determined by "df=3+thenumberofknots".
Figure 1 shows the variation in MSE as the number of interior knots varies, considering a sample size of n=300 and R2=0.5, and for both nested and nonnested setting scenarios. In the nested setting, the mth candidate model comprises the first m linear variables in {Xij}500j=1. The number of candidate models Mn is determined as the nearest integer from 3n1/3, resulting in Mn=20. In the nonnested setting, the linear components of all candidate models are a subset of {Xi1,…,Xi5}, disregarding the remaining Xi variables, thereby yielding a total of 25−1=31 candidate models. As depicted in the figure, the MSE increases with an increase in the number of knots, potentially exacerbating the phenomenon of overfitting. Consequently, we opt for "df=3" as the degree of freedom for the CPLJMA method, ensuring a balance between model flexibility and susceptibility to overfitting.
The greater the number of covariates, the more tedious the computation. Therefore, in substantiating the theoretical properties outlined in Theorem 1 through numerical simulations, we adopt a nested setting. Accordingly, when the sample sizes are n=75,150,300, and 600, then Mn=12,15,20, and 25, respectively.
Figure 2 illustrates a numerical inspection for the asymptotic optimality of ˆμˆπ(ˆωcv) in Theorem 1 by showing the mean of LR, defined as LR=Lˆπ(ˆωcv)/infω∈HnLˆπ(ω), for different samples and various R. In a simulation trial based on 100 replications, we observe that the mean curve of LR decreases and converges to one as n increases. This intuitively demonstrates the asymptotic optimality of CPLJMA.
Figure 3 shows the MSE ratio curves for the seven CATE μ estimators we considered, where we used the AIC as the denominator to yield the MSE ratio with an entry of 1.00. Generally, our proposed CPLJMA outperforms its competitors on the MSE ratios when R2 or n is small or moderate, particularly because it is difficult to identify the optimal model when there is considerable noise in the model. The advantage of model averaging without relying on a single model is that it provides protection against poor model selection. As expected, SAIC and SBIC invariably yielded more accurate results than did their respective model selection rivals. In short, to some extent, CPLJMA is superior to its competitors.
Case 2: With correct candidate models
The DGP is generated from
Yt,i−Yc,i=μi+ei=5∑j=1Xijβj+g(Ui)+et,i−ec,i |
where the vector of covariates Xi=(Xi1,…,Xi5)T is from an independent standard normal distribution N(0,1), Ui is distributed as U[−1,1], and the corresponding coefficient and nonparametric function are βj=1/j and 1.2Ui, respectively. The settings for {et,i,ec,i}, σ, R2, and bf are the same as those in Case 1. Thus, we can obtain
μi=5∑j=1j−1Xij+1.2Ui. |
The propensity score is taken as
π(Xi,Ui)=exp(0.75Xi3+1.2Ui)1+exp(0.75Xi3+1.2Ui). |
Based on the LPLMs in (2.6), we can obtain {ˆπ(Xi,Ui)}ni=1 and thus {Zπ,i}ni=1. In this case, we consider the nonnested models. The linear parts of all candidate models are constructed by varying combinations of {Xi1,Xi2,Xi3,Xi4,Xi5}; thus, Mn=25−1. The sample size is taken as n=75,150,300,600. The results for the convergence of the model weights ˆωcv and the MSE ratio of the above methods are given in Figures 4 and 5, respectively, based on 100 replications.
Figure 4 clearly shows that the sum of the weights corresponding to the correct candidate models tends to one as the sample sizes n and R2 increase. This intuitively confirms the convergence of ˆωcv presented in Theorem 2 via numerical inspection.
As shown in Figure 5, we still derive the MSE ratio using AIC as the denominator. In most cases, the results of the MSE ratio demonstrate the merits of our approach over its competitors. Obviously, as R2 increases progressively, the MSE ratio for all scenarios tends to decrease, as expected. Increasing the sample size also improved the performance for all approaches. Overall, CPLJMA still outperforms several existing methods.
Our proposed method is applied to the Diabetes dataset from Dr. John Schorling, Department of Medicine, University of Virginia School of Medicine. The original data consisted of 19 covariates on 403 subjects from 1046 subjects interviewed in a study. However, due to the availability of missing data, we selected 16 covariates and 366 respondents as the dataset for the current case study, 175 of whom respondents resided in Buckingham and 191 of whom did not.
Our analysis considers the outcome variable Y to be stabilized glucose. The treatment indicator variable δ takes the value of 1 if people reside in Buckingham, and 0 otherwise. We calculated the Pearson correlation coefficients of the 14 baseline covariates X and U with Y and ranked them in descending order of correlation strength. Let us note that all continuous covariates X and Y are standardized to have a mean of zero and a variance of one, and U is scaled to [0,1]. See Table 1 for details.
Symbol | Description | Correlation with Y |
X1 | glycosolated hemoglobin | 0.7409 |
X2 | cholesterol/HDL ratio | 0.2989 |
X3 | waist | 0.2337 |
X4 | weight | 0.1888 |
X5 | high density lipoprotein | -0.1801 |
X6 | frame (0 if large, 1 if medium, 2 otherwise) | -0.1726 |
X7 | first systolic blood pressure | 0.1654 |
X8 | total cholesterol | 0.1514 |
X9 | hip | 0.1448 |
X10 | gender (0 if male, 1 if otherwise) | -0.0861 |
X11 | height | 0.0825 |
X12 | postprandial time when labs were drawn | -0.0485 |
X13 | first diastolic blood pressure | 0.0257 |
We assume that the candidate models consist of nested models constructed from the covariates in {X1,…,X13,U} with intercept terms, where the baseline covariates are X for the linear part, and we consider that age is a covariate U for the nonparametric part. Accordingly, there are 13 well-prepared candidate models. To implement our proposal, the propensity score π(Xi,Ui) is still solved by LPLMs.
We conduct a "guided simulation experiment" to evaluate the performance of our proposal and that of its competitors. In particular, we use the largest candidate model containing all covariates as a guided model, and m∗ denotes the index of that model in the class of candidate models. Based on the m∗th candidate model and the original dataset {Yi,Xi,Ui,δi}ni=1, we can obtain a simulation dataset {Y(m∗)i,Xi,δi,Ui}ni=1. Thus,
Y(m∗)i=δiY(m∗)t,i+(1−δi)Y(m∗)c,i,Y(m∗)t,i=X(m∗)Tiˆρ(m∗)+f(m∗)(U(m∗)i)+e(m∗)t,i,Y(m∗)c,i=X(m∗)Tiˆη(m∗)+h(m∗)(U(m∗)i)+e(m∗)c,i, | (4.2) |
where ˆρ(m∗) and ˆη(m∗) are the regression coefficient estimators for the linear part, f(m∗)(U(m∗)i) and h(m∗)(U(m∗)i) are the estimators for the nonparametric part, and {e(m∗)t,i,e(m∗)c,i}ni=1 is from N(0,1). Therefore, the "true" μi, CATE, is known in this analysis dataset, namely,
μi=X(m∗)Tiˆρ(m∗)+f(m∗)(U(m∗)i)−[X(m∗)Tiˆη(m∗)+h(m∗)(U(m∗)i)]. |
We randomly selected samples from 20%,40%,60%,80%, and 100% of the dataset to describe the performance of the proposed CPLJMA and its competitors by MSE, MSEmedian, and the optimal rate based on 100 replications.
The results are displayed in Table 2. Our approach produced a lower MSE and median and a higher optimal rate than those of its competitors across all sample sizes considered. As expected, the average methods based on the information criterion perform better than the model selection methods. To some extent, our proposal has a clear advantage over its competitors in solving practical problems.
n | Method | AIC | BIC | SAIC | SBIC | EW | TECV | TEEM | MAPLM | CPLJMA |
20% | MSE | 2.0044 | 1.9531 | 1.8427 | 1.8182 | 1.9216 | 1.9145 | 1.8024 | 1.8539 | 1.5563 |
Median | 0.7534 | 0.7321 | 0.6935 | 0.6824 | 0.7322 | 0.7035 | 0.5806 | 0.6822 | 0.5255 | |
Optimal rate | 0.13 | 0.04 | 0.01 | 0.11 | 0.05 | 0.02 | 0.11 | 0.13 | 0.40 | |
40% | MSE | 1.1867 | 1.6152 | 1.0405 | 1.0329 | 0.9579 | 0.9649 | 0.8924 | 0.9416 | 0.8520 |
Median | 0.4276 | 0.4165 | 0.3666 | 0.3624 | 0.3522 | 0.3435 | 0.2892 | 0.3237 | 0.2876 | |
Optimal rate | 0.13 | 0.00 | 0.02 | 0.07 | 0.01 | 0.04 | 0.15 | 0.18 | 0.40 | |
60% | MSE | 1.0056 | 0.9915 | 0.8794 | 0.8749 | 0.9020 | 0.8820 | 0.8467 | 0.8835 | 0.7948 |
Median | 0.3621 | 0.3549 | 0.3106 | 0.3078 | 0.3243 | 0.3020 | 0.2557 | 0.3069 | 0.2598 | |
Optimal rate | 0.06 | 0.04 | 0.02 | 0.07 | 0.03 | 0.04 | 0.15 | 0.14 | 0.45 | |
80% | MSE | 0.8512 | 0.8336 | 0.7542 | 0.7513 | 0.7522 | 0.7500 | 0.7219 | 0.7282 | 0.6955 |
Median | 0.3055 | 0.2970 | 0.2600 | 0.2591 | 0.2576 | 0.2479 | 0.2306 | 0.2392 | 0.2272 | |
Optimal rate | 0.05 | 0.00 | 0.00 | 0.11 | 0.02 | 0.04 | 0.24 | 0.16 | 0.38 | |
100% | MSE | 0.8357 | 0.8247 | 0.7105 | 0.7082 | 0.7013 | 0.7173 | 0.7669 | 0.6824 | 0.6628 |
Median | 0.2997 | 0.2935 | 0.2478 | 0.2465 | 0.2426 | 0.2375 | 0.2667 | 0.2283 | 0.2184 | |
Optimal rate | 0.06 | 0.00 | 0.01 | 0.03 | 0.05 | 0.13 | 0.01 | 0.12 | 0.59 |
Considering the heterogeneity and heteroskedasticity information embedded in the dataset, the problem of model uncertainty, and the flexibility and interpretability of PLMs, a jackknife-type method based on the weight selection criterion and its feasible form, called the CPLJMA method, are proposed for estimating the CATE. Within this context, we consider the optimal model weights chosen by minimizing the LOO-CV criterion, in which the B-splines approximate the nonparametric function, and we demonstrate its asymptotic optimality in accordance with a minimization of the approximate risk with squared error loss. In addition, since the choice of weights is crucial to the model averaging method, we considered the convergence properties of the weights, that is, when the sample size goes to infinity and at least one candidate model is correctly specified, the sum of weights obtained by our approach for the correct candidate models converges to one in probability. In the simulation section, we examine the finite-sample performance of our estimator and compare it with several other model selection and averaging methods. We illustrate our method by using a real-world dataset. The simulation results indicated that our method possessed some advantages relative to its competitors.
There are still many issues worthy of further discussion. First, our proposed CPLJMA method is valid only for one-dimensional covariates with respect to the nonparametric part, and it can be further refined and extended to multiple dimensions. Second, the least squares loss used as the loss function in our analysis is more sensitive to outliers. Thus, quantile regression, which is less sensitive to outliers, is considered as a loss function because developing the model averaging process and establishing its asymptotic properties remains challenging; this area deserves future research.
X. Zhang: Writing—original draft; X. Zhang and J. Li: Writing—review and editing. All authors have read and approved the final version of the manuscript for publication.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
The authors would like to thank the editors and reviewers for their valuable and insightful comments. The authors also thank Professor Guangren Yang for his advice for revising the paper.
The authors declare no conflict of interest.
This Supplementary provides detailed proofs of the main theorems stated in the above.
Lemmas and their proofs
We introduce some lemmas and their proofs before proving the Theorems in Section 3.
Lemma 1. Provided that Conditions (C1)-(C2) hold, we have ‖Zˆπ−Zπ‖2=Op(1).
Proof. According to the monotonicity of the Lr-norm and the Holder inequality, we have
max{max1≤i≤nσ2i,max1≤i≤nσ2t,i}=max{max1≤i≤nE(e2i|Xi,Ui),max1≤i≤nE(e2t,i|Xi,Ui)}≤max[max1≤i≤n{E(e4Gi|Xi,Ui)}12G,max1≤i≤n{E(e4Gt,i)|Xi,Ui)}12G]≤ˉC |
almost surely, where the second inequality and the last inequality are attributed to the Holder inequality and Condition (C2), respectively. Thus, we know that for any ϵ>0, there exists an integer Nϵ such that P(max1≤i≤nσ2i>ˉC)≤ϵ/2 for all n≥Nϵ. Let Mϵ=2ˉC/ϵ. Then we have
supn≥1P(1nn∑i=1e2i>Mϵ)=supn≥1{P(1nn∑i=1e2i>Mϵ,max1≤i≤nσ2i≤ˉC)+P(1nn∑i=1e2i>Mϵ,max1≤i≤nσ2i>ˉC)}≤supn≥1E{I(1nn∑i=1e2i>Mϵ)I(max1≤i≤nσ2i≤ˉC)}+supn≥NϵP(max1≤i≤nσ2i>ˉC)≤supn≥1E[E{I(1nn∑i=1e2i>Mϵ)|Xi,Ui}I(max1≤i≤nσ2i≤ˉC)]+ϵ2=supn≥1E{P(1nn∑i=1e2i>Mϵ|Xi,Ui)I(max1≤i≤nσ2i≤ˉC)}+ϵ2.≤supn≥1E{M−1ϵE(1nn∑i=1e2i|Xi,Ui)I(max1≤i≤nσ2i≤ˉC)}+ϵ2=supn≥1E{M−1ϵ1nn∑i=1σ2iI(max1≤i≤nσ2i≤ˉC)}+ϵ2≤M−1ϵˉC+ϵ2=ϵ. | (5.1) |
Thus, we have
1nn∑i=1e2i=Op(1). | (5.2) |
Similarly, it can be obtained that
1nn∑i=1e2t,i=Op(1). | (5.3) |
By the Cauchy-Schwarz inequality, we obtain
‖Zˆπ−Zπ‖2=n∑i=1{δiˆπ(Xi,Ui)Yt,i−1−δi1−ˆπ(Xi,Ui)Yc,i−(δiπ(Xi,Ui)Yt,i−1−δi1−π(Xi,Ui)Yc,i)}2=n∑i=1[{δi−ˆπ(Xi,Ui)ˆπ(Xi,Ui)(1−ˆπ(Xi,Ui))−δi−π(Xi,Ui)π(Xi,Ui)(1−π(Xi,Ui))}Yt,i+{1−δi1−ˆπ(Xi,Ui)−1−δi1−π(Xi,Ui)}(Yt,i−Yc,i)]2≤c[{√nmax1≤i≤n|δi−ˆπ(Xi,Ui)ˆπ(Xi,Ui)(1−ˆπ(Xi,Ui))−δi−π(Xi,Ui)π(Xi,Ui)(1−π(Xi,Ui))|}21nn∑i=1Y2t,i+{√nmax1≤i≤n|1−δi1−ˆπ(Xi,Ui)−1−δi1−π(Xi,Ui)|}21nn∑i=1(Yt,i−Yc,i)2]≤c[{√nmax1≤i≤n|δi−ˆπ(Xi,Ui)ˆπ(Xi,Ui)(1−ˆπ(Xi,Ui))−δi−π(Xi,Ui)π(Xi,Ui)(1−π(Xi,Ui))|}2(1nn∑i=1μ2t,i+1nn∑i=1e2t,i)+{√nmax1≤i≤n|1−δi1−ˆπ(Xi,Ui)−1−δi1−π(Xi,Ui)|}2(1nn∑i=1μ2i+1nn∑i=1e2i)]≤{√nmax1≤i≤n|δi−ˆπ(Xi,Ui)ˆπ(Xi,Ui)(1−ˆπ(Xi,Ui))−δi−π(Xi,Ui)π(Xi,Ui)(1−π(Xi,Ui))|}2Op(1)+{√nmax1≤i≤n|1−δi1−ˆπ(Xi,Ui)−1−δi1−π(Xi,Ui)|}2Op(1), |
where the last equation is due to Condition (C2), (5.2) and (5.3). Lemma 1 holds, if we can prove that
√nmax1≤i≤n|δi−ˆπ(Xi,Ui)ˆπ(Xi,Ui)(1−ˆπ(Xi,Ui))−δi−π(Xi,Ui)π(Xi,Ui)(1−π(Xi,Ui))|=Op(1), | (5.4) |
√nmax1≤i≤n|1−δi1−ˆπ(Xi,Ui)−1−δi1−π(Xi,Ui)|=Op(1). | (5.5) |
By the Taylor expansion, one has
√nmax1≤i≤n|δi−ˆπ(Xi,Ui)ˆπ(Xi,Ui)(1−ˆπ(Xi,Ui))−δi−π(Xi,Ui)π(Xi,Ui)(1−π(Xi,Ui))|≤c{min1≤i≤nˆπ(Xi,Ui;θ∗i,κ∗i)}−2{1−max1≤i≤nˆπ(Xi,Ui;θ∗i,κ∗i)}−2⋅max1≤i≤n{‖∂ˆπ(Xi,Ui;θi,κi)∂θT|θ=θ∗i‖√n‖ˆθn−θ0‖+‖∂ˆπ(Xi,Ui;θi,κi)∂κT|κ=κ∗i‖‖κˆθn−κ0‖}, |
where θ∗i is a vector between ˆθn and θ0, and κ∗i is a vector between κˆθn and κ0. Expanding ˆπ(Xi,Ui;θ∗i,κ∗i) in a Taylor series and considering the property of π(Xi,Ui), we have
min1≤i≤nˆπ(Xi,Ui;θ∗i,κ∗i)≥cπ−max1≤i≤n{‖∂ˆπ(Xi,Ui;θi,κi)∂θT‖θ=θ∗∗i‖ˆθn−θ0‖+‖∂ˆπ(Xi,Ui;θi,κi)∂κT‖κ=κ∗∗i‖κˆθn−κ0‖},1−max1≤i≤nˆπ(Xi,Ui;θ∗i,κ∗i)≥cπ−max1≤i≤n{‖∂ˆπ(Xi,Ui;θi,κi)∂θT‖θ=θ∗∗i‖ˆθn−θ0‖+‖∂ˆπ(Xi,Ui;θi,κi)∂κT‖κ=κ∗∗i‖κˆθn−κ0‖}, |
where θ∗∗i is a vector between ˆθ∗i and θ0, and κ∗∗i is a vector between ˆκ∗i and κ0. Together with Condition (C2), this shows that (5.4) is valid. Likewise, we determine that (5.5) is valid. Therefore, the proof of Lemma 1 is completed.
Lemma 2. By Condition (10) of Theorem 2.1 in [26], we have
supω∈Hn|˜Rπ(ω)Rπ(ω)−1|a.s.⟶0. | (5.6) |
With Assumption 3, for some integer 1≤G≤∞, there exists
Mn˜ξ−2GπMn∑m=1{˜Rπ(ωom)}Ga.s.⟶0. | (5.7) |
Lemma 3. Let ˜ω=argminω∈Hn{Ln(ω)+an(ω)+bn}, where an(ω) is a term related to ω and bn is a term unrelated to ω. Let Rn(ω)=E{Ln(ω)|X,U}. If
supω∈Hn|an(ω)|Rn(ω)=op(1), | (5.8) |
supω∈Hn|Rn(ω)−Ln(ω)|Rn(ω)=op(1), | (5.9) |
and there exists a constant c and a positive integer N∗ so that when n≥N∗, infω∈HnRn(ω)≥c>0 almost surely, then Ln(˜ω)/infω∈HnLn(ω)→1 in probability. $
Lemma 4. For any n1×n2 matrices B1 and B2,
λmax{B1B2}≤λmax{B1}λmax{B2},λmax{B1+B2}≤λmax{B1}+λmax{B2}. |
Theorems and theirs proofs
Proof of Theorem 1:
λmax(⋅) is denoted as the largest singular value of a matrix. From Condition (C2), we have
λmax(Ωπ)=O(1). | (5.10) |
By the definition of Rπ(ω), it can be shown that
Rπ(ω)=‖A(ω)μ‖2+tr{P(ω)ΩπP(ω)T}, | (5.11) |
where A(ω)=In−P(ω). Define the loss function of ˜μπ(ω) as ˜Lπ(ω)=‖μ−˜μπ(ω)‖2 and its corresponding risk function as ˜Rπ(ω)=E{˜Lπ(ω)|X,U}. Similarly, we have ˜Rπ(ω)=‖˜A(ω)μ‖2+tr{˜P(ω)Ωπ˜P(ω)T} in which ˜A(ω)=In−˜P(ω). Define
Vπ(ω)=‖A(ω)μ‖2+tr{P(ω)ΩπP(ω)T}and˜Vπ(ω)=‖˜A(ω)μ‖2+tr{˜P(ω)Ωπ˜P(ω)T}. |
Then we have
Lˆπ(ˆωcv)infω∈HnLˆπ(ω)−1=supω∈Hn{Lˆπ(ˆωcv)Lˆπ(ω)−1}=supω∈Hn{Lˆπ(ˆωcv)Vπ(ˆωcv)Vπ(ˆωcv)˜Vπ(ˆωcv)˜Vπ(ˆωcv)˜Lπ(ˆωcv)˜Lπ(ˆωcv)˜Lπ(ω)˜Lπ(ω)˜Rπ(ω)˜Rπ(ω)Rπ(ω)Rπ(ω)Lˆπ(ω)−1}≤supω∈Hn(Lˆπ(ω)Rπ(ω))supω∈Hn(Rπ(ω)˜Rπ(ω))supω∈Hn(˜Rπ(ω)˜Lπ(ω))supω∈Hn(˜Lπ(ω)˜Rπ(ω))×supω∈Hn(˜Rπ(ω)Rπ(ω))supω∈Hn(Rπ(ω)Lˆπ(ω))˜Lπ(ˆωcv)infω∈Hn˜Lπ(ω)−1. |
Thus, to prove Theorem 1, it suffices to hold that
supω∈Hn|˜Rπ(ω)Rπ(ω)−1|=op(1), | (5.12) |
supω∈Hn|˜Lπ(ω)˜Rπ(ω)−1|=op(1), | (5.13) |
supω∈Hn|Lˆπ(ω)Rπ(ω)−1|=op(1), | (5.14) |
˜Lπ(ˆωcv)infω∈Hn˜Lπ(ω)−1=op(1). | (5.15) |
We can obtain (5.12) in Lemma 2, which implies that (5.12) is valid.
For (5.13), it is noted that
|˜Lπ(ω)−˜Rπ(ω)|=|‖μ−˜μπ(ω)‖2−‖˜A(ω)μ‖2−tr{˜P(ω)Ωπ˜P(ω)T}|=|‖˜P(ω)eπ‖2−tr{˜P(ω)Ωπ˜P(ω)T}−2μT˜AT(ω)˜P(ω)eπ|. |
Hence, for (5.13) to hold, it suffices to show that
supω∈Hn|‖˜P(ω)eπ‖2−tr{˜P(ω)Ωπ˜P(ω)T}|˜Rπ(ω)=op(1), | (5.16) |
supω∈Hn|μT˜AT(ω)˜P(ω)eπ|˜Rπ(ω)=op(1). | (5.17) |
In addition, according to Lemma 4 and the property of P(m), we have
λmax{P(m)}=λmax{Q+˜X(m)(˜X(m)T˜X(m))−1˜X(m)T}≤λmax{Q}+λmax{˜X(m)(˜X(m)T˜X(m))−1˜X(m)T}≤2. | (5.18) |
Note that
˜P(m)=P(m)−D(m)A(m), | (5.19) |
which, along with (5.18) and the first term of Condition (C4), leads us to
λmax{˜P(ω)}≤Mn∑m=1ωm[λmax{P(m)}+λmax{−D(m)A(m)}]≤Mn∑m=1ωm[2+λmax{−D(m)}λmax{A(m)}]≤Mn∑m=1ωm[2+max1≤i≤nhm,ii1−hm,ii]=1+(1−ˉh)−1=O(1). | (5.20) |
To prove (5.16), it is necessary only to verify that, for any δ>0,
Pr{supω∈Hn|‖˜P(ω)eπ‖2−tr{˜P(ω)Ωπ˜P(ω)T}|/˜Rπ(ω)>δ|X,U}≤Pr{supω∈HnMn∑m=1Mn∑s=1ωmωs|eTπ˜PT(m)˜P(s)eπ−tr{Ωπ˜PT(s)˜P(m)}|>δ˜ξπ|X,U}≤Pr{max1≤m≤Mnmax1≤s≤Mn|eTπ˜PT(m)˜P(s)eπ−tr{Ωπ˜PT(s)˜P(m)}|>δ˜ξπ|X,U}≤Mn∑m=1Mn∑s=1Pr{|eTπ˜PT(m)˜P(s)eπ−tr{Ωπ˜PT(s)˜P(m)}|>δ˜ξπ|X,U}≤C1δ−2G˜ξ−2GπMn∑m=1Mn∑s=1E{|eTπΩ−1/2πΩ1/2π˜P(ωom)T˜P(ωos)Ω1/2πΩ−1/2πeπ−tr{Ωπ˜P(ωos)T˜P(ωom)}|2G|X,U}≤C1δ−2G˜ξ−2GπλGmax(Ωπ)Mn∑m=1Mn∑s=1|tr{˜P(ωom)T˜P(ωos)Ωπ˜P(ωos)T˜P(ωom)}|G≤C1δ−2GλGmax(Ωπ)λ2Gmax[˜P(ωos)]˜ξ−2GπMn∑m=1|tr{˜P(ωom)TΩπ˜P(ωom)}|G≤C1δ−2GλGmax(Ωπ)λ2Gmax[˜P(ωos)]Mn˜ξ−2GπMn∑m=1{˜Rπ(ωom)}G→0, as n→∞, |
where C1 is a positive constant. The third inequality, fourth inequality, and fifth inequality are derived from the triangle inequality, Markov's inequality and (7) of Theorem 2 of [23], respectively. The sixth line follows from inequality tr(B1B2)≤λmax(B1)tr(B2), and the last inequality is contributed to tr{˜P(ωom)TΩπ˜P(ωom)}≤˜Rπ(ωom). The last inequality is guaranteed by Lemma 2, (5.10), and (5.20). Thus, (5.16) is valid.
By similar arguments, for (5.17), we obtain that
Pr{supω∈Hn|μT˜AT(ω)˜P(ω)eπ|/˜Rπ(ω)>δ|X,U}≤Mn∑m=1Mn∑s=1Pr{|μT˜AT(ωom)˜P(ωos)eπ|>δ˜ξπ|X,U}≤δ−2G˜ξ−2GπMn∑m=1Mn∑s=1E{|μT˜AT(ωom)˜P(ωos)eπ|2G|X,U}≤C2δ−2G˜ξ−2GπMn∑m=1Mn∑s=1|˜P(ωos)Ω1/2π˜A(ωom)μ|2G≤C2δ−2G˜ξ−2GπMnλGmax(Ωπ)λ2Gmax[˜P(ωos)]Mn∑m=1|˜A(ωom)μ|2G≤C2δ−2GλGmax(Ωπ)λ2Gmax[˜P(ωos)]Mn˜ξ−2GπMn∑m=1{˜Rπ(ωom)}G→0, as n→∞, |
where C2 is a positive constant, and the last inequality is due to the fact that ‖˜A(ωom)μ‖2≤˜Rπ(ωom), which is implied by (5.11). Thus, (5.17) is valid. This completes the proof of (5.13).
By the Cauchy-Schwarz inequality, it can be shown that
|Lˆπ(ω)Rπ(ω)−1|≤|Lπ(ω)Rπ(ω)−1|+2{Lπ(ω)}1/2‖ˆμπ(ω)−ˆμˆπ(ω)‖Rπ(ω)+‖ˆμπ(ω)−ˆμˆπ(ω)‖2Rπ(ω). |
Thus, to prove (5.14), it suffices to show that
supω∈Hn|Lπ(ω)Rπ(ω)−1|=op(1), | (5.21) |
supω∈Hn‖ˆμπ(ω)−ˆμˆπ(ω)‖2Rπ(ω)=op(1). | (5.22) |
Similarly, using the technique used in deriving (5.13), it can be shown that (5.21) is valid. By Lemma 1, Lemma 4, Condition (C3), and (5.18), we have
supω∈Hn‖ˆμπ(ω)−ˆμˆπ(ω)‖2Rπ(ω)=supω∈Hn‖P(ω)Zˆπ−Zπ‖2Rπ(ω)≤ξ−1πsupω∈Hnλ2max{P(ω)}‖Zˆπ−Zπ‖2≤4ξ−1π‖Zˆπ−Zπ‖2→0, as n→∞. |
As a result, (5.21) and (5.22) are valid, and thus (5.14) is valid.
By the jackknife criterion in (2.2), straightforward and careful calculation yields
CVˆπ(ω)=‖Zˆπ−˜μˆπ(ω)‖2=‖Zˆπ−μ+μ−˜μπ(ω)+˜μπ(ω)−˜μˆπ(ω)‖2=˜Lπ(ω)+˜an(ω)+‖Zˆπ−μ‖2, | (5.23) |
where the term ‖Zˆπ−μ‖2 is independent of ω, and
˜an(ω)=‖˜μπ(ω)−˜μˆπ(ω)‖2+2{μ−˜μπ(ω)}T{˜μπ(ω)−˜μˆπ(ω)}+2(Zˆπ−Zπ)T{μ−˜μπ(ω)}+2(Zˆπ−Zπ)T{˜μπ(ω)−˜μˆπ(ω)}+2eTπ˜A(ω)μ−2eTπ˜P(ω)eπ+2eTπ˜P(ω)(Zπ−Zˆπ). |
Thus, for (2.5) to hold, it only needs to verify that, as n→∞,
supω∈Hn|˜an(ω)|˜Rπ(ω)=op(1). | (5.24) |
Using the Cauchy-Schwarz inequality, Lemma 4 and (5.20), we obtain that
|˜an(ω)|=‖˜μπ(ω)−˜μˆπ(ω)‖2+2{˜Lπ(ω)}12‖˜μπ(ω)−˜μˆπ(ω)‖+2(Zˆπ−Zπ)T{˜Lπ(ω)}12+2(Zˆπ−Zπ)T‖˜μπ(ω)−˜μˆπ(ω)‖+2|eTπ˜A(ω)μ|−2|eTπ˜P(ω)eπ|+2‖˜P(ω)Teπ‖(Zπ−Zˆπ), | (5.25) |
where
‖˜μπ(ω)−˜μˆπ(ω)‖2=‖˜P(ω)(Zˆπ−Zπ)‖2≤[λmax{˜P(ω)}]2‖Zˆπ−Zπ‖2=Op(1). |
Therefore, for (5.24) to hold, it suffices to prove that
supω∈Hn|μT˜AT(ω)eπ|˜Rπ(ω)=op(1), | (5.26) |
supω∈Hn|eTπ˜P(ω)eπ|˜Rπ(ω)=op(1), | (5.27) |
supω∈Hn‖˜P(ω)Teπ‖˜Rπ(ω)=op(1). | (5.28) |
Likewise, the technique is applied to derive (5.17). For any δ>0, we have
Pr{supω∈Hn|μT˜AT(ω)eπ|/˜Rπ(ω)>δ|X,U}≤Mn∑m=1Pr{|μT˜AT(ωom)eπ|>δ˜ξπ|X,U}≤δ−2G˜ξ−2GπMn∑m=1E{|μT˜AT(ωom)eπ|2G|X,U}≤C3δ−2G˜ξ−2GπMn∑m=1|Ω1/2π˜A(ωom)μ|2G≤C3δ−2G˜ξ−2GπλGmax(Ωπ)Mn∑m=1|˜A(ωom)μ|2G≤C3δ−2GλGmax(Ωπ)˜ξ−2GπMn∑m=1{˜Rπ(ωom)}G→0, as n→∞, |
where C3 is a positive constant. By Conditions (C2) and (C3) and (5.10), we know that (5.26) is valid.
It can be observed that
|eTπ˜P(ω)eπ|≤|eTπ˜P(ω)eπ−tr(˜P(ω)Ωπ)|+tr(˜P(ω)Ωπ). |
Therefore, (5.27) holds if we can prove that
supω∈Hn|eTπ˜P(ω)eπ−tr(˜P(ω)Ωπ)|˜Rπ(ω)=op(1), | (5.29) |
supω∈Hn|tr(˜P(ω)Ωπ)|˜Rπ(ω)=op(1). | (5.30) |
Similar to (5.26), it can be shown that
Pr{supω∈Hn|eTπ˜P(ω)eπ−tr(˜P(ω)Ωπ)|/˜Rπ(ω)>δ|X,U}≤Mn∑m=1Pr{|eTπ˜P(ωom)eπ−tr(˜P(ωom)Ωπ)|>δ˜ξπ|X,U}≤δ−2G˜ξ−2GπMn∑m=1E{|eTπΩ−1/2πΩ1/2π˜P(ωom)TΩ1/2πΩ−1/2πeπ−tr{Ωπ˜P(ωom)}|2G|X,U}≤C4δ−2G˜ξ−2GπλGmax(Ωπ)Mn∑m=1(tr{˜P(ωom)TΩπ˜P(ωom)})G≤C4δ−2GλGmax(Ωπ)˜ξ−2GπMn∑m=1{˜Rπ(ωom)}G→0, as n→∞, |
where C4 is a positive constant. As a result, (5.29) is valid.
By (5.10), Condition (C4), and the fact that all the diagonal elements of ˜P(m) are zeros, it is observed that
supω∈Hn|tr(˜P(ω)Ωπ)|/˜Rπ(ω)≤˜ξ−1πmax1≤m≤Mn|tr(˜P(m)Ωπ)|≤˜ξ−1πmax1≤m≤Mn|λmax(Ωπ)tr(˜P(m))|≤˜ξ−1πλmax(Ωπ)max1≤m≤Mntr(˜P(m))→0, as n→∞. |
Thus, (5.30) is valid.
Similarly, for (5.28), we have that
‖˜P(ω)Teπ‖2≤|eTπ˜P(ω)T˜P(ω)eπ−tr{˜P(ω)TΩπ˜P(ω)}|+tr{˜P(ω)TΩπ˜P(ω)}. |
Then, for (5.28) to hold, we only need to prove
supω∈Hn|eTπ˜P(ω)T˜P(ω)eπ−tr{˜P(ω)TΩπ˜P(ω)}|{˜Rπ(ω)}2=op(1), | (5.31) |
supω∈Hn|tr{˜P(ω)TΩπ˜P(ω)}|{˜Rπ(ω)}2=op(1). | (5.32) |
By the proof of (5.16), it can be shown that (5.31) is valid. Letting S(m)=D(m)+In, this together with (5.19) generates
tr{˜P(m)T˜P(m)}=tr{[P(m)−D(m)A(m)]˜P(m)}=tr{[(P(m)−In)S(m)+In]˜P(m)}=tr{P(m)S(m)˜Pm}−tr{S(m)˜P(m)}=tr{P(m)S(m)S(m)(P(m)−In)}+tr{P(m)S(m)}≤tr{P(m)S(m)S(m)P(m)}+tr{P(m)S(m)}≤tr{P(m)(1−ˉh)−2}+tr{P(m)(1−ˉh)−1}=(dn+km)(1−ˉh)−2(2−ˉh), | (5.33) |
where
tr{P(m)}=tr{Q+˜X(m)(˜X(m)T˜X(m))−1˜X(m)T}=tr{Q}+tr{˜X(m)(˜X(m)T˜X(m))−1˜X(m)T}=dn+km,tr{˜P(m)}=0. |
Under Lemma 2 and the second part of Condition (C4), we have
˜ξ−2π(ˉd+ˉk)=ξ−2π(ˉd+ˉk)ξ2π˜ξ−2π≤ξ−2π(ˉd+ˉk){supω∈Hn|Rπ(ω)˜Rπ(ω)−1|+1}2a.s.⟶0. |
where ˜ξπ=infω∈Hn˜R(ω). Which, along with (5.10) and (5.33), implies that
supω∈Hn|tr{˜P(ω)TΩπ˜P(ω)}|/{˜Rπ(ω)}2≤˜ξ−2πsupω∈Hn|λmax(Ωπ)Mn∑m=1Mn∑s=1ωmωstr{˜P(m)T˜P(s)}|≤λmax(Ωπ)˜ξ−2π(ˉd+ˉk)(1−ˉh)−2(2−ˉh)→0, as n→∞, |
and thus (5.32) is valid. In conclusion, the proof of Theorem 1 is completed.
Proof of Theorem 2:
Define ψn(ω)=Zˆπ−Zπ+ˆμπ(ω)−˜μπ(ω)+˜μπ(ω)−˜μˆπ(ω). A simple calculation yields
CVˆπ(ˆωcv)=‖Zˆπ−˜μˆπ(ωcv)‖2=‖Zˆπ−Zπ+Zπ−ˆμπ(ˆωcv)+ˆμπ(ˆωcv)−˜μπ(ˆωcv)+˜μπ(ˆωcv)−˜μˆπ(ˆωcv)‖2=‖μ−ˆμπ(ˆωcv)+eπ+ψn(ˆωcv)‖2=‖ˆscvm0∑m=1ˆωcv,mˆscv{μ−ˆμ(m)π}+(1−ˆscv)Mn∑m=m0+1ˆωcv,m1−ˆscv{μ−ˆμ(m)π}+eπ+ψn(ˆωcv)‖2=‖ˆscv{μ−ˆμ(ˆωC)}+(1−ˆscv){μ−ˆμ(ˆωF)}+eπ+ψn(ˆωcv)‖2, | (5.34) |
where ˆωC=(ˆωcv,1,…,ˆωcv,m0,0,…,0)/ˆscv∈Hn and ˆωF=(0,…,0,ˆωcv,m0+1,…,ˆωcv,Mn)/(1−ˆscv)∈Hn. Likewise, we obtain that
CVˆπ(ˆωC)=‖Zˆπ−Zπ+Zπ−ˆμπ(ˆωC)+ˆμπ(ˆωC)−˜μπ(ˆωC)+˜μπ(ˆωC)−˜μˆπ(ˆωC)‖2=‖μ−ˆμπ(ˆωC)+eπ+ψn(ˆωC)‖2. | (5.35) |
We know that CVˆπ(ˆωcv)≤CVˆπ(ˆωC), which with (5.34), (5.35), and the Cauchy-Schwarz inequality, implies that
(1−ˆscv)2≤[(1−ˆs2cv)‖μ−ˆμπ(ˆωC)‖2+‖ψn(ˆωC)‖2+2eTπ[{μ−ˆμπ(ˆωC)}+ψn(ˆωC)]+2{μ−ˆμπ(ˆωC)}Tψn(ˆωC)+2[ˆscv{μ−ˆμπ(ˆωC)}+ψn(ˆωcv)]T{μ−ˆμπ(ˆωF)}+‖ψn(ˆωcv)‖2+2eTπ{μ−ˆμπ(ˆωF)}+2eTπ[ˆscv{μ−ˆμπ(ˆωC)}+ψn(ˆωcv)]+2ˆscv{μ−ˆμπ(ˆωC)}Tψn(ˆωcv)]/‖μ−ˆμπ(ˆωF)‖≤[2‖μ−ˆμπ(ˆωC)‖2+2supω∈Hn‖ψn(ω)‖2+4|eTπ{μ−ˆμπ(ˆωC)}|+4‖eπ‖supω∈Hn‖ψn(ω)‖+4‖μ−ˆμπ(ˆωC)‖supω∈Hn‖ψn(ω)‖+2{‖μ−ˆμπ(ˆωC)‖+supω∈Hn‖ψn(ω)‖}‖μ−ˆμπ(ˆωF)‖+2eTπA(ˆωF)μ−2eTπP(ˆωF)eπ]1Rπ(ˆωF)Rπ(ˆωF)Lπ(ˆωF) |
≤[{2‖μ−ˆμπ(ˆωC)‖2+2supω∈Hn‖ψn(ω)‖2+4|eTπ{μ−ˆμπ(ˆωC)}|+4‖eπ‖supω∈Hn‖ψn(ω)‖+4‖μ−ˆμπ(ˆωC)‖supω∈Hn‖ψn(ω)‖}ξ−1π,F+2|eTπA(ˆωF)μ|Rπ(ˆωF)+2eTπP(ˆωF)eπRπ(ˆωF)+2{‖μ−ˆμπ(ˆωC)‖+supω∈Hn‖ψn(ω)‖}{Lπ(ˆωF)Rπ(ˆωF)}1/2ξ−1/2π,F]supω∈HF[|Lπ(ω)Rπ(ω)−1|+1]≤[{2‖μ−ˆμπ(ˆωC)‖2+2supω∈Hn‖ψn(ω)‖2+4|eTπ{μ−ˆμπ(ˆωC)}|+4‖eπ‖supω∈Hn‖ψn(ω)‖+4‖μ−ˆμπ(ˆωC)‖supω∈Hn‖ψn(ω)‖}ξ−1π,F+2supω∈HF|eTπA(ω)μ|Rπ(ω)+2supω∈HFeTπP(ω)eπRπ(ω)+2ξ−1/2π,F{‖μ−ˆμπ(ˆωC)‖+supω∈Hn‖ψn(ω)‖}{supω∈HF|Lπ(ω)Rπ(ω)−1|+1}1/2]×supω∈HF{|Lπ(ω)Rπ(ω)−1|+1}. |
Condition (C6) indicates that to prove Theorem 2, it suffices to show
ξ−1π,F‖μ−ˆμπ(ˆωC)‖2=op(1), | (5.36) |
ξ−1π,F|eTπ{μ−ˆμπ(ˆωC)}|=op(1), | (5.37) |
supω∈Hn‖ψn(ω)‖2=Op(1), | (5.38) |
ξ−2π,F‖eπ‖2=op(1), | (5.39) |
supω∈HF|Lπ(ω)Rπ(ω)−1|=op(1), | (5.40) |
supω∈HF|eTπA(ω)μ|Rπ(ω)=op(1), | (5.41) |
supω∈HFeTπP(ω)eπRπ(ω)=op(1). | (5.42) |
For the correct model with m=1,2,…,m0, we have
P(m)μ={Q+˜X(m)(˜X(m)T˜X(m))−1˜X(m)T}{Q+˜X(m)(˜X(m)T˜X(m))−1˜X(m)T}Zπ={Q+2Q˜X(m)(˜X(m)T˜X(m))−1˜X(m)T+˜X(m)(˜X(m)T˜X(m))−1˜X(m)T˜X(m)(˜X(m)T˜X(m))−1˜X(m)T}Zπ={Q+˜X(m)(˜X(m)T˜X(m))−1˜X(m)T}Zπ=μ, |
where Q˜X(m)=Q(I−Q)X(m)=0. This implies that
‖μ−ˆμπ(ˆωC)‖2=‖m0∑m=1ˆωcv,mˆscv{μ−P(m)Zπ}‖2=‖m0∑m=1ˆωcv,mˆscvP(m)eπ‖2≤12m0∑m=1ˆωcv,mˆscvm0∑t=1ˆωcv,sˆscv(eTπP(m)P(m)eπ+eTπP(s)P(s)eπ)=m0∑m=1ˆωcv,mˆscveTπP(m)P(m)eπ≤max1≤m≤m0eTπP(m)eπ. |
Thus, for (5.36) to hold, we need to make
ξ−1π,Fmax1≤m≤m0eTπP(m)eπ=op(1). | (5.43) |
By Markov's inequality, for any δ>0, we have
supn≥1P(max1≤m≤m0eTπP(m)eπ>δ)≤supn≥1m0∑m=1P(eTπP(m)eπ>δ)=supn≥1m0∑m=1E[E{I(eTπP(m)eπ>δ)|X,U}]=supn≥1m0∑m=1E{P(eTπP(m)eπ>δ|X,U)}≤supn≥1m0∑m=1E[δ−2GE{(eTπP(m)eπ)2G|X,U}]≤supn≥1m0∑m=1E{δ−2Gtr{P(m)Ωπ}2G}≤δ−2Gλ2Gmax(Ω)m0(ˉd+ˉk)2G, |
which is op(1) under Condition (C6), implying that (5.43) is valid and thus guarantees that (5.36) is valid.
Indeed, (5.37) is further simplified to the following form:
|eTπ{μ−ˆμπ(ˆωC)}|=|m0∑m=1ˆωcv,mˆscveTπP(m)eπ|≤max1≤m≤m0eTπP(m)eπ. |
Thus, by the proof of (5.43), it can be shown that (5.37) is valid.
For (5.38), one can obtain that
supω∈Hn‖ψn(ω)‖2=supω∈Hn‖Zˆπ−Zπ+ˆμπ(ω)−˜μπ(ω)+˜μπ(ω)−˜μˆπ(ω)‖2≤2supω∈Hn{‖Zˆπ−Zπ‖2+‖ˆμπ(ω)−˜μπ(ω)‖2+‖˜μπ(ω)−˜μˆπ(ω)‖2}≤2{1+(1−ˉh)−2}‖Zˆπ−Zπ‖2+2supω∈Hn‖ˆμπ(ω)−˜μπ(ω)‖2=Op(1)+2supω∈Hn‖ˆμπ(ω)−˜μπ(ω)‖2, | (5.44) |
where the third step is derived from the Cauchy-Schwarz inequality and (5.25), and the last step is obtained from Lemma 2 and Condition (C7). Therefore, (5.44) holds if we can prove that
supω∈Hn‖ˆμπ(ω)−˜μπ(ω)‖2=op(1). | (5.45) |
By (5.19), Lemma 4, and the Cauchy-Shwarz inequality, we have
supω∈Hn‖ˆμπ(ω)−˜μπ(ω)‖2=supω∈Hn‖P(ω)Zπ−˜P(ω)Zπ‖2=supω∈Hn‖Mn∑m=1ωmD(m)A(m)Zπ‖2≤12supω∈HnMn∑m=1Mn∑s=1ωmωs(ZTπA(m)D(m)D(m)A(m)Zπ+ZTπA(s)D(s)D(s)A(s)Zπ) | (5.46) |
≤supω∈HnMn∑m=1Mn∑s=1ωmωsλmax{A(m)D(m)D(s)A(s)}ZTπZπ≤2(1−ˉh)−2ˉh2n(1nμTμ+1neTπeπ), | (5.47) |
where λmax(A)=−1. This, along with Condition (C2), Condition (C4), and (5.46), implies that, to prove (5.44), it only suffices to prove
1neTπeπ=Op(1). | (5.48) |
Likewise, we use the same technique used in Lemma 1 in deriving (5.48). Thus, it is valid and obtaining (5.38) is valid. Based on (5.48) and Condition (3), we know that (5.39) is valid.
Under Conditions (C6) and (C7), it is observed that
(Mn−m0)ξ−2Gπ,FMn−m0∑m=m0{Rπ(ωom)}G→0,ˉh→0,(ˉd+ˉk)ξ−2F,π→0,a.s. |
This result implies that Conditions (C3) and (C4) are satisfied for {Zπ,i,Xi,Ui}ni=1 with ω∈HF. Therefore, it is analogous to (5.13), and we directly obtain that (5.40)–(5.42) are valid. Above all, this completes the proof of Theorem 2.
[1] | B. A. Brumback, Fundamentals of causal inference: With R, CRC Press, 2021. https://doi.org/10.1080/01621459.2023.2287599 |
[2] |
R. K. Crump, V. J. Hotz, G. W. Imbens, O. A. Mitnik, Nonparametric tests for treatment effect heterogeneity, Rev. Econ. Stat., 90 (2008), 389–405. https://doi.org/10.1162/rest.90.3.389 doi: 10.1162/rest.90.3.389
![]() |
[3] |
R. F. Engle, C. W. J. Granger, J. Rice, A. Weiss, Semiparametric estimates of the relation between weather and electricity sales, J. Am. Stat. Assoc., 81 (1986), 247–269. https://doi.org/10.1080/01621459.1986.10478274 doi: 10.1080/01621459.1986.10478274
![]() |
[4] |
J. Fan, Y. Ma, W. Dai, Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models, J. Am. Stat. Assoc., 109 (2014), 1270–1284. https://doi.org/10.1080/01621459.2013.879828 doi: 10.1080/01621459.2013.879828
![]() |
[5] |
Y. Gao, W. Long, Z. Wang, Estimating average treatment effect by model averaging, Econ. Lett., 135 (2015), 42–45. https://doi.org/10.1016/j.econlet.2015.08.002 doi: 10.1016/j.econlet.2015.08.002
![]() |
[6] |
B. E. Hansen, Least squares model averaging, Econometrica, 75 (2007), 1175–1189. https://doi.org/10.1111/j.1468-0262.2007.00785.x doi: 10.1111/j.1468-0262.2007.00785.x
![]() |
[7] |
B. E. Hansen, J. S. Racine, Jackknife model averaging, J. Econ., 167 (2012), 38–46. https://doi.org/10.1016/j.jeconom.2011.06.019 doi: 10.1016/j.jeconom.2011.06.019
![]() |
[8] |
G. W. Imbens, J. M. Wooldridge, Recent developments in the econometrics of program evaluation, J. Econ. Lit., 47 (2009), 5–86. https://doi.org/10.1257/jel.47.1.5 doi: 10.1257/jel.47.1.5
![]() |
[9] |
K. Imai, M. Ratkovic, Estimating treatment effect heterogeneity in randomized program evaluation, Ann. Appl. Stat., 7 (2013), 443–470. https://doi.org/10.1214/12-AOAS593 doi: 10.1214/12-AOAS593
![]() |
[10] |
H. Jo, M. A. Harjoto, The causal effect of corporate governance on corporate social responsibility, J. Bus. Ethics., 106 (2012), 53–72. https://doi.org/10.1007/s10551-011-1052-1 doi: 10.1007/s10551-011-1052-1
![]() |
[11] |
T. Kitagawa, C. Muris, Model averaging in semiparametric estimation of treatment effects, J. Econ., 193 (2016), 271–289. https://doi.org/10.1016/j.jeconom.2016.03.002 doi: 10.1016/j.jeconom.2016.03.002
![]() |
[12] |
K. C. Li, Asymptotic optimality for Cp, CL, cross-validation and generalized cross-validation: Discrete index set, Ann. Stat., 15 (1987), 958–975. https://doi.org/10.1214/aos/1176350486 doi: 10.1214/aos/1176350486
![]() |
[13] |
Q. Liu, R. Okui, Heteroskedasticity-robust Cp model averaging, Econom. J., 16 (2013), 463–472. https://doi.org/10.1111/ectj.12009 doi: 10.1111/ectj.12009
![]() |
[14] |
M. Müller, Estimation and testing in generalized partial linear models–-A comparative study, Stat. Comput., 11 (2001), 299–309. https://doi.org/10.1023/A:1011981314532 doi: 10.1023/A:1011981314532
![]() |
[15] |
C. A. Rolling, Y. Yang, Model selection for estimating treatment effects, J. R. Stat. Soc. B, 76 (2014), 749–769. https://doi.org/10.1111/rssb.12043 doi: 10.1111/rssb.12043
![]() |
[16] |
C. A. Rolling, Y. Yang, D. Velez, Combining estimates of conditional treatment effects, Economet. Theor., 35 (2019), 1089–1110. https://doi.org/10.1017/S0266466618000397 doi: 10.1017/S0266466618000397
![]() |
[17] |
P. R. Rosenbaum, D. B. Rubin, The central role of the propensity score in observational studies for causal effects, Biometrika, 70 (1983), 41–55. https://doi.org/10.2307/2335942 doi: 10.2307/2335942
![]() |
[18] |
D. B. Rubin, Estimating causal effects of treatments in randomized and nonrandomized studies, J. Educ. Psychol., 66 (1974), 688–701. https://doi.org/10.1037/h0037350 doi: 10.1037/h0037350
![]() |
[19] |
D. B. Rubin, Assignment to treatment group on the basis of a covariate, J. Educ. Behav. Stat., 2 (1977), 1–26. https://doi.org/10.2307/1164933 doi: 10.2307/1164933
![]() |
[20] |
T. A. Severini, J. G. Staniswalis, Quasi-likelihood estimation in semiparametric models, J. Am. Stat. Assoc., 89 (1994), 501–511. https://doi.org/10.2307/2290852 doi: 10.2307/2290852
![]() |
[21] |
C. J. Stone, Optimal global rates of convergence for nonparametric regression, Ann. Stat., 10 (1982), 1040–1053. https://doi.org/10.1214/aos/1176345969 doi: 10.1214/aos/1176345969
![]() |
[22] |
L. Tian, A. A. Alizadeh, A. J. Gentles, R. Tibshirani, A simple method for estimating interactions between a treatment and a large number of covariates, J. Am. Stat. Assoc., 109 (2014), 1517–1532. https://doi.org/10.1080/01621459.2014.951443 doi: 10.1080/01621459.2014.951443
![]() |
[23] |
W. Whittle, Bounds for the moments of linear and quadratic forms in independent variables, Theory Probab. Appl., 5 (1960), 302–305. https://doi.org/10.1137/1105028 doi: 10.1137/1105028
![]() |
[24] |
Z. Tan, On doubly robust estimation for logistic partially linear models, Stat. Probab. Lett., 155 (2019), 108577. https://doi.org/10.1016/j.spl.2019.108577 doi: 10.1016/j.spl.2019.108577
![]() |
[25] |
J. Zeng, W. Cheng, G. Hu, Y. Rong, Model selection and model averaging for semiparametric partially linear models with missing data, Commun. Stat.-Theor. M., 48 (2019), 381–395. https://doi.org/10.1080/03610926.2017.1410717 doi: 10.1080/03610926.2017.1410717
![]() |
[26] |
X. Zhang, A. T. Wan, G. Zou, Model averaging by jackknife criterion in models with dependent data, J. Econ., 174 (2013), 82–94. https://doi.org/10.1016/j.jeconom.2013.01.004 doi: 10.1016/j.jeconom.2013.01.004
![]() |
[27] |
X. Zhang, W. Wang, Optimal model averaging estimation for partially linear models, Stat. Sin., 29 (2019), 693–718. https://doi.org/10.2139/ssrn.2948380 doi: 10.2139/ssrn.2948380
![]() |
Symbol | Description | Correlation with Y |
X1 | glycosolated hemoglobin | 0.7409 |
X2 | cholesterol/HDL ratio | 0.2989 |
X3 | waist | 0.2337 |
X4 | weight | 0.1888 |
X5 | high density lipoprotein | -0.1801 |
X6 | frame (0 if large, 1 if medium, 2 otherwise) | -0.1726 |
X7 | first systolic blood pressure | 0.1654 |
X8 | total cholesterol | 0.1514 |
X9 | hip | 0.1448 |
X10 | gender (0 if male, 1 if otherwise) | -0.0861 |
X11 | height | 0.0825 |
X12 | postprandial time when labs were drawn | -0.0485 |
X13 | first diastolic blood pressure | 0.0257 |
n | Method | AIC | BIC | SAIC | SBIC | EW | TECV | TEEM | MAPLM | CPLJMA |
20% | MSE | 2.0044 | 1.9531 | 1.8427 | 1.8182 | 1.9216 | 1.9145 | 1.8024 | 1.8539 | 1.5563 |
Median | 0.7534 | 0.7321 | 0.6935 | 0.6824 | 0.7322 | 0.7035 | 0.5806 | 0.6822 | 0.5255 | |
Optimal rate | 0.13 | 0.04 | 0.01 | 0.11 | 0.05 | 0.02 | 0.11 | 0.13 | 0.40 | |
40% | MSE | 1.1867 | 1.6152 | 1.0405 | 1.0329 | 0.9579 | 0.9649 | 0.8924 | 0.9416 | 0.8520 |
Median | 0.4276 | 0.4165 | 0.3666 | 0.3624 | 0.3522 | 0.3435 | 0.2892 | 0.3237 | 0.2876 | |
Optimal rate | 0.13 | 0.00 | 0.02 | 0.07 | 0.01 | 0.04 | 0.15 | 0.18 | 0.40 | |
60% | MSE | 1.0056 | 0.9915 | 0.8794 | 0.8749 | 0.9020 | 0.8820 | 0.8467 | 0.8835 | 0.7948 |
Median | 0.3621 | 0.3549 | 0.3106 | 0.3078 | 0.3243 | 0.3020 | 0.2557 | 0.3069 | 0.2598 | |
Optimal rate | 0.06 | 0.04 | 0.02 | 0.07 | 0.03 | 0.04 | 0.15 | 0.14 | 0.45 | |
80% | MSE | 0.8512 | 0.8336 | 0.7542 | 0.7513 | 0.7522 | 0.7500 | 0.7219 | 0.7282 | 0.6955 |
Median | 0.3055 | 0.2970 | 0.2600 | 0.2591 | 0.2576 | 0.2479 | 0.2306 | 0.2392 | 0.2272 | |
Optimal rate | 0.05 | 0.00 | 0.00 | 0.11 | 0.02 | 0.04 | 0.24 | 0.16 | 0.38 | |
100% | MSE | 0.8357 | 0.8247 | 0.7105 | 0.7082 | 0.7013 | 0.7173 | 0.7669 | 0.6824 | 0.6628 |
Median | 0.2997 | 0.2935 | 0.2478 | 0.2465 | 0.2426 | 0.2375 | 0.2667 | 0.2283 | 0.2184 | |
Optimal rate | 0.06 | 0.00 | 0.01 | 0.03 | 0.05 | 0.13 | 0.01 | 0.12 | 0.59 |
Symbol | Description | Correlation with Y |
X1 | glycosolated hemoglobin | 0.7409 |
X2 | cholesterol/HDL ratio | 0.2989 |
X3 | waist | 0.2337 |
X4 | weight | 0.1888 |
X5 | high density lipoprotein | -0.1801 |
X6 | frame (0 if large, 1 if medium, 2 otherwise) | -0.1726 |
X7 | first systolic blood pressure | 0.1654 |
X8 | total cholesterol | 0.1514 |
X9 | hip | 0.1448 |
X10 | gender (0 if male, 1 if otherwise) | -0.0861 |
X11 | height | 0.0825 |
X12 | postprandial time when labs were drawn | -0.0485 |
X13 | first diastolic blood pressure | 0.0257 |
n | Method | AIC | BIC | SAIC | SBIC | EW | TECV | TEEM | MAPLM | CPLJMA |
20% | MSE | 2.0044 | 1.9531 | 1.8427 | 1.8182 | 1.9216 | 1.9145 | 1.8024 | 1.8539 | 1.5563 |
Median | 0.7534 | 0.7321 | 0.6935 | 0.6824 | 0.7322 | 0.7035 | 0.5806 | 0.6822 | 0.5255 | |
Optimal rate | 0.13 | 0.04 | 0.01 | 0.11 | 0.05 | 0.02 | 0.11 | 0.13 | 0.40 | |
40% | MSE | 1.1867 | 1.6152 | 1.0405 | 1.0329 | 0.9579 | 0.9649 | 0.8924 | 0.9416 | 0.8520 |
Median | 0.4276 | 0.4165 | 0.3666 | 0.3624 | 0.3522 | 0.3435 | 0.2892 | 0.3237 | 0.2876 | |
Optimal rate | 0.13 | 0.00 | 0.02 | 0.07 | 0.01 | 0.04 | 0.15 | 0.18 | 0.40 | |
60% | MSE | 1.0056 | 0.9915 | 0.8794 | 0.8749 | 0.9020 | 0.8820 | 0.8467 | 0.8835 | 0.7948 |
Median | 0.3621 | 0.3549 | 0.3106 | 0.3078 | 0.3243 | 0.3020 | 0.2557 | 0.3069 | 0.2598 | |
Optimal rate | 0.06 | 0.04 | 0.02 | 0.07 | 0.03 | 0.04 | 0.15 | 0.14 | 0.45 | |
80% | MSE | 0.8512 | 0.8336 | 0.7542 | 0.7513 | 0.7522 | 0.7500 | 0.7219 | 0.7282 | 0.6955 |
Median | 0.3055 | 0.2970 | 0.2600 | 0.2591 | 0.2576 | 0.2479 | 0.2306 | 0.2392 | 0.2272 | |
Optimal rate | 0.05 | 0.00 | 0.00 | 0.11 | 0.02 | 0.04 | 0.24 | 0.16 | 0.38 | |
100% | MSE | 0.8357 | 0.8247 | 0.7105 | 0.7082 | 0.7013 | 0.7173 | 0.7669 | 0.6824 | 0.6628 |
Median | 0.2997 | 0.2935 | 0.2478 | 0.2465 | 0.2426 | 0.2375 | 0.2667 | 0.2283 | 0.2184 | |
Optimal rate | 0.06 | 0.00 | 0.01 | 0.03 | 0.05 | 0.13 | 0.01 | 0.12 | 0.59 |