Loading [MathJax]/jax/output/SVG/jax.js
Theory article

Model averaging with causal effects for partially linear models

  • Received: 04 January 2024 Revised: 30 April 2024 Accepted: 06 May 2024 Published: 09 May 2024
  • MSC : 62F40, 62G20, 62G99

  • Treatment effects with heterogeneity and heteroskedasticity are widely studied and applied in many fields, such as statistics and econometrics. The conditional average treatment effect provides an excellent measure of the heterogeneous treatment effect. In this paper, we propose a model averaging estimation for the conditional average treatment effect with partially linear models based on the jackknife-type criterion under heteroscedastic error. Within this context, we provide theoretical justification for our model averaging approach, and we establish asymptotic optimality and weight convergence properties for our model under certain conditions. The performance of our proposed estimator is compared with that of classical estimators by using a Monte Carlo study and empirical analysis.

    Citation: Xiaowei Zhang, Junliang Li. Model averaging with causal effects for partially linear models[J]. AIMS Mathematics, 2024, 9(6): 16392-16421. doi: 10.3934/math.2024794

    Related Papers:

    [1] Yongge Tian . An effective treatment of adding-up restrictions in the inference of a general linear model. AIMS Mathematics, 2023, 8(7): 15189-15200. doi: 10.3934/math.2023775
    [2] Zhongqi Liang, Yanqiu Zhou . Model averaging based on weighted generalized method of moments with missing responses. AIMS Mathematics, 2023, 8(9): 21683-21699. doi: 10.3934/math.20231106
    [3] Xiaocong Chen, Qunying Wu . Complete convergence and complete integral convergence of partial sums for moving average process under sub-linear expectations. AIMS Mathematics, 2022, 7(6): 9694-9715. doi: 10.3934/math.2022540
    [4] Kai Zhang, Yunpeng Ji, Qiuwei Pan, Yumei Wei, Yong Ye, Hua Liu . Sensitivity analysis and optimal treatment control for a mathematical model of Human Papillomavirus infection. AIMS Mathematics, 2020, 5(3): 2646-2670. doi: 10.3934/math.2020172
    [5] Dayang Dai, Dabuxilatu Wang . A generalized Liu-type estimator for logistic partial linear regression model with multicollinearity. AIMS Mathematics, 2023, 8(5): 11851-11874. doi: 10.3934/math.2023600
    [6] Avijit Duary, Md. Al-Amin Khan, Sayan Pani, Ali Akbar Shaikh, Ibrahim M. Hezam, Adel Fahad Alrasheedi, Jeonghwan Gwak . Inventory model with nonlinear price-dependent demand for non-instantaneous decaying items via advance payment and installment facility. AIMS Mathematics, 2022, 7(11): 19794-19821. doi: 10.3934/math.20221085
    [7] Qiang Zhao, Zhaodi Wang, Jingjing Wu, Xiuli Wang . Weighted expectile average estimation based on CBPS with responses missing at random. AIMS Mathematics, 2024, 9(8): 23088-23099. doi: 10.3934/math.20241122
    [8] Taher S. Hassan, Amir Abdel Menaem, Hasan Nihal Zaidi, Khalid Alenzi, Bassant M. El-Matary . Improved Kneser-type oscillation criterion for half-linear dynamic equations on time scales. AIMS Mathematics, 2024, 9(10): 29425-29438. doi: 10.3934/math.20241426
    [9] Jieqiong Lu, Peixin Zhao, Xiaoshuang Zhou . Orthogonality based modal empirical likelihood inferences for partially nonlinear models. AIMS Mathematics, 2024, 9(7): 18117-18133. doi: 10.3934/math.2024884
    [10] Ashraf Al-Quran . T-spherical linear Diophantine fuzzy aggregation operators for multiple attribute decision-making. AIMS Mathematics, 2023, 8(5): 12257-12286. doi: 10.3934/math.2023618
  • Treatment effects with heterogeneity and heteroskedasticity are widely studied and applied in many fields, such as statistics and econometrics. The conditional average treatment effect provides an excellent measure of the heterogeneous treatment effect. In this paper, we propose a model averaging estimation for the conditional average treatment effect with partially linear models based on the jackknife-type criterion under heteroscedastic error. Within this context, we provide theoretical justification for our model averaging approach, and we establish asymptotic optimality and weight convergence properties for our model under certain conditions. The performance of our proposed estimator is compared with that of classical estimators by using a Monte Carlo study and empirical analysis.



    Causal effects are fundamental to project evaluation across various fields, such as economics, finance, and biomedicine. It is well known that the central issue in project evaluation in these fields is to identify the "causal relationships" that exist between project treatments and project outcomes and to quantify the "causal effects". For example, pharmaceutical companies are interested in the effect of a new drug or device developed to treat a disease, and investment banks are concerned with the profitability of companies that have received significant capital investment. These research questions rely on estimating treatment effects; thus, studying the identification, estimation, and empirical application of treatment effects is crucial and meaningful. Causal effects are also referred to as treatment effects in the literature.

    It is essential, in some cases, to accurately capture heterogeneous treatment effects. For example, a new medicine could be more beneficial for children than for adults; an advertising strategy for sanitary napkins may be more persuasive for women than for men. These results indicate that using knowledge of heterogeneous treatment effects can maximize the value and effectiveness of treatment programs [15]. A commonly used heterogeneous treatment effects measurement is the conditional average treatment effect (CATE), to which much recent attention has been given.

    In several studies, CATE has been estimated by fitting a parametric model to the relationship between observations and baseline covariates, in conjunction with treatment assignment. Alternatively, CATE can be estimated by modeling the conditional mean of potential outcomes using a parametric model based on the baseline covariates for each treatment group. Examples include [2,9,19,22]. However, in practice, practitioners generally have datasets in which not all covariates show a linear relationship with each other. Instead, the data may be nonparametric. In this context, partially linear models (PLMs) are more applicable to such datasets, which combine the interpretability of linear models with the flexibility of nonparametric models.

    Therefore, we consider a partially linear regression framework [3] in which the distribution of the response Yi may depend on a binary treatment assignment indicator δi{0,1} and the baseline covariates (Xi,Ui). To segregate the treatment differences of the primary interest, we assume for the observations that

    Yt,iYc,i=μi+ei=j=1xijβj+g(Ui)+ei,i=1,,n, (1.1)

    where {Yt,i,Yc,i} denote the Yi associated pair of potential outcomes and δi represents the treatment indicator variable, taking δi=1 if the individual belongs to the treatment group and δi=0 otherwise. Xi=(xi1,xi2,)T is covariate and Ui is univariate covariate. β is an unknown coefficient vector associated with Xi, g() is an unknown smooth nonlinear function, and e1,,en are unobservable heteroscedastic random errors independent of {Xi,Ui}ni=1 with conditional mean E(ei|Xi,Ui)=0 and conditional variance E(e2i|Xi,Ui)=σ2i. A fundamental problem in statistics and causal inference is to improve predictive accuracy based on observable datasets {Yi,Xi,Ui,δi}ni=1.

    Researchers, however, usually have multiple candidate models assumed based on various understandings of the observed data. Thus, Polling and Yang [15] proposed the treatment effect across-validation (TECV) method. It is a model selection method for studying CATE, which is able to identify a model that is most suitable for estimating CATE among multiple candidate models. However, there is a risk of model uncertainty, and it is impossible to know whether the model is the most suitable for the dataset. If researchers cannot select the optimal model, they will face missing useful information, leading to unstable estimation. The model averaging method can reduce the risk of regression estimation and the bias introduced by selecting a single model, avoid ignoring the useful information contained in the remaining candidate models, and thereby improve prediction accuracy.

    Model averaging methods have been applied to handle causal inference problems. For example, Gao et al., [5] developed a model averaging method based on the JMA of [7] to estimate the average treatment effect (ATE). [11] proposed a data-driven approach to averaging the estimators over candidate specifications to address the specification uncertainty in the weighted estimation of propensity scores for the average treatment effect on treated (ATT). Rolling et al., [16] introduced a model combination technique, treatment effect estimation by mixing (TEEM), which was designed to amalgamate estimators from various programs. This approach is used to generate more accurate CATE estimations. Although TEEM is compatible with parametric, nonparametric, or semiparametric statistical models, as well as nonstatistical machine learning processes and even subjective expert judgment, the approach proposed by [16] may encounter difficulties in finding treatment-control pairs in each cell after partitioning. This is especially true when dealing with a large or even moderate number of covariates. Thus, we discussed a model averaging estimation for the CATE with multiple candidate partially linear regression models.

    To the best of our knowledge, no optimal model averaging estimation has been developed for PLMs based on the jackknife-type criterion to address causal inference problems. Motivated by this, under heteroscedastic error, the primary goal of the current article is to develop a model averaging estimation for the conditional average treatment effect with PLMs based on a jackknife-type criterion called the CPLJMA method, where C, PL, and JMA represent causal effects, partially linear, and jackknife model averaging, respectively. However, it is difficult to directly extend the available results to our idea for the following reasons: For existing optimal model averaging estimations for PLMs, important examples include [27], who proposed optimal model averaging estimation for PLMs with the Mallows-type criterion based on kernel estimation (MAPLM). Zeng et al., [25] proposed focused information criteria and frequentist model averaging estimators for semiparametric partially linear models with missing responses and established their theoretical properties. We utilize a different jackknife-type weight choice criterion from theirs, and, additionally, we add an extra theorem of weight convergence. Our main contributions to these challenges are as follows: (ⅰ) On a theoretical basis, the proposed estimator is asymptotically optimal in terms of minimizing the squared error loss. (ⅱ) The convergence property of the weights is investigated, and we prove that the sum of the weights assigned to the correct candidate model is one as the sample size increases to infinity. This is true, provided that there is at least one correctly specified candidate model. In the simulation section, numerical simulations and empirical analysis are applied to verify the validity of the proposed model averaging approach and the analytical framework; this proves theoretical support for the wide application of such model averaging methods.

    The remainder of this paper is organized as follows. In Section 2, we describe the estimation procedure of the jackknife criterion for CATE based on PLMs, and we study its theoretical properties in Section 3. Section 4 illustrates the performance of our proposal via simulations and data examples. Concluding remarks are made in Section 5. Technical proofs are deferred to the Supplementary.

    First, we use the potential outcomes framework [8,18] to define the CATE as the expectation of the random variable conditional on the observed value of the baseline covariates (Xi,Ui), that is

    μi=E(Yt,iYc,i|Xi,Ui),

    where Yt,iYc,i indicates the treatment effect for the ith individual; this is also labeled the "fundamental problem of causal inference" because Yt,i and Yc,i are infeasible to observe simultaneously. The main goal of this study is to estimate μi based on the model averaging method.

    Under the causal inference framework, we make the following identifiability assumptions [1]:

    Assumption 1. Consistency: Yi=δiYt,i+(1δi)Yc,i;

    Assumption 2. Unconfoundedness: {Yt,i,Yc,i}δi{Xi,Ui};

    Assumption 3. Positivity: 0<cππ(Xi,Ui)1cπ<1 almost surely, where π(Xi,Ui)=P(δi=1|Yt,i,Yc,i,Xi,Ui)=P(δi=1|Xi,Ui) denotes the propensity score and cπ is a positive constant.

    Assumption 1 links potential outcomes to observed outcomes and requires the potential outcomes to be well defined. Assumption 2 is conditional independence, which implies that the treatment assignment indicator δi is independent of the potential outcome {Yt,i,Yc,i} given covariates Xi and Ui, and it requires that all potential confounding information on the relationship between treatments and potential outcomes is observed in the covariates; it precludes potential confounding between treatment assignments and outcomes. Assumption 3 implies that treatment assignments are not deterministic, which is crucial for controlling confounding bias and systematic differences between the treatment and control groups. This assumption also guarantees π(Xi,Ui) and 1π(Xi,Ui) are invertible with probability one. [17] referred to the combination of Assumptions 2 and 3 as a "strongly ignorable treatment assignment".

    Inverse probability weighting (IPW) based on the potential outcome framework is a powerful tool for correcting confounding bias. The IPW approach utilizes the inverse of the propensity scores to construct weights for observed outcomes to balance baseline covariates between groups [10]. Under the strongly ignorable treatment assignment assumption, suppose

    Zπ,i=δiYiπ(Xi,Ui)(1δi)Yi1π(Xi,Ui)

    to construct the unbiased estimator of μi given the conditions Xi, Ui. Then, after careful calculation, we have

    E(Zπ,i|Xi,Ui)=E[δiYiπ(Xi,Ui)(1δi)Yi1π(Xi,Ui)|Xi,Ui]=E(δi|Xi,Ui)π(Xi,Ui)E(Yt,i|Xi,Ui)E(1δi|Xi,Ui)1π(Xi,Ui)E(Yc,i|Xi,Ui)=E(Yt,i|Xi,Ui)E(Yc,i|Xi,Ui)=μi. (2.1)

    The unreasonable model selected to estimate the CATE may yield uncertain estimations. To provide a more reasonable and robust CATE estimator, we develop a model averaging estimation for CATE with multiple candidate PLMs.

    As discussed earlier, we can obtain computationally feasible PLMs of heterogeneous causal effects based on treatment-control pairs. In accordance with Eq (2.1) and model (1.1), we let eπ,i=Zπ,iμi. Then, we have

    Zπ,i=μi+eπ,i=j=1xijβj+g(Ui)+eπ,i, (2.2)

    where μi denotes the CATE for the ith individual, E(eπ,i|Xi,Ui)=0, and E(e2π,i|Xi,Ui)=σ2π,i. Let us note that {Zπ,i,Xi,Ui}ni=1 is fully observed when π(Xi,Ui) is known.

    Specifically, we consider multiple candidate models for model (2.2) of the form

    μ(m)i=kmj=1x(m)ijβ(m)j+g(Ui),m=1,,Mn, (2.3)

    used for evaluating μi, where x(m)ij is the jth entry of X(m)i, X(m)i is an km-dimensional subvector of Xi, β(m)j is the corresponding regression coefficient vector, g() is an unknown function in the nonparametric part, and Mn denotes the total number of candidate models allowed to go to infinity.

    Define Zπ=(Zπ,1,,Zπ,n)TRn, X(m)=(X(m)1,,X(m)n)TRn×km, where {x(m)Tij}ni=1 is an 1×km column vector, g(U)=(g(U1),,g(Un))TRn, and eπ=(eπ,1,,eπ,n)TRn. Then, the mth candidate model in matrix form is

    Zπ=X(m)β(m)+g(U)+eπ.

    To estimate the nonparametric function, we use the B-spline regression method. Let Sn be the space of polynomial splines of degree l1, and let {ψk,k=1,,dn} denote a normalized B-spline basis. For any gnSn, we have

    gn(U)=dnk=1ψk(U)αk=ΨT(U)α,

    for some coefficients {αk}dnk=1, where Ψ(U)=(ψ1(U),,ψdn(U))T, α=(α1,,αdn)T. Here, dn increases with n. We define the n×dn matrix K=(Ψ(U1),,Ψ(Un))T. Then, we assume that the n×(p+dn) matrix X(m)=(X(m),K) is nonsingular and associated with the unknown (p+dn)-dimensional parameter vector γ(m)=(β(m)T,αT)T. Thus, we have

    μ(m)π,n=X(m)γ(m)=X(m)β(m)+Kα.

    By regressing Zπ on X(m), the least squares estimator of β(m) and α can be obtained as

    ˆβ(m)={X(m)T(IQ)X(m)}1X(m)T(IQ)Zπ,

    and

    ˆα=(KTK)1KT(ZπX(m)ˆβ(m)),

    where Q=K(KTK)1KT is a symmetric idempotent matrix. Then

    ˆμ(m)π=X(m)ˆβ(m)+Kˆα={Q+˜X(m)(˜X(m)T˜X(m))1˜X(m)T}Zπ=P(m)Zπ,

    where ˜X(m)=(IQ)X(m) is a symmetric idempotent matrix and P(m)=Q+˜X(m)(˜X(m)T˜X(m))1˜X(m)T. Then the corresponding model averaging estimator of μ can be formulated as

    ˆμπ(ω)=Mnm=1ωmˆμ(m)π=P(ω)Zπ, (2.4)

    where P(ω)=Mnm=1ωmP(m) and ω=(ω1,,ωMn)T is a weight vector belonging to the continuous set Hn={ω[0,1]Mn:Mnm=1ωm=1}.

    Notably, the choice of weight vector is crucial in the model averaging method. Thus, we consider the jackknife-type criterion to choose the weight vector ω for (2.4) in the PLMs framework. Specifically, leave-one-out cross-validation (LOO-CV) is used to estimate μ, and then the estimator in the mth candidate model is given by

    ˜μ(m)π=˜P(m)Zπ and ˜P(m)=P(m)D(m)A(m),

    where D(m)=diag(D(m)11,,D(m)nn)Rn×n with the ith diagonal element D(m)ii=hm,ii/(1hm,ii), A(m)=InP(m), and hm,ii is the ith diagonal entry of P(m). Thus, the jackknife-type model averaging estimator is

    ˜μπ(ω)=Mnm=1ωm˜μ(m)π=˜P(ω)Zπ,

    where ˜P=Mnm=1ωm˜P(m). Then, the weight choice criterion is

    CVπ(ω)=Zπ˜μπ(ω)2. (2.5)

    The optimal weight vector is obtained by minimizing the criterion in (2.5) over the space Hn. However, such a minimization process in real-world data analysis is computationally infeasible because π(Xi,Ui) is generally unknown. In our modeling framework, we estimate it by adopting the logistic partially linear models (LPLMs) in [24],

    ˆπ(Xi,Ui)=eXTiθ+κ(Ui)1+eXTiθ+κ(Ui),i=1,,n (2.6)

    which relies on generalized partially linear models (GPLMs) in [14], where the coefficients θ for the linear part and the nonparametric part κ() are estimated using B-spline basis estimation. This method is facilitated by the "sgplm1" function within the R package "gplm", with the degrees of freedom set to "df=3". Further elucidation of this choice is provided in the subseqent numerical simulations. Then, ˆπ(Xi,Ui) is substituted for π(Xi,Ui) with Zπ to obtain Zˆπ. Then, a feasible counterpart of CVπ(ω) in (2.5) becomes

    CVˆπ(ω)=Zˆπ˜μˆπ(ω)2.

    The optimal weights ˆωcv are obtained by selecting ωHn to minimize the jackknife-type criterion in

    ˆωcv=argminωHnCVˆπ(ω).

    Given ˜ωcv, substituting it into (2.4) yields the optimal model averaging estimator of μ as ˆμˆπ(ˆωcv) in the case where ˆπ(Xi,Ui). CVˆπ(ω) is a quadratic programming problem with respect to ω, that is, CVˆπ(ω)=ωTHTˆπHˆπω, where Hπ=(Zˆπ˜P(1)Zˆπ,,Zˆπ˜P(Mn)Zˆπ).

    In this section, we focus on some theoretical properties of our proposed model averaging method. In Subsection 3.1, we prove the asymptotic optimality of the model averaging estimator ˆμ(ˆωcv) by illustrating that the selected weight vector ˆωcv yields a squared error that is asymptotically identical to that of the infeasible optimal weight vector. Subsection 3.2 concerns the convergence property of the optimal weight vector ˆωcv. When the sample size tends to infinity, the sum of the weights assigned to the correct model by the optimal weight vector obtained by the proposed method converges to one in probability.

    Before introducing the theoretical properties, we first define some notations. Let μt,i=E(Yt,i|Xi,Ui) and et,i=Yt,iμt,i denote the conditional expectation and the random error of the treatment group, respectively, in which σ2t,i=E(e2t,i|Xi,Ui). The loss function and the corresponding risk function of ˆμπ(ω) are defined as

    Lπ(ω)=μˆμπ(ω)2 and Rπ(ω)=E{Lπ(ω)|Xi,Ui},

    respectively, where is the Euclidean norm. Let ξπ=infωHnRπ(ω), ˉk=max1mMnkm, and ˉh=max1mMnmax1inhm,ii. The following conditions are assumed with respect to n.

    (C1) nˆθnθ0=Op(1) and ˆκˆθnκ0=op(n1/4), in which θ0 and κ0 are the true values of θ and κ, respectively, and the first derivatives of ˆπ(Xi,Ui;θ,κ) with respect to θ and κ are continuous and bounded.

    (C2) For some integers G1,

    maxi{E(e4Gi|Xi,Ui),E(e4Gt,i|Xi,Ui),|μi|,|μt,i|}ˉC<,a.s.

    where i=1,,n, and ˉC is a positive constant.

    (C3) For some integers 1G,

    Mnξ2GπMnm=1{Rπ(ωom)}Ga.s.0.

    (C4) ˉha.s.0 and (ˉd+ˉk)ξ2πa.s.0.

    (C5) The functions {gj(U)}nj=1 belong to a class of functions F, whose rth derivative g(r)j exists and is Lipschitz of order η,

    F={gj():|g(r)j(s)g(r)j(t)|G|st|ηfors,t[a,b]},

    for some positive constant G, where r is a nonnegative integer and η(0,1], such that υ=r+η>0.5.

    Condition (C1), a commonly used restriction for GPLMs, requires n-consistent estimates for the parametric component ˆθn. The nonparametric component ˆκˆθn is viewed as a function of the parametric component to achieve consistency. This restriction is reasonable [20]. Condition (C2) is a moment condition concerning random errors, and it is satisfied by {μi,μt,i}ni=1, which are bounded. Condition (C3) is a convergence condition that imposes certain restrictions on the circumstances for applying our asymptotic outcome. A prerequisite for Condition (C3) to hold is ξπ, which requires that no finite-dimensional correct model exists in the class of candidate models [6]. It also requires that Mn and max1mMnRπ(ωom) go to infinity slowly enough. Condition (C4), an assumption that excludes the case of extremely unbalanced design matrices as candidate models, is widely imposed in studies of optimal model averaging based on cross-validation, such as [7,12], among others. The B-spline approximation in PLMs requires Condition (C5) with references to [4,21], which is a regularity condition that necessitates the nonparametric coefficient function to be sufficiently smooth.

    Theorem 1. If Conditions (C1)–(C5) are satisfied, then

    Lˆπ(ˆωcv)infωHnLˆπ(ω)P1.

    Theorem 1 demonstrates its asymptotic optimality by minimizing the approximate risk with squared error loss. This theorem illustrates that the square error with selected model weights ˆωcv minimized by the LOO-CV criterion is asymptotically equal to that of the infeasible optimal weight vector.

    In this subsection, we concentrate on the convergence properties of the optimal weights in model averaging. It should be noted that, in this article, the mth candidate model in (2.3) is deemed correctly specified or a correct model if there exists β(m) such that μ=X(m)β(m)+g(U); otherwise, model (2.3) is said to be mispecified or an incorrect model.

    We first introduce some notation. ˆscv is defined as the sum of the weights assigned to the correct candidate models using our proposed method, which can be denoted as ˆscv=m0m=1ˆωcv,m in mathematical notation, where m0 indicates that the first m0 candidate models are all correctly specified. Let HF={ω[0,1]Mn:Mnm=m0+1ωm=1} be the weight set of all incorrect candidate models and ξπ,F=infωHFRπ(ω) is the optimal risk when the weights are assigned to all the mispecified candidate models. We specify some necessary conditions for further analysis based on n.

    (C6) For some integers 1G,

    ξ2Gπ,Fmax{m0(ˉd+ˉk)2G,(Mnm0)Mnm0m=m0{Rπ(ωom)}G}a.s.0.

    (C7) ˉh=O(n1/2).

    Condition (C6) requires ξ2Gπ,F to grow at a rate no slower than m0(ˉd+ˉk)2G and (Mnm0)Mnm0m=m0{Rπ(ωom)}G. It is worth noting that if (Mnm0)Mnm0m=m0{Rπ(ωom)}G is larger than m0(ˉd+ˉk)2G, then Condition (C6) is identical to (Mnm0)ξ2Gπ,FMnm0m=m0{Rπ(ωom)}Ga.s.0 and is thus analogous to Condition (C3). Condition (C7), excluding the case of peculiar models as candidate models, is from [13].

    Theorem 2. If Conditions (C1)–(C7) are satisfied, then

    ˆscvp1.

    Theorem 2 indicates that the model averaging estimator ˆμ(ˆωcv) of our proposal is sufficient for the sum of the weights assigned to the correct models to converge to one in probability as the sample size goes to infinity and automatically excludes the incorrect models.

    To demonstrate the theoretical properties of Section 3, in this section we conduct two Monte Carlo experiments on the finite-sample performance, where Case 1 verifies the asymptotic optimality of Theorem 1 with the candidate models all misspecified, and Case 2 justifies the weight convergence property of Theorem 2 with at least one correct model specified. In addition, the superiority of our method is illustrated by applying it to the Diabetes dataset. For a better analysis, we also consider several relevant existing methods as competitors to our CPLJMA approach, including the model selection methods for AIC, BIC, and treatment effect cross-validation (TECV) proposed by [15]; Information criterion-based model averaging methods such as SAIC and SBIC; the equal weight method (EW); and treatment effect estimation by mixing (TEEM) and the Mallows averaging of partially linear models (MAPLM). We calculate the mean squared error (MSE) to assess the performances of the estimators, defined as MSE=1nDDd=1{ˆμ(ω)}(d)μ(d)2, where μ(d) and {ˆμ(ω)}(d) are the CATE and model averaging estimator in the dth replicate, respectively. D denotes the number of replicates of the simulation. Additionally, we calculated MSEmedian in the empirical analysis, defined as MSEmedian=mediand=1,,DMSE(d), and the optimal rate, which is the percentage of the smallest MSE value. In complement to the rigorous numerical investigations detailed earlier, we embark on the task of determining the number of (interior) knots. Like the pivotal role of bandwidth in kernel smoothing, these knots, akin to tuning parameters, have a remarkable influence on the smoothness and adaptability of our spline models. The details of the numerical study and its results are described in the following subsections.

    Case 1: Without correct candidate models

    The data-generating process (DGP) is as follows:

    Yt,iYc,i=μi+ei=500j=1Xijβj+g(Ui)+0.5X2i2+et,iec,i,

    where Xi1=1. {Xij}Jj=2, the covariates of the linear part, are generated from a multivariate normal distribution with mean 0 and covariance 0.5|j1j2| between xij1 and xij2. The associated coefficients in the linear component are taken as βj=1/j. The coefficient function g(Ui)=sin(2πUi), where UiUniform[0,1]. {et,i,ec,i} are independent random errors distributed from N(0,σ2X2i2), where the parameter σ2 is chosen by R2=var(μi)/var(Yt,iYc,i), which varies on a grid between 0.1 and 0.9. Then

    μi=500j=1j1Xij+sin(2πUi)+0.5X2i2andei=et,iec,i. (4.1)

    We rescaled μi to have unit variance so that the expected R2 equals 11+σ2 for the unknown model. It is clear from (4.1) that the class of candidate models considered in this case is misspecified. In addition, to obtain {δ}ni=1, the propensity score is taken as

    π(Xi,Ui)=exp(0.75Xi2+sin(2πUi))1+exp(0.75Xi2+sin(2πUi)).

    Thus, we obtain {Yi}ni=1. For ˆπ(Xi,Ui), we use the GPLMs in (2.6) to approximate the coefficients of the linear part and the form of the nonparametric part.

    First, we delve into the influence of interior knots within the B-spline basis on the performance of our proposed CPLJMA approach. By employing the "bs(,df)" function from the R package "splines", we generate an appropriate B-spline basis matrix. Here, the degree of freedom parameter, denoted as df, is a crucial factor, determined by "df=3+thenumberofknots".

    Figure 1 shows the variation in MSE as the number of interior knots varies, considering a sample size of n=300 and R2=0.5, and for both nested and nonnested setting scenarios. In the nested setting, the mth candidate model comprises the first m linear variables in {Xij}500j=1. The number of candidate models Mn is determined as the nearest integer from 3n1/3, resulting in Mn=20. In the nonnested setting, the linear components of all candidate models are a subset of {Xi1,,Xi5}, disregarding the remaining Xi variables, thereby yielding a total of 251=31 candidate models. As depicted in the figure, the MSE increases with an increase in the number of knots, potentially exacerbating the phenomenon of overfitting. Consequently, we opt for "df=3" as the degree of freedom for the CPLJMA method, ensuring a balance between model flexibility and susceptibility to overfitting.

    Figure 1.  The curves of the median of MSE with the number of knots for Case 1 over n=300 and R=0.5 simulations.

    The greater the number of covariates, the more tedious the computation. Therefore, in substantiating the theoretical properties outlined in Theorem 1 through numerical simulations, we adopt a nested setting. Accordingly, when the sample sizes are n=75,150,300, and 600, then Mn=12,15,20, and 25, respectively.

    Figure 2 illustrates a numerical inspection for the asymptotic optimality of ˆμˆπ(ˆωcv) in Theorem 1 by showing the mean of LR, defined as LR=Lˆπ(ˆωcv)/infωHnLˆπ(ω), for different samples and various R. In a simulation trial based on 100 replications, we observe that the mean curve of LR decreases and converges to one as n increases. This intuitively demonstrates the asymptotic optimality of CPLJMA.

    Figure 2.  Evaluating the asymptotic optimality of ˆμˆπ(ˆωcv) under nested models: The class of candidate models is all misspecified.

    Figure 3 shows the MSE ratio curves for the seven CATE μ estimators we considered, where we used the AIC as the denominator to yield the MSE ratio with an entry of 1.00. Generally, our proposed CPLJMA outperforms its competitors on the MSE ratios when R2 or n is small or moderate, particularly because it is difficult to identify the optimal model when there is considerable noise in the model. The advantage of model averaging without relying on a single model is that it provides protection against poor model selection. As expected, SAIC and SBIC invariably yielded more accurate results than did their respective model selection rivals. In short, to some extent, CPLJMA is superior to its competitors.

    Figure 3.  Performances of various method estimators under nested models: The classes of candidate models are all misspecified.

    Case 2: With correct candidate models

    The DGP is generated from

    Yt,iYc,i=μi+ei=5j=1Xijβj+g(Ui)+et,iec,i

    where the vector of covariates Xi=(Xi1,,Xi5)T is from an independent standard normal distribution N(0,1), Ui is distributed as U[1,1], and the corresponding coefficient and nonparametric function are βj=1/j and 1.2Ui, respectively. The settings for {et,i,ec,i}, σ, R2, and bf are the same as those in Case 1. Thus, we can obtain

    μi=5j=1j1Xij+1.2Ui.

    The propensity score is taken as

    π(Xi,Ui)=exp(0.75Xi3+1.2Ui)1+exp(0.75Xi3+1.2Ui).

    Based on the LPLMs in (2.6), we can obtain {ˆπ(Xi,Ui)}ni=1 and thus {Zπ,i}ni=1. In this case, we consider the nonnested models. The linear parts of all candidate models are constructed by varying combinations of {Xi1,Xi2,Xi3,Xi4,Xi5}; thus, Mn=251. The sample size is taken as n=75,150,300,600. The results for the convergence of the model weights ˆωcv and the MSE ratio of the above methods are given in Figures 4 and 5, respectively, based on 100 replications.

    Figure 4.  Evaluating the convergence of the model weights ˆωcv under nonnested models: The class of candidate models contains correct models.
    Figure 5.  MSE ratio curves for the various methods under nonnested models: The class of candidate models contains correct models.

    Figure 4 clearly shows that the sum of the weights corresponding to the correct candidate models tends to one as the sample sizes n and R2 increase. This intuitively confirms the convergence of ˆωcv presented in Theorem 2 via numerical inspection.

    As shown in Figure 5, we still derive the MSE ratio using AIC as the denominator. In most cases, the results of the MSE ratio demonstrate the merits of our approach over its competitors. Obviously, as R2 increases progressively, the MSE ratio for all scenarios tends to decrease, as expected. Increasing the sample size also improved the performance for all approaches. Overall, CPLJMA still outperforms several existing methods.

    Our proposed method is applied to the Diabetes dataset from Dr. John Schorling, Department of Medicine, University of Virginia School of Medicine. The original data consisted of 19 covariates on 403 subjects from 1046 subjects interviewed in a study. However, due to the availability of missing data, we selected 16 covariates and 366 respondents as the dataset for the current case study, 175 of whom respondents resided in Buckingham and 191 of whom did not.

    Our analysis considers the outcome variable Y to be stabilized glucose. The treatment indicator variable δ takes the value of 1 if people reside in Buckingham, and 0 otherwise. We calculated the Pearson correlation coefficients of the 14 baseline covariates X and U with Y and ranked them in descending order of correlation strength. Let us note that all continuous covariates X and Y are standardized to have a mean of zero and a variance of one, and U is scaled to [0,1]. See Table 1 for details.

    Table 1.  Pearson correlation coefficient of each linear covariate with Y.
    Symbol Description Correlation with Y
    X1 glycosolated hemoglobin 0.7409
    X2 cholesterol/HDL ratio 0.2989
    X3 waist 0.2337
    X4 weight 0.1888
    X5 high density lipoprotein -0.1801
    X6 frame (0 if large, 1 if medium, 2 otherwise) -0.1726
    X7 first systolic blood pressure 0.1654
    X8 total cholesterol 0.1514
    X9 hip 0.1448
    X10 gender (0 if male, 1 if otherwise) -0.0861
    X11 height 0.0825
    X12 postprandial time when labs were drawn -0.0485
    X13 first diastolic blood pressure 0.0257

     | Show Table
    DownLoad: CSV

    We assume that the candidate models consist of nested models constructed from the covariates in {X1,,X13,U} with intercept terms, where the baseline covariates are X for the linear part, and we consider that age is a covariate U for the nonparametric part. Accordingly, there are 13 well-prepared candidate models. To implement our proposal, the propensity score π(Xi,Ui) is still solved by LPLMs.

    We conduct a "guided simulation experiment" to evaluate the performance of our proposal and that of its competitors. In particular, we use the largest candidate model containing all covariates as a guided model, and m denotes the index of that model in the class of candidate models. Based on the mth candidate model and the original dataset {Yi,Xi,Ui,δi}ni=1, we can obtain a simulation dataset {Y(m)i,Xi,δi,Ui}ni=1. Thus,

    Y(m)i=δiY(m)t,i+(1δi)Y(m)c,i,Y(m)t,i=X(m)Tiˆρ(m)+f(m)(U(m)i)+e(m)t,i,Y(m)c,i=X(m)Tiˆη(m)+h(m)(U(m)i)+e(m)c,i, (4.2)

    where ˆρ(m) and ˆη(m) are the regression coefficient estimators for the linear part, f(m)(U(m)i) and h(m)(U(m)i) are the estimators for the nonparametric part, and {e(m)t,i,e(m)c,i}ni=1 is from N(0,1). Therefore, the "true" μi, CATE, is known in this analysis dataset, namely,

    μi=X(m)Tiˆρ(m)+f(m)(U(m)i)[X(m)Tiˆη(m)+h(m)(U(m)i)].

    We randomly selected samples from 20%,40%,60%,80%, and 100% of the dataset to describe the performance of the proposed CPLJMA and its competitors by MSE, MSEmedian, and the optimal rate based on 100 replications.

    The results are displayed in Table 2. Our approach produced a lower MSE and median and a higher optimal rate than those of its competitors across all sample sizes considered. As expected, the average methods based on the information criterion perform better than the model selection methods. To some extent, our proposal has a clear advantage over its competitors in solving practical problems.

    Table 2.  The MSE, Median and the Optimal rate across 100 repetitions under nested models.
    n Method AIC BIC SAIC SBIC EW TECV TEEM MAPLM CPLJMA
    20% MSE 2.0044 1.9531 1.8427 1.8182 1.9216 1.9145 1.8024 1.8539 1.5563
    Median 0.7534 0.7321 0.6935 0.6824 0.7322 0.7035 0.5806 0.6822 0.5255
    Optimal rate 0.13 0.04 0.01 0.11 0.05 0.02 0.11 0.13 0.40
    40% MSE 1.1867 1.6152 1.0405 1.0329 0.9579 0.9649 0.8924 0.9416 0.8520
    Median 0.4276 0.4165 0.3666 0.3624 0.3522 0.3435 0.2892 0.3237 0.2876
    Optimal rate 0.13 0.00 0.02 0.07 0.01 0.04 0.15 0.18 0.40
    60% MSE 1.0056 0.9915 0.8794 0.8749 0.9020 0.8820 0.8467 0.8835 0.7948
    Median 0.3621 0.3549 0.3106 0.3078 0.3243 0.3020 0.2557 0.3069 0.2598
    Optimal rate 0.06 0.04 0.02 0.07 0.03 0.04 0.15 0.14 0.45
    80% MSE 0.8512 0.8336 0.7542 0.7513 0.7522 0.7500 0.7219 0.7282 0.6955
    Median 0.3055 0.2970 0.2600 0.2591 0.2576 0.2479 0.2306 0.2392 0.2272
    Optimal rate 0.05 0.00 0.00 0.11 0.02 0.04 0.24 0.16 0.38
    100% MSE 0.8357 0.8247 0.7105 0.7082 0.7013 0.7173 0.7669 0.6824 0.6628
    Median 0.2997 0.2935 0.2478 0.2465 0.2426 0.2375 0.2667 0.2283 0.2184
    Optimal rate 0.06 0.00 0.01 0.03 0.05 0.13 0.01 0.12 0.59

     | Show Table
    DownLoad: CSV

    Considering the heterogeneity and heteroskedasticity information embedded in the dataset, the problem of model uncertainty, and the flexibility and interpretability of PLMs, a jackknife-type method based on the weight selection criterion and its feasible form, called the CPLJMA method, are proposed for estimating the CATE. Within this context, we consider the optimal model weights chosen by minimizing the LOO-CV criterion, in which the B-splines approximate the nonparametric function, and we demonstrate its asymptotic optimality in accordance with a minimization of the approximate risk with squared error loss. In addition, since the choice of weights is crucial to the model averaging method, we considered the convergence properties of the weights, that is, when the sample size goes to infinity and at least one candidate model is correctly specified, the sum of weights obtained by our approach for the correct candidate models converges to one in probability. In the simulation section, we examine the finite-sample performance of our estimator and compare it with several other model selection and averaging methods. We illustrate our method by using a real-world dataset. The simulation results indicated that our method possessed some advantages relative to its competitors.

    There are still many issues worthy of further discussion. First, our proposed CPLJMA method is valid only for one-dimensional covariates with respect to the nonparametric part, and it can be further refined and extended to multiple dimensions. Second, the least squares loss used as the loss function in our analysis is more sensitive to outliers. Thus, quantile regression, which is less sensitive to outliers, is considered as a loss function because developing the model averaging process and establishing its asymptotic properties remains challenging; this area deserves future research.

    X. Zhang: Writing—original draft; X. Zhang and J. Li: Writing—review and editing. All authors have read and approved the final version of the manuscript for publication.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    The authors would like to thank the editors and reviewers for their valuable and insightful comments. The authors also thank Professor Guangren Yang for his advice for revising the paper.

    The authors declare no conflict of interest.

    This Supplementary provides detailed proofs of the main theorems stated in the above.

    Lemmas and their proofs

    We introduce some lemmas and their proofs before proving the Theorems in Section 3.

    Lemma 1. Provided that Conditions (C1)-(C2) hold, we have ZˆπZπ2=Op(1).

    Proof. According to the monotonicity of the Lr-norm and the Holder inequality, we have

    max{max1inσ2i,max1inσ2t,i}=max{max1inE(e2i|Xi,Ui),max1inE(e2t,i|Xi,Ui)}max[max1in{E(e4Gi|Xi,Ui)}12G,max1in{E(e4Gt,i)|Xi,Ui)}12G]ˉC

    almost surely, where the second inequality and the last inequality are attributed to the Holder inequality and Condition (C2), respectively. Thus, we know that for any ϵ>0, there exists an integer Nϵ such that P(max1inσ2i>ˉC)ϵ/2 for all nNϵ. Let Mϵ=2ˉC/ϵ. Then we have

    supn1P(1nni=1e2i>Mϵ)=supn1{P(1nni=1e2i>Mϵ,max1inσ2iˉC)+P(1nni=1e2i>Mϵ,max1inσ2i>ˉC)}supn1E{I(1nni=1e2i>Mϵ)I(max1inσ2iˉC)}+supnNϵP(max1inσ2i>ˉC)supn1E[E{I(1nni=1e2i>Mϵ)|Xi,Ui}I(max1inσ2iˉC)]+ϵ2=supn1E{P(1nni=1e2i>Mϵ|Xi,Ui)I(max1inσ2iˉC)}+ϵ2.supn1E{M1ϵE(1nni=1e2i|Xi,Ui)I(max1inσ2iˉC)}+ϵ2=supn1E{M1ϵ1nni=1σ2iI(max1inσ2iˉC)}+ϵ2M1ϵˉC+ϵ2=ϵ. (5.1)

    Thus, we have

    1nni=1e2i=Op(1). (5.2)

    Similarly, it can be obtained that

    1nni=1e2t,i=Op(1). (5.3)

    By the Cauchy-Schwarz inequality, we obtain

    ZˆπZπ2=ni=1{δiˆπ(Xi,Ui)Yt,i1δi1ˆπ(Xi,Ui)Yc,i(δiπ(Xi,Ui)Yt,i1δi1π(Xi,Ui)Yc,i)}2=ni=1[{δiˆπ(Xi,Ui)ˆπ(Xi,Ui)(1ˆπ(Xi,Ui))δiπ(Xi,Ui)π(Xi,Ui)(1π(Xi,Ui))}Yt,i+{1δi1ˆπ(Xi,Ui)1δi1π(Xi,Ui)}(Yt,iYc,i)]2c[{nmax1in|δiˆπ(Xi,Ui)ˆπ(Xi,Ui)(1ˆπ(Xi,Ui))δiπ(Xi,Ui)π(Xi,Ui)(1π(Xi,Ui))|}21nni=1Y2t,i+{nmax1in|1δi1ˆπ(Xi,Ui)1δi1π(Xi,Ui)|}21nni=1(Yt,iYc,i)2]c[{nmax1in|δiˆπ(Xi,Ui)ˆπ(Xi,Ui)(1ˆπ(Xi,Ui))δiπ(Xi,Ui)π(Xi,Ui)(1π(Xi,Ui))|}2(1nni=1μ2t,i+1nni=1e2t,i)+{nmax1in|1δi1ˆπ(Xi,Ui)1δi1π(Xi,Ui)|}2(1nni=1μ2i+1nni=1e2i)]{nmax1in|δiˆπ(Xi,Ui)ˆπ(Xi,Ui)(1ˆπ(Xi,Ui))δiπ(Xi,Ui)π(Xi,Ui)(1π(Xi,Ui))|}2Op(1)+{nmax1in|1δi1ˆπ(Xi,Ui)1δi1π(Xi,Ui)|}2Op(1),

    where the last equation is due to Condition (C2), (5.2) and (5.3). Lemma 1 holds, if we can prove that

    nmax1in|δiˆπ(Xi,Ui)ˆπ(Xi,Ui)(1ˆπ(Xi,Ui))δiπ(Xi,Ui)π(Xi,Ui)(1π(Xi,Ui))|=Op(1), (5.4)
    nmax1in|1δi1ˆπ(Xi,Ui)1δi1π(Xi,Ui)|=Op(1). (5.5)

    By the Taylor expansion, one has

    nmax1in|δiˆπ(Xi,Ui)ˆπ(Xi,Ui)(1ˆπ(Xi,Ui))δiπ(Xi,Ui)π(Xi,Ui)(1π(Xi,Ui))|c{min1inˆπ(Xi,Ui;θi,κi)}2{1max1inˆπ(Xi,Ui;θi,κi)}2max1in{ˆπ(Xi,Ui;θi,κi)θT|θ=θinˆθnθ0+ˆπ(Xi,Ui;θi,κi)κT|κ=κiκˆθnκ0},

    where θi is a vector between ˆθn and θ0, and κi is a vector between κˆθn and κ0. Expanding ˆπ(Xi,Ui;θi,κi) in a Taylor series and considering the property of π(Xi,Ui), we have

    min1inˆπ(Xi,Ui;θi,κi)cπmax1in{ˆπ(Xi,Ui;θi,κi)θTθ=θiˆθnθ0+ˆπ(Xi,Ui;θi,κi)κTκ=κiκˆθnκ0},1max1inˆπ(Xi,Ui;θi,κi)cπmax1in{ˆπ(Xi,Ui;θi,κi)θTθ=θiˆθnθ0+ˆπ(Xi,Ui;θi,κi)κTκ=κiκˆθnκ0},

    where θi is a vector between ˆθi and θ0, and κi is a vector between ˆκi and κ0. Together with Condition (C2), this shows that (5.4) is valid. Likewise, we determine that (5.5) is valid. Therefore, the proof of Lemma 1 is completed.

    Lemma 2. By Condition (10) of Theorem 2.1 in [26], we have

    supωHn|˜Rπ(ω)Rπ(ω)1|a.s.0. (5.6)

    With Assumption 3, for some integer 1G, there exists

    Mn˜ξ2GπMnm=1{˜Rπ(ωom)}Ga.s.0. (5.7)

    Lemma 3. Let ˜ω=argminωHn{Ln(ω)+an(ω)+bn}, where an(ω) is a term related to ω and bn is a term unrelated to ω. Let Rn(ω)=E{Ln(ω)|X,U}. If

    supωHn|an(ω)|Rn(ω)=op(1), (5.8)
    supωHn|Rn(ω)Ln(ω)|Rn(ω)=op(1), (5.9)

    and there exists a constant c and a positive integer N so that when nN, infωHnRn(ω)c>0 almost surely, then Ln(˜ω)/infωHnLn(ω)1 in probability. $

    Lemma 4. For any n1×n2 matrices B1 and B2,

    λmax{B1B2}λmax{B1}λmax{B2},λmax{B1+B2}λmax{B1}+λmax{B2}.

    Theorems and theirs proofs

    Proof of Theorem 1:

    λmax() is denoted as the largest singular value of a matrix. From Condition (C2), we have

    λmax(Ωπ)=O(1). (5.10)

    By the definition of Rπ(ω), it can be shown that

    Rπ(ω)=A(ω)μ2+tr{P(ω)ΩπP(ω)T}, (5.11)

    where A(ω)=InP(ω). Define the loss function of ˜μπ(ω) as ˜Lπ(ω)=μ˜μπ(ω)2 and its corresponding risk function as ˜Rπ(ω)=E{˜Lπ(ω)|X,U}. Similarly, we have ˜Rπ(ω)=˜A(ω)μ2+tr{˜P(ω)Ωπ˜P(ω)T} in which ˜A(ω)=In˜P(ω). Define

    Vπ(ω)=A(ω)μ2+tr{P(ω)ΩπP(ω)T}and˜Vπ(ω)=˜A(ω)μ2+tr{˜P(ω)Ωπ˜P(ω)T}.

    Then we have

    Lˆπ(ˆωcv)infωHnLˆπ(ω)1=supωHn{Lˆπ(ˆωcv)Lˆπ(ω)1}=supωHn{Lˆπ(ˆωcv)Vπ(ˆωcv)Vπ(ˆωcv)˜Vπ(ˆωcv)˜Vπ(ˆωcv)˜Lπ(ˆωcv)˜Lπ(ˆωcv)˜Lπ(ω)˜Lπ(ω)˜Rπ(ω)˜Rπ(ω)Rπ(ω)Rπ(ω)Lˆπ(ω)1}supωHn(Lˆπ(ω)Rπ(ω))supωHn(Rπ(ω)˜Rπ(ω))supωHn(˜Rπ(ω)˜Lπ(ω))supωHn(˜Lπ(ω)˜Rπ(ω))×supωHn(˜Rπ(ω)Rπ(ω))supωHn(Rπ(ω)Lˆπ(ω))˜Lπ(ˆωcv)infωHn˜Lπ(ω)1.

    Thus, to prove Theorem 1, it suffices to hold that

    supωHn|˜Rπ(ω)Rπ(ω)1|=op(1), (5.12)
    supωHn|˜Lπ(ω)˜Rπ(ω)1|=op(1), (5.13)
    supωHn|Lˆπ(ω)Rπ(ω)1|=op(1), (5.14)
    ˜Lπ(ˆωcv)infωHn˜Lπ(ω)1=op(1). (5.15)

    We can obtain (5.12) in Lemma 2, which implies that (5.12) is valid.

    For (5.13), it is noted that

    |˜Lπ(ω)˜Rπ(ω)|=|μ˜μπ(ω)2˜A(ω)μ2tr{˜P(ω)Ωπ˜P(ω)T}|=|˜P(ω)eπ2tr{˜P(ω)Ωπ˜P(ω)T}2μT˜AT(ω)˜P(ω)eπ|.

    Hence, for (5.13) to hold, it suffices to show that

    supωHn|˜P(ω)eπ2tr{˜P(ω)Ωπ˜P(ω)T}|˜Rπ(ω)=op(1), (5.16)
    supωHn|μT˜AT(ω)˜P(ω)eπ|˜Rπ(ω)=op(1). (5.17)

    In addition, according to Lemma 4 and the property of P(m), we have

    λmax{P(m)}=λmax{Q+˜X(m)(˜X(m)T˜X(m))1˜X(m)T}λmax{Q}+λmax{˜X(m)(˜X(m)T˜X(m))1˜X(m)T}2. (5.18)

    Note that

    ˜P(m)=P(m)D(m)A(m), (5.19)

    which, along with (5.18) and the first term of Condition (C4), leads us to

    λmax{˜P(ω)}Mnm=1ωm[λmax{P(m)}+λmax{D(m)A(m)}]Mnm=1ωm[2+λmax{D(m)}λmax{A(m)}]Mnm=1ωm[2+max1inhm,ii1hm,ii]=1+(1ˉh)1=O(1). (5.20)

    To prove (5.16), it is necessary only to verify that, for any δ>0,

    Pr{supωHn|˜P(ω)eπ2tr{˜P(ω)Ωπ˜P(ω)T}|/˜Rπ(ω)>δ|X,U}Pr{supωHnMnm=1Mns=1ωmωs|eTπ˜PT(m)˜P(s)eπtr{Ωπ˜PT(s)˜P(m)}|>δ˜ξπ|X,U}Pr{max1mMnmax1sMn|eTπ˜PT(m)˜P(s)eπtr{Ωπ˜PT(s)˜P(m)}|>δ˜ξπ|X,U}Mnm=1Mns=1Pr{|eTπ˜PT(m)˜P(s)eπtr{Ωπ˜PT(s)˜P(m)}|>δ˜ξπ|X,U}C1δ2G˜ξ2GπMnm=1Mns=1E{|eTπΩ1/2πΩ1/2π˜P(ωom)T˜P(ωos)Ω1/2πΩ1/2πeπtr{Ωπ˜P(ωos)T˜P(ωom)}|2G|X,U}C1δ2G˜ξ2GπλGmax(Ωπ)Mnm=1Mns=1|tr{˜P(ωom)T˜P(ωos)Ωπ˜P(ωos)T˜P(ωom)}|GC1δ2GλGmax(Ωπ)λ2Gmax[˜P(ωos)]˜ξ2GπMnm=1|tr{˜P(ωom)TΩπ˜P(ωom)}|GC1δ2GλGmax(Ωπ)λ2Gmax[˜P(ωos)]Mn˜ξ2GπMnm=1{˜Rπ(ωom)}G0, as n,

    where C1 is a positive constant. The third inequality, fourth inequality, and fifth inequality are derived from the triangle inequality, Markov's inequality and (7) of Theorem 2 of [23], respectively. The sixth line follows from inequality tr(B1B2)λmax(B1)tr(B2), and the last inequality is contributed to tr{˜P(ωom)TΩπ˜P(ωom)}˜Rπ(ωom). The last inequality is guaranteed by Lemma 2, (5.10), and (5.20). Thus, (5.16) is valid.

    By similar arguments, for (5.17), we obtain that

    Pr{supωHn|μT˜AT(ω)˜P(ω)eπ|/˜Rπ(ω)>δ|X,U}Mnm=1Mns=1Pr{|μT˜AT(ωom)˜P(ωos)eπ|>δ˜ξπ|X,U}δ2G˜ξ2GπMnm=1Mns=1E{|μT˜AT(ωom)˜P(ωos)eπ|2G|X,U}C2δ2G˜ξ2GπMnm=1Mns=1|˜P(ωos)Ω1/2π˜A(ωom)μ|2GC2δ2G˜ξ2GπMnλGmax(Ωπ)λ2Gmax[˜P(ωos)]Mnm=1|˜A(ωom)μ|2GC2δ2GλGmax(Ωπ)λ2Gmax[˜P(ωos)]Mn˜ξ2GπMnm=1{˜Rπ(ωom)}G0, as n,

    where C2 is a positive constant, and the last inequality is due to the fact that ˜A(ωom)μ2˜Rπ(ωom), which is implied by (5.11). Thus, (5.17) is valid. This completes the proof of (5.13).

    By the Cauchy-Schwarz inequality, it can be shown that

    |Lˆπ(ω)Rπ(ω)1||Lπ(ω)Rπ(ω)1|+2{Lπ(ω)}1/2ˆμπ(ω)ˆμˆπ(ω)Rπ(ω)+ˆμπ(ω)ˆμˆπ(ω)2Rπ(ω).

    Thus, to prove (5.14), it suffices to show that

    supωHn|Lπ(ω)Rπ(ω)1|=op(1), (5.21)
    supωHnˆμπ(ω)ˆμˆπ(ω)2Rπ(ω)=op(1). (5.22)

    Similarly, using the technique used in deriving (5.13), it can be shown that (5.21) is valid. By Lemma 1, Lemma 4, Condition (C3), and (5.18), we have

    supωHnˆμπ(ω)ˆμˆπ(ω)2Rπ(ω)=supωHnP(ω)ZˆπZπ2Rπ(ω)ξ1πsupωHnλ2max{P(ω)}ZˆπZπ24ξ1πZˆπZπ20, as n.

    As a result, (5.21) and (5.22) are valid, and thus (5.14) is valid.

    By the jackknife criterion in (2.2), straightforward and careful calculation yields

    CVˆπ(ω)=Zˆπ˜μˆπ(ω)2=Zˆπμ+μ˜μπ(ω)+˜μπ(ω)˜μˆπ(ω)2=˜Lπ(ω)+˜an(ω)+Zˆπμ2, (5.23)

    where the term Zˆπμ2 is independent of ω, and

    ˜an(ω)=˜μπ(ω)˜μˆπ(ω)2+2{μ˜μπ(ω)}T{˜μπ(ω)˜μˆπ(ω)}+2(ZˆπZπ)T{μ˜μπ(ω)}+2(ZˆπZπ)T{˜μπ(ω)˜μˆπ(ω)}+2eTπ˜A(ω)μ2eTπ˜P(ω)eπ+2eTπ˜P(ω)(ZπZˆπ).

    Thus, for (2.5) to hold, it only needs to verify that, as n,

    supωHn|˜an(ω)|˜Rπ(ω)=op(1). (5.24)

    Using the Cauchy-Schwarz inequality, Lemma 4 and (5.20), we obtain that

    |˜an(ω)|=˜μπ(ω)˜μˆπ(ω)2+2{˜Lπ(ω)}12˜μπ(ω)˜μˆπ(ω)+2(ZˆπZπ)T{˜Lπ(ω)}12+2(ZˆπZπ)T˜μπ(ω)˜μˆπ(ω)+2|eTπ˜A(ω)μ|2|eTπ˜P(ω)eπ|+2˜P(ω)Teπ(ZπZˆπ), (5.25)

    where

    ˜μπ(ω)˜μˆπ(ω)2=˜P(ω)(ZˆπZπ)2[λmax{˜P(ω)}]2ZˆπZπ2=Op(1).

    Therefore, for (5.24) to hold, it suffices to prove that

    supωHn|μT˜AT(ω)eπ|˜Rπ(ω)=op(1), (5.26)
    supωHn|eTπ˜P(ω)eπ|˜Rπ(ω)=op(1), (5.27)
    supωHn˜P(ω)Teπ˜Rπ(ω)=op(1). (5.28)

    Likewise, the technique is applied to derive (5.17). For any δ>0, we have

    Pr{supωHn|μT˜AT(ω)eπ|/˜Rπ(ω)>δ|X,U}Mnm=1Pr{|μT˜AT(ωom)eπ|>δ˜ξπ|X,U}δ2G˜ξ2GπMnm=1E{|μT˜AT(ωom)eπ|2G|X,U}C3δ2G˜ξ2GπMnm=1|Ω1/2π˜A(ωom)μ|2GC3δ2G˜ξ2GπλGmax(Ωπ)Mnm=1|˜A(ωom)μ|2GC3δ2GλGmax(Ωπ)˜ξ2GπMnm=1{˜Rπ(ωom)}G0, as n,

    where C3 is a positive constant. By Conditions (C2) and (C3) and (5.10), we know that (5.26) is valid.

    It can be observed that

    |eTπ˜P(ω)eπ||eTπ˜P(ω)eπtr(˜P(ω)Ωπ)|+tr(˜P(ω)Ωπ).

    Therefore, (5.27) holds if we can prove that

    supωHn|eTπ˜P(ω)eπtr(˜P(ω)Ωπ)|˜Rπ(ω)=op(1), (5.29)
    supωHn|tr(˜P(ω)Ωπ)|˜Rπ(ω)=op(1). (5.30)

    Similar to (5.26), it can be shown that

    Pr{supωHn|eTπ˜P(ω)eπtr(˜P(ω)Ωπ)|/˜Rπ(ω)>δ|X,U}Mnm=1Pr{|eTπ˜P(ωom)eπtr(˜P(ωom)Ωπ)|>δ˜ξπ|X,U}δ2G˜ξ2GπMnm=1E{|eTπΩ1/2πΩ1/2π˜P(ωom)TΩ1/2πΩ1/2πeπtr{Ωπ˜P(ωom)}|2G|X,U}C4δ2G˜ξ2GπλGmax(Ωπ)Mnm=1(tr{˜P(ωom)TΩπ˜P(ωom)})GC4δ2GλGmax(Ωπ)˜ξ2GπMnm=1{˜Rπ(ωom)}G0, as n,

    where C4 is a positive constant. As a result, (5.29) is valid.

    By (5.10), Condition (C4), and the fact that all the diagonal elements of ˜P(m) are zeros, it is observed that

    supωHn|tr(˜P(ω)Ωπ)|/˜Rπ(ω)˜ξ1πmax1mMn|tr(˜P(m)Ωπ)|˜ξ1πmax1mMn|λmax(Ωπ)tr(˜P(m))|˜ξ1πλmax(Ωπ)max1mMntr(˜P(m))0, as n.

    Thus, (5.30) is valid.

    Similarly, for (5.28), we have that

    ˜P(ω)Teπ2|eTπ˜P(ω)T˜P(ω)eπtr{˜P(ω)TΩπ˜P(ω)}|+tr{˜P(ω)TΩπ˜P(ω)}.

    Then, for (5.28) to hold, we only need to prove

    supωHn|eTπ˜P(ω)T˜P(ω)eπtr{˜P(ω)TΩπ˜P(ω)}|{˜Rπ(ω)}2=op(1), (5.31)
    supωHn|tr{˜P(ω)TΩπ˜P(ω)}|{˜Rπ(ω)}2=op(1). (5.32)

    By the proof of (5.16), it can be shown that (5.31) is valid. Letting S(m)=D(m)+In, this together with (5.19) generates

    tr{˜P(m)T˜P(m)}=tr{[P(m)D(m)A(m)]˜P(m)}=tr{[(P(m)In)S(m)+In]˜P(m)}=tr{P(m)S(m)˜Pm}tr{S(m)˜P(m)}=tr{P(m)S(m)S(m)(P(m)In)}+tr{P(m)S(m)}tr{P(m)S(m)S(m)P(m)}+tr{P(m)S(m)}tr{P(m)(1ˉh)2}+tr{P(m)(1ˉh)1}=(dn+km)(1ˉh)2(2ˉh), (5.33)

    where

    tr{P(m)}=tr{Q+˜X(m)(˜X(m)T˜X(m))1˜X(m)T}=tr{Q}+tr{˜X(m)(˜X(m)T˜X(m))1˜X(m)T}=dn+km,tr{˜P(m)}=0.

    Under Lemma 2 and the second part of Condition (C4), we have

    ˜ξ2π(ˉd+ˉk)=ξ2π(ˉd+ˉk)ξ2π˜ξ2πξ2π(ˉd+ˉk){supωHn|Rπ(ω)˜Rπ(ω)1|+1}2a.s.0.

    where ˜ξπ=infωHn˜R(ω). Which, along with (5.10) and (5.33), implies that

    supωHn|tr{˜P(ω)TΩπ˜P(ω)}|/{˜Rπ(ω)}2˜ξ2πsupωHn|λmax(Ωπ)Mnm=1Mns=1ωmωstr{˜P(m)T˜P(s)}|λmax(Ωπ)˜ξ2π(ˉd+ˉk)(1ˉh)2(2ˉh)0, as n,

    and thus (5.32) is valid. In conclusion, the proof of Theorem 1 is completed.

    Proof of Theorem 2:

    Define ψn(ω)=ZˆπZπ+ˆμπ(ω)˜μπ(ω)+˜μπ(ω)˜μˆπ(ω). A simple calculation yields

    CVˆπ(ˆωcv)=Zˆπ˜μˆπ(ωcv)2=ZˆπZπ+Zπˆμπ(ˆωcv)+ˆμπ(ˆωcv)˜μπ(ˆωcv)+˜μπ(ˆωcv)˜μˆπ(ˆωcv)2=μˆμπ(ˆωcv)+eπ+ψn(ˆωcv)2=ˆscvm0m=1ˆωcv,mˆscv{μˆμ(m)π}+(1ˆscv)Mnm=m0+1ˆωcv,m1ˆscv{μˆμ(m)π}+eπ+ψn(ˆωcv)2=ˆscv{μˆμ(ˆωC)}+(1ˆscv){μˆμ(ˆωF)}+eπ+ψn(ˆωcv)2, (5.34)

    where ˆωC=(ˆωcv,1,,ˆωcv,m0,0,,0)/ˆscvHn and ˆωF=(0,,0,ˆωcv,m0+1,,ˆωcv,Mn)/(1ˆscv)Hn. Likewise, we obtain that

    CVˆπ(ˆωC)=ZˆπZπ+Zπˆμπ(ˆωC)+ˆμπ(ˆωC)˜μπ(ˆωC)+˜μπ(ˆωC)˜μˆπ(ˆωC)2=μˆμπ(ˆωC)+eπ+ψn(ˆωC)2. (5.35)

    We know that CVˆπ(ˆωcv)CVˆπ(ˆωC), which with (5.34), (5.35), and the Cauchy-Schwarz inequality, implies that

    (1ˆscv)2[(1ˆs2cv)μˆμπ(ˆωC)2+ψn(ˆωC)2+2eTπ[{μˆμπ(ˆωC)}+ψn(ˆωC)]+2{μˆμπ(ˆωC)}Tψn(ˆωC)+2[ˆscv{μˆμπ(ˆωC)}+ψn(ˆωcv)]T{μˆμπ(ˆωF)}+ψn(ˆωcv)2+2eTπ{μˆμπ(ˆωF)}+2eTπ[ˆscv{μˆμπ(ˆωC)}+ψn(ˆωcv)]+2ˆscv{μˆμπ(ˆωC)}Tψn(ˆωcv)]/μˆμπ(ˆωF)[2μˆμπ(ˆωC)2+2supωHnψn(ω)2+4|eTπ{μˆμπ(ˆωC)}|+4eπsupωHnψn(ω)+4μˆμπ(ˆωC)supωHnψn(ω)+2{μˆμπ(ˆωC)+supωHnψn(ω)}μˆμπ(ˆωF)+2eTπA(ˆωF)μ2eTπP(ˆωF)eπ]1Rπ(ˆωF)Rπ(ˆωF)Lπ(ˆωF)
    [{2μˆμπ(ˆωC)2+2supωHnψn(ω)2+4|eTπ{μˆμπ(ˆωC)}|+4eπsupωHnψn(ω)+4μˆμπ(ˆωC)supωHnψn(ω)}ξ1π,F+2|eTπA(ˆωF)μ|Rπ(ˆωF)+2eTπP(ˆωF)eπRπ(ˆωF)+2{μˆμπ(ˆωC)+supωHnψn(ω)}{Lπ(ˆωF)Rπ(ˆωF)}1/2ξ1/2π,F]supωHF[|Lπ(ω)Rπ(ω)1|+1][{2μˆμπ(ˆωC)2+2supωHnψn(ω)2+4|eTπ{μˆμπ(ˆωC)}|+4eπsupωHnψn(ω)+4μˆμπ(ˆωC)supωHnψn(ω)}ξ1π,F+2supωHF|eTπA(ω)μ|Rπ(ω)+2supωHFeTπP(ω)eπRπ(ω)+2ξ1/2π,F{μˆμπ(ˆωC)+supωHnψn(ω)}{supωHF|Lπ(ω)Rπ(ω)1|+1}1/2]×supωHF{|Lπ(ω)Rπ(ω)1|+1}.

    Condition (C6) indicates that to prove Theorem 2, it suffices to show

    ξ1π,Fμˆμπ(ˆωC)2=op(1), (5.36)
    ξ1π,F|eTπ{μˆμπ(ˆωC)}|=op(1), (5.37)
    supωHnψn(ω)2=Op(1), (5.38)
    ξ2π,Feπ2=op(1), (5.39)
    supωHF|Lπ(ω)Rπ(ω)1|=op(1), (5.40)
    supωHF|eTπA(ω)μ|Rπ(ω)=op(1), (5.41)
    supωHFeTπP(ω)eπRπ(ω)=op(1). (5.42)

    For the correct model with m=1,2,,m0, we have

    P(m)μ={Q+˜X(m)(˜X(m)T˜X(m))1˜X(m)T}{Q+˜X(m)(˜X(m)T˜X(m))1˜X(m)T}Zπ={Q+2Q˜X(m)(˜X(m)T˜X(m))1˜X(m)T+˜X(m)(˜X(m)T˜X(m))1˜X(m)T˜X(m)(˜X(m)T˜X(m))1˜X(m)T}Zπ={Q+˜X(m)(˜X(m)T˜X(m))1˜X(m)T}Zπ=μ,

    where Q˜X(m)=Q(IQ)X(m)=0. This implies that

    μˆμπ(ˆωC)2=m0m=1ˆωcv,mˆscv{μP(m)Zπ}2=m0m=1ˆωcv,mˆscvP(m)eπ212m0m=1ˆωcv,mˆscvm0t=1ˆωcv,sˆscv(eTπP(m)P(m)eπ+eTπP(s)P(s)eπ)=m0m=1ˆωcv,mˆscveTπP(m)P(m)eπmax1mm0eTπP(m)eπ.

    Thus, for (5.36) to hold, we need to make

    ξ1π,Fmax1mm0eTπP(m)eπ=op(1). (5.43)

    By Markov's inequality, for any δ>0, we have

    supn1P(max1mm0eTπP(m)eπ>δ)supn1m0m=1P(eTπP(m)eπ>δ)=supn1m0m=1E[E{I(eTπP(m)eπ>δ)|X,U}]=supn1m0m=1E{P(eTπP(m)eπ>δ|X,U)}supn1m0m=1E[δ2GE{(eTπP(m)eπ)2G|X,U}]supn1m0m=1E{δ2Gtr{P(m)Ωπ}2G}δ2Gλ2Gmax(Ω)m0(ˉd+ˉk)2G,

    which is op(1) under Condition (C6), implying that (5.43) is valid and thus guarantees that (5.36) is valid.

    Indeed, (5.37) is further simplified to the following form:

    |eTπ{μˆμπ(ˆωC)}|=|m0m=1ˆωcv,mˆscveTπP(m)eπ|max1mm0eTπP(m)eπ.

    Thus, by the proof of (5.43), it can be shown that (5.37) is valid.

    For (5.38), one can obtain that

    supωHnψn(ω)2=supωHnZˆπZπ+ˆμπ(ω)˜μπ(ω)+˜μπ(ω)˜μˆπ(ω)22supωHn{ZˆπZπ2+ˆμπ(ω)˜μπ(ω)2+˜μπ(ω)˜μˆπ(ω)2}2{1+(1ˉh)2}ZˆπZπ2+2supωHnˆμπ(ω)˜μπ(ω)2=Op(1)+2supωHnˆμπ(ω)˜μπ(ω)2, (5.44)

    where the third step is derived from the Cauchy-Schwarz inequality and (5.25), and the last step is obtained from Lemma 2 and Condition (C7). Therefore, (5.44) holds if we can prove that

    supωHnˆμπ(ω)˜μπ(ω)2=op(1). (5.45)

    By (5.19), Lemma 4, and the Cauchy-Shwarz inequality, we have

    supωHnˆμπ(ω)˜μπ(ω)2=supωHnP(ω)Zπ˜P(ω)Zπ2=supωHnMnm=1ωmD(m)A(m)Zπ212supωHnMnm=1Mns=1ωmωs(ZTπA(m)D(m)D(m)A(m)Zπ+ZTπA(s)D(s)D(s)A(s)Zπ) (5.46)
    supωHnMnm=1Mns=1ωmωsλmax{A(m)D(m)D(s)A(s)}ZTπZπ2(1ˉh)2ˉh2n(1nμTμ+1neTπeπ), (5.47)

    where λmax(A)=1. This, along with Condition (C2), Condition (C4), and (5.46), implies that, to prove (5.44), it only suffices to prove

    1neTπeπ=Op(1). (5.48)

    Likewise, we use the same technique used in Lemma 1 in deriving (5.48). Thus, it is valid and obtaining (5.38) is valid. Based on (5.48) and Condition (3), we know that (5.39) is valid.

    Under Conditions (C6) and (C7), it is observed that

    (Mnm0)ξ2Gπ,FMnm0m=m0{Rπ(ωom)}G0,ˉh0,(ˉd+ˉk)ξ2F,π0,a.s.

    This result implies that Conditions (C3) and (C4) are satisfied for {Zπ,i,Xi,Ui}ni=1 with ωHF. Therefore, it is analogous to (5.13), and we directly obtain that (5.40)–(5.42) are valid. Above all, this completes the proof of Theorem 2.



    [1] B. A. Brumback, Fundamentals of causal inference: With R, CRC Press, 2021. https://doi.org/10.1080/01621459.2023.2287599
    [2] R. K. Crump, V. J. Hotz, G. W. Imbens, O. A. Mitnik, Nonparametric tests for treatment effect heterogeneity, Rev. Econ. Stat., 90 (2008), 389–405. https://doi.org/10.1162/rest.90.3.389 doi: 10.1162/rest.90.3.389
    [3] R. F. Engle, C. W. J. Granger, J. Rice, A. Weiss, Semiparametric estimates of the relation between weather and electricity sales, J. Am. Stat. Assoc., 81 (1986), 247–269. https://doi.org/10.1080/01621459.1986.10478274 doi: 10.1080/01621459.1986.10478274
    [4] J. Fan, Y. Ma, W. Dai, Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models, J. Am. Stat. Assoc., 109 (2014), 1270–1284. https://doi.org/10.1080/01621459.2013.879828 doi: 10.1080/01621459.2013.879828
    [5] Y. Gao, W. Long, Z. Wang, Estimating average treatment effect by model averaging, Econ. Lett., 135 (2015), 42–45. https://doi.org/10.1016/j.econlet.2015.08.002 doi: 10.1016/j.econlet.2015.08.002
    [6] B. E. Hansen, Least squares model averaging, Econometrica, 75 (2007), 1175–1189. https://doi.org/10.1111/j.1468-0262.2007.00785.x doi: 10.1111/j.1468-0262.2007.00785.x
    [7] B. E. Hansen, J. S. Racine, Jackknife model averaging, J. Econ., 167 (2012), 38–46. https://doi.org/10.1016/j.jeconom.2011.06.019 doi: 10.1016/j.jeconom.2011.06.019
    [8] G. W. Imbens, J. M. Wooldridge, Recent developments in the econometrics of program evaluation, J. Econ. Lit., 47 (2009), 5–86. https://doi.org/10.1257/jel.47.1.5 doi: 10.1257/jel.47.1.5
    [9] K. Imai, M. Ratkovic, Estimating treatment effect heterogeneity in randomized program evaluation, Ann. Appl. Stat., 7 (2013), 443–470. https://doi.org/10.1214/12-AOAS593 doi: 10.1214/12-AOAS593
    [10] H. Jo, M. A. Harjoto, The causal effect of corporate governance on corporate social responsibility, J. Bus. Ethics., 106 (2012), 53–72. https://doi.org/10.1007/s10551-011-1052-1 doi: 10.1007/s10551-011-1052-1
    [11] T. Kitagawa, C. Muris, Model averaging in semiparametric estimation of treatment effects, J. Econ., 193 (2016), 271–289. https://doi.org/10.1016/j.jeconom.2016.03.002 doi: 10.1016/j.jeconom.2016.03.002
    [12] K. C. Li, Asymptotic optimality for Cp, CL, cross-validation and generalized cross-validation: Discrete index set, Ann. Stat., 15 (1987), 958–975. https://doi.org/10.1214/aos/1176350486 doi: 10.1214/aos/1176350486
    [13] Q. Liu, R. Okui, Heteroskedasticity-robust Cp model averaging, Econom. J., 16 (2013), 463–472. https://doi.org/10.1111/ectj.12009 doi: 10.1111/ectj.12009
    [14] M. Müller, Estimation and testing in generalized partial linear models–-A comparative study, Stat. Comput., 11 (2001), 299–309. https://doi.org/10.1023/A:1011981314532 doi: 10.1023/A:1011981314532
    [15] C. A. Rolling, Y. Yang, Model selection for estimating treatment effects, J. R. Stat. Soc. B, 76 (2014), 749–769. https://doi.org/10.1111/rssb.12043 doi: 10.1111/rssb.12043
    [16] C. A. Rolling, Y. Yang, D. Velez, Combining estimates of conditional treatment effects, Economet. Theor., 35 (2019), 1089–1110. https://doi.org/10.1017/S0266466618000397 doi: 10.1017/S0266466618000397
    [17] P. R. Rosenbaum, D. B. Rubin, The central role of the propensity score in observational studies for causal effects, Biometrika, 70 (1983), 41–55. https://doi.org/10.2307/2335942 doi: 10.2307/2335942
    [18] D. B. Rubin, Estimating causal effects of treatments in randomized and nonrandomized studies, J. Educ. Psychol., 66 (1974), 688–701. https://doi.org/10.1037/h0037350 doi: 10.1037/h0037350
    [19] D. B. Rubin, Assignment to treatment group on the basis of a covariate, J. Educ. Behav. Stat., 2 (1977), 1–26. https://doi.org/10.2307/1164933 doi: 10.2307/1164933
    [20] T. A. Severini, J. G. Staniswalis, Quasi-likelihood estimation in semiparametric models, J. Am. Stat. Assoc., 89 (1994), 501–511. https://doi.org/10.2307/2290852 doi: 10.2307/2290852
    [21] C. J. Stone, Optimal global rates of convergence for nonparametric regression, Ann. Stat., 10 (1982), 1040–1053. https://doi.org/10.1214/aos/1176345969 doi: 10.1214/aos/1176345969
    [22] L. Tian, A. A. Alizadeh, A. J. Gentles, R. Tibshirani, A simple method for estimating interactions between a treatment and a large number of covariates, J. Am. Stat. Assoc., 109 (2014), 1517–1532. https://doi.org/10.1080/01621459.2014.951443 doi: 10.1080/01621459.2014.951443
    [23] W. Whittle, Bounds for the moments of linear and quadratic forms in independent variables, Theory Probab. Appl., 5 (1960), 302–305. https://doi.org/10.1137/1105028 doi: 10.1137/1105028
    [24] Z. Tan, On doubly robust estimation for logistic partially linear models, Stat. Probab. Lett., 155 (2019), 108577. https://doi.org/10.1016/j.spl.2019.108577 doi: 10.1016/j.spl.2019.108577
    [25] J. Zeng, W. Cheng, G. Hu, Y. Rong, Model selection and model averaging for semiparametric partially linear models with missing data, Commun. Stat.-Theor. M., 48 (2019), 381–395. https://doi.org/10.1080/03610926.2017.1410717 doi: 10.1080/03610926.2017.1410717
    [26] X. Zhang, A. T. Wan, G. Zou, Model averaging by jackknife criterion in models with dependent data, J. Econ., 174 (2013), 82–94. https://doi.org/10.1016/j.jeconom.2013.01.004 doi: 10.1016/j.jeconom.2013.01.004
    [27] X. Zhang, W. Wang, Optimal model averaging estimation for partially linear models, Stat. Sin., 29 (2019), 693–718. https://doi.org/10.2139/ssrn.2948380 doi: 10.2139/ssrn.2948380
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(874) PDF downloads(48) Cited by(0)

Figures and Tables

Figures(5)  /  Tables(2)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog