1.
Introduction
The model considered in this paper is a classical semi-parametric model, varying-coefficient partially linear model, and it has the following form
where Y is the response variable, X, Z and T are q−dimensional, p−dimensional and one-dimensional covariates, respectively. β=(β1,⋯,βp)T is a p−dimensional unknown parameter vector, θ(⋅)=(θ1(⋅),⋯,θq(⋅))T is a q−dimensional unknown non-parametric function vector, ε is the random error and satisfies E(ε|X,Z,T)=0. Model (1.1) has been well studied by many statisticians, see the literatures, for example, Fan and Huang [1], You and Zhou [2], Huang and Zhang [3], Zhao [4], Feng, Zhang and Lu [5] among others.
In practical applications, missing data problems are frequently encountered in almost all research areas, such as psychological sciences, medical studies, industrial and agricultural production. The complete-case (CC) method will lose the estimation efficiency due to the disregard of the information from the missing values, and may result in biased results if the data is not missing completely at random. For details, see Little and Rubin [6]. The inverse probability weighted (IPW) method is another frequently used method dated back to Horvitz and Thompson [7] that can be applied to the case of missing covariates. This method is to take the inverse of the selection probability as the weight to the fully observed data, and under missing at random (MAR) assumption this method is unbiased. It has attracted much attention in statistical analysis with missing data, but still doesn't make full use of the incomplete data. The imputation method is a popularly method to deal with missing responses in many studies which was introduced by Yates [8]. The concept of imputation is to fill in each missing data with a suitable value, and then use the observed value and the imputed value for statistical inference by the standard method. This method can improve the efficiency of the resulted estimators, see the literatures, for example, Cheng [9], Wang and Rao [10,11], Wang, Linton and Hardle [12], and so forth. In order to further improve the efficiency of estimation, Robins, Rotnitzky and Zhao [13] propose an augmented inverse probability weighted (AIPW) method. This method has the double robustness, that is, if the selection probability and the conditional expectation function are both correctly specified, the resulted estimator will reach the semi-parametric effective bound, and if either of the two assumed models is correctly specified, the estimator is consistent, see the details in Robins and Rotnitzky [14] and Scharfstein, Rotnitzky and Robins [15]. In subsequent ten years, the doubly robust estimation has been well studied, see for example, Kang and Schafer [16], Qin, Shao and Zhang [17], Cao, Tsiatis and Davidian [18], Han [19], and Rotnitzky et al. [20].
However, double robustness does not provide sufficient protection for estimation consistency, since it allows only one model for the selection probability and one for the conditional expectation function. It is often risky to assume that one of these two models is correctly specified with an unknown data generating process. Noticed this, Han and Wang [21] propose multiple robust estimator for the population mean when the response variable is subject to ignorable missingness. They suggest multiple models for both the selection probability function and the outcome regression model, and the resulted estimator is consistent if any of the multiple models is correctly specified, and attains the semi-parametric efficiency bound when one selection probability and one outcome regression model are correctly specified, without requiring knowledge of which models are correct. For the details please resort to Han and Wang [21]. Subsequently, Han [22] studies the multiple robust estimator for the linear regression model. He discusses the numerical implementation of the proposed method through a modified Newton-Raphson algorithm, derives the asymptotic distribution of the resulted estimator and provides some ways to improve the estimation efficiency. Later, Sun, Wang and Han [23] propose multiple robust kernel estimating equations (MRKEEs) for nonparametric regression, demonstrate its multiple robustness, and show that the resulted estimator achieves the optimal efficiency within the class of augmented inverse propensity weighted (AIPW) kernel estimators when including correctly specified models for both the missingness mechanism and the outcome regression. Please refer to Sun, Wang and Han [23] for more discussion. In addition, the multiple robust estimation with nonignorably missing data has been studied recently, and here we just list some literatures, see for example, Han [24] and Li, Yang and Han [25].
To the best of our knowledge, the multiple robust estimation for the parameters of the varying-coefficient partially linear model with response missing at random has not been studied. So in this paper, applying the idea of Han [22] and Sun, Wang and Han [23], we consider the multiple robust estimation method for the parameters of the varying-coefficient partially linear model with missing response, and the proposed method is demonstrated superior over the existing competitors via simulation studies.
This paper is organized as follows. The proposed estimation technique and its multiple robustness are presented in Section 2. Numerical simulation studies are conducted in Section 3 in order to examine the performance of the proposed method. The technical proofs are also provided in Section 4. Conclusions are summarized in Section 5.
2.
The proposed estimator
Suppose the available incomplete data {(Ri,Yi,Xi,Zi,Ti),i=1,2,…,n} is a random sample from model (1.1), that is
where Ri is an indicator variable, when Yi can be observed, then Ri=1, and when Yi is missing with Ri=0. The covariate Xi,Zi and Ti are all observed. Following Han [22] and Sun, Wang and Han [23], we also suppose the auxiliary variables Si relate to (Ri,Yi,Xi,Zi,Ti) is available. Just as Han [22] points out that the auxiliary variables do not enter the regression model and are not of direct statistical interest, but they can reduce the impact of missing data on estimation and improve the estimation efficiency. Let Vi=(XTi,ZTi)T denote the covariates. The missing mechanism we assume in this paper is MAR mechanism that commonly used in practice. Specifically, given the covariates Vi, Ti and the available auxiliary variables Si, the missing of Yi is independent of Yi, that is,
Here we assume that π(⋅) is only related to V and S.
We first carry out the estimator of the varying coefficient functions θ(⋅). For any t in a small neighborhood of t0, using the local linear fitting for θj(t),j=1,2,…,q, we have
Suppose the parameter β is known, and then minimizing the following objective function
about (aj,bj),j=1,2,…,q, we can obtain the estimator of θ(t) at t0, where Kh(⋅)=h−1k(⋅/h), k(⋅) is a kernel function, and h is the bandwidth. Let
and
then the estimator of the coefficient functions θ(t) at t0 is given by
Substituting (2.3) into (2.1), we obtain
where ˜Yi=Yi−XTiˆg(Ti), ˜Zi=Zi−ˆμT(Ti)Xi with ˆg(t)=∑nk=1Sk(t)Yk and ˆμ(t)=∑nk=1Sk(t)ZTk.
For model (2.4), using the complete data, the CC estimator of β can be obtained by solving the following estimation equation
where
From Little and Rubin [6] we know that the CC estimator maybe biased unless the missing mechanism is missing completely at random. So following the works of Robins, Rotnitzky and Zhao [13], the doubly robust estimator ˆβAIPW of β can be defined by
where ˆπ(Vi,Si) is some estimated value of π(Vi,Si), ηi(β)=E[ˆξi(β)|Vi,Ti,Si]. ˆβAIPW has been improved in terms of consistency, but in practice it is still a great risk to assume that one of the two assumed models is correctly specified. So inspired by Han [22] and Sun, Wang and Han [23], next we shall give the multiple robust estimation for β.
Suppose there are J and K models used to estimate π(V,S) and E(Y|V,T,S). Let P={πj(αj):j=1,⋯,J} and F={ak(γk):k=1,⋯,K} denote the set of these two models respectively, where αj and γk are the corresponding parameters.
Let ˆαj, ˆγk be the estimator of αj, γk respectively. Usually, ˆαj can be obtained by maximizing the binomial likelihood
According to the property of MAR assumption, it can be seen that Y and R are conditionally independent with respect to (V,T,S), that is, E(Y|V,T,S)=E(Y|R=1,V,T,S). Therefore, using the complete observation data to fit the model ak(γk), we can obtain ˆγk. Let ˆβk be the solution of
Obviously, ˆβk is an estimated value of β.
Next, let m=∑ni=1Ri represents the number of the observable response variables. Without loss of generality, R1=⋯=Rm=1, Rm+1=⋯=Rn=0. Let ω(V,S)=1π(V,S), similar to Han [22], the following formulas hold
where j=1,⋯,J, k=1,⋯,K, Uk(β,γk)={Z−μT(T)X}{ak(γk)−XTθ(T)−ZTβ}. Therefore, the weights ωi,i=1,⋯,m can be defined by
where
Based on the empirical likelihood method, under the above constraints, the Lagrange multiplier method is used to solve the maximum value problem of ∏mi=1ωi, and we use the solution as the weight ωi(i=1,⋯,m) to estimate the parameter β. For ease of presentation, let ˆαT={(ˆα1)T,⋯,(ˆαJ)T}, ˆβT={(ˆβ1)T,⋯,(ˆβK)T}, ˆγT={(ˆγ1)T,⋯,(ˆγK)T}, and ˆgi(ˆα,ˆβ,ˆγ)T=[π1i(ˆα1)−ν1(ˆα1),⋯,πJi(ˆαJ)−νJ(ˆαJ), {ˆU1i(ˆβ1,ˆγ1)−η1(ˆβ1,ˆγ1)}T,⋯,{ˆUKi(ˆβK,ˆγK)−ηK(ˆβK,ˆγK)}T]. Based on the empirical likelihood theory, we have
where ˆρT=(ˆρ1,⋯,ˆρJ+pK) is the (J+pK)-dimension Lagrange multiplier, and is the solution of
Due to the non-negativity of the weight ˆωi, ˆρ satisfies
So we can solve the equation
to obtain the multiple robust estimator of the parameter β, denoted by ˆβMR.
In calculation of the weight ˆωi, the Lagrange multiplier ˆρ is essential. The calculation algorithm we used is similar to Han [22], for the details please refer to Han [22], here we omit.
The multiple robustness of ˆβMR is given by the following theorem.
Theorem 2.1. Suppose that the conditions C1–C5 in Section 4 hold, and if P contains a model that correctly specifies π(V,S), or F contains a correctly specified model for E(Y|V,T,S), then ∑mi=1ˆωiˆξi(β)P⟶0 with n→∞.
3.
Simulation study
In this section, we conduct some numerical simulations to evaluate the feasibility of the above method and the finite sample performance of the proposed estimator ˆβMR. Several indices of multiple robust estimates, inverse probability weighted estimates, and augmented inverse probability weighted estimates are compared and analyzed under different sample sizes.
We consider five mutually independent covariates, namely: X∼N(0,1),T∼U(0,1),Z1∼N(1,5),Z2∼B(0.5,1),Z3∼N(0,1). The response variable is generated by the model Y=XTθ(T)+ZTβ+ε, where θ(t)=sin(πt) and β=(1,1,2)T. In addition, We consider three auxiliary variable, namely S(1)=1+Z(1)−Z(2)+ε1, S(2)=I{S(1)+0.4ε2>2.8}, S(3)=exp[{S(1)/9}2]+ε3, where I(⋅) is an indicator function. (ε,ε1,ε2,ε3)T∼N(0,Σ). The diagonal elements of the matrix Σ are 1,0.5,1,2, the elements at positions (1,2) and (2,1) are 0.5, and the remaining elements are all 0. The probability of selection is logit{π(V,S)}=3.5−5S(2), under which there are approximately 34% of the subjects with missing Y. The models for correctly estimating π(V,S) and E(Y|V,T,S) are logit{π1(α1)}=α11+α12S(2) and a1(γ1)=XTθ(T)+γ11Z1+γ12Z2+γ13Z3+γ14S(3) respectively. In addition, we also use two incorrect models in the simulation process, namely logit{π2(α2)}=α21+α22Z1+α23Z2+α24Z3, a2(γ2)=XT(−4T2+4T)+γ21Z1+γ22Z2+γ23Z3+γ24S(3). For simplicity, we use the Rule of Thumb method to obtain the optimal bandwidth when estimating the nonparametric functions, that is, h=1.06∗{min(qr,sig)}∗n−1/5, where sig is the standard deviation of covariate T, qr=(Q3−Q1)/1.34, Q1 and Q3 are the first and third quartile, respectively. In simulation, we generate random samples with n=200 and n=500 respectively, and repeat the process 500 times to calculate the average biases, mean squared errors (MSEs), the root of mean squared errors (RMSEs) and median absolute error (MAEs).
In order to verify the superiority of the multiple robust estimation method, we give the calculated indices of the parameter β under different estimation methods, which are the inverse probability weighted estimates ˆβIPW, and the augmented inverse probability weighted estimates ˆβAIPW and multiple robust estimates ˆβMR. To distinguish all the estimators constructed based on different methods and models, each estimator is assigned a name with the form "Method-0000", where each digit of the four-digit number, from left to right, indicates whether π1(α1),π2(α2),a1(γ1),a2(γ2) is used in the construction (1 means yes, 0 means no), respectively. The simulation results are reported in Table 1 and Table 2 with the sample size n=200 and n=500.
It can be seen from the two tables that regardless of the estimation method, the larger the sample size, the better the estimation effect. And when the models for estimating the selection probability and the conditional expectation are all specified correctly, the estimated results obtained by the multiple robust estimation method, the inverse probability weighted estimation method and the augmented inverse probability weighted estimation method are not much different, but the effect of multiple robust estimation is better in terms of MSE. When all the models for estimating the selection probability and the conditional expectation are specified incorrectly, the AIPW−0101 has unsatisfactory effects, the resulted estimators have larger deviations, but our proposed MRE−0101, despite using two incorrect models, can generate better estimators. The interesting observation that ˆβMR seems to still provide a reasonable (at least not too bad) estimate of β even if there is no model correctly specified is similar to Han [22]. In a word, it is obvious that our proposed multiple robust estimation method is better than the two competitors.
4.
Proofs
Before we give the proof of Theorem 2.1, some notations and interpretations are presented firstly.
Let Φ(t)=E[RXZT|T=t], Ψ(t)=E[RXXT|T=t], then
Substituting (4.1) into (2.1), we obtain
where ˇYi=Yi−XTig(Ti), ˇZi=Zi−μT(Ti)Xi, with g(Ti)={Ψ(Ti)}−1E[RiXiYi|Ti], μ(Ti)={Ψ(Ti)}−1Φ(Ti). From model (4.2), using the complete data, the CC estimator of β can be obtained by solving the following estimation equation
where ξi(β)=ˇZi(ˇYi−ˇZTiβ)=(Zi−μT(Ti)Xi)[Yi−XTig(Ti)−(Zi−μT(Ti)Xi)Tβ], and E[ξi(β)]=0.
Suppose C to be a positive constant which can represent different values, and assume the following conditions C1–C5 hold.
C1 The bandwidth h satisfies h=Cn−1/5, that is h→0 and nh→∞ as n→∞, where C>0 is a given positive constant.
C2 The kernel function K(⋅) is a symmetric probability kernel function, and ∫t2K(t)dt≠0, ∫t4K(t)dt<∞.
C3 For each t∈(0,1), f(t),Φ(t),Ψ(t) and θ(t) are twice continuous differentiable at point t, where f(t) is the density function of the variable T.
C4 sup0≤t≤1E[ε4i|Ti=t]<∞, sup0≤t≤1E[X4ir|Ti=t]<∞, and they are continuous about t, where Xir is the r-th component of Xi, i=1,⋯,n, r=1,⋯,q.
C5 For a given t, Ψ(t) is a positive definite matrix.
Next, a Lemma is needed in proof of Theorem 2.1, and the proof can be found in Zhao [4].
Lemma 4.1. Suppose conditions C1–C5 hold, then we have
where Cn=h2+(log(1/h)nh)1/2.
Proof of Theorem 2.1: Assuming that P contains a model that correctly specifies π(V,S), without loss of generality, let π1(α1) be the model, α10 represents the truth value of α1, that is π1(α10)=π(V,S). Next, we combine the theory of empirical likelihood to prove that ˆβMR is a consistent estimator of β.
Referring to the method in Han [22] to establish the relationship between the weight ˆωi and the empirical likelihood on the biased sample. Let pi represent the conditional empirical probability on the biased sample (Yi,Xi,Zi,Ti,Si),Ri=1,i=1,⋯,m, based on (2.8),(2.9) and ω(V,S)=1π1(α10), a more reasonable value of pi can be given by the following constrained optimization problem:
Using the Lagrange multiplier method again, we get
where ˆλT=(ˆλ1,⋯,ˆλJ+pK) is the (J+pK)-dimensional Lagrange multiplier, and satisfies
Due to the non-negativity of ˆpi, ˆλ satisfies 1+ˆλTˆgi(ˆα,ˆβ,ˆγ)/π1i(ˆα1)>0,i=1,⋯,m. Since
then the solution of (2.11), ˆρ, can be written as ˆρ1=(ˆλ1+1)/ν1(ˆα1) and ˆρl=ˆλl/ν1(ˆα1),l=2,⋯,J+pK. Therefore
Just like White [24], let αj∗, βk∗ and γk∗ are the minimum points of the corresponding Kullback-Leibler distance respectively, then we have ˆαjP⟶αj∗,ˆβkP⟶βk∗,ˆγkP⟶γk∗, and n1/2(ˆαj−αj∗), n1/2(ˆβk−βk∗) and n1/2(ˆγk−γk∗) are bounded by probability. At the same time, νj(ˆαj)P⟶νj∗, ηk(ˆβk,ˆλk)P⟶μk∗, where νj∗=E[πj(αj∗)], μk∗=E[Uk(βk∗,γk∗)]. Generally speaking, when the model πj(αj) for π(V,S) is correctly specified, we have πj(αj∗)=π(V,S), and when the model ak(γk) for E(Y|V,T,S) is correctly specified, we have ak(γk∗)=E(Y|V,T,S). Let αT∗={(α1∗)T,⋯,(αJ∗)T}, βT∗={(β1∗)T,⋯,(βK∗)T}, γT∗={(γ1∗)T,⋯,(γK∗)T}, and suppose ˆρP⟶ρ∗.
Based on the empirical likelihood theory, it can be known that ˆλP⟶0. According to the appendix in Han [22], ˆλ=Op(n−1/2) holds. Since the model π1(α1) is correct, then we have mnP⟶ν1∗, and
Refer to Zhao [4], since
and E[Xiεi]=0, E[(Zi−μT(Ti)Xi)Xi)T]=0, we have
Combine conditions C1, C4, C5 and Lemma 4.1, we have
That is, 1nn∑i=1Riπ1i(α1∗)ˆξi(β)P⟶1nn∑i=1Riπ1i(α1∗)ξi(β). Then we have
Therefore, when n→∞, β is the solution of the formula (2.13), which shows that ˆβMR is a consistent estimator of β.
Next, suppose that F contains a model that correctly specifies E(Y|V,T,S). Without loss of generality, let a1(γ1) be the true model and γ10 be the true value of γ1, that is a1(γ10)=E(Y|V,T,S), and γ1∗=γ10. A previous constraint is actually
and ˆβ1P⟶β1∗=β, so we get 1nn∑i=1ˆU1i(ˆβ1,ˆγ1)P⟶0.
Let g(α∗,β∗,γ∗)T=[π1(α1∗)−ν1∗,⋯,πJ(αJ∗)−νJ∗,{U1(β1∗,γ1∗)−η1∗}T,⋯,{UK(βK∗,γK∗)−ηK∗}T], due to 1nn∑i=1ˆU1i(ˆβ1,ˆγ1)P⟶1nn∑i=1U1i(β,γ10),‖1nn∑i=1[ˆξi(β)−ξi(β)]‖P⟶0, and E[U1(β,γ10)]=0, then we have
This shows that ˆβMR is a consistent estimator of β.
So the proof of Theorem 2.1 is completed.
5.
Conclusions
In this article, we have proposed the multiple robust estimators for parameters in varying-coefficient partially linear model with missing response at random, and the multiple robustness of our proposals has been shown theoretically under some regular conditions. Our simulation studies fully demonstrate the superiority of our multiple robust estimation method through Table 1 and Table 2. Finally, we point out some problems for the future researches. First, we only discuss the multiple robust estimation process of parameters, and the fitting of nonparametric function curves can be expanded in the future studies. Next, based on the model in this article, if the missing mechanism is nonignorable missing, how to obtain the robust estimation of parameters is also worth studying.
Acknowledgments
The research is supported by NSF projects (ZR2021MA077 and ZR2019MA016) of Shandong Province of China.
Conflict of interest
The authors declare that they have no conflicts of interest to this work.