Multiple robust estimation of parameters in varying-coefficient partially linear model with response missing at random

Yaxin Zhao; Xiuli Wang; Yaxin Zhao; Xiuli Wang

doi:10.3934/mmc.2022004

Mathematical Modelling and Control

2022, Volume 2, Issue 1: 24-33. doi: 10.3934/mmc.2022004

Previous Article Next Article

Research article

Multiple robust estimation of parameters in varying-coefficient partially linear model with response missing at random

Yaxin Zhao ,
Xiuli Wang ^,

School of Mathematics and Statistics, Shandong Normal University, Ji'nan 250358, China

Received: 25 December 2021 Revised: 06 February 2022 Accepted: 10 February 2022 Published: 08 March 2022

In this paper, we consider the multiple robust estimation of the parameters in the varying-coefficient partially linear model with response missing at random. The multiple robust estimation method is proposed, and the multiple robustness of the proposed method is proved. Numerical simulations are conducted to investigate the finite sample performance of the proposed estimators compared with other competitors.

Keywords:

Citation: Yaxin Zhao, Xiuli Wang. Multiple robust estimation of parameters in varying-coefficient partially linear model with response missing at random[J]. Mathematical Modelling and Control, 2022, 2(1): 24-33. doi: 10.3934/mmc.2022004

Related Papers:

[1]	Marcelo Menezes Morato, Vladimir Stojanovic . A robust identification method for stochastic nonlinear parameter varying systems. Mathematical Modelling and Control, 2021, 1(1): 35-51. doi: 10.3934/mmc.2021004
[2]	Biresh Kumar Dakua, Bibhuti Bhusan Pati . A frequency domain-based loop shaping procedure for the parameter estimation of the fractional-order tilt integral derivative controller. Mathematical Modelling and Control, 2024, 4(4): 374-389. doi: 10.3934/mmc.2024030
[3]	Muhammad Nawaz Khan, Imtiaz Ahmad, Mehnaz Shakeel, Rashid Jan . Fractional calculus analysis: investigating Drinfeld-Sokolov-Wilson system and Harry Dym equations via meshless procedures. Mathematical Modelling and Control, 2024, 4(1): 86-100. doi: 10.3934/mmc.2024008
[4]	Yanchao He, Yuzhen Bai . Finite-time stability and applications of positive switched linear delayed impulsive systems. Mathematical Modelling and Control, 2024, 4(2): 178-194. doi: 10.3934/mmc.2024016
[5]	Weiwei Han, Zhipeng Zhang, Chengyi Xia . Modeling and analysis of networked finite state machine subject to random communication losses. Mathematical Modelling and Control, 2023, 3(1): 50-60. doi: 10.3934/mmc.2023005
[6]	Xiaoyu Ren, Ting Hou . Pareto optimal filter design with hybrid $H_{2} /H_{\infty}$ optimization. Mathematical Modelling and Control, 2023, 3(2): 80-87. doi: 10.3934/mmc.2023008
[7]	Hongyu Ma, Dadong Tian, Mei Li, Chao Zhang . Reachable set estimation for 2-D switched nonlinear positive systems with impulsive effects and bounded disturbances described by the Roesser model. Mathematical Modelling and Control, 2024, 4(2): 152-162. doi: 10.3934/mmc.2024014
[8]	Ruxin Zhang, Zhe Yin, Ailing Zhu . Numerical simulations of a mixed finite element method for damped plate vibration problems. Mathematical Modelling and Control, 2023, 3(1): 7-22. doi: 10.3934/mmc.2023002
[9]	Hassan Alsuhabi . The new Topp-Leone exponentied exponential model for modeling financial data. Mathematical Modelling and Control, 2024, 4(1): 44-63. doi: 10.3934/mmc.2024005
[10]	Vladimir Djordjevic, Ljubisa Dubonjic, Marcelo Menezes Morato, Dragan Prsic, Vladimir Stojanovic . Sensor fault estimation for hydraulic servo actuator based on sliding mode observer. Mathematical Modelling and Control, 2022, 2(1): 34-43. doi: 10.3934/mmc.2022005

Abstract

1. Introduction

The model considered in this paper is a classical semi-parametric model, varying-coefficient partially linear model, and it has the following form

$\begin{align} Y = X^T\theta(T)+Z^T\beta+\varepsilon, \end{align}$

(1.1)

where $Y$ is the response variable, $X$ , $Z$ and $T$ are $q-$ dimensional, $p-$ dimensional and one-dimensional covariates, respectively. $\beta = (\beta_1, \cdots, \beta_p)^T$ is a $p-$ dimensional unknown parameter vector, $\theta(\cdot) = (\theta_1(\cdot), \cdots, \theta_q(\cdot))^T$ is a $q-$ dimensional unknown non-parametric function vector, $\varepsilon$ is the random error and satisfies $E(\varepsilon|X, Z, T) = 0$ . Model (1.1) has been well studied by many statisticians, see the literatures, for example, Fan and Huang ^[1], You and Zhou ^[2], Huang and Zhang ^[3], Zhao ^[4], Feng, Zhang and Lu ^[5] among others.

In practical applications, missing data problems are frequently encountered in almost all research areas, such as psychological sciences, medical studies, industrial and agricultural production. The complete-case (CC) method will lose the estimation efficiency due to the disregard of the information from the missing values, and may result in biased results if the data is not missing completely at random. For details, see Little and Rubin ^[6]. The inverse probability weighted (IPW) method is another frequently used method dated back to Horvitz and Thompson ^[7] that can be applied to the case of missing covariates. This method is to take the inverse of the selection probability as the weight to the fully observed data, and under missing at random (MAR) assumption this method is unbiased. It has attracted much attention in statistical analysis with missing data, but still doesn't make full use of the incomplete data. The imputation method is a popularly method to deal with missing responses in many studies which was introduced by Yates ^[8]. The concept of imputation is to fill in each missing data with a suitable value, and then use the observed value and the imputed value for statistical inference by the standard method. This method can improve the efficiency of the resulted estimators, see the literatures, for example, Cheng ^[9], Wang and Rao ^[10,11], Wang, Linton and Hardle ^[12], and so forth. In order to further improve the efficiency of estimation, Robins, Rotnitzky and Zhao ^[13] propose an augmented inverse probability weighted (AIPW) method. This method has the double robustness, that is, if the selection probability and the conditional expectation function are both correctly specified, the resulted estimator will reach the semi-parametric effective bound, and if either of the two assumed models is correctly specified, the estimator is consistent, see the details in Robins and Rotnitzky ^[14] and Scharfstein, Rotnitzky and Robins ^[15]. In subsequent ten years, the doubly robust estimation has been well studied, see for example, Kang and Schafer ^[16], Qin, Shao and Zhang ^[17], Cao, Tsiatis and Davidian ^[18], Han ^[19], and Rotnitzky et al. ^[20].

However, double robustness does not provide sufficient protection for estimation consistency, since it allows only one model for the selection probability and one for the conditional expectation function. It is often risky to assume that one of these two models is correctly specified with an unknown data generating process. Noticed this, Han and Wang ^[21] propose multiple robust estimator for the population mean when the response variable is subject to ignorable missingness. They suggest multiple models for both the selection probability function and the outcome regression model, and the resulted estimator is consistent if any of the multiple models is correctly specified, and attains the semi-parametric efficiency bound when one selection probability and one outcome regression model are correctly specified, without requiring knowledge of which models are correct. For the details please resort to Han and Wang ^[21]. Subsequently, Han ^[22] studies the multiple robust estimator for the linear regression model. He discusses the numerical implementation of the proposed method through a modified Newton-Raphson algorithm, derives the asymptotic distribution of the resulted estimator and provides some ways to improve the estimation efficiency. Later, Sun, Wang and Han ^[23] propose multiple robust kernel estimating equations (MRKEEs) for nonparametric regression, demonstrate its multiple robustness, and show that the resulted estimator achieves the optimal efficiency within the class of augmented inverse propensity weighted (AIPW) kernel estimators when including correctly specified models for both the missingness mechanism and the outcome regression. Please refer to Sun, Wang and Han ^[23] for more discussion. In addition, the multiple robust estimation with nonignorably missing data has been studied recently, and here we just list some literatures, see for example, Han ^[24] and Li, Yang and Han ^[25].

To the best of our knowledge, the multiple robust estimation for the parameters of the varying-coefficient partially linear model with response missing at random has not been studied. So in this paper, applying the idea of Han ^[22] and Sun, Wang and Han ^[23], we consider the multiple robust estimation method for the parameters of the varying-coefficient partially linear model with missing response, and the proposed method is demonstrated superior over the existing competitors via simulation studies.

This paper is organized as follows. The proposed estimation technique and its multiple robustness are presented in Section 2. Numerical simulation studies are conducted in Section 3 in order to examine the performance of the proposed method. The technical proofs are also provided in Section 4. Conclusions are summarized in Section 5.

2. The proposed estimator

Suppose the available incomplete data $\left\{(R_{i}, Y_{i}, X_{i}, Z_{i}, T_{i}), i = 1, 2, \ldots, n\right\}$ is a random sample from model (1.1), that is

$\begin{align} Y_{i} = X_{i}^T\theta(T_{i})+Z_{i}^T\beta+\varepsilon_{i}, \end{align}$

(2.1)

where $R_{i}$ is an indicator variable, when $Y_{i}$ can be observed, then $R_{i} = 1$ , and when $Y_{i}$ is missing with $R_{i} = 0$ . The covariate $X_{i}, Z_{i}$ and $T_{i}$ are all observed. Following Han ^[22] and Sun, Wang and Han ^[23], we also suppose the auxiliary variables $S_{i}$ relate to $(R_{i}, Y_{i}, X_{i}, Z_{i}, T_{i})$ is available. Just as Han ^[22] points out that the auxiliary variables do not enter the regression model and are not of direct statistical interest, but they can reduce the impact of missing data on estimation and improve the estimation efficiency. Let $V_{i} = (X_{i}^{T}, Z_{i}^{T})^{T}$ denote the covariates. The missing mechanism we assume in this paper is MAR mechanism that commonly used in practice. Specifically, given the covariates $V_{i}$ , $T_{i}$ and the available auxiliary variables $S_{i}$ , the missing of $Y_{i}$ is independent of $Y_{i}$ , that is,

$\begin{align} P\{R_{i} = 1|Y_{i}, V_{i}, T_{i}, S_{i}\} = P\{R_{i} = 1|V_{i}, T_{i}, S_{i}\}\hat{ = }\pi(V_{i}, S_{i}). \end{align}$

(2.2)

Here we assume that $\pi(\cdot)$ is only related to $V$ and $S$ .

We first carry out the estimator of the varying coefficient functions $\theta(\cdot)$ . For any $t$ in a small neighborhood of $t_{0}$ , using the local linear fitting for $\theta_{j}(t), j = 1, 2, \ldots, q$ , we have

$\theta_{j}(t)\approx\theta_{j}(t_{0})+\theta'_{j}(t_{0})(t-t_{0}) = a_{j}+b_{j}(t-t_{0}).$

Suppose the parameter $\beta$ is known, and then minimizing the following objective function

$\sum\limits_{i = 1}^{n}R_{i}\{Y_{i}-Z_{i}^{T}\beta-\sum\limits_{j = 1}^{q}(a_{j}+b_{j}(T_{i}-t_{0}))X_{ij}\}^{2}K_{h}(T_{i}-t_{0})$

about $(a_{j}, b_{j}), j = 1, 2, \ldots, q$ , we can obtain the estimator of $\theta(t)$ at $t_{0}$ , where $K_{h}(\cdot) = h^{-1}k(\cdot/h)$ , $k(\cdot)$ is a kernel function, and $h$ is the bandwidth. Let

$D_{t_{0}} = \left( \begin{array}{ccr} X_{1}^{T} & h^{-1}(T_{1}-t_{0})X_{1}^{T} \\ \vdots &\vdots\\ X_{n}^{T} & h^{-1}(T_{n}-t_{0})X_{n}^{T}\\ \end{array} \right),$

$W_{t_{0}} = {\rm{diag}}(K_{h}(T_{1}-t_{0})R_{1}, K_{h}(T_{2}-t_{0})R_{2}, \cdots, K_{h}(T_{n}-t_{0})R_{n}),$

and

$\begin{equation} \begin{aligned} S(t_{0})& = (I_{q}, 0_{q})(D_{t_{0}}^{T}W_{t_{0}}D_{t_{0}})^{-1}D_{t_{0}}^{T}W_{t_{0}}\\ & = (S_{1}(t_{0}), S_{2}(t_{0}), \ldots, S_{n}(t_{0})), \nonumber \end{aligned} \end{equation}$

then the estimator of the coefficient functions $\theta(t)$ at $t_{0}$ is given by

$\begin{align} \tilde{\theta}(t_{0}) = \sum\limits_{k = 1}^{n}S_{k}(t_{0})(Y_{k}-Z_{k}^{T}\beta). \end{align}$

(2.3)

Substituting (2.3) into (2.1), we obtain

$\begin{align} \tilde{Y}_{i} = \tilde{Z}_{i}^{T}\beta+\varepsilon_{i}, \end{align}$

(2.4)

where $\tilde{Y}_{i} = Y_{i}-X_{i}^{T}\hat{g}(T_{i})$ , $\tilde{Z}_{i} = Z_{i}-\hat{\mu}^{T}(T_{i})X_{i}$ with $\hat{g}(t) = \sum_{k = 1}^{n}S_{k}(t)Y_{k}$ and $\hat{\mu}(t) = \sum_{k = 1}^{n}S_{k}(t)Z_{k}^{T}$ .

For model (2.4), using the complete data, the CC estimator of $\beta$ can be obtained by solving the following estimation equation

$\begin{align} \sum\limits_{i = 1}^{n}R_{i}\hat{\xi}_{i}(\beta) = 0, \end{align}$

(2.5)

where

$\begin{equation} \begin{aligned} \hat{\xi}_{i}(\beta)& = \tilde{Z}_{i}(\tilde{Y}_{i}-\tilde{Z}_{i}^{T}\beta)\\ & = (Z_{i}-\hat{\mu}^{T}(T_{i})X_{i})[Y_{i}-X_{i}^{T}\hat{g}(T_{i})-(Z_{i}-\hat{\mu}^{T}(T_{i})X_{i})^{T}\beta].\nonumber \end{aligned} \end{equation}$

From Little and Rubin ^[6] we know that the CC estimator maybe biased unless the missing mechanism is missing completely at random. So following the works of Robins, Rotnitzky and Zhao ^[13], the doubly robust estimator $\hat{\beta}_{AIPW}$ of $\beta$ can be defined by

$\begin{align} \frac{1}{n}\sum\limits_{i = 1}^{n}\{\frac{R_{i}}{\hat{\pi}(V_{i}, S_{i})}\hat{\xi}_{i}(\beta)-\frac{R_{i}-\hat{\pi}(V_{i}, S_{i})}{\hat{\pi}(V_{i}, S_{i})}\eta_{i}(\beta)\} = 0, \end{align}$

(2.6)

where $\hat{\pi}(V_{i}, S_{i})$ is some estimated value of $\pi(V_{i}, S_{i})$ , $\eta_{i}(\beta) = E[\hat{\xi}_{i}(\beta)|V_{i}, T_{i}, S_{i}]$ . $\hat{\beta}_{AIPW}$ has been improved in terms of consistency, but in practice it is still a great risk to assume that one of the two assumed models is correctly specified. So inspired by Han ^[22] and Sun, Wang and Han ^[23], next we shall give the multiple robust estimation for $\beta$ .

Suppose there are $J$ and $K$ models used to estimate $\pi(V, S)$ and $E(Y|V, T, S)$ . Let $\mathscr{P} = \{\pi^j(\alpha^j): j = 1, \cdots, J\}$ and $\mathscr{F} = \{a^k(\gamma^k): k = 1, \cdots, K\}$ denote the set of these two models respectively, where $\alpha^j$ and $\gamma^k$ are the corresponding parameters.

Let $\hat{\alpha}^j$ , $\hat{\gamma}^k$ be the estimator of $\alpha^j$ , $\gamma^k$ respectively. Usually, $\hat{\alpha}^j$ can be obtained by maximizing the binomial likelihood

$\prod \limits_{i = 1}^{n}\{\pi_i^j(\alpha^j)\}^{R_i}\{1-\pi_i^j(\alpha^j)\}^{1-R_i}.$

According to the property of MAR assumption, it can be seen that $Y$ and $R$ are conditionally independent with respect to $(V, T, S)$ , that is, $E(Y|V, T, S) = E(Y|R = 1, V, T, S)$ . Therefore, using the complete observation data to fit the model $a^k(\gamma^k)$ , we can obtain $\hat{\gamma}^k$ . Let $\hat{\beta}^k$ be the solution of

$\begin{align} \frac{1}{n}\sum\limits_{i = 1}^{n}\{Z_{i}-\hat{\mu}^{T}(T_{i})X_{i}\}\{R_{i}Y_{i}+(1-R_i)a_i^k(\hat{\gamma}^k)-X_i^T\tilde{\theta}(T_i)-Z_i^T\beta\} = 0. \end{align}$

(2.7)

Obviously, $\hat{\beta}^k$ is an estimated value of $\beta$ .

Next, let $m = \sum^{n}_{i = 1}R_i$ represents the number of the observable response variables. Without loss of generality, $R_1 = \cdots = R_m = 1$ , $R_{m+1} = \cdots = R_n = 0$ . Let $\omega(V, S) = \frac{1}{\pi(V, S)}$ , similar to Han ^[22], the following formulas hold

$\begin{align} E(\omega(V, S)[\pi^j(\alpha^j)-E\{\pi^j(\alpha^j)\}]|R = 1) = 0, \end{align}$

(2.8)

$\begin{align} E(\omega(V, S)[U^k(\beta, \gamma^k)-E\{U^k(\beta, \gamma^k)\}]|R = 1) = 0, \end{align}$

(2.9)

where $j = 1, \cdots, J$ , $k = 1, \cdots, K$ , $U^k(\beta, \gamma^k) = \{Z-\mu^{T}(T)X\}\{a^k(\gamma^k)-X^T\theta(T)-Z^T\beta\}$ . Therefore, the weights $\omega_i, i = 1, \cdots, m$ can be defined by

$\begin{array}{ll} \omega_i\geq0, i = 1, \cdots, m;\quad \sum\limits_{i = 1}^{m}\omega_i = 1, \\ \sum\limits_{i = 1}^{m}\omega_i\{\pi_i^j(\hat{\alpha}^j)-\nu^j(\hat{\alpha}^j)\} = 0, j = 1, \cdots, J, \\ \sum\limits_{i = 1}^{m}\omega_i\{\hat{U}_i^k(\hat{\beta}^k, \hat{\gamma}^k)-\eta^k(\hat{\beta}^k, \hat{\gamma}^k)\} = 0, k = 1, \cdots, K, \\ \end{array}$

where

$\nu^j(\hat{\alpha}^j) = \frac{1}{n}\sum\limits_{i = 1}^{n}\pi_i^j(\hat{\alpha}^j), j = 1, \cdots, J,$

$\eta^k(\hat{\beta}^k, \hat{\gamma}^k) = \frac{1}{n}\sum\limits_{i = 1}^{n}\hat{U}_i^k(\hat{\beta}^k, \hat{\gamma}^k), k = 1, \cdots, K,$

$\hat{U}_i^k(\hat{\beta}^k, \hat{\gamma}^k) = \{Z_i-\hat{\mu}^{T}(T_{i})X_{i}\}\{a_i^k(\hat{\gamma}^k)-X_i^T\tilde{\theta}(T_i)-Z_i^T\hat{\beta}^k\}.$

Based on the empirical likelihood method, under the above constraints, the Lagrange multiplier method is used to solve the maximum value problem of $\prod^m_{i = 1}\omega_i$ , and we use the solution as the weight $\omega_i(i = 1, \cdots, m)$ to estimate the parameter $\beta$ . For ease of presentation, let $\hat{\alpha}^T = \{(\hat{\alpha}^1)^T, \cdots, (\hat{\alpha}^J)^T\}$ , $\hat{\beta}^T = \{(\hat{\beta}^1)^T, \cdots, (\hat{\beta}^K)^T\}$ , $\hat{\gamma}^T = \{(\hat{\gamma}^1)^T, \cdots, (\hat{\gamma}^K)^T\}$ , and $\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})^T = [\pi_i^1(\hat{\alpha}^1)-\nu^1(\hat{\alpha}^1), \cdots, \pi_i^J(\hat{\alpha}^J)-\nu^J(\hat{\alpha}^J),$ $\{\hat{U}_i^1(\hat{\beta}^1, \hat{\gamma}^1)-\eta^1(\hat{\beta}^1, \hat{\gamma}^1)\}^T, \cdots, \{\hat{U}_i^K(\hat{\beta}^K, \hat{\gamma}^K)-\eta^K(\hat{\beta}^K, \hat{\gamma}^K)\}^T]$ . Based on the empirical likelihood theory, we have

$\begin{align} \hat{\omega}_i = \frac{1}{m}\frac{1}{1+\hat{\rho}^T\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})}, \quad i = 1, \cdots, m, \end{align}$

(2.10)

where $\hat{\rho}^T = (\hat{\rho}_1, \cdots, \hat{\rho}_{J+pK})$ is the $(J+pK)$ -dimension Lagrange multiplier, and is the solution of

$\begin{align} \frac{1}{m}\sum\limits_{i = 1}^{m}\frac{\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})}{1+\rho^T\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})} = 0. \end{align}$

(2.11)

Due to the non-negativity of the weight $\hat{\omega}_i$ , $\hat{\rho}$ satisfies

$\begin{align} 1+\hat{\rho}^T\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma}) > 0, \quad i = 1, \cdots, m. \end{align}$

(2.12)

So we can solve the equation

$\begin{align} \sum\limits_{i = 1}^{m}\hat{\omega}_i\hat{\xi}_i(\beta) = 0 \end{align}$

(2.13)

to obtain the multiple robust estimator of the parameter $\beta$ , denoted by $\hat{\beta}_{MR}$ .

In calculation of the weight $\hat{\omega}_{i}$ , the Lagrange multiplier $\hat{\rho}$ is essential. The calculation algorithm we used is similar to Han ^[22], for the details please refer to Han ^[22], here we omit.

The multiple robustness of $\hat{\beta}_{MR}$ is given by the following theorem.

Theorem 2.1. Suppose that the conditions C1–C5 in Section 4 hold, and if $\mathscr{P}$ contains a model that correctly specifies $\pi(V, S)$ , or $\mathscr{F}$ contains a correctly specified model for $E(Y|V, T, S)$ , then $\sum_{i = 1}^{m}\hat{\omega}_{i}\hat{\xi}_{i}(\beta) \stackrel{P}\longrightarrow 0$ with $n\rightarrow \infty$ .

3. Simulation study

In this section, we conduct some numerical simulations to evaluate the feasibility of the above method and the finite sample performance of the proposed estimator $\hat{\beta}_{MR}$ . Several indices of multiple robust estimates, inverse probability weighted estimates, and augmented inverse probability weighted estimates are compared and analyzed under different sample sizes.

We consider five mutually independent covariates, namely: $X\sim N(0, 1), T\sim U(0, 1), Z_{1}\sim N(1, 5), Z_{2}\sim B(0.5, 1), Z_{3}\sim N(0, 1)$ . The response variable is generated by the model $Y = X^T\theta(T)+Z^T\beta+\varepsilon$ , where $\theta(t) = \sin(\pi t)$ and $\beta = (1, 1, 2)^{T}$ . In addition, We consider three auxiliary variable, namely $S^{(1)} = 1+Z^{(1)}-Z^{(2)}+\varepsilon_1$ , $S^{(2)} = I\{S^{(1)}+0.4\varepsilon_2 > 2.8\}$ , $S^{(3)} = \exp[\{S^{(1)}/9\}^2]+\varepsilon_3$ , where $I(\cdot)$ is an indicator function. $(\varepsilon, \varepsilon_1, \varepsilon_2, \varepsilon_3)^T\sim N(0, \Sigma)$ . The diagonal elements of the matrix $\Sigma$ are $1, 0.5, 1, 2$ , the elements at positions $(1, 2)$ and $(2, 1)$ are $0.5$ , and the remaining elements are all $0$ . The probability of selection is $\rm{logit}\{\pi(V, S)\} = 3.5-5S^{(2)}$ , under which there are approximately $34\%$ of the subjects with missing $Y$ . The models for correctly estimating $\pi(V, S)$ and $E(Y|V, T, S)$ are $\rm{logit}\{\pi^1(\alpha^1)\} = \alpha_1^1+\alpha_2^1S^{(2)}$ and $a^1(\gamma^1) = X^{T}\theta(T)+\gamma_1^1Z_{1}+\gamma_2^1Z_{2}+\gamma_3^1Z_{3}+\gamma_4^1S^{(3)}$ respectively. In addition, we also use two incorrect models in the simulation process, namely $\rm{logit}\{\pi^2(\alpha^2)\} = \alpha_1^2+\alpha_2^2Z_{1}+\alpha_3^2Z_{2}+\alpha_4^2Z_{3}$ , $a^2(\gamma^2) = X^{T}(-4T^{2}+4T)+\gamma_1^2Z_{1}+\gamma_2^2Z_{2}+\gamma_3^2Z_{3}+\gamma_4^2S^{(3)}$ . For simplicity, we use the Rule of Thumb method to obtain the optimal bandwidth when estimating the nonparametric functions, that is, $h = 1.06*\{min(qr, sig)\}*n^{-1/5}$ , where $sig$ is the standard deviation of covariate $T$ , $qr = (Q_{3}-Q_{1})/1.34$ , $Q_{1}$ and $Q_{3}$ are the first and third quartile, respectively. In simulation, we generate random samples with $n = 200$ and $n = 500$ respectively, and repeat the process $500$ times to calculate the average biases, mean squared errors (MSEs), the root of mean squared errors (RMSEs) and median absolute error (MAEs).

In order to verify the superiority of the multiple robust estimation method, we give the calculated indices of the parameter $\beta$ under different estimation methods, which are the inverse probability weighted estimates $\hat{\beta}_{IPW}$ , and the augmented inverse probability weighted estimates $\hat{\beta}_{AIPW}$ and multiple robust estimates $\hat{\beta}_{MR}$ . To distinguish all the estimators constructed based on different methods and models, each estimator is assigned a name with the form "Method- $0000$ ", where each digit of the four-digit number, from left to right, indicates whether $\pi^1(\alpha^1), \pi^2(\alpha^2), a^1(\gamma^1), a^2(\gamma^2)$ is used in the construction ( $1$ means yes, $0$ means no), respectively. The simulation results are reported in and with the sample size $n = 200$ and $n = 500$ .

Table 1. The biases, MSEs, RMSEs and MAEs (multiplied by

$10^2$ ) of different estimators for parameter

$\beta$ when sample size

$n = 200$ .

Method	$\beta_{1}$				$\beta_{2}$				$\beta_{3}$
Method	Bias	MSE	RMSE	MAE	Bias	MSE	RMSE	MAE	Bias	MSE	RMSE	MAE
IPW-1000	0.071	0.083	2.872	2.296	1.451	2.645	16.26	12.96	0.003	1.360	11.66	9.160
IPW-0100	0.329	0.233	4.826	3.677	0.021	4.591	21.43	16.04	2.113	3.915	19.79	14.65
AIPW-1010	0.030	0.078	2.799	2.211	1.324	2.654	16.29	12.88	0.073	1.402	11.84	9.207
AIPW-1001	0.113	0.074	2.724	2.206	0.644	2.629	16.22	13.18	0.181	1.546	12.44	9.692
AIPW-0110	0.065	0.069	2.625	2.125	0.387	2.283	15.11	12.17	0.232	1.551	12.45	9.778
AIPW-0101	-4.930	130.1	114.1	33.09	-5.591	212.9	145.9	51.96	-1.457	447.1	211.4	59.17
MR-1111	0.066	0.022	2.479	2.193	-0.921	2.341	15.46	12.00	-0.563	0.591	11.18	9.120
MR-1110	0.030	0.021	2.532	2.208	0.637	2.207	15.59	12.08	0.487	0.572	11.23	9.510
MR-1101	0.031	0.023	2.574	2.140	0.602	2.206	15.13	13.26	0.487	0.573	12.01	9.165
MR-1011	0.068	0.020	2.608	2.361	-0.927	2.334	16.35	12.31	-0.544	0.592	12.84	9.997
MR-1010	0.031	0.022	2.427	2.072	0.639	2.207	16.01	12.24	0.501	0.570	11.81	9.772
MR-1001	0.032	0.023	2.899	2.508	0.603	2.208	16.37	13.06	0.501	0.571	12.92	9.328
MR-0111	0.065	0.022	2.623	2.283	-0.909	2.341	15.24	13.52	-0.555	0.590	12.12	9.808
MR-0110	0.029	0.024	2.487	2.904	0.629	2.203	16.20	12.73	0.492	0.573	11.42	9.629
MR-0101	0.121	0.106	3.458	3.140	0.560	5.210	18.53	15.87	-1.064	1.371	16.54	12.61

| Show Table

DownLoad: CSV

Table 2. The biases, MSEs, RMSEs and MAEs (multiplied by

$10^2$ ) of different estimators for parameter

$\beta$ when sample size

$n = 500$ .

Method	$\beta_{1}$				$\beta_{2}$				$\beta_{3}$
Method	Bias	MSE	RMSE	MAE	Bias	MSE	RMSE	MAE	Bias	MSE	RMSE	MAE
IPW-1000	-0.071	0.030	1.732	1.401	0.202	1.023	10.11	8.104	-0.243	0.649	8.054	6.333
IPW-0100	0.208	0.135	3.677	2.869	0.034	4.570	21.38	15.17	0.264	3.559	18.86	13.43
AIPW-1010	0.014	0.027	1.649	1.334	-0.426	1.013	10.06	7.950	0.126	0.611	7.815	5.072
AIPW-1001	-0.009	0.029	1.692	1.334	-0.173	0.964	9.820	7.813	0.364	0.556	7.457	5.692
AIPW-0110	0.017	0.031	1.762	1.380	-0.026	0.870	9.328	7.529	-0.945	0.567	7.527	6.027
AIPW-0101	3.621	165.3	128.6	38.69	-1.019	294.4	171.6	63.55	15.32	123.2	351.0	69.06
MR-1111	0.040	0.018	1.665	1.308	0.257	0.950	9.379	7.060	-0.248	0.387	6.919	4.147
MR-1110	-0.031	0.022	1.689	1.347	0.219	0.913	9.828	7.301	0.114	0.404	6.484	4.991
MR-1101	0.039	0.021	1.669	1.386	-0.460	1.081	10.33	7.582	0.228	0.396	6.873	4.136
MR-1011	0.057	0.020	1.670	1.302	0.265	0.893	9.785	8.108	0.220	0.382	6.622	4.618
MR-1010	0.027	0.015	1.537	1.256	-0.071	0.860	9.603	7.096	0.265	0.359	6.140	5.054
MR-1001	0.044	0.026	1.714	1.350	0.441	1.039	10.36	8.224	0.326	0.462	7.427	6.043
MR-0111	-0.028	0.031	1.746	1.371	0.329	1.014	9.751	7.301	0.164	0.406	7.520	5.150
MR-0110	0.034	0.029	1.657	1.395	0.480	1.076	9.560	8.061	0.326	0.441	7.293	5.875
MR-0101	1.027	0.103	2.853	2.331	0.537	4.031	11.39	10.57	0.931	1.230	11.36	9.480

| Show Table

DownLoad: CSV

It can be seen from the two tables that regardless of the estimation method, the larger the sample size, the better the estimation effect. And when the models for estimating the selection probability and the conditional expectation are all specified correctly, the estimated results obtained by the multiple robust estimation method, the inverse probability weighted estimation method and the augmented inverse probability weighted estimation method are not much different, but the effect of multiple robust estimation is better in terms of MSE. When all the models for estimating the selection probability and the conditional expectation are specified incorrectly, the $AIPW-0101$ has unsatisfactory effects, the resulted estimators have larger deviations, but our proposed $MRE-0101$ , despite using two incorrect models, can generate better estimators. The interesting observation that $\hat{\beta}_{MR}$ seems to still provide a reasonable (at least not too bad) estimate of $\beta$ even if there is no model correctly specified is similar to Han ^[22]. In a word, it is obvious that our proposed multiple robust estimation method is better than the two competitors.

4. Proofs

Before we give the proof of Theorem 2.1, some notations and interpretations are presented firstly.

Let $\Phi(t) = E[RXZ^{T}|T = t]$ , $\Psi(t) = E[RXX^{T}|T = t]$ , then

$\begin{align} \theta(T_{i}) = \{\Psi(T_{i})\}^{-1}\{E[R_{i}X_{i}Y_{i}|T_{i}]-\Phi(T_{i})\beta\}. \end{align}$

(4.1)

Substituting (4.1) into (2.1), we obtain

$\begin{align} \check{Y}_{i} = \check{Z}_{i}^{T}\beta+\varepsilon_{i}, \end{align}$

(4.2)

where $\check{Y}_{i} = Y_{i}-X_{i}^{T}g(T_{i})$ , $\check{Z}_{i} = Z_{i}-\mu^{T}(T_{i})X_{i}$ , with $g(T_{i}) = \{\Psi(T_{i})\}^{-1}E[R_{i}X_{i}Y_{i}|T_{i}]$ , $\mu(T_{i}) = \{\Psi(T_{i})\}^{-1}\Phi(T_{i})$ . From model (4.2), using the complete data, the CC estimator of $\beta$ can be obtained by solving the following estimation equation

$\sum\limits_{i = 1}^{n}R_{i}\xi_{i}(\beta) = 0,$

where $\xi_{i}(\beta) = \check{Z}_{i}(\check{Y}_{i}-\check{Z}_{i}^{T}\beta) = (Z_{i}-\mu^{T}(T_{i})X_{i})[Y_{i}-X_{i}^{T}g(T_{i})-(Z_{i}-\mu^{T}(T_{i})X_{i})^{T}\beta]$ , and $E[\xi_{i}(\beta)] = 0.$

Suppose $C$ to be a positive constant which can represent different values, and assume the following conditions C1–C5 hold.

C1 The bandwidth $h$ satisfies $h = Cn^{-1/5}$ , that is $h\rightarrow0$ and $nh\rightarrow \infty$ as $n\rightarrow \infty$ , where $C > 0$ is a given positive constant.

C2 The kernel function $K(\cdot)$ is a symmetric probability kernel function, and $\int t^{2}K(t)dt \neq 0$ , $\int t^{4}K(t)dt < \infty$ .

C3 For each $t\in(0, 1)$ , $f(t), \Phi(t), \Psi(t)$ and $\theta(t)$ are twice continuous differentiable at point $t$ , where $f(t)$ is the density function of the variable T.

C4 $\sup \limits_{0\leq t\leq 1}E[\varepsilon_{i}^{4}|T_{i} = t] < \infty$ , $\sup \limits_{0\leq t\leq 1}E[X_{ir}^{4}|T_{i} = t] < \infty$ , and they are continuous about $t$ , where $X_{ir}$ is the $r$ -th component of $X_{i}$ , $i = 1, \cdots, n$ , $r = 1, \cdots, q$ .

C5 For a given $t$ , $\Psi(t)$ is a positive definite matrix.

Next, a Lemma is needed in proof of Theorem 2.1, and the proof can be found in Zhao ^[4].

Lemma 4.1. Suppose conditions C1–C5 hold, then we have

$\sup \limits_{0 < t < 1}\|\hat{\mu}(t)-\Psi(t)^{-1}\Phi(t)\| = O_{p}(C_{n}),$

$\sup \limits_{0 < t < 1}\|\hat{g}(t)-\Psi(t)^{-1}\Phi(t)\beta-\theta(t)\| = O_{p}(C_{n}),$

where $C_{n} = h^{2}+(\frac{\log(1/h)}{nh})^{1/2}$ .

Proof of Theorem 2.1: Assuming that $\mathscr{P}$ contains a model that correctly specifies $\pi(V, S)$ , without loss of generality, let $\pi^1(\alpha^1)$ be the model, $\alpha_0^1$ represents the truth value of $\alpha^1$ , that is $\pi^1(\alpha_0^1) = \pi(V, S)$ . Next, we combine the theory of empirical likelihood to prove that $\hat{\beta}_{MR}$ is a consistent estimator of $\beta$ .

Referring to the method in Han ^[22] to establish the relationship between the weight $\hat{\omega}_i$ and the empirical likelihood on the biased sample. Let $p_i$ represent the conditional empirical probability on the biased sample $(Y_i, X_i, Z_i, T_i, S_i), R_i = 1, i = 1, \cdots, m$ , based on $(2.8), (2.9)$ and $\omega(V, S) = \frac{1}{\pi^1(\alpha_0^1)}$ , a more reasonable value of $p_i$ can be given by the following constrained optimization problem:

$\max\limits_{p_1, \cdots, p_m}\prod\limits_{i = 1}^{m}p_i;\quad p_i\geq0, i = 1, \cdots, m;\quad \sum\limits_{i = 1}^{m}p_i = 1,$

$\sum\limits_{i = 1}^{m}p_i\{\pi_j^i(\hat{\alpha}^j)-\nu^j(\hat{\alpha}^j)\}/\pi_i^1(\hat{\alpha}^1) = 0, j = 1, \cdots, J,$

$\sum\limits_{i = 1}^{m}p_i\{\hat{U}_i^k(\hat{\beta}^k, \hat{\gamma}^k)-\eta^k(\hat{\beta}^k, \hat{\gamma}^k)\}/\pi_i^1(\hat{\alpha}^1) = 0, k = 1, \cdots, K.$

Using the Lagrange multiplier method again, we get

$\hat{p}_i = \frac{1}{m}\frac{1}{1+\hat{\lambda}^T\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})/\pi_i^1(\hat{\alpha}^1)}, i = 1, \cdots, m,$

where $\hat{\lambda}^T = (\hat{\lambda}_1, \cdots, \hat{\lambda}_{J+pK})$ is the $(J+pK)$ -dimensional Lagrange multiplier, and satisfies

$\frac{1}{m}\sum\limits_{i = 1}^{m}\frac{\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})/\pi_i^1(\hat{\alpha}^1)}{1+\lambda^T\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})/\pi_i^1(\hat{\alpha}^1)} = 0.$

Due to the non-negativity of $\hat{p}_i$ , $\hat{\lambda}$ satisfies $1+\hat{\lambda}^T\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})/\pi_i^1(\hat{\alpha}^1) > 0, i = 1, \cdots, m.$ Since

$\begin{equation} \begin{aligned} &\frac{1}{m}\sum\limits_{i = 1}^{m}\frac{\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})/\pi_i^1(\hat{\alpha}^1)}{1+\lambda^T\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})/\pi_i^1(\hat{\alpha}^1)}\\ & = \frac{1}{\nu^1(\hat{\alpha}^1)}\frac{1}{m}\sum\limits_{i = 1}^{m}\frac{\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})}{1+\frac{\pi_i^1(\hat{\alpha}^1)-\nu^1(\hat{\alpha}^1)}{\nu^1(\hat{\alpha}^1)}+\{\frac{\lambda}{\nu^1(\hat{\alpha}^1)}\}^T\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})}\\ & = \frac{1}{\nu^1(\hat{\alpha}^1)}\frac{1}{m}\sum\limits_{i = 1}^{m}\frac{\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})}{1+\{\frac{\lambda_1+1}{\nu^1(\hat{\alpha}^1)}, \frac{\lambda_2}{\nu^1(\hat{\alpha}^1)}, \cdots, \frac{\lambda_{J+pK}}{\nu^1(\hat{\alpha}^1)}\}\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})}, \nonumber \end{aligned} \end{equation}$

then the solution of $(2.11)$ , $\hat{\rho}$ , can be written as $\hat{\rho}_1 = (\hat{\lambda}_1+1)/\nu^1(\hat{\alpha}^1)$ and $\hat{\rho}_l = \hat{\lambda}_l/\nu^1(\hat{\alpha}^1), l = 2, \cdots, J+pK.$ Therefore

$\hat{\omega}_i = \frac{1}{m}\frac{\nu^1(\hat{\alpha}^1)/\pi_i^1(\hat{\alpha}^1)}{1+\hat{\lambda}^T\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})/\pi_i^1(\hat{\alpha}^1)} = \frac{\hat{p}_i\nu^1(\hat{\alpha}^1)}{\pi_i^1(\hat{\alpha}^1)}.$

Just like White ^[24], let $\alpha_\ast^j$ , $\beta_\ast^k$ and $\gamma_\ast^k$ are the minimum points of the corresponding Kullback-Leibler distance respectively, then we have $\hat{\alpha}^j\stackrel{P}\longrightarrow\alpha_\ast^j, \hat{\beta}^k\stackrel{P}\longrightarrow\beta_\ast^k, \hat{\gamma}^k\stackrel{P}\longrightarrow\gamma_\ast^k$ , and $n^{1/2}(\hat{\alpha}^j-\alpha_\ast^j)$ , $n^{1/2}(\hat{\beta}^k-\beta_\ast^k)$ and $n^{1/2}(\hat{\gamma}^k-\gamma_\ast^k)$ are bounded by probability. At the same time, $\nu^j(\hat{\alpha}^j)\stackrel{P}\longrightarrow\nu_\ast^j$ , $\eta^k(\hat{\beta}^k, \hat{\lambda}^k)\stackrel{P}\longrightarrow\mu_\ast^k$ , where $\nu_\ast^j = E[\pi^j(\alpha_\ast^j)]$ , $\mu_\ast^k = E[U^k(\beta_\ast^k, \gamma_\ast^k)]$ . Generally speaking, when the model $\pi^j(\alpha^j)$ for $\pi(V, S)$ is correctly specified, we have $\pi^j(\alpha_\ast^j) = \pi(V, S)$ , and when the model $a^k(\gamma^k)$ for $E(Y|V, T, S)$ is correctly specified, we have $a^k(\gamma_\ast^k) = E(Y|V, T, S)$ . Let $\alpha_\ast^T = \{(\alpha_\ast^1)^T, \cdots, (\alpha_\ast^J)^T\}$ , $\beta_\ast^T = \{(\beta_\ast^1)^T, \cdots, (\beta_\ast^K)^T\}$ , $\gamma_\ast^T = \{(\gamma_\ast^1)^T, \cdots, (\gamma_\ast^K)^T\}$ , and suppose $\hat{\rho}\stackrel{P}\longrightarrow\rho_\ast$ .

Based on the empirical likelihood theory, it can be known that $\hat{\lambda}\stackrel{P}\longrightarrow0$ . According to the appendix in Han ^[22], $\hat{\lambda} = O_p(n^{-1/2})$ holds. Since the model $\pi^1(\alpha^1)$ is correct, then we have $\frac{m}{n}\stackrel{P}\longrightarrow\nu_\ast^1$ , and

$\begin{align*} \sum\limits_{i = 1}^{m}\hat{\omega}_i\hat{\xi}_i(\beta)& = \frac{1}{m}\sum\limits_{i = 1}^{n}\frac{R_i\nu^1(\hat{\alpha}^1)/\pi_i^1(\hat{\alpha}^1)}{1+\hat{\lambda}^T\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})/\pi_i^1(\hat{\alpha}^1)}\hat{\xi}_i(\beta)\\ & = \frac{\nu^1(\hat{\alpha}^1)}{m}\sum\limits_{i = 1}^{n}\frac{R_i/\pi_i^1(\hat{\alpha}^1)}{1+\hat{\lambda}^T\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})/\pi_i^1(\hat{\alpha}^1)}\hat{\xi}_i(\beta)\\ & = \frac{\nu_\ast^1}{m}\sum\limits_{i = 1}^{n}\frac{R_i/\pi_i^1(\hat{\alpha}^1)}{1+\hat{\lambda}^T\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})/\pi_i^1(\hat{\alpha}^1)}\hat{\xi}_i(\beta)\\ & = \frac{1}{n}\sum\limits_{i = 1}^{n}\frac{R_i}{\pi_i^1(\alpha_\ast^1)}\hat{\xi}_i(\beta)+o_p(1). \end{align*}$

Refer to Zhao ^[4], since

$\begin{equation} \begin{split} \hat{\xi}_{i}(\beta) & = [Z_{i}-\hat{\mu}^{T}(T_{i})X_{i}]\varepsilon_{i}+[\mu(T_{i})-\hat{\mu}(T_{i})]^{T}X_{i}\varepsilon_{i}\\ &\quad +[Z_{i}-\hat{\mu}^{T}(T_{i})X_{i}]X_{i}^{T}[\theta(T_{i})-\hat{g}(T_{i})+\hat{\mu}(T_{i})\beta]\\ &\quad +[\mu(T_{i})-\hat{\mu}(T_{i})]^{T}X_{i}X_{i}^{T}[\theta(T_{i})-\hat{g}(T_{i})+\hat{\mu}(T_{i})\beta], \\ \xi_{i}(\beta) & = [Z_{i}-\mu^{T}(T_{i})X_{i}][Z_{i}-\mu^{T}(T_{i})X_{i}]^{T}\beta+[Z_{i}-\mu^{T}(T_{i})X_{i}]\varepsilon_{i}, \nonumber \end{split} \end{equation}$

and $E[X_{i}\varepsilon_{i}] = 0$ , $E[(Z_{i}-\mu^{T}(T_{i})X_{i})X_{i})^{T}] = 0$ , we have

$\begin{equation} \begin{split} \hat{\xi}_{i}(\beta)-\xi_{i}(\beta) & = [\mu(T_{i})-\hat{\mu}(T_{i})]^{T}X_{i}\varepsilon_{i}\\ &\quad +[\mu(T_{i})-\hat{\mu}(T_{i})]^{T}X_{i}X_{i}^{T}[\theta(T_{i})-\hat{g}(T_{i})+\hat{\mu}(T_{i})\beta]\\ &\quad +[Z_{i}-\mu^{T}(T_{i})X_{i}][X_{i}^{T}\theta(T_{i})-X_{i}^{T}\hat{g}(T_{i})\\ &\quad +X_{i}^{T}\hat{\mu}(T_{i})\beta-Z_{i}^{T}\beta+X_{i}^{T}\mu(T_{i})\beta].\nonumber \end{split} \end{equation}$

Combine conditions C1, C4, C5 and Lemma 4.1, we have

$\|\frac{1}{n}\sum\limits_{i = 1}^{n}\frac{R_i}{\pi_i^1(\alpha_\ast^1)}\hat{\xi}_i(\beta)-\frac{1}{n}\sum\limits_{i = 1}^{n}\frac{R_i}{\pi_i^1(\alpha_\ast^1)}\xi_i(\beta)\|\stackrel{P}\longrightarrow0.$

That is, $\frac{1}{n}\sum\limits_{i = 1}^{n}\frac{R_i}{\pi_i^1(\alpha_\ast^1)}\hat{\xi}_i(\beta)\stackrel{P}\longrightarrow\frac{1}{n}\sum\limits_{i = 1}^{n}\frac{R_i}{\pi_i^1(\alpha_\ast^1)}\xi_i(\beta)$ . Then we have

$\begin{equation} \begin{aligned} \sum\limits_{i = 1}^{m}\hat{\omega}_i\hat{\xi}_i(\beta)& = \frac{1}{n}\sum\limits_{i = 1}^{n}\frac{R_i}{\pi_i^1(\alpha_\ast^1)}\hat{\xi}_i(\beta)+o_p(1)\\ & = \frac{1}{n}\sum\limits_{i = 1}^{n}\frac{R_i}{\pi_i^1(\alpha_\ast^1)}\xi_i(\beta)+o_p(1)\\ & \stackrel{P}\longrightarrow E[\frac{R}{\pi(V, S)}\xi(\beta)] = 0.\nonumber \end{aligned} \end{equation}$

Therefore, when $n\rightarrow \infty$ , $\beta$ is the solution of the formula $(2.13)$ , which shows that $\hat{\beta}_{MR}$ is a consistent estimator of $\beta$ .

Next, suppose that $\mathscr{F}$ contains a model that correctly specifies $E(Y|V, T, S)$ . Without loss of generality, let $a^1(\gamma^1)$ be the true model and $\gamma_0^1$ be the true value of $\gamma^1$ , that is $a^1(\gamma_0^1) = E(Y|V, T, S)$ , and $\gamma_\ast^1 = \gamma_0^1$ . A previous constraint is actually

$\sum\limits_{i = 1}^{m}\hat{\omega}_i\hat{U}_i^1(\hat{\beta}^{1}, \hat{\gamma}^{1}) = \frac{1}{n}\sum\limits_{i = 1}^{n}\hat{U}_i^1(\hat{\beta}^{1}, \hat{\gamma}^{1}),$

and $\hat{\beta}^1\stackrel{P}\longrightarrow\beta_\ast^1 = \beta$ , so we get $\frac{1}{n}\sum\limits_{i = 1}^{n}\hat{U}_i^1(\hat{\beta}^{1}, \hat{\gamma}^{1})\stackrel{P}\longrightarrow0.$

Let $g(\alpha_\ast, \beta_\ast, \gamma_\ast)^T = [\pi^1(\alpha_\ast^1)-\nu_\ast^1, \cdots, \pi^J(\alpha_\ast^J)-\nu_\ast^J, \{U^1(\beta_\ast^1, \gamma_\ast^1)-\eta_\ast^1\}^T, \cdots, \{U^K(\beta_\ast^K, \gamma_\ast^K)-\eta_\ast^K\}^T],$ due to $\frac{1}{n}\sum\limits_{i = 1}^{n}\hat{U}_i^1(\hat{\beta}^1, \hat{\gamma}^1)\stackrel{P}\longrightarrow\frac{1}{n}\sum\limits_{i = 1}^{n}U_i^1(\beta, \gamma_0^1), \|\frac{1}{n}\sum\limits_{i = 1}^{n}[\hat{\xi}_{i}(\beta)-\xi_i(\beta)]\|\stackrel{P}\longrightarrow0,$ and $E[U^1(\beta, \gamma_0^1)] = 0$ , then we have

$\begin{align*} &\sum\limits_{i = 1}^{m}\hat{\omega}_i\hat{\xi}_i(\beta)\\ = &\sum\limits_{i = 1}^{m}\hat{\omega}_i\{\hat{\xi}_i(\beta)-\hat{U}_i^1(\hat{\beta}^{1}, \hat{\gamma}^{1})\}+\frac{1}{n}\sum\limits_{i = 1}^{n}\hat{U}_i^1(\hat{\beta}^{1}, \hat{\gamma}^{1})\\ = &\frac{1}{m}\sum\limits_{i = 1}^{n}\frac{R_i{\hat{\xi}_i(\beta)-\hat{U}_i^1(\hat{\beta}^{1}, \hat{\gamma}^{1})}}{1+\hat{\rho}^T\hat{g}_i(\hat{\alpha}, \hat{\beta}, \hat{\gamma})}+E[U^1(\beta, \gamma_0^1)]+o_p(1)\\ = &\frac{1}{P(R = 1)}E[\frac{R{\xi(\beta)-U^1(\beta, \gamma_0^1)}}{1+\rho_\ast^Tg(\alpha_\ast, \beta_\ast, \gamma_\ast)}]+o_p(1)\\ = &\frac{1}{P(R = 1)}E\{E[\frac{R{\xi(\beta)-U^1(\beta, \gamma_0^1)}}{1+\rho_\ast^Tg(\alpha_\ast, \beta_\ast, \gamma_\ast)}|Y, V, T, S]\}+o_p(1)\stackrel{P}\longrightarrow0. \end{align*}$

This shows that $\hat{\beta}_{MR}$ is a consistent estimator of $\beta$ .

So the proof of Theorem 2.1 is completed.

5. Conclusions

In this article, we have proposed the multiple robust estimators for parameters in varying-coefficient partially linear model with missing response at random, and the multiple robustness of our proposals has been shown theoretically under some regular conditions. Our simulation studies fully demonstrate the superiority of our multiple robust estimation method through Table 1 and Table 2. Finally, we point out some problems for the future researches. First, we only discuss the multiple robust estimation process of parameters, and the fitting of nonparametric function curves can be expanded in the future studies. Next, based on the model in this article, if the missing mechanism is nonignorable missing, how to obtain the robust estimation of parameters is also worth studying.

Acknowledgments

The research is supported by NSF projects (ZR2021MA077 and ZR2019MA016) of Shandong Province of China.

Conflict of interest

The authors declare that they have no conflicts of interest to this work.

References

[1]	J. Fan, T. Huang, Profile Likelihood Inferences on Semiparametric Varying-Coefficient Partially Linear Models, Bernoulli, 11 (2005), 1031–1057. https://doi.org/10.3150/bj/1137421639 doi: 10.3150/bj/1137421639
[2]	J. You, Y. Zhou, Empirical likelihood for semiparametric varying-coefficient partially linear regression models, Stat. Probabil. Lett., 76 (2006), 412–422. https://doi.org/10.1016/j.spl.2005.08.029 doi: 10.1016/j.spl.2005.08.029
[3]	Z. Huang, R. Zhang, Empirical likelihood for nonparametric parts in semiparametric varying-coefficient partially linear models, Stat. Probabil. Lett., 79 (2009), 1798–1808. https://doi.org/10.1016/j.spl.2009.05.008 doi: 10.1016/j.spl.2009.05.008
[4]	P. Zhao, Infrerence for semiparametric varying coefficient partially linear models, doctoral dissertation, Beijing: Beijing University of Technology, 2010.
[5]	J. Feng, R. Zhang, Y. Lu, Inference on varying-coefficient partially linear regression model, Acta Math. Appl. Sin-E, 31 (2015), 139–156. https://doi.org/10.1007/s10255-015-0457-5 doi: 10.1007/s10255-015-0457-5
[6]	R. Little, D. Rubin, Statistical analysis with missing data, 1 Eds., New York: Wiley Press, 1986.
[7]	D. Horvitz, D. Thompson, A generalization of sampling without replacement from a finite universe, J. Am. Stat. Assoc., 47 (1952), 663–685. https://doi.org/10.1080/01621459.1952.10483446 doi: 10.1080/01621459.1952.10483446
[8]	F. Yates, The analysis of replicated experiments when the field results are incomplete, Exp. Agr., 1 (1933), 129–142.
[9]	P. Cheng, Nonparametric estimation of mean functionals with data missing at random, J. Am. Stat. Assoc., 89 (1994), 81–87. https://doi.org/10.1080/01621459.1994.10476448 doi: 10.1080/01621459.1994.10476448
[10]	Q. Wang, J. Rao, Empirical likelihood for linear regression models under imputation for missing responses, Can. J. Stat., 29 (2001), 596–608. https://doi.org/10.2307/3316009 doi: 10.2307/3316009
[11]	Q. Wang, J. Rao, Empirical likelihood-based inference under imputation for missing response data, Ann. Stat., 30 (2002), 896–924. https://doi.org/10.1214/aos/1028674845 doi: 10.1214/aos/1028674845
[12]	Q. Wang, O. Linton, W. Hardle, Semi-parametric regression analysis with missing response at random, J. Am. Stat. Assoc., 99 (2004), 334–345. https://doi.org/10.1198/016214504000000449 doi: 10.1198/016214504000000449
[13]	J. Robins, A. Rotnitzky, L. Zhao, Estimation of Regression Coefficients When Some Regressors Are Not Always Observed, J. Am. Stat. Assoc., 89 (1994), 846–866. https://doi.org/10.1080/01621459.1994.10476818 doi: 10.1080/01621459.1994.10476818
[14]	J. Robins, A. Rotnitzky, Semiparametric efficiency in multivariate regression models with missing data, J. Am. Stat. Assoc., 90 (1995), 122–129. https://doi.org/10.1080/01621459.1995.10476494 doi: 10.1080/01621459.1995.10476494
[15]	D. Scharfstein, A. Rotnitzky, R. Robins, Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models, J. Am. Stat. Assoc., 94 (1999), 1096–1120. https://doi.org/10.1080/01621459.1999.10473862 doi: 10.1080/01621459.1999.10473862
[16]	J. Kang, J. Schafer, Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data (with Discussion), Statistics, 22 (2007), 523–539. https://doi.org/10.1214/07-STS227 doi: 10.1214/07-STS227
[17]	J. Qin, J. Shao, B. Zhang, Efficient and doubly robust imputation for covariate-dependent missing responses, J. Am. Stat. Assoc., 103 (2008), 797–810. https://doi.org/10.1198/016214508000000238 doi: 10.1198/016214508000000238
[18]	W. Cao, A. Tsiatis, M. Davidian, Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data, Biometrika, 96 (2009), 723–734. https://doi.org/10.1093/biomet/asp033 doi: 10.1093/biomet/asp033
[19]	P. Han, A note on improving the efficiency of inverse probability weighted estimator using the augmentation term, Stat. Probabil. Lett., 82 (2012), 2221–2228. https://doi.org/10.1016/j.spl.2012.08.005 doi: 10.1016/j.spl.2012.08.005
[20]	A. Rotnitzky, Q. Lei, M. Sued, J. Robins, Improved double-robust estimation in missing data and causal inference models, Biometrika, 99 (2012), 439–456. https://doi.org/10.1093/biomet/ass013 doi: 10.1093/biomet/ass013
[21]	P. Han, L. Wang, Estimation with missing data: Beyond double robustness, Biometrika, 100 (2013), 417–430. https://doi.org/10.1093/biomet/ass087 doi: 10.1093/biomet/ass087
[22]	P. Han, Multiply Robust Estimation in Regression Analysis With Missing Data, J. Am. Stat. Assoc., 109 (2014), 1159–1173. https://doi.org/10.1080/01621459.2014.880058 doi: 10.1080/01621459.2014.880058
[23]	Y. Sun, L. Wang, P. Han, Multiply robust estimation in nonparametric regression with missing data, J. Nonparametr. Stat., 32 (2020), 73–92. https://doi.org/10.1080/10485252.2019.1700254 doi: 10.1080/10485252.2019.1700254
[24]	P. Han, Calibration and multiple robustness when data are missing not at random, Stat. Sinica, 109 (2018), 1725–1740. https://doi.org/10.5705/ss.202015.0408 doi: 10.5705/ss.202015.0408
[25]	W. Li, S. Yang, P. Han, Robust estimation for moment condition models with data missing not at random, J. Stat. Plan. Infer., 207 (2020), 246–254. https://doi.org/10.1016/j.jspi.2020.01.001 doi: 10.1016/j.jspi.2020.01.001
[26]	H. White, Maximum likelihood estimation of misspecified models, Econometrica, 50 (1982), 1–25. https://doi.org/10.2307/1912526 doi: 10.2307/1912526

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Mathematical Modelling and Control

1.4 1.2

Metrics

Article views(2815) PDF downloads(70) Cited by(0)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Tables(2)

Mathematical Modelling and Control

Multiple robust estimation of parameters in varying-coefficient partially linear model with response missing at random

Related Papers:

Abstract

1. Introduction

2. The proposed estimator

3. Simulation study

4. Proofs

5. Conclusions

Acknowledgments

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Mathematical Modelling and Control

Multiple robust estimation of parameters in varying-coefficient partially linear model with response missing at random

Related Papers:

Abstract

1. Introduction

2. The proposed estimator

3. Simulation study

4. Proofs

5. Conclusions

Acknowledgments

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog