Robust estimation for varying-coefficient partially nonlinear model with nonignorable missing response

Yanting Xiao; Yifan Shi; Yanting Xiao; Yifan Shi

doi:10.3934/math.20231526

AIMS Mathematics

2023, Volume 8, Issue 12: 29849-29871. doi: 10.3934/math.20231526

Previous Article Next Article

Research article Special Issues

Robust estimation for varying-coefficient partially nonlinear model with nonignorable missing response

Yanting Xiao ^,,
Yifan Shi

Department of Applied Mathematics, Xi'an University of Technology, Xi'an, Shaanxi 710048, China

Received: 21 August 2023 Revised: 24 September 2023 Accepted: 15 October 2023 Published: 02 November 2023
MSC : 62G10, 62G05

In this paper, we studied the robust estimation for the varying-coefficient partially nonlinear model based on modal regression with nonignorable missing response. First, an instrumental variable was used to handle the identifiability issue of parameters in the propensity score function, and a generalized method of moment was combined to obtain the consistent estimators. Second, inverse probability weighting and modal regression were adopted to construct the estimators of parameters and coefficient function in the model. Under some mild conditions, the asymptotic properties of the resulting estimators were established. Furthermore, simulation studies and a real example were carried out to illustrate the effectiveness of our proposed estimation procedures.

Keywords:

varying-coefficient partially nonlinear model,
nonignorable missing data,
instrumental variable,
modal regression,
inverse probability weighting

Citation: Yanting Xiao, Yifan Shi. Robust estimation for varying-coefficient partially nonlinear model with nonignorable missing response[J]. AIMS Mathematics, 2023, 8(12): 29849-29871. doi: 10.3934/math.20231526

Related Papers:

[1]	Liqi Xia, Xiuli Wang, Peixin Zhao, Yunquan Song . Empirical likelihood for varying coefficient partially nonlinear model with missing responses. AIMS Mathematics, 2021, 6(7): 7125-7152. doi: 10.3934/math.2021418
[2]	Yanting Xiao, Wanying Dong . Robust estimation for varying-coefficient partially linear measurement error model with auxiliary instrumental variables. AIMS Mathematics, 2023, 8(8): 18373-18391. doi: 10.3934/math.2023934
[3]	Heng Liu, Xia Cui . Adaptive estimation for spatially varying coefficient models. AIMS Mathematics, 2023, 8(6): 13923-13942. doi: 10.3934/math.2023713
[4]	Jieqiong Lu, Peixin Zhao, Xiaoshuang Zhou . Orthogonality based modal empirical likelihood inferences for partially nonlinear models. AIMS Mathematics, 2024, 9(7): 18117-18133. doi: 10.3934/math.2024884
[5]	Qiang Zhao, Zhaodi Wang, Jingjing Wu, Xiuli Wang . Weighted expectile average estimation based on CBPS with responses missing at random. AIMS Mathematics, 2024, 9(8): 23088-23099. doi: 10.3934/math.20241122
[6]	Yanping Liu, Juliang Yin . B-spline estimation in varying coefficient models with correlated errors. AIMS Mathematics, 2022, 7(3): 3509-3523. doi: 10.3934/math.2022195
[7]	Cuiping Wang, Xiaoshuang Zhou, Peixin Zhao . Empirical likelihood based heteroscedasticity diagnostics for varying coefficient partially nonlinear models. AIMS Mathematics, 2024, 9(12): 34705-34719. doi: 10.3934/math.20241652
[8]	Peng Lai, Wenxin Tian, Yanqiu Zhou . Semi-supervised estimation for the varying coefficient regression model. AIMS Mathematics, 2024, 9(1): 55-72. doi: 10.3934/math.2024004
[9]	Qiang Zhao, Chao Zhang, Jingjing Wu, Xiuli Wang . Robust and efficient estimation for nonlinear model based on composite quantile regression with missing covariates. AIMS Mathematics, 2022, 7(5): 8127-8146. doi: 10.3934/math.2022452
[10]	Zawar Hussain, Atif Akbar, Mohammed M. A. Almazah, A. Y. Al-Rezami, Fuad S. Al-Duais . Diagnostic power of some graphical methods in geometric regression model addressing cervical cancer data. AIMS Mathematics, 2024, 9(2): 4057-4075. doi: 10.3934/math.2024198

Abstract

1. Introduction

Varying-coefficient partially linear model, one of the important semiparametric models, has attracted much attention due to the interpretability of parametric models and the flexibility of nonparametric models. In this model, we assume that the response and covariates with a parametric part have a linear relationship. However, this assumption may be inappropriate in some practical applications. For example, based on the empirical studies in an analysis of a real data set in ecology ^[1], it is more reasonable to assume that the relationship between net ecosystem $CO_{2}$ exchange (NEE) and the amount of photosynthetically active radiation (PAR) is nonlinear. The scatterplot between viral load and CD4 cell count shows that there is no rigorous linearity between them ^[2].

To describe the complicated potential nonlinear relationship between the response and certain covariates, Li and Mei ^[3] proposed the varying-coefficient partially nonlinear model, which takes the form as

$\begin{equation} Y = \mathit{\boldsymbol{X}}^T\mathit{\boldsymbol{\alpha}}(U)+g(\mathit{\boldsymbol{Z}}, {\mathit{\boldsymbol{\beta}}})+{\varepsilon}, \end{equation}$

(1.1)

where $Y$ is the response and $\mathit{\boldsymbol{X}} \in R^p, \mathit{\boldsymbol{Z}}\in R^r$ and $U$ are the associated covariates. $g(., .)$ is a pre-specified nonlinear function with parameter vector $\mathit{\boldsymbol{\beta}} = (\beta_1, \cdots, \beta_q)^T$ . $\mathit{\boldsymbol{\alpha}}(.) = (\alpha_1(.), \cdots, \alpha_p(.))^T$ is a coefficient function and $\varepsilon$ is a model error with mean of zero and variance $\sigma^2$ . It is not necessary to demand that the dimension of $\mathit{\boldsymbol{Z}}$ and $\mathit{\boldsymbol{\beta}}$ is equal.

It is obvious that model (1.1) is more flexible, which covers a varying-coefficient partially linear model as a special case, thus, it has been explored by extensive research since it was proposed. Li and Mei ^[3] proposed a profile nonlinear least squares estimation procedure and established the asymptotic properties of the resulting estimators. Zhou et al. ^[4] investigated the empirical likelihood inference and showed the corresponding Wilks phenomena. Furthermore, Yang and Yang ^[5] developed a two-stage estimation procedure based on the orthogonality-projection method and smooth-threshold estimating equations. In addition, Xiao and Chen ^[6] developed a local bias-corrected empirical likelihood procedure for model (1.1) with measurement errors in the nonparametric part. Tang et al. ^[7] proposed an adjusted empirical likelihood statistical inference procedure for model (1.1) with endogenous covariates.

The common above research is based on either the least squares method or the empirical likelihood method which are related to mean regression. As we all know, although they have satisfactory performance in the case of normal distributed data set, it is sensitive and may produce relative large bias when the data is subject to heavy-tailed distribution or contains outliers. Therefore, some robust statistical inferences are developed. For model (1.1), Jiang et al. ^[8] proposed a robust estimation procedure based on the exponential squared loss function, and Yang et al. ^[9] developed quantile regression for robust inference. Recently, Xiao and Liang ^[10] proposed a robust two-stage estimation procedure for model (1.1) based on modal regression.

Missing data is frequently encountered in research fields such as biomedicine, economics and clinical medicine. Common missing mechanism includes missing completely at random (MCAR), missing at random (MAR) and missing nonignorable at random (MNAR) (Little and Rubin ^[11]). The missing probability, which is called the propensity score function, depends only on the exactly observed variables in the assumption of MAR. At present, some research has explored statistical inference for model (1.1) under the MAR assumption. For example, with the missing covariates at random, Wang et al. ^[12] proposed the profile nonlinear least squares estimation procedure based on the inverse probability weighted method. Xia et al. ^[13] developed an empirical likelihood inference based on the weighted imputation method when the response was missing at random.

Nevertheless, it is possible that the propensity score is related to the value of the unobserved variable itself. For example, in the surveys about income, individuals with high income are usually less willing to provide their specific income, and the nonresponse rates tend to be related to the values of response. In this situation, the NMAR assumption would become reasonable. Compared to the MAR assumption, it is a great challenge to handle with MNAR since some parameters are not identifiable without any further restrictions on the propensity score. Over the past few years, MNAR data analysis has received a lot of attention in the literature. Kim and Yu ^[14] first defined the exponential tilting model for the nonignorably missing mechanism and proposed a semiparametric estimation procedure for the mean functions. Tang et al. ^[15] developed modified estimating equations by imputing missing data through a kernel regression method in the exponential tilting model. Wang et al. ^[16] proposed an instrument variable, which is related to the study variable but unrelated to the missing data propensity, to construct estimating equations. To our knowledge, there is no literature to study the inferences for the varying-coefficient partially nonlinear model with nonignorable missing response. This motivates us to study this interesting topic. First, as an extension of the varying-coefficient partially linear model, the model has the ability of the interpretability of the parametric model and the flexibility of the nonparametric model. In addition, it can overcome the limitation of linear function in traditional semiparametric models. Second, compared to MAR, MNAR is much more common in practical applications and more challenging to study. Furthermore, as one of the robust estimation procedures, modal regression not only has good performances in the presence of outliers or heavy-tailed distributions, but it also has full asymptotic efficiency under the normal error distributions.

In this paper, we propose a robust estimation procedure for model (1.1) with nonignorable missing response. For the nonignorable nonresponse propensity, we assume a semiparametric model because the fully parametric approach is very sensitive to failure of the assumed parametric models. The instrument variable not involved in the propensity has been studied by Shao and Wang ^[17] to avoid the identifiability issue. The nonparametric component is replaced by a kernel-type estimator, and then the parameter is estimated by the profiled estimating equations and the generalized method of moments. We then propose robust estimators in the varying-coefficient partially nonlinear model based on the inverse probability weighted technique and modal regression. On the one hand, estimation procedures of using complete-case data may result in serious bias, especially when the missing probability is large. In consequence, the inverse probability weighted technique provides a modified way to reduce the estimation bias by using the inverse of the propensity as weights. On the other hand, modal regression, introduced by Yao et al. ^[18], has been extensively applied to other semiparametric models, such as ^[19,20]. Modal regression has many advantages over the traditional mean regression. First, it is easy to implement by involving a tuning parameter that can be automatically selected. Second, this method has good robustness in the presence of outliers or heavy-tailed errors, which has been successfully verified by extensive literature. Meanwhile, the obtained estimator can achieve fully asymptotic efficiency under the normal error distribution in terms of theoretical properties. This encourages us to adopt inverse probability weighting and modal regression to the varying-coefficient partially nonlinear model.

The rest of this paper is organized as follows. Section 2 introduces a robust estimation procedure based on modal regression and inverse probability weighting. Section 3 establishes its theoretical properties. In Section 4, we discuss the selection of bandwidth and the specific estimation algorithm. Simulation studies and a real data are then conducted to evaluate the performances of the proposed estimation procedure in Sections 5 and 6. We make our concluding remarks in Section 7 and leave the proofs of the main theorems to the Appendix.

2. Estimation methodology

2.1. Modal regression procedure with nonignorable missing response

Suppose that $\left\{ {\left. {{Y_i}, \mathit{\boldsymbol{X_i}}, \mathit{\boldsymbol{Z_i}}, {U_i}} \right\}} \right._{i = 1}^n$ is a random sample from model (1.1); that is,

$\begin{eqnarray} {Y_i} = \mathit{\boldsymbol{X_i}}^T\mathit{\boldsymbol{\alpha}} \left( {{U_i}} \right) + g\left( {\mathit{\boldsymbol{Z_i}}, \mathit{\boldsymbol{\beta}} } \right) + {\varepsilon _i}, i = 1, 2 \cdots , n, \end{eqnarray}$

(2.1)

where covariates $\mathit{\boldsymbol{X_i}}, \mathit{\boldsymbol{Z_i}}, {U_i}$ are completely observed and response ${Y_i}$ has nonignorable missing values. Let ${\delta _i}$ be a binary response indicator for ${Y_i}$ , where ${\delta _i} = 1$ if ${Y_i}$ is observed, and ${\delta _i} = 0$ otherwise. Furthermore, we define the propensity of missing data as follows

$\begin{eqnarray} \pi ({Y_i}, {\mathit{\boldsymbol{T}}_i}) = Pr(\left. {{\delta _i} = 1} \right|{Y_i}, {\mathit{\boldsymbol{X}}_i}, {\mathit{\boldsymbol{Z}}_i}, {U_i}), \end{eqnarray}$

where ${\mathit{\boldsymbol{T}}_i} = {({\mathit{\boldsymbol{X}}_i^T, \mathit{\boldsymbol{Z}}_i^T, {U_i}})^T}$ . In the MNAR assumption, the propensity of missing data depends on ${Y_i}$ , regardless of whether ${Y_i}$ is observed or missing.

To simplify model (2.1), we apply B-spline functions to approximate the nonparametric function $\mathit{\boldsymbol{\alpha}} \left(u \right)$ instead of utilizing the local polynomial estimation. This choice is motivated by the fact that B-spline functions approximation possesses bounded support and numerical stability. In addition, B-spline functions approximation avoids the issue of high computational complexity associated with local polynomial estimation, as elaborated in ^[21]. More specifically, let $\mathit{\boldsymbol{B}}(u) = (B_1(u), \ldots, B_L(u))^T$ be the B-spline basis function with the order of $M$ , where $L = K+M$ and $K$ is the number of interior knots. Therefore, the nonparametric function ${\alpha _k}(u)$ can be approximated by

$\begin{eqnarray} \alpha_{k}(u)\approx \mathit{\boldsymbol{B}}^T(u) \mathit{\boldsymbol{\gamma}}_{k}, k = 1, 2, \ldots, p, \end{eqnarray}$

(2.2)

where ${\mathit{\boldsymbol{\gamma}} _k} = {\left({{\gamma _{k1}}, {\gamma _{k2}}, \cdots, {\gamma _{kL}}} \right)^T}$ . Then, we substitute (2.2) into model (2.1) to obtain

$\begin{eqnarray} {Y_i} \approx \mathit{\boldsymbol{W_i}}^T\mathit{\boldsymbol{\gamma}} + g\left( {\mathit{\boldsymbol{Z_i}}, \mathit{\boldsymbol{\beta}} } \right) + {\varepsilon _i}, i = 1, 2, \cdots , n, \end{eqnarray}$

(2.3)

where $\mathit{\boldsymbol{\gamma}} = {({\mathit{\boldsymbol{\gamma}} _1^T, \mathit{\boldsymbol{\gamma}} _2^T \cdots, \mathit{\boldsymbol{\gamma}} _p^T})^T}$ and $\mathit{\boldsymbol{W_i}} = \mathit{\boldsymbol{I}}_p \otimes \mathit{\boldsymbol{B}}\left({{U_i}} \right) \cdot \mathit{\boldsymbol{X_i}}$ , with ${\mathit{\boldsymbol{I}}_p}$ being a $p$ -dimensional identity matrix.

For fixed $\mathit{\boldsymbol{\beta}}$ , model (2.3) can be reexpressed as

$\begin{eqnarray} \mathit{\boldsymbol{Y}} - g\left( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}} } \right) \approx \mathit{\boldsymbol{W}}\mathit{\boldsymbol{\gamma}} + \mathit{\boldsymbol{\varepsilon}} , \end{eqnarray}$

(2.4)

where $\mathit{\boldsymbol{Y}} = {({{Y_1}, {Y_2}, \cdots, {Y_n}})^T}$ , $g({\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}} }) = {({g({\mathit{\boldsymbol{Z_1}}, \mathit{\boldsymbol{\beta}} }), g({\mathit{\boldsymbol{Z_2}}, \mathit{\boldsymbol{\beta}} }), \cdots, g({\mathit{\boldsymbol{Z_n}}, \mathit{\boldsymbol{\beta}} })})^T}$ , $\mathit{\boldsymbol{W}} = {({\mathit{\boldsymbol{W_1}}, \mathit{\boldsymbol{W_2}}, \cdots, \mathit{\boldsymbol{W_n}}})^T}$ and $\mathit{\boldsymbol{\varepsilon}} = {({{\varepsilon _1}, {\varepsilon _2}, \cdots, {\varepsilon _n}})^T}$ .

Clearly, model (2.4) is a standard linear model. We can obtain the initial estimators of $\mathit{\boldsymbol{\gamma}}$ with the ordinary least squares method as follows

$\begin{eqnarray} {\hat {\mathit{\boldsymbol{\gamma}}} ^{(1)}} = {( {{\mathit{\boldsymbol{W}}^T}\mathit{\boldsymbol{W}}} )^{-1}}{\mathit{\boldsymbol{W}}^T}( {\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}} } )} ). \end{eqnarray}$

We replace $\mathit{\boldsymbol{\gamma}}$ in (2.4) by ${\hat {\mathit{\boldsymbol{\gamma}}} ^{(1)}}$ , and model (2.4) becomes

$\begin{eqnarray} \mathit{\boldsymbol{P}}\mathit{\boldsymbol{Y}} \approx \mathit{\boldsymbol{P}}g\left( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}} } \right) + \mathit{\boldsymbol{\varepsilon}}, \end{eqnarray}$

(2.5)

where $\mathit{\boldsymbol{P}} = {\mathit{\boldsymbol{I}}_n} - {\mathit{\boldsymbol{W}}{{({{\mathit{\boldsymbol{W}}^T}\mathit{\boldsymbol{W}}})}^{ - 1}}{\mathit{\boldsymbol{W}}^T}}$ , with ${\mathit{\boldsymbol{I}}_n}$ being an $n$ -dimensional identity matrix.

Then, the inverse probability weighting technique is adopted to reduce the estimation bias when ${Y_i}$ is missing and modal regression is involved to attained the robust estimators. Motivated by the idea of Yao et al. ^[18], the modal estimator for $\mathit{\boldsymbol{\beta}}$ is constructed by maximizing the following objective function

$\begin{eqnarray} Q(\mathit{\boldsymbol{\beta}}) = \sum\limits_{i = 1}^n {{\rm{\Delta }}_i}{{\phi _h}\left[ {{\mathit{\boldsymbol{P}}_i}\left( {\mathit{\boldsymbol{Y}} - g\left( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}} } \right)} \right)} \right]}, \end{eqnarray}$

(2.6)

where ${{\rm{\Delta }}_i} = {\delta _i}/\pi ({Y_i}, {\mathit{\boldsymbol{T}}_i})$ , ${\phi _h}(t) = \phi (t/h)/h$ with kernel density function $\phi (\cdot)$ and bandwidth $h$ . $\mathit{\boldsymbol{P_i}}$ denotes the $i$ th row of the matrix $\mathit{\boldsymbol{P}}$ . According to Yao et al. ^[18], we adopt the Gaussian kernel density function for $\phi (\cdot)$ throughout the paper. The specific selection of $h$ is described in Section 4.1.

However, the propensity score $\pi ({Y_i}, {\mathit{\boldsymbol{T}}_i})$ is usually unknown in practice, and it is common to use the estimator $\hat{\pi} ({Y_i}, {\mathit{\boldsymbol{T}}_i})$ to replace ${\pi} ({Y_i}, {\mathit{\boldsymbol{T}}_i})$ . The estimation procedure of the propensity ${\pi} ({Y_i}, {\mathit{\boldsymbol{T}}_i})$ is given in the next subsection.

2.2. Estimation for semiparametric nonignorable propensity

According to Shao and Wang ^[17], we assume the following semiparametric model for the propensity

$\begin{eqnarray} \pi ({Y_i}, {\mathit{\boldsymbol{T}}_i}) = Pr(\left. {{\delta _i} = 1} \right|{Y_i}, {\mathit{\boldsymbol{X}}_i}, {\mathit{\boldsymbol{Z}}_i}, {U_i}) = \frac{1}{{1 + \psi ({\mathit{\boldsymbol{T}}_i})q({Y_i}, \mathit{\boldsymbol{\zeta}} )}}, \end{eqnarray}$

(2.7)

where ${q({Y_i}, \mathit{\boldsymbol{\zeta}})}$ is a known function of $\mathit{\boldsymbol{\zeta}}$ , which is a $d$ -dimensional unknown parameter, and $\psi (\cdot)$ is an unknown function. Several common models are special cases of the semiparametric model (2.7). For example, (2.7) reduces to the model for ignorable missing data when ${q({Y_i}, \mathit{\boldsymbol{\zeta}})}$ does not depend on ${Y_i}$ . When $q({Y_i}, \mathit{\boldsymbol{\zeta}}) = \exp (\mathit{\boldsymbol{\zeta}} {Y_i})$ , (2.7) simplifies to the exponential tilting model for nonignorable missing data, as described in Kim and Yu ^[14].

Without any further assumptions, Shao and Wang ^[17] showed that $\psi (\cdot)$ and $\mathit{\boldsymbol{\zeta}}$ are not identifiable in propensity score function (2.7). To overcome this difficulty, we assume that ${\mathit{\boldsymbol{T}}_i}$ can be decomposed as two parts ${\mathit{\boldsymbol{T}}_i} = {(\mathit{\boldsymbol{V}}_i^T, \mathit{\boldsymbol{S}}_i^T)^T}$ and ${\mathit{\boldsymbol{S}}_i}$ can be excluded from the propensity model, then (2.7) can be reduced as

$\begin{eqnarray} \pi ({Y_i}, {\mathit{\boldsymbol{T}}_i}) = \pi ({Y_i}, {\mathit{\boldsymbol{V}}_i}) = \frac{1}{{1 + \psi ({\mathit{\boldsymbol{V}}_i})q({Y_i}, \mathit{\boldsymbol{\zeta}} )}}, \end{eqnarray}$

(2.8)

where $\psi (\cdot)$ is an unspecified function of ${\mathit{\boldsymbol{V}}_i}$ . The covariate ${\mathit{\boldsymbol{S}}_i}$ is referred to the instrument variable by Wang et al. ^[16], which aids in identifying and estimating all unknown parameters.

To estimate the propensity defined by (2.8), for any fixed $\mathit{\boldsymbol{\zeta}}$ , we have

$\begin{eqnarray} {\psi _{\mathit{\boldsymbol{\zeta}}} }(\mathit{\boldsymbol{V}}) = \frac{{E\{ \left. {(1 - \delta )} \right|\mathit{\boldsymbol{V}}\} }}{{E\{ \left. {\delta q(Y, \mathit{\boldsymbol{\zeta}} )} \right|\mathit{\boldsymbol{V}}\} }}. \end{eqnarray}$

Thus, a nonparametric kernel estimator of ${\psi _{\zeta} }(\mathit{\boldsymbol{V}})$ is given by

$\begin{eqnarray} {\hat \psi _{\mathit{\boldsymbol{\zeta}}} }(\mathit{\boldsymbol{V}}) = \frac{{\sum\limits_{i = 1}^n {(1 - {\delta _i}){L_b}(\mathit{\boldsymbol{V}} - {\mathit{\boldsymbol{V}}_i})} }}{{\sum\limits_{i = 1}^n {{\delta _i}q({Y_i}, \mathit{\boldsymbol{\zeta}} ){L_b}(\mathit{\boldsymbol{V}}- {\mathit{\boldsymbol{V}}_i})} }}, \end{eqnarray}$

(2.9)

where ${L_b}(\cdot) = {b^{ - 1}}L({ \cdot / b})$ , $L(\cdot)$ is a kernel function and $b$ is a bandwidth. Note that ${\hat \psi _{\mathit{\boldsymbol{\zeta}}} }(\mathit{\boldsymbol{V}})$ is not an estimator of $\psi (\mathit{\boldsymbol{V}})$ as $\mathit{\boldsymbol{\zeta}}$ is an unknown parameter value. However, ${\hat \psi _{\mathit{\boldsymbol{\zeta}}} }(\mathit{\boldsymbol{V}})$ is useful in the following estimating equations for estimating unknown $\mathit{\boldsymbol{\zeta}}$ . Define

$\begin{eqnarray} f({Y}, \mathit{\boldsymbol{T}}, {\delta} , {\psi _{\mathit{\boldsymbol{\zeta}}} }, \mathit{\boldsymbol{\zeta}} ) = \left\{ {\left. {\frac{{\delta} }{{\pi ({Y}, \mathit{\boldsymbol{T}}, {\psi _{\mathit{\boldsymbol{\zeta}}} }, \mathit{\boldsymbol{\zeta}} )}} - 1} \right\}h(\mathit{\boldsymbol{S}})}\right., \end{eqnarray}$

(2.10)

where $\pi ({Y}, \mathit{\boldsymbol{T}}, {\psi _{\mathit{\boldsymbol{\zeta}}} }, \mathit{\boldsymbol{\zeta}}) = {\left\{ {1 + \psi (\mathit{\boldsymbol{V}})q({Y}, \mathit{\boldsymbol{\zeta}})} \right\}^{ - 1}}$ and $h(\mathit{\boldsymbol{S}})$ is a known vector-valued function with dimension $R \ge d + 1$ . As in Wang et al. ^[16], we recommend to use $h(\mathit{\boldsymbol{S}}) = (1, \mathit{\boldsymbol{V}}^{T}, \mathit{\boldsymbol{S}}^{T})^{T}$ throughout this paper. It is easy to verify that $E[{f({Y}, \mathit{\boldsymbol{T}}, {\delta}, {\psi _{\mathit{\boldsymbol{\zeta _0}}}}, {\mathit{\boldsymbol{\zeta}} _0})}] = \mathit{\boldsymbol{0}}$ under the true parameter value ${\mathit{\boldsymbol{\zeta}} _0}$ . The estimating equations are over-identifies because of $R > d$ , and we employ the two-step generalized method of moments (GMM).

Let ${F_n}({\hat \psi _{\mathit{\boldsymbol{\zeta}}} }, \mathit{\boldsymbol{\zeta}}) = \frac{1}{n}\sum\limits_{i = 1}^n {f({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\delta _i}, {{\hat \psi }_{\mathit{\boldsymbol{\zeta}}} }, \mathit{\boldsymbol{\zeta}})}$ and ${\hat {\mathit{\boldsymbol{\zeta}}} ^{(1)}} = \mathop {\arg \min }\limits_{\mathit{\boldsymbol{\zeta}}} {F_n}{({\hat \psi _{\mathit{\boldsymbol{\zeta}}} }, \mathit{\boldsymbol{\zeta}})^T}{F_n}({\hat \psi _{\mathit{\boldsymbol{\zeta}}} }, \mathit{\boldsymbol{\zeta}})$ . The two-step GMM estimator of $\mathit{\boldsymbol{\zeta}}$ is

$\begin{eqnarray} \hat {\mathit{\boldsymbol{\zeta}}} = \mathop {\arg \min }\limits_{\mathit{\boldsymbol{\zeta}}} {F_n}{({\hat \psi _{\mathit{\boldsymbol{\zeta}}} }, \mathit{\boldsymbol{\zeta}} )^T}\hat W_n^{ - 1}{F_n}({\hat \psi _{\mathit{\boldsymbol{\zeta}}} }, \mathit{\boldsymbol{\zeta}}), \end{eqnarray}$

(2.11)

where ${\hat W_n} = \frac{1}{n}\sum\limits_{i = 1}^n {f({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\delta _i}, {{\hat \psi }_{{{\hat {\mathit{\boldsymbol{\zeta}}} }^{(1)}}}}, {{\hat {\mathit{\boldsymbol{\zeta}}} }^{(1)}})} f{({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\delta _i}, {\hat \psi _{{{\hat {\mathit{\boldsymbol{\zeta}}} }^{(1)}}}}, {\hat {\mathit{\boldsymbol{\zeta}}} ^{(1)}})^T}.$ Eventually, the propensity model can be consistently estimated by

$\begin{eqnarray} \hat{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}):\hat{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, \hat {\mathit{\boldsymbol{\zeta}}}) = \frac{1}{{1 + {{\hat \psi }_{\hat {\mathit{\boldsymbol{\zeta}}} }}({\mathit{\boldsymbol{V}}_i})q({Y_i}, \hat{\mathit{\boldsymbol{\zeta}}} )}}. \end{eqnarray}$

(2.12)

Then, the modal estimator $\mathit{\boldsymbol{\hat \beta}}$ for $\mathit{\boldsymbol{\beta}}$ is constructed by maximizing the following objective function

$\begin{eqnarray} \hat{Q}(\mathit{\boldsymbol{\beta}}) = \sum\limits_{i = 1}^n \hat{{\rm{\Delta }}}_i{{\phi _h}\left[ {{\mathit{\boldsymbol{P}}_i}\left( {\mathit{\boldsymbol{Y}} - g\left( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}} } \right)} \right)} \right]}, \end{eqnarray}$

(2.13)

where $\hat{{\rm{\Delta }}}_i = {\delta _i}/\hat{\pi} ({Y_i}, {\mathit{\boldsymbol{T}}_i})$ .

Once modal estimator $\mathit{\boldsymbol{\hat \beta}}$ is obtained, model (2.3) can be transformed to

$\begin{eqnarray} Y_i^* \approx \mathit{\boldsymbol{W}}_i^T\mathit{\boldsymbol{\gamma}} + {\varepsilon _i}, i = 1, \cdots , n, \end{eqnarray}$

(2.14)

where $Y_i^ * = {Y_i} - g({{\mathit{\boldsymbol{Z}}_i}, \mathit{\boldsymbol{\hat \beta}} })$ .

Similarly, the modal estimators of $\mathit{\boldsymbol{\gamma}}$ are obtained by maximizing the following objective function

$\begin{eqnarray} \hat{Q}(\mathit{\boldsymbol{\gamma}} ) = \sum\limits_{i = 1}^n\hat{\Delta} _i {{\phi _h}[ {( {Y_i^ * - \mathit{\boldsymbol{W_i}}^T\mathit{\boldsymbol{\gamma}} } )} ]} . \end{eqnarray}$

(2.15)

Therefore, we have the estimator of coefficient functions by ${\hat \alpha _k}(u) = \mathit{\boldsymbol{B}}{(u)^T}{\hat {\mathit{\boldsymbol{\gamma}}} _k}, k = 1, \cdots, p$ .

3. Theoretical properties

In this section, we discuss the asymptotic properties of the proposed modal regression estimators for both parametric and nonparametric. Denote the true values of $\mathit{\boldsymbol{\beta}}$ and ${\alpha _k}(\cdot)$ as $\mathit{\boldsymbol{\beta}}_0$ and ${\alpha _{0k}}(\cdot)$ in model (2.1), respectively. Correspondingly, $\mathit{\boldsymbol{\gamma}}_{0k}$ is the optimal approximation coefficient of $\mathit{\boldsymbol{\gamma}}_{k}$ in (2.2). Let $F(x, z, u, h) = \mathrm{E}(\phi_h''(\varepsilon)|{{X} = x, {Z} = z}, U = u)$ and $G(x, z, u, h) = \mathrm{E}(\phi_h'(\varepsilon)^2|{{X} = x, {Z} = z}, U = u)$ . To facilitate the presentation, we denote by $g'(\mathit{\boldsymbol{z}}, \mathit{\boldsymbol{\beta}}) = \partial g(\mathit{\boldsymbol{z}}, \mathit{\boldsymbol{\beta}}) / \partial \mathit{\boldsymbol{\beta}}$ the $q\times 1$ vector of the first order derivatives of $g(\mathit{\boldsymbol{z}}, \mathit{\boldsymbol{\beta}})$ with respect to $\mathit{\boldsymbol{\beta}}$ .

The required assumption conditions of the main results are derived in the following. These assumption are common and can be easily satisfied.

A1: The random variable $U$ has a bounded support $\mathcal{U}$ . Its density function $f(\cdot)$ is Lipschitz continuous, bounded away from 0 to infinite on its support.

A2: All the true coefficient functions ${\alpha _{0k}}(u), k = 1, \cdots, p$ are all $r$ th continuously differentiable on the interval $(0, 1)$ with $r \ge 2$ .

A3: For any $\mathit{\boldsymbol{z}}$ , $g(\mathit{\boldsymbol{z}}, \mathit{\boldsymbol{\beta}})$ is a continuous function of $\mathit{\boldsymbol{\beta}}$ , and the 2th derivative of $g(\mathit{\boldsymbol{z}}, \mathit{\boldsymbol{\beta}})$ with respect to $\mathit{\boldsymbol{\beta}}$ is continuous.

A4: There exists an $s > 0$ such that $\mathrm{E}||\textbf{X}||^{2s} < \infty$ , $\mathrm{E}||g'(\mathit{\boldsymbol{z}}, \mathit{\boldsymbol{\beta}})||^{2s} < \infty$ .

A5: The bandwidth $h$ satisfies as $n \to \infty, n{h^{2r}} \to \infty, n{h}/\log n \to \infty$ .

A6: Suppose that ${{\mathit{\boldsymbol{\zeta}}} _0}$ is the unique solution to $E[{f({Y}, \mathit{\boldsymbol{T}}, {\delta}, {\psi _{\mathit{\boldsymbol{\zeta}}} }, \mathit{\boldsymbol{\zeta}}) }] = 0$ , $\Gamma = E[\partial f({Y}, \mathit{\boldsymbol{T}}, {\delta}, {\psi _{\mathit{\boldsymbol{\zeta}}_0} }, \mathit{\boldsymbol{\zeta_0}})/\partial \mathit{\boldsymbol{\zeta}}]$ is of full rank and $D = E[{ f {{({Y}, \mathit{\boldsymbol{T}}, {\delta}, {\psi _{\mathit{\boldsymbol{\zeta}}_0} }, \mathit{\boldsymbol{\zeta_0}})}^{ \otimes 2}}}]$ is positive definite.

A7: Let $c_1, \ldots, c_K$ be the interior knots on $[0, 1]$ . Set $c_0 = 0, \; c_{K+1} = 1$ and $h_i = c_i-c_{i-1}$ . Then, there exists a constant $C_0$ such that

$\frac{{\rm{max}}\{h_i\}}{{\rm{min}}\{h_i\}}\leq C_0, \; \; \; {\rm{max}}\{|h_{i+1}-h_{i}|\} = o(K^{-1}).$

A8: $F(x, z, u, h)$ and $G(x, z, u, h)$ are continuous with respect to $(x, z, u)$ . In addition, $F(x, z, u, h) < 0$ for any $h > 0$ .

A9: $\mathrm{E}(\phi_h'(\varepsilon)|{x}, z, u) = 0$ and $\mathrm{E}(\phi_h''(\varepsilon)^2|{x}, z, u)$ , $\mathrm{E}(\phi_h'(\varepsilon)^3|{x}, z, u)$ and $\mathrm{E}(\phi_h'''(\varepsilon)|{x}, z, u)$ are continuous with respect to $(x, z, u)$ .

A10: The matrix $A$ and $\Sigma$ defined in Theorem 1 are positive definite.

These assumptions are commonly adopted in the previous semiparametric model. Assumptions A1–A4 are generally required in the varying-coefficient partially nonlinear model. Assumption A5 is common with bandwidth in kernel function. Assumption A6 is necessary for proving the asymptotic properties of two-step GMM estimator ${{\mathit{\boldsymbol{\zeta}}}}$ . Assumption A7 indicates that $c_1, \ldots, c_K$ is a quasi-uniform sequence of partitions on $[0, 1]$ . Assumptions A8 and A9 are commonly used in the modal regression. Assumption A10 ensures the existence of the asymptotic variance of the estimator $\mathit{\boldsymbol{\hat\beta}}$ .

The following Theorem 1 gives the asymptotic normality for the parameter estimator $\mathit{\boldsymbol{\hat\beta}}$ .

Theorem 1. Suppose that assumptions A1-A10 hold and the number of interior knots is $K = O(n^{1/(2r+1)})$ . As $n\rightarrow \infty$ , we have

$\sqrt{n}(\hat{\mathit{\boldsymbol{\beta}}}-\mathit{\boldsymbol{\beta}}_0)\stackrel{d}\longrightarrow N(0, C^{-1}),$

where $C = \Sigma A^{-1}\Sigma ^T$ , $\Sigma = E({F(x, z, u, h)g'(\mathit{\boldsymbol{z}}, \mathit{\boldsymbol{\beta}}){{g'(\mathit{\boldsymbol{z}}, \mathit{\boldsymbol{\beta}})^T}}})$ , $A = B+H {\rm{\Pi}} H^T$ , $B = E[{{\pi ({Y}, \mathit{\boldsymbol{T}}, {\mathit{\boldsymbol{\zeta}}_0})}}^{-1} G(x, z, u, h)g'(\mathit{\boldsymbol{z}}, \mathit{\boldsymbol{\beta}}){{ g'(\mathit{\boldsymbol{z}}, \mathit{\boldsymbol{\beta}})^T}}]$ , ${\rm{\Pi}}$ is the asymptotic variance of $\hat{\mathit{\boldsymbol{\zeta}}}$ defined in remark 1, $H = \lim\limits_{n\to\infty}\frac{1}{n}\sum_{i = 1}^nE[{{\pi ({Y}, \mathit{\boldsymbol{T}}, {\mathit{\boldsymbol{\zeta}}_0})}}^{-1} \phi_h'(\varepsilon)(\mathit{\boldsymbol{P}}_ig'(\mathit{\boldsymbol{\beta}}))^T\{\partial {{\pi ({Y}, \mathit{\boldsymbol{T}}, {\mathit{\boldsymbol{\zeta}}_0})}}/ \partial {\mathit{\boldsymbol{\zeta}}}\}^T]$ and $\mathit{\boldsymbol{P}}_i$ is the $i$ th row of $\mathit{\boldsymbol{P}} = {\mathit{\boldsymbol{I}}_n} - {\mathit{\boldsymbol{W}}{{({{\mathit{\boldsymbol{W}}^T}\mathit{\boldsymbol{W}}})}^{ - 1}}{\mathit{\boldsymbol{W}}^T}}$ , $g'(\mathit{\boldsymbol{\beta}}) = {(g'({\mathit{\boldsymbol{Z}}_1}, \mathit{\boldsymbol{\beta}}), \cdots, g'({\mathit{\boldsymbol{Z}}_n}, \mathit{\boldsymbol{\beta}}))^T}$ .

Remark 1. As discussed in ^[22], $\hat {\mathit{\boldsymbol{\zeta}}}$ is a consistent estimator of ${\mathit{\boldsymbol{\zeta}}}$ , and

$\begin{eqnarray} {n^{1/2}}( {\hat {\mathit{\boldsymbol{\zeta}}} - {{\mathit{\boldsymbol{\zeta}}}}} )\mathop \to \limits^{d} N(0, {\rm{\Pi }}), \end{eqnarray}$

where ${\rm{\Pi }} = {({{{\rm{\Xi }}^ T }{D^{-1}}{\rm{\Xi }}})^{-1}}{{\rm{\Xi }}^T }{D^{-1}}{\rm{\Omega }}{D^{-1}}{\rm{\Xi }}{({{{\rm{\Xi }}^T }{D^{ - 1}}{\rm{\Xi }}})^{ - 1}}$ , ${\rm{\Xi }} = E\{{(1 - \delta)[{h(\mathit{\boldsymbol{S}})-{m_s}(\mathit{\boldsymbol{V}})}][{\eta ({Y}, {{\mathit{\boldsymbol{\zeta}}}_0}})-{m_\eta}(\mathit{\boldsymbol{V}})}]\}$ , ${m_\eta }(\mathit{\boldsymbol{V}}) = {{E\{ {{\delta \eta ({{Y}, {{\mathit{\boldsymbol{\zeta}}} _0}})q({{Y}, {{\mathit{\boldsymbol{\zeta}}} _0}})} |\mathit{\boldsymbol{V}}} \}}/ {E\{ { {\delta q({{Y}, {{\mathit{\boldsymbol{\zeta}}} _0}})} |\mathit{\boldsymbol{V}}} \}}}$ , ${m_s}(\mathit{\boldsymbol{V}}) = {{E\{ {{\delta h(\mathit{\boldsymbol{S}})q({{Y}, {{\mathit{\boldsymbol{\zeta}}} _0}})} |\mathit{\boldsymbol{V}}} \}} / {E\{ {{\delta q({{Y}, {{\mathit{\boldsymbol{\zeta}}} _0}})} |\mathit{\boldsymbol{V}}} \}}}$ , $\eta ({{Y}, {{\mathit{\boldsymbol{\zeta}}} _0}}) = {\{ {q({{Y}, {{\mathit{\boldsymbol{\zeta}}} _0}})} \}^{ - 1}}\partial q({{Y}, {{\mathit{\boldsymbol{\zeta}}} _0}})/\partial {{\mathit{\boldsymbol{\zeta}}} ^ \top }$ , ${\Omega }$ is the covariate matrix of ${\rm{\Lambda }} = \{ {h(\mathit{\boldsymbol{S}}) - {m_s}(\mathit{\boldsymbol{V}})} \}$ $[{\delta \{ {1 + \psi_{\zeta _0}(\mathit{\boldsymbol{V}})q({{Y}, {{\mathit{\boldsymbol{\zeta}}} _0}})} \} -1}]$ + ${\cal M}$ , ${\cal M} = {({{{\cal M}_1}, \ldots, {{\cal M}_R}})^ \top }$ , ${{\cal M}_r}$ = vec $(\Phi_{r})^{T} \ell(\mathit{\boldsymbol{T}}, {Y}, \delta), r = 1, \ldots, R$ , $\ell(\mathit{\boldsymbol{T}}, {Y}, \delta)$ is an influence function, $R$ is the dimension of $h(\mathit{\boldsymbol{S}})$ , $\Phi_{r} = E\{ {{{\psi_{\zeta _0}(\mathit{\boldsymbol{V}})[{\nabla {{\cal C}_r}(\mathit{\boldsymbol{V}}){f_v}(\mathit{\boldsymbol{V}}) + {{\cal C}_r }(\mathit{\boldsymbol{V}})\nabla {f_v}(\mathit{\boldsymbol{V}})}]} / {{f_v}(\mathit{\boldsymbol{V}})}}} \}$ , ${{\cal C}_r}(\mathit{\boldsymbol{V}}) = {\mathop{\rm Cov}} ({\mathit{\boldsymbol{T}}, {{\tilde {\cal W}}_r }\mid \mathit{\boldsymbol{V}}}),$ $\nabla {{\cal C}_r}(\mathit{\boldsymbol{V}}) = \partial {{\cal C}_r}(\mathit{\boldsymbol{V}})$ $/\partial {\mathit{\boldsymbol{V}}^ \top },$ $\nabla {f_v}(\mathit{\boldsymbol{V}}) = \partial {f_v}(\mathit{\boldsymbol{V}})/\partial {\mathit{\boldsymbol{V}}^ \top }$ , ${f_v}(\mathit{\boldsymbol{V}})$ is the density of $\mathit{\boldsymbol{V}}$ and ${{\tilde {\cal W}}_r}$ is the $r$ th coordinate of $\delta q({{Y}, {{\mathit{\boldsymbol{\zeta}}} _0}})[{h(\mathit{\boldsymbol{S}}) - {m_s}(\mathit{\boldsymbol{V}})}]$ .

The following Theorem 2 gives the consistency of estimator ${\hat \alpha _k}(\cdot), k = 1, \cdots, p$ .

Theorem 2. Suppose that assumptions A1–A10 hold, and the number of interior knots is $K = O(n^{1/(2r+1)})$ . As $n\rightarrow \infty$ , we have

$\begin{eqnarray} \left\| {{{\hat \alpha }_k}( \cdot ) - {\alpha _{0k}}( \cdot )} \right\| = {O_p}({n^{ - \frac{r}{{2r + 1}}}}), k = 1, \cdots , p. \end{eqnarray}$

4. Bandwidth selection and estimation algorithm

In this section, we discuss the selection of the bandwidth and estimation procedure based on the modal expectation maximization (MEM) algorithm proposed by Li et al. ^[23].

4.1. Bandwidth selection

To obtain the modal estimators $\hat{\mathit{\boldsymbol{\beta}}}$ and $\hat{\mathit{\boldsymbol{\gamma}}}$ of parameters $\mathit{\boldsymbol{\beta}}$ and $\mathit{\boldsymbol{\gamma}}$ , it is necessary to select an appropriate bandwidth $h$ . For simplicity, we assume that the error variable is independent of $\mathit{\boldsymbol{X}}$ , $\mathit{\boldsymbol{Z}}$ and $U$ . According to the thought of Yao et al. ^[18], the ratio of the asymptotic variance of our proposed estimator to that of the least square estimator is given by

$\begin{eqnarray} R({h_1}) = \frac{{ G({h_1})}}{{ F{{({h_1})}^2} \sigma_1^2}}, \end{eqnarray}$

(4.1)

where $\sigma_1^2 = E(\varepsilon^2)$ , $F(h_1) = \mathrm{E}[\phi_{h_1}''(\varepsilon)]$ and $G(h_1) = \mathrm{E}[\phi_{h_1}'(\varepsilon)^2]$ . In practice, we do not know the error distribution, hence, we cannot obtain $F(h_1)$ and $G(h_1)$ . A feasible method is with estimator $F(h_1)$ and $G(h_1)$ by $\hat F({h_1}) = \frac{1}{n}\sum\limits_{i = 1}^n {{\phi''_{{h_1}}}\left({{{\hat \varepsilon }_{1i}}} \right)},$ and $\hat G({h_1}) = \frac{1}{n}\sum\limits_{i = 1}^n [\phi'_{{h_1}}(\hat\varepsilon_{1i})]^2$ , respectively.

Then, the estimator for $R({h_1})$ is given by

$\begin{eqnarray} \hat{R}({h_1}) = \frac{{\hat G({h_1})}}{{\hat F{{({h_1})}^2}\hat \sigma _1^2}}, \end{eqnarray}$

(4.2)

where ${\hat \varepsilon _{1i}} = \hat{\Delta} _i{\mathit{\boldsymbol{P}}_i}[\mathit{\boldsymbol{Y}} - g(\mathit{\boldsymbol{Z}}, {\mathit{\boldsymbol{\hat \beta}} ^{{\mathop{\rm int}} }})]$ and $\hat \sigma _1^2 = \frac{1}{n}\sum\limits_{i = 1}^n \hat\varepsilon_{1i}^2$ , $\hat{\mathit{\boldsymbol{\beta}}}^{\rm int}$ are the pilot estimators.

Similarly, to obtain the modal estimator ${\hat{\gamma}}$ , the estimator for the ratio of the asymptotic variance of our proposed estimator to that of the least square estimator is given by

$\begin{eqnarray} \hat{R}({h_2}) = \frac{{\hat G({h_2})}}{{\hat F{{({h_2})}^2}\hat \sigma _2^2}}, \end{eqnarray}$

(4.3)

where $\hat F({h_2}) = \frac{1}{n}\sum\limits_{i = 1}^n {{\phi''_{{h_2}}}\left({{{\hat \varepsilon }_{2i}}} \right)}, \hat G({h_2}) = \frac{1}{n}\sum\limits_{i = 1}^n [\phi'_{{h_2}}(\hat\varepsilon_{2i})]^2$ , ${\hat \varepsilon _{2i}} = {{\hat{\Delta }}_i}({{Y_i} - g({{\mathit{\boldsymbol{Z}}_i}, \mathit{\boldsymbol{\hat \beta}} }) - \mathit{\boldsymbol{X}}_i^T{{\mathit{\boldsymbol{\hat \alpha}} }^{{\mathop{\rm int}} }}({{U_i}})})$ and $\hat \sigma _2^2 = \frac{1}{n}\sum\limits_{i = 1}^n \hat\varepsilon_{2i}^2$ , $\hat{\mathit{\boldsymbol{\beta}}}$ are the modal estimators and ${ \mathit{\boldsymbol{\hat \alpha}} ^{{\mathop{\rm int}} }}({{\cdot}})$ are the pilot estimators.

Furthermore, we use the grid search method to choose appropriate bandwidth. ${h_{opt1}}$ and ${h_{opt2}}$ can be obtained by minimizing (4.2) and (4.3), respectively. The optimal choice of $h_{1}$ and $h_{2}$ are

$\begin{eqnarray} {h_{opt1}} = \mathop {\arg \min }\limits_{{h_1}} \hat{R}({h_1}), {h_{opt2}} = \mathop {\arg \min }\limits_{{h_2}} \hat{R}({h_2}). \end{eqnarray}$

Following Yao et al. ^[18], the possible grid points for bandwidth can be ${h_1} = 0.5{\hat \sigma _1} \times {1.02^j}$ , ${h_2} = 0.5{\hat \sigma _2} \times {1.02^j}, j = 0, 1, \cdots, 100$ .

4.2. The MEM algorithm of parametric components

In this subsection, we adopt the following MEM algorithm by Li et al. ^[23] to obtain the estimators of parameters $\mathit{\boldsymbol{\beta}}$ by maximizing (2.13). Let $\mathit{\boldsymbol{\beta}}^{(0)}$ be initial estimators of $\mathit{\boldsymbol{\beta}}$ with $m = 0$ .

$\textbf{Step 1 (E-step):}$ We update $\pi(i|\mathit{\boldsymbol{\beta}}^{(m)})$ by

$\pi (\left. i \right|{{\mathit{\boldsymbol{\beta}}} ^{(m)}}) = \frac{{{\hat{\Delta }}_i}{{\phi _h} \{ {{{\mathit{\boldsymbol{P}}_i}[{\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, {{\mathit{\boldsymbol{\beta}}} ^{(m)}}} )} ]} }\}}} {{\sum\limits_{i = 1}^n {{\hat{\Delta }}_i}{{\phi _h}\{ {{\mathit{\boldsymbol{P}}_i}[{\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, {{\mathit{\boldsymbol{\beta}}} ^{(m)}}} )} ]} \}} }}, i = 1, 2, \cdots , n.$

$\textbf{Step 2 (M-step):}$ We update $\mathit{\boldsymbol{\beta}}^{(m+1)}$ by

$\begin{eqnarray} \mathit{\boldsymbol{\beta}}^{(m+1)}& = &\arg\max\limits_{\mathit{\boldsymbol{\beta}}}\sum\limits_{i = 1}^{n}\pi(i|\mathit{\boldsymbol{\beta}}^{(m)})\hat{\Delta} _i \log\phi_{h}\left\{ {{\mathit{\boldsymbol{P}}_i}[\mathit{\boldsymbol{Y}} - g(\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}} )]} \right\} \\ & = &\arg\max\limits_{\mathit{\boldsymbol{\beta}}}\sum\limits_{i = 1}^{n}\pi(i|\mathit{\boldsymbol{\beta}}^{(m)})\hat{\Delta} _i \log\phi_{h}\{ {{\mathit{\boldsymbol{P}}_i}[ {\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, {{\mathit{\boldsymbol{\beta}}} ^{(m)}}} ) - {g^\prime }( {\mathit{\boldsymbol{Z}}, {{\mathit{\boldsymbol{\beta}}} ^{(m)}}} )( {\mathit{\boldsymbol{\beta}} - {{\mathit{\boldsymbol{\beta}}} ^{(m)}}} )} ]} \}\\ & = & (\mathit{\boldsymbol{G}}^{T}\mathit{\boldsymbol{D}} \mathit{\boldsymbol{G}})^{-1}\mathit{\boldsymbol{G}}^{T}\mathit{\boldsymbol{D}} \mathit{\boldsymbol{\hat{Y}}}, \end{eqnarray}$

where $\mathit{\boldsymbol{\hat{Y}}} = (\hat{Y}_{1}, \hat{Y}_{2}, \ldots, \hat{Y}_{n})^{T}$ , ${\hat Y_i} = {{\hat{\Delta }}_i}{\mathit{\boldsymbol{P}}_i}$ $[{\mathit{\boldsymbol{Y}} - g({\mathit{\boldsymbol{Z}}, {{\mathit{\boldsymbol{\beta}}} ^{(m)}}}) + {g^\prime }({\mathit{\boldsymbol{Z}}, {{\mathit{\boldsymbol{\beta}}} ^{(m)}}}){{\mathit{\boldsymbol{\beta}}} ^{(m)}}}]$ , $\mathit{\boldsymbol{G}} = {({{\hat\Delta _1}{\mathit{\boldsymbol{P}}_1}{g^\prime }({\mathit{\boldsymbol{Z}}, {{\mathit{\boldsymbol{\beta}}} ^{(m)}}}), {\hat\Delta _2}{\mathit{\boldsymbol{P}}_2}{g^\prime }({\mathit{\boldsymbol{Z}}, {{\mathit{\boldsymbol{\beta}}} ^{(m)}}}), \cdots, {\hat\Delta _n}{\mathit{\boldsymbol{P}}_n}{g^\prime }({\mathit{\boldsymbol{Z}}, {{\mathit{\boldsymbol{\beta}}} ^{(m)}}})})^T}$ , $\mathit{\boldsymbol{D}}$ is $n\times n$ diagonal matrix with $\mathit{\boldsymbol{D}} = \mathrm{diag}$ $({\pi ({\left. 1 \right|{{\mathit{\boldsymbol{\beta}}} ^{(m)}}}), \pi ({\left. 2 \right|{{\mathit{\boldsymbol{\beta}}} ^{(m)}}}), \cdots, \pi ({\left. n \right|{{\mathit{\boldsymbol{\beta}}} ^{(m)}}})})$ .

$\textbf{Step 3:}$ Let $m = m + 1$ and iterate the E-step and M-step until the algorithm converges. Denote the final estimator of $\mathit{\boldsymbol{\beta}}$ as $\mathit{\boldsymbol{\hat{\beta}}}$ .

4.3. The MEM algorithm of nonparametric components

Based on the MEM algorithm, we can obtain the estimators of parameters $\mathit{\boldsymbol{\gamma}}$ by maximizing (2.15). Let $\mathit{\boldsymbol{\gamma}}^{(0)}$ be initial estimators of $\mathit{\boldsymbol{\gamma}}$ with $m = 0$ .

$\textbf{Step 1 (E-step):}$ We update $\pi(i|\mathit{\boldsymbol{\gamma}}^{(m)})$ by

$\pi (\left. i \right|{{\mathit{\boldsymbol{\gamma}}} ^{(m)}}) = \frac{{{\hat{\Delta }}_i}{{\phi _h}[ {( {Y_i^ * - \mathit{\boldsymbol{W}}_i^T{{\mathit{\boldsymbol{\gamma}}} ^{(m)}}} )} ]}}{\sum\limits_{i = 1}^n {{\hat{\Delta }}_i}{{\phi _h}[ {( {Y_i^ * - \mathit{\boldsymbol{W}}_i^T{{\mathit{\boldsymbol{\gamma}}} ^{(m)}}} )} ]} }, i = 1, 2, \cdots , n.$

$\textbf{Step 2(M-step):}$ We update $\mathit{\boldsymbol{\gamma}}^{(m+1)}$ by

$\begin{eqnarray} \mathit{\boldsymbol{\gamma}}^{(m+1)}& = &{\mathop {\arg \max }\limits_ {\mathit{\boldsymbol{\gamma}}} \sum\limits_{i = 1}^n {\pi ( {\left. i \right|{ {\mathit{\boldsymbol{\gamma}}} ^{(m)}}} ){{\hat{\Delta }}_i}\log {\phi _h}[ {( {Y_i^ * - \mathit{\boldsymbol{W}}_i^T \mathit{\boldsymbol{\gamma}} } )} ]} }\\ & = &{{{( {{{\tilde {\mathit{\boldsymbol{W}}}}^T}\tilde {\mathit{\boldsymbol{D}}}\tilde {\mathit{\boldsymbol{W}}}} )}^{ - 1}}{{\tilde {\mathit{\boldsymbol{W}}}}^T}\tilde {\mathit{\boldsymbol{D}}}\tilde {\mathit{\boldsymbol{Y}}}}, \end{eqnarray}$

(4.4)

where $\tilde {\mathit{\boldsymbol{Y}}} = {{\mathit{\boldsymbol{\Delta}} }}{\mathit{\boldsymbol{Y}}^ * }$ , $\tilde {\mathit{\boldsymbol{W}}} = {{\mathit{\boldsymbol{\Delta}} }}\mathit{\boldsymbol{W}}$ , $\tilde {\mathit{\boldsymbol{D}}} = \mathrm{diag}({\pi ({\left. 1 \right|{{\mathit{\boldsymbol{\gamma}}} ^{(m)}}}), \cdots, \pi ({\left. n \right|{{\mathit{\boldsymbol{\gamma}}} ^{(m)}}})})$ and ${{\mathit{\boldsymbol{\Delta}} }} = \mathrm{diag}({{{\hat{{\Delta} }}_1}, \cdots, {{\hat{{\Delta} }}_n}})$ .

$\textbf{Step 3:}$ Let $m = m + 1$ and iterate the E-step and M-step until the algorithm converges. Denote the final estimator of $\mathit{\boldsymbol{\gamma}}$ as $\mathit{\boldsymbol{\hat{\gamma}}}$ . The corresponding estimator of nonparametric function is ${\hat \alpha _k}(u) = \mathit{\boldsymbol{B}}{(u)^T}{\hat {\mathit{\boldsymbol{\gamma}}} _k}, k = 1, \cdots, p$ .

5. Simulation studies

In this section, we conduct some simulations to evaluate the finite sample performances of the proposed estimation procedures. We generate the data from the varying-coefficient partially nonlinear model described as follows:

$\begin{eqnarray} {Y} = \mathit{\boldsymbol{X}}^T\mathit{\boldsymbol{\alpha}} \left( {{U}} \right) + g\left( {{\mathit{\boldsymbol{Z}}}, \mathit{\boldsymbol{\beta}} } \right) + {\varepsilon }, \end{eqnarray}$

(5.1)

where the nonlinear function $g(\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}}) = \exp \left({{Z_1}{\beta _1} + {Z_2}{\beta _2}} \right)$ with parameter vector $\mathit{\boldsymbol{\beta}} = {\left({{\beta _1}, {\beta _2}} \right)^T}$ = ${\left({1, 1.5} \right)^T}$ , the coefficient functions $\mathit{\boldsymbol{\alpha}} (U) = {\left({{\alpha _1}(U), {\alpha _2}(U)} \right)^T}$ with ${\alpha _1}(U) = \sin (2\pi U)$ and ${\alpha _2}(U) =$ $3.5\{\exp [{-{{(4U-1)}^2}}] + \exp [{-{{(4U-3)}^2}}] - 1.5\}.$ The variable $U$ is generated from uniform distribution $U(0, 1)$ . The variable $\mathit{\boldsymbol{X}} = {\left({{X_1}, {X_2}} \right)^T}$ follows $\left({0, \Sigma} \right)$ with ${\Sigma _{ij}} = {0.5^{\left| {\left. {i - j} \right|} \right.}}, 1 \le i, j \le 2$ and $\mathit{\boldsymbol{Z}} = {\left({{Z_1}, {Z_2}} \right)^T}$ and it is the same generated as $\mathit{\boldsymbol{X}}$ . To illustrate the robustness of the proposed method, the following three different distribution of model errors are considered: (1) The normal distribution: $\varepsilon\sim N(0, 1)$ ; (2) the $t$ distribution: $\varepsilon\sim t(3)$ ; (3) the mixed normal distribution(MN): $\varepsilon\sim 0.9N(0, 1^{2})+0.1N(0, 9^{2})$ . The sample sizes $n$ are 200 and 400, separately, and the simulations are based on 500 replications. Furthermore, we generate ${\delta _i}$ from the Bernoulli distribution with probability $\pi ({Y_i}, {\mathit{\boldsymbol{V}}_i})$ = $\{{1 + \psi ({\mathit{\boldsymbol{V}}_i})\exp \{ \zeta {Y_i}\} }\}^{-1}$ and consider four choices of the function $\psi (\cdot)$ and parameter $\zeta$ :

Case1. $\psi \left({{\mathit{\boldsymbol{V}}_i}} \right) = \exp \left\{ { - 0.1 - 1.5{X_1}_i - 1.5{Z_{1i}} - 1.5{U_i}} \right\}$ and $\zeta = 0$ ;

Case2. $\psi \left({{\mathit{\boldsymbol{V}}_i}} \right) = \exp \left\{ { - 0.1 + 0.5{X_1}_i + 0.5{Z_{1i}} + 0.5{U_i}} \right\}$ and $\zeta = - 0.8$ ;

Case3. $\psi \left({{\mathit{\boldsymbol{V}}_i}} \right) = \exp \left\{ { - 0.1 + 0.5\sin ({X_{1i}}) + 0.5\sin ({Z_{1i}}) + 0.5\sin ({U_i})} \right\}$ and $\zeta = - 0.8$ ;

Case4. $\psi \left({{\mathit{\boldsymbol{V}}_i}} \right) = \exp \left\{ {0.6 - 0.3\exp ({X_1}_i) - 0.3\exp ({Z_{1i}}) - 0.3\exp ({U_i})} \right\}$ and $\zeta = - 0.1$ .

While Case 1 is an ignorable missing mechanism, Cases 2–4 represent three different kinds of nonignorable missing mechanisms. The coefficients in the aforementioned four probability models are chosen so that the average proportions of missing data are between 30 $%$ and 40 $%$ . It can be seen that ${\mathit{\boldsymbol{V}}_i} = ({{X_{1i}}, {Z_{1i}}, {U_i}})^{T}$ and the instrument variable ${\mathit{\boldsymbol{S}}_i} = ({{X_{2i}}, {Z_{2i}}})^{T}$ . As in Shao and Wang ^[17], we select the Gaussian kernel function $L(\cdot) = \frac{1}{\sqrt{2\pi}}{\rm exp}({-(\cdot)^{2}}/{2})$ to compute the nonparametric kernel estimator with $L\left({{X_{1i}}, {Z_{1i}}, {U_i}} \right) = L\left({{X_{1i}}} \right)L\left({{Z_{1i}}} \right)L\left({{U_i}} \right)$ , and choose the bandwidth to be ${b_1} = 1.5{\hat \sigma _{{X_1}}}{n^{{{ - 1} / 3}}}$ , ${b_2} = 1.5{\hat \sigma _{{Z_1}}}{n^{{{ - 1} / 3}}}$ and ${b_3} = 1.5{\hat \sigma _U}{n^{{{ - 1} / 3}}}$ , where ${\hat \sigma _{{X_1}}}, {\hat \sigma _{{Z_1}}}, {\hat \sigma _U}$ is the standard deviation of datasets $\left\{ {{X_{1i}}} \right\}, \left\{ {{Z_{1i}}} \right\}, \left\{ {{U_i}} \right\}, i = 1, 2, \cdots n$ .

To evaluate the efficiencies of the proposed estimation procedure, four estimators of $\mathit{\boldsymbol{\beta}}$ are considered as follows:

(ⅰ) The proposed modal regression (MR) estimator is based on (2.13), which is denoted as MNAR-MR.

(ⅱ) The MR estimator is based on (2.13) in which $\hat\pi$ is estimated with MAR assumption, which is denoted as MAR-MR.

(ⅲ) The MR estimator is based on complete-case(CC) data, which is denoted as CC-MR.

(ⅳ) The MR estimator is based on full sample data, which is denoted as FULL-MR.

Further, in order to assess the robustness of the proposed procedure, we also compare the profile least square (PLS) method in four scenarios: (ⅰ) The PLS estimator based on (2.13) which is denoted as MNAR-PLS, (ⅱ) The PLS estimator based on (2.13) in which $\hat\pi$ is estimated with MAR assumption, which is denoted as MAR-PLS, (ⅲ) The PLS estimator based on complete-case data, which is denoted as CC-PLS and (ⅳ) The PLS estimator based on full sample data, which is denoted as FULL-PLS. The absolute Bias and standard deviation (SD) with the proposed MR method and the PLS method are reported in Tables 1–4, respectively.

Table 1. Bias and SD of four estimators under proposed MR procedure with

$n = 200$ .

$\varepsilon$	$\pi$		MNAR-MR		MAR-MR		CC-MR		FULL-MR
			$\beta_{1}$	$\beta_{2}$	$\beta_{1}$	$\beta_{2}$	$\beta_{1}$	$\beta_{2}$	$\beta_{1}$	$\beta_{2}$
$\mathit{\boldsymbol{N(0, 1)}}$	Case1	Bias	-0.0041	0.0057	-0.0042	0.0032	-0.0078	-0.0089	-0.0036	0.0002
		SD	0.0363	0.0485	0.0742	0.0808	0.0624	0.0710	0.0082	0.0072
	Case2	Bias	-0.0017	-0.0006	0.0999	0.0114	-0.0038	-0.0015	0.0006	0.0002
		SD	0.0316	0.0305	0.0935	0.0861	0.0704	0.0196	0.0103	0.0091
	Case3	Bias	-0.0007	-0.0008	-0.0190	0.0223	-0.0019	0.0112	-0.0003	0.0001
		SD	0.0309	0.0299	0.1962	0.1218	0.0959	0.0933	0.0088	0.0075
	Case4	Bias	-0.0004	0.0001	0.0275	0.0962	0.0015	-0.0072	-0.0002	0.0001
		SD	0.0286	0.0273	0.1339	0.1074	0.0598	0.0752	0.0095	0.0080
$\mathit{\boldsymbol{t(3)}}$	Case1	Bias	-0.0019	0.0018	-0.0024	0.0015	-0.0060	0.0038	-0.0003	0.0001
		SD	0.0250	0.0245	0.0407	0.0358	0.0383	0.0347	0.0056	0.0057
	Case2	Bias	0.0003	-0.0007	0.0176	0.0011	-0.0082	-0.0018	-0.0002	-0.0001
		SD	0.0148	0.0129	0.0897	0.0483	0.0406	0.0385	0.0075	0.0067
	Case3	Bias	0.0007	-0.0009	0.0149	0.0131	0.0013	-0.0032	0.0003	-0.0003
		SD	0.0152	0.0139	0.2315	0.0889	0.0479	0.0662	0.0059	0.0058
	Case4	Bias	-0.0004	-0.0006	-0.0103	0.0141	0.0048	-0.0031	0.0001	0.0001
		SD	0.0122	0.0109	0.2143	0.0966	0.0572	0.0444	0.0063	0.0051
$\mathit{\boldsymbol{MN}}$	Case1	Bias	-0.0004	0.0003	-0.0027	-0.0001	-0.0025	-0.0053	-0.0001	-0.0001
		SD	0.0263	0.0247	0.0498	0.0489	0.0423	0.0482	0.0052	0.0044
	Case2	Bias	-0.0016	0.0008	-0.0045	0.0172	-0.0027	-0.0013	-0.0007	0.0004
		SD	0.0171	0.0141	0.1186	0.0949	0.0510	0.0479	0.0063	0.0049
	Case3	Bias	-0.0005	-0.0003	-0.0113	0.0140	-0.0027	-0.0021	-0.0001	-0.0001
		SD	0.0160	0.0138	0.0973	0.0682	0.0505	0.0483	0.0065	0.0057
	Case4	Bias	-0.0001	-0.0005	0.0232	0.0119	-0.0012	-0.0028	-0.0002	-0.0001
		SD	0.0170	0.0209	0.1681	0.1512	0.0336	0.0535	0.0055	0.0046

| Show Table

DownLoad: CSV

Table 2. Bias and SD of four estimators under proposed MR procedure with

$n = 400$ .

$\varepsilon$	$\pi$		MNAR-MR		MAR-MR		CC-MR		FULL-MR
			$\beta_{1}$	$\beta_{2}$	$\beta_{1}$	$\beta_{2}$	$\beta_{1}$	$\beta_{2}$	$\beta_{1}$	$\beta_{2}$
${\mathit{\boldsymbol{N}}{\bf{(0, 1)}}}$	Case1	Bias	0.0001	-0.0005	-0.0005	-0.0007	0.0052	-0.0012	-0.0002	0.0001
		SD	0.0597	0.0675	0.0544	0.0932	0.0937	0.0932	0.0087	0.0093
	Case2	Bias	-0.0007	0.0005	0.0418	0.0275	-0.0016	-0.0029	0.0001	0.0003
		SD	0.0064	0.0055	0.0875	0.0813	0.1004	0.1084	0.0042	0.0036
	Case3	Bias	0.0008	0.0005	0.0660	0.0870	-0.0018	-0.0331	-0.0004	0.0005
		SD	0.0094	0.0051	0.0989	0.1022	0.0625	0.0833	0.0039	0.0034
	Case4	Bias	-0.0013	-0.0064	0.0705	0.0924	0.0108	-0.0093	-0.0002	0.0002
		SD	0.0061	0.0085	0.1084	-0.0989	0.0873	0.0607	0.0034	0.0030
${\mathit{\boldsymbol{t}}{\bf{(3)}}}$	Case1	Bias	0.0004	-0.0003	-0.0013	0.0003	0.0082	-0.0018	0.0001	-0.0001
		SD	0.0197	0.0176	0.0173	0.0165	0.0342	0.0331	0.0037	0.0032
	Case2	Bias	-0.0006	0.0003	0.0106	0.0070	-0.0013	-0.0019	0.0001	-0.0001
		SD	0.0073	0.0062	0.0523	0.0759	0.0397	0.0504	0.0035	0.0034
	Case3	Bias	-0.0001	-0.0002	0.0241	0.0187	-0.0054	-0.0030	-0.0001	-0.0001
		SD	0.0079	0.0066	0.0852	0.0708	0.0482	0.0583	0.0034	0.0031
	Case4	Bias	-0.0006	0.0002	0.0559	0.0106	-0.0056	-0.0048	-0.0001	0.0001
		SD	0.0058	0.0052	0.0982	0.0816	0.0611	0.0542	0.0034	0.0031
$\mathit{\boldsymbol{MN}}$	Case1	Bias	0.0001	0.0002	0.0021	-0.0001	-0.0032	-0.0027	0.0003	-0.0003
		SD	0.0109	0.0148	0.0207	0.0165	0.0382	0.0434	0.0029	0.0027
	Case2	Bias	0.0005	-0.0007	0.0573	-0.0044	-0.0013	-0.0010	0.0001	-0.0002
		SD	0.0072	0.0062	0.0896	0.0876	0.0497	0.0480	0.0026	0.0024
	Case3	Bias	-0.0003	0.0001	0.0146	0.0037	-0.0033	-0.0017	-0.0002	0.0001
		SD	0.0043	0.0042	0.0187	0.0230	0.0104	0.0100	0.0027	0.0024
	Case4	Bias	-0.0001	0.0003	-0.0115	0.0106	0.0081	-0.0019	-0.0001	0.0001
		SD	0.0067	0.0062	0.1307	0.1195	0.0599	0.0477	0.0028	0.0026

| Show Table

DownLoad: CSV

Table 3. Bias and SD of four estimators under PLS method with

$n = 200$ .

$\varepsilon$	$\pi$		MNAR-PLS		MAR-PLS		CC-PLS		FULL-PLS
			$\beta_{1}$	$\beta_{2}$	$\beta_{1}$	$\beta_{2}$	$\beta_{1}$	$\beta_{2}$	$\beta_{1}$	$\beta_{2}$
$\mathit{\boldsymbol{N}}{\bf{(0, 1)}}$	Case1	Bias	-0.0040	0.0005	-0.0040	-0.0018	-0.006324	0.0026	-0.0003	-0.0001
		SD	0.0297	0.0275	0.0404	0.0382	0.0394	0.0317	0.0041	0.0040
	Case2	Bias	-0.0004	0.0004	-0.0994	0.0112	-0.0004	-0.0015	0.0002	0.0002
		SD	0.0096	0.0086	0.0764	0.0846	0.0208	0.0183	0.0039	0.0033
	Case3	Bias	0.0006	-0.0006	-0.0134	0.0150	-0.0019	-0.0011	-0.0003	-0.0002
		SD	0.0121	0.0111	0.1942	0.1140	0.0180	0.0149	0.0048	0.0042
	Case4	Bias	0.0003	-0.0002	0.0116	0.0609	0.0002	-0.0029	0.0002	-0.0002
		SD	0.0125	0.0106	0.1236	0.0877	0.0184	0.0153	0.0058	0.0046
${\mathit{\boldsymbol{t}}{\bf{(3)}}}$	Case1	Bias	-0.0021	0.0021	0.0026	-0.0021	-0.0196	-0.0096	0.0008	-0.0004
		SD	0.0305	0.0286	0.0899	0.1050	0.1255	0.0581	0.0195	0.0220
	Case2	Bias	-0.0011	-0.0017	0.0684	0.1159	0.0090	-0.0104	-0.0007	0.0003
		SD	0.0444	0.0613	0.1062	0.1131	0.0921	0.0884	0.0158	0.0131
	Case3	Bias	-0.0085	-0.0017	-0.0207	0.0123	-0.0032	-0.0081	0.0014	-0.0011
		SD	0.0408	0.0324	0.2315	0.1011	0.0980	0.0762	0.0111	0.0104
	Case4	Bias	-0.0086	0.0037	-0.0101	0.0103	-0.0094	-0.0094	0.0004	-0.0005
		SD	0.0542	0.0447	0.2347	0.1103	0.0915	0.0967	0.0106	0.0093
$\mathit{\boldsymbol{MN}}$	Case1	Bias	0.0013	-0.0018	-0.0026	0.0005	-0.0092	-0.0118	-0.0006	0.0004
		SD	0.0495	0.0397	0.1036	0.0989	0.1090	0.0961	0.0207	0.0181
	Case2	Bias	0.0019	-0.0065	-0.0156	0.0174	-0.0039	-0.0070	-0.0014	0.0006
		SD	0.0425	0.0381	0.2588	0.1277	0.1122	0.1108	0.0095	0.0077
	Case3	Bias	0.0020	-0.0043	0.0595	0.0194	-0.0996	0.0068	-0.0003	-0.0005
		SD	0.0591	0.0385	0.1131	0.0959	0.0981	0.1118	0.0114	0.0094
	Case4	Bias	-0.0010	-0.0055	0.0865	0.0246	-0.0031	0.0108	-0.0008	0.0007
		SD	0.0466	0.0547	0.1801	0.1546	0.1091	0.1001	0.0086	0.0075

| Show Table

DownLoad: CSV

Table 4. Bias and SD of four estimators under proposed PLS method with

$n = 400$ .

$\varepsilon$	$\pi$		MNAR-PLS		MAR-PLS		CC-PLS		FULL-PLS
			$\beta_{1}$	$\beta_{2}$	$\beta_{1}$	$\beta_{2}$	$\beta_{1}$	$\beta_{2}$	$\beta_{1}$	$\beta_{2}$
$\mathit{\boldsymbol{N}}{\bf{(0, 1)}}$	Case1	Bias	-0.0005	-0.0001	0.0004	-0.0004	-0.0043	0.0017	-0.0001	-0.0001
		SD	0.0151	0.0121	0.0183	0.0162	0.0338	0.0339	0.0024	0.0020
	Case2	Bias	-0.0001	0.0008	0.0047	0.0024	-0.0017	-0.0006	0.0002	0.0001
		SD	0.0063	0.0054	0.0240	0.0220	0.0101	0.0849	0.0023	0.0019
	Case3	Bias	-0.0003	0.0005	0.0232	0.0054	-0.0014	-0.0086	-0.0001	0.0002
		SD	0.0074	0.0059	0.0622	0.0205	0.0127	0.0107	0.0028	0.0023
	Case4	Bias	-0.0002	0.0001	0.0741	0.0566	0.0003	-0.0016	0.0001	0.0001
		SD	0.0055	0.0049	0.0919	0.0821	0.0175	0.0167	0.0023	0.0020
${\mathit{\boldsymbol{t}}{\bf{(3)}}}$	Case1	Bias	0.0014	0.0043	-0.0030	0.0042	0.0062	-0.0061	-0.0002	-0.0004
		SD	0.0239	0.0223	0.0287	0.0287	0.0945	0.0958	0.0116	0.0110
	Case2	Bias	-0.0016	0.0053	0.0286	0.0302	-0.0061	-0.0287	-0.0001	-0.0002
		SD	0.0281	0.0445	0.0944	0.0980	0.0261	0.0850	0.0085	0.0075
	Case3	Bias	0.0012	-0.0050	0.0412	0.0310	-0.0071	-0.0041	0.0004	-0.0005
		SD	0.0152	0.0140	0.1119	0.0951	0.0944	0.0834	0.0058	0.0052
	Case4	Bias	-0.0046	-0.0033	0.0780	0.0934	-0.0085	-0.0075	-0.0008	0.0005
		SD	0.0188	0.0129	0.1065	0.1088	0.0954	0.0734	0.0055	0.0063
$\mathit{\boldsymbol{MN}}$	Case1	Bias	0.0043	-0.0011	-0.0029	-0.0039	-0.0053	-0.0431	0.0005	-0.0007
		SD	0.0399	0.0191	0.0942	0.0890	0.0405	0.0508	0.0099	0.0082
	Case2	Bias	-0.0024	0.0040	0.0682	0.0098	-0.0042	-0.0041	0.0007	-0.0009
		SD	0.0123	-0.0100	0.1341	0.0710	0.0954	0.0995	0.0045	0.0038
	Case3	Bias	0.0076	0.0082	-0.0212	0.0437	-0.0258	-0.0097	-0.0006	0.0004
		SD	0.0225	0.0216	0.1043	0.0990	0.0949	0.0626	0.0073	0.0074
	Case4	Bias	-0.0008	-0.0014	-0.0520	0.0494	-0.0075	-0.0068	-0.0002	0.0002
		SD	0.0107	0.0543	0.1525	0.1453	0.0955	0.0968	0.0040	0.0036

| Show Table

DownLoad: CSV

Simulation results of Tables 1– observe the following conclusions: (1) According to and , the proposed MNAR-MR estimation procedure behaves better than the CC-MR and MAR-MR in terms of giving smaller Bias and SD in most of the considered scenarios, even under ignorable missing Case 1. The CC-MR method only focuses on the fully observed samples and ignores the information of the covariates corresponding to missing response, which causes the biased estimators. As expected, the estimators of MAR-MR perform well under Case 1 when the missing mechanism is ignorable, while it has large bias under Cases 2–4 with nonignorable missing mechanism. (2) Comparing to the proposed MR procedure and PLS method, it is clear to see that Bias and SD of the estimators obtained by the PLS method are smaller than the proposed MR procedure when the error distribution is normal. However, when the model error yields to $t$ distribution or mixed normal distribution, the estimators obtained by the proposed MR procedure are more accurate than those obtained by the PLS method. This indicates that the proposed MR estimation procedure is less sensitive to the heavy-tailed distribution or distribution of containing outliers. (3) As the sample size increases, there is an evident tendency that the performances of all estimation procedures are getting better and better in most of the considered scenarios.

Furthermore, to evaluate the performance of the proposed estimation procedure for nonparametric functions, we present the estimated curves of ${\hat \alpha _1}(\cdot)$ and ${\hat \alpha _2}(\cdot)$ with different methods, respectively. Specifically, shows the estimated curves of ${\hat \alpha _1}(\cdot)$ with the proposed MR procedure and the PLS method when the sample size $n$ is 200 and the missing mechanism is Case 2. Similar to , displays the estimated curves of ${\hat \alpha _2}(\cdot)$ . Meanwhile, we consider the root of average square error (RASE) value of the estimation results for the nonparametric functions, where $RASE = \sqrt{\frac{1}{M}\sum_{j = 1}^{2}\sum_{i = 1}^{M}[\hat{\alpha}_{j}(u_{i})-\alpha_{j}(u_{i})]^{2}}$ and $M = 200$ and $u_{i}$ takes the equality points on the interval $(0, 1)$ . shows the boxplots of the RASE for four estimators based on the proposed MR procedure when the sample size $n$ is 200 and 400 with the model error $\varepsilon\sim t(3)$ , respectively.

Figure 1. The estimated curves of the nonparametric function

$\alpha_{1}(\cdot)$ .

DownLoad: Full-Size Img PowerPoint

Figure 2. The estimated curves of the nonparametric function

$\alpha_{2}(\cdot)$ .

DownLoad: Full-Size Img PowerPoint

Figure 3. The boxplots of the RASE for the nonparametric functions.

DownLoad: Full-Size Img PowerPoint

According to Figures 1–3, a few conclusions can be drawn as follows: (1) It is noted that the curve-fitting results of MNAR-MR are more effective than MNAR-PLS in that it is much closer to the real curve, regardless of whether the model error yields to normal distribution or mixed normal distribution. This indicates that the advantages of modal regression was apparent when data contains outliers. (2) It can be seen from Figure 3 that the RASE value of MAR-MR is closest to the RASE value of FULL-MR under Case 1, while it has large bias under the nonignorable missing mechanism. However, the RASE value of MNAR-MR is closest to the RASE value of FULL-MR with nonignorable missing mechanism with evidence that the proposed procedure has better performance on the nonparametric part. (3) The RASE value of four estimators based on the proposed MR procedure decreases with the increasing of sample size.

6. A real data analysis

In this section, we illustrate the proposed method using HIV-CD4 data, which is collected on 2139 HIV positive patients enrolled in AIDS Clinical Trials Group Protocol 175 (ACTG 175) (Hammer et al. ^[24]). In this data, the patients were randomized into four groups receiving their respective antiretroviral treatment regimen: (1) Zidovudine or ZDV with 532 subjects, (2) didanosine or ddi with 522 subjects, (3) ZDV + ddi with 524 subjects and (4) ZDV + zalcitabine with 561 subjects. Let response $Y$ be the CD4 count at 96 $\pm$ 5 weeks under each antiretroviral treatment regimen. There are six continuous baseline covariates: Age ( $U$ ), weight ( $Z_{1}$ ), CD4 cell counts at baseline and 20 $\pm$ 5 weeks ( $Z_{2}$ and $Z_{3}$ ) and CD8 cell counts at baseline and 20 $\pm$ 5 weeks ( $Z_{4}$ and $Z_{5}$ ). Due to adverse events, death, dropout and some other reasons, $Y$ has missing values, but the values of covariates are fully observed. Specifically, the proportions of missing data under $Y$ in four regimens are about 39.66 $%$ , 36.21 $%$ , 35.69 $%$ and 37.43 $%$ , respectively.

Typically, a sharp decline in CD4 cell count is indicative of disease progression, and patients with low CD4 cell count are more likely to drop out from the scheduled study visits compared to patients with normal CD4 (Yuan and Yin ^[25]). Therefore, the missing values of the CD4 count $Y$ at 96 $\pm$ 5 weeks is likely related to the CD4 value count itself, and $Y$ has nonignorable missing values. Following Zhang and Wang ^[26], given the six baseline covariates and CD4 count at 96 $\pm$ 5 weeks, the missing data propensity does not depend on age and weight. That is, to apply the proposed method, we use the age and weight as the instrument variable $\mathit{\boldsymbol{S}}$ .

The scatterplot in indicates that there is no obvious linear relationship between $Y$ and the covariates. According to ^[24], it is generally believed that the influence of CD4 cell counts and CD8 cell counts on $Y$ may not be immediate. There is a lagging effect, which is similar to the impact of acceleration on displacement. Based on the above discussions, we establish the following varying-coefficient partially nonlinear model:

$\begin{eqnarray} {Y} = \alpha ( U ) + \mathrm{exp}( {{{Z}_1}{\beta_{1}} +Z_{2}\beta_{2} +Z_{3}\beta_{3}+Z_{4}\beta_{4}+Z_{5}\beta_{5}} ) + {\varepsilon}. \end{eqnarray}$

(6.1)

Figure 4. The scatterplot of CD4 count at 96

$\pm$ 5 weeks with baseline covariates.

DownLoad: Full-Size Img PowerPoint

The estimators of parameter based on the MNAR-MR procedure and fitting curves for the nonparametric function are given in Table 5 and Figure 5 (From left to right, top to bottom, they are regimen (1) to regimen (4)).

Table 5. The MNAR-MR estimators for HIV-CD4 data.

	$\beta_{1}$	$\beta_{2}$	$\beta_{3}$	$\beta_{4}$	$\beta_{5}$
Regimen(1)
Estimator	0.025479	0.122334	0.324853	-0.013147	-0.013933
Regimen(2)
Estimator	0.032869	0.160647	0.255346	0.004643	-0.065893
Regimen(3)
Estimator	0.077097	0.284075	0.465495	0.032198	-0.073453
Regimen(4)
Estimator	0.032597	0.199342	0.417049	-0.011851	-0.037176

| Show Table

DownLoad: CSV

Figure 5. The fitting curves of coefficient function for HIV-CD4 data.

DownLoad: Full-Size Img PowerPoint

It can be seen that: (1) For all the regimens, the estimated coefficients of $Z_{4}$ and $Z_{5}$ are close to zero, while the estimated coefficients of $Z_{2}$ and $Z_{3}$ are relatively large. This may indicate that CD8 cell counts at baseline and 20 $\pm$ 5 weeks have no significant impact on the CD4 count at 96 $\pm$ 5 weeks, but CD4 cell counts at baseline and 20 $\pm$ 5 weeks appear to affect the CD4 count at 96 $\pm$ 5 weeks. (2) Under four regimens, the coefficients of $Z_{2}$ and $Z_{3}$ under regimen (3) are greater than the other regimens, which may indicate that regimen (3) gives smaller relative hazard ratios than other regimens. (3) The estimation curve for the nonparametric function is different under the assumptions of MNAR, MAR and CC, which suggests that the assumption of a nonignorable missing propensity is reasonable, and also that the effect of age on the response differs under four scenarios. In the future study, the method of distributed fault estimation based on the fault estimation observer for multi-agent systems ^[27,28,29] may be applied to improve the estimation performance.

7. Conclusions

This article studied the statistical inference for the varying-coefficient partially nonlinear model with a nonignorable missing response. A robust estimation procedure was proposed to estimate the parametric and nonparametric parts separately based on modal regression. Mainly due to the identifiability of the nonresponse propensity, we considered a semiparametric propensity model and applied the GMM approach, making use of an instrument variable to identify unknown parameters in the propensity. Further, the coefficient function was approximated by the B-spline function to simplify the model structure, and the inverse probability weighted technique and modal regression were applied to construct the efficient MNAR-MR estimators. Compared to mean regression, modal regression is robust against outliers or heavy-tail error distribution, and it performs no worse than the least-square-based method for normal error distribution. Inverse probability weighted technique can increase the estimation efficiency by the complete-case-data procedure in which the missing subject is excluded. Under some mild conditions, the asymptotic properties of the proposed procedure were established. Some simulation studies and a real example were carried out to demonstrate that the proposed procedure has good performance in the finite samples. In our assumptions, nonlinear function $g(.)$ is pre-specified. A more valuable model is that of relaxing $g(.)$ , which is unknown. It would be an interesting topic in the future research.

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

This work is supported by the Natural Science Basic Research Plan in Shaanxi Province of China(2022JM-027).

Conflict of interest

The authors declare no conflict of interest.

Appendix: Proofs of theorems

To facilitate the presentation, we denote by

$\begin{eqnarray} \eta_i(\mathit{\boldsymbol{\beta}}) = {{\rm{ \mathsf{ δ} }}_i}/\hat{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, \hat {\mathit{\boldsymbol{\zeta}}}){{\phi}'_h[ {{\mathit{\boldsymbol{P}}_i}( {\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}} })})}]}{\mathit{\boldsymbol{P}}_i}g'(\mathit{\boldsymbol{\beta}}), \end{eqnarray}$

(7.1)

then the modal estimator $\mathit{\boldsymbol{\hat \beta}}$ satisfies $\sum_{i = 1}^n \eta_i(\hat{\mathit{\boldsymbol{\beta}}}) = 0$ , which is obtained by maximizing (2.13).

Lemma 1. Suppose Assumptions (A1)–(A10) hold. As $n\rightarrow \infty$ , we have

$\frac{1}{\sqrt{n}}\sum\limits_{i = 1}^n \eta_i(\mathit{\boldsymbol{\beta_0}})\stackrel{d}\longrightarrow N(0, A).$

Proof. Let ${R_j}(U) = {\alpha _j}(U) - \mathit{\boldsymbol{B}}^T{(U)}{\mathit{\boldsymbol{\gamma}}_{0j}}$ , $\mathit{\boldsymbol{R}}(U) = {({R_1}(U), {R_2}(U), \cdots, {R_p}(U))^T}$ . Model (2.4) can be reexpressed as

$\begin{eqnarray} \mathit{\boldsymbol{Y}} - g(\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}} ) = \mathit{\boldsymbol{W}}\mathit{\boldsymbol{\gamma}} + \mathit{\boldsymbol{X}}\mathit{\boldsymbol{R}}(U) + \mathit{\boldsymbol{\varepsilon}}. \end{eqnarray}$

(7.2)

For fixed $\mathit{\boldsymbol{\beta}}$ , model (7.2) becomes

$\begin{eqnarray} \mathit{\boldsymbol{PY}} = \mathit{\boldsymbol{X}}\mathit{\boldsymbol{R}}(U) + \mathit{\boldsymbol{P}}g(\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}} ) + \mathit{\boldsymbol{\varepsilon}}. \end{eqnarray}$

(7.3)

Notice that

$\begin{eqnarray} {{\phi}'_h[ {{\mathit{\boldsymbol{P}}_i}( {\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}} } )} )} ]} = {\phi }'_h[\mathit{\boldsymbol{X}}_i^T \mathit{\boldsymbol{R}}(U_i) + {\varepsilon}_i ] = {\phi }'_h({\varepsilon}_i)+{\phi }''_h({\varepsilon}_i)\mathit{\boldsymbol{X}}_i^T \mathit{\boldsymbol{R}}(U_i)+o_p(1). \end{eqnarray}$

(7.4)

By simple calculation,

$\begin{eqnarray} \frac{1}{\sqrt{n}}\sum\limits_{i = 1}^n \eta_i(\mathit{\boldsymbol{\beta}}_0)& = &\frac{1}{\sqrt{n}}\sum\limits_{i = 1}^n \frac{{\delta}_i} {\hat{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, \hat {\mathit{\boldsymbol{\zeta}}})}{{\phi }'_h[ {{\mathit{\boldsymbol{P}}_i}( {\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}}_0 })})}]}{\mathit{\boldsymbol{P}}_i}g'(\mathit{\boldsymbol{\beta}}_0)\\ & = &\frac{1}{\sqrt{n}}\sum\limits_{i = 1}^n \frac{{\delta}_i}{{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})}{{\phi}'_h[ {{\mathit{\boldsymbol{P}}_i}( {\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}}_0 })})}]}{\mathit{\boldsymbol{P}}_i}g'(\mathit{\boldsymbol{\beta}}_0)\\ &\; &+\frac{1}{\sqrt{n}}\sum\limits_{i = 1}^n \big[\frac{{\delta}_i}{\hat{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, \hat {\mathit{\boldsymbol{\zeta}}})}-\frac{{\delta}_i}{{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})}\big]{{\phi }'_h[ {{\mathit{\boldsymbol{P}}_i}( {\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}}_0 })})}]}{\mathit{\boldsymbol{P}}_i}g'(\mathit{\boldsymbol{\beta}}_0)\\ & = &\frac{1}{\sqrt{n}}\sum\limits_{i = 1}^n I_{1i}+\frac{1}{\sqrt{n}}\sum\limits_{i = 1}^n I_{2i}. \end{eqnarray}$

(7.5)

For $I_{1i}$ , by (7.4) and the fact $||R(U) = O({K^{-r}})||$ , it can be shown that $E(I_{1i}) = 0$ and $\mathrm{Var}(I_{1i}) = E(I_{1i}^2) = E [{{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})^{-1}} {\phi }'_h(\varepsilon_i)^2 (P_ig'(\mathit{\boldsymbol{\beta}}_0))^T P_ig'(\mathit{\boldsymbol{\beta}}_0)]$ .

Therefore,

$\begin{eqnarray} \frac{1}{\sqrt{n}}\sum\limits_{i = 1}^n I_{i1}\stackrel{d}\longrightarrow N(0, B) \end{eqnarray}$

(7.6)

is obtained, where $B = E[{{\pi ({Y}, \mathit{\boldsymbol{T}}, {\mathit{\boldsymbol{\zeta}}_0})}}^{-1} G(x, z, u, h)g'(\mathit{\boldsymbol{z}}, \mathit{\boldsymbol{\beta}}){{ g'(\mathit{\boldsymbol{z}}, \mathit{\boldsymbol{\beta}})^T}}].$

In terms of $I_{2i}$ , noticing that

$\begin{eqnarray} \frac{1}{\sqrt{n}}\sum\limits_{i = 1}^n I_{i2}& = &\frac{1}{\sqrt{n}}\sum\limits_{i = 1}^n\big[\frac{{\delta}_i}{\hat{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, \hat {\mathit{\boldsymbol{\zeta}}})}-\frac{{\delta}_i}{{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})}\big]{{\phi}'_h[ {{\mathit{\boldsymbol{P}}_i}( {\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}}_0 })})}]}{\mathit{\boldsymbol{P}}_i}g'(\mathit{\boldsymbol{\beta}}_0)\\ & = &\frac{1}{{n}}\sum\limits_{i = 1}^n\big[ \frac{\partial {\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})/\partial \mathit{\boldsymbol{\zeta}}}{{{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})^2}}\big] \delta_i {{\phi}'_h[ {{\mathit{\boldsymbol{P}}_i}( {\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}}_0 })})}]}{\mathit{\boldsymbol{P}}_i}g'(\mathit{\boldsymbol{\beta}}_0) \sqrt{n} (\hat{\mathit{\boldsymbol{\zeta}}}-\mathit{\boldsymbol{\zeta}}_0)+o_p(n^{-1/2}) \end{eqnarray}$

(7.7)

it is easy to show that

$\begin{eqnarray*} \frac{1}{{n}}\sum\limits_{i = 1}^n\big[ \frac{\partial {\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})/\partial \mathit{\boldsymbol{\zeta}}}{{{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})^2}}\big] \delta_i {{\phi }'_h[ {{\mathit{\boldsymbol{P}}_i}( {\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}}_0 })})}]}{\mathit{\boldsymbol{P}}_i}g'(\mathit{\boldsymbol{\beta}}_0)\stackrel{p}\longrightarrow H, \end{eqnarray*}$

where $H$ is defined in Theorem 1.

According to Wang et al. ^[22], we have $\hat{\mathit{\boldsymbol{\zeta}}}-\mathit{\boldsymbol{\zeta}}_0 = O_p(n^{-1/2})$ and $n^{-1/2}(\hat{\mathit{\boldsymbol{\zeta}}}-\mathit{\boldsymbol{\zeta}}_0)\stackrel{d}\longrightarrow N(0, \Pi)$ . Then, we have

$\begin{eqnarray} \frac{1}{\sqrt{n}}\sum\limits_{i = 1}^n I_{i2}\stackrel{d}\longrightarrow N(0, H\Pi H^T). \end{eqnarray}$

(7.8)

By (7.5), (7.6) and (7.8), the result is proved. □

Lemma 2. Suppose Assumptions (A1)–(A10) hold. As $n\rightarrow \infty$ , we have

$\frac{1}{{n}}\sum\limits_{i = 1}^n \frac{\partial \eta_i(\mathit{\boldsymbol{\beta_0}})}{\partial \mathit{\boldsymbol{\beta}}}\stackrel{p}\longrightarrow \Sigma.$

Proof.

$\begin{eqnarray} \frac{1}{{n}}\sum\limits_{i = 1}^n \frac{\partial \eta_i(\mathit{\boldsymbol{\beta_0}})}{\partial \mathit{\boldsymbol{\beta}}}& = & \frac{1}{{n}}\sum\limits_{i = 1}^n \frac{\partial}{\partial \mathit{\boldsymbol{\beta}}}\Big\{\frac{{\delta}_i}{{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})}{{\phi }'_h[ {{\mathit{\boldsymbol{P}}_i}( {\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}}_0 })})}]}{\mathit{\boldsymbol{P}}_i}g'(\mathit{\boldsymbol{\beta}}_0)\Big\}\\ &+&\frac{1}{{n}}\sum\limits_{i = 1}^n \frac{\partial}{\partial \mathit{\boldsymbol{\beta}}} \Big\{\big[\frac{{\delta}_i}{\hat{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, \hat {\mathit{\boldsymbol{\zeta}}})}-\frac{{\delta}_i}{{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})}\big]{{\phi }'_h[ {{\mathit{\boldsymbol{P}}_i}( {\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}}_0 })})}]}{\mathit{\boldsymbol{P}}_i}g'(\mathit{\boldsymbol{\beta}}_0)\Big\}\\ & = &T_1+T_2 \end{eqnarray}$

(7.9)

it is easy to obtain $E(T_1) = E\Big({\phi }''_h(\varepsilon_i)(\mathit{\boldsymbol{P}}_ig'(\mathit{\boldsymbol{\beta}}_0))^T \mathit{\boldsymbol{P}}_ig'(\mathit{\boldsymbol{\beta}}_0)\Big).$ By the law of large numbers, we have $T_1\stackrel{p}\longrightarrow \Sigma$ , where $\Sigma = E\left({F(x, z, u, h)g'(\mathit{\boldsymbol{z}}, \mathit{\boldsymbol{\beta}}){{g'(\mathit{\boldsymbol{z}}, \mathit{\boldsymbol{\beta}})^T}}} \right)$ .

For $T_2$ , applying the Taylor expansion, we get

$\begin{eqnarray} T_2 = \frac{1}{{n}}\sum\limits_{i = 1}^n\big[ \frac{\partial {\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})/\partial \mathit{\boldsymbol{\zeta}}}{{{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})^2}}\big] \frac{\partial}{\partial \mathit{\boldsymbol{\beta}}}\{\delta_i {{\phi}'_h[ {{\mathit{\boldsymbol{P}}_i}( {\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}}_0 })})}]}{\mathit{\boldsymbol{P}}_i}g'(\mathit{\boldsymbol{\beta}}_0)\} (\hat{\mathit{\boldsymbol{\zeta}}}-\mathit{\boldsymbol{\zeta}}_0)+o_p(1). \end{eqnarray}$

(7.10)

By the law of large numbers, we have

$\begin{eqnarray} &\; &\lim\limits_{n\to\infty}\frac{1}{{n}}\sum\limits_{i = 1}^n\big[ \frac{\partial {\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})/\partial \mathit{\boldsymbol{\zeta}}}{{{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})^2}}\big] \frac{\partial}{\partial \mathit{\boldsymbol{\beta}}}\{\delta_i {{\phi}'_h[ {{\mathit{\boldsymbol{P}}_i}( {\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}}_0 })})}]}{\mathit{\boldsymbol{P}}_i}g'(\mathit{\boldsymbol{\beta}}_0)\} \\ & = & E\Big\{\frac{\partial {\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})/\partial \mathit{\boldsymbol{\zeta}}}{{{\pi}({Y_i}, {\mathit{\boldsymbol{T}}_i}, {\mathit{\boldsymbol{\zeta}}_0})}} \frac{\partial}{\partial \mathit{\boldsymbol{\beta}}}\big\{ {{\phi}'_h[ {{\mathit{\boldsymbol{P}}_i}( {\mathit{\boldsymbol{Y}} - g( {\mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{\beta}}_0 })})}]}{\mathit{\boldsymbol{P}}_i}g'(\mathit{\boldsymbol{\beta}}_0)\big\} \Big\}. \end{eqnarray}$

(7.11)

As $\hat{\mathit{\boldsymbol{\zeta}}}$ is the consistent estimator of $\mathit{\boldsymbol{\zeta}}_0$ , we conclude that $T_2 = o_p(1)$ . This yields the result that

$\frac{1}{{n}}\sum\limits_{i = 1}^n \frac{\partial \eta_i(\mathit{\boldsymbol{\beta_0}})}{\partial \mathit{\boldsymbol{\beta}}}\stackrel{p}\longrightarrow \Sigma.$

□

Proof of Theorem 1.. Since the modal estimators $\mathit{\boldsymbol{\hat {{\beta}}}}$ satisfies $\sum_{i = 1}^n {{\eta _i}(\hat{\mathit{\boldsymbol{\beta}}})} = 0$ , according to the Taylor expansion, we have

$\begin{eqnarray} \frac{1}{\sqrt{n}}\sum\limits_{i = 1}^n {{\eta _i}(\hat {\mathit{\boldsymbol{\beta}}} )} = \frac{1}{\sqrt{n}}\sum\limits_{i = 1}^n {{\eta _i}({\mathit{\boldsymbol{\beta_0}}} )}+\frac{1}{{n}}\sum\limits_{i = 1}^n\frac{\partial \eta_i(\mathit{\boldsymbol{\beta_0}})}{\partial \mathit{\boldsymbol{\beta}}} \sqrt{n}(\hat{\mathit{\boldsymbol{\beta}}}-\mathit{\boldsymbol{\beta}}_0). \end{eqnarray}$

(7.12)

From Lemmas 1, 2 and (7.12), we have $\sqrt n (\hat {\mathit{\boldsymbol{\beta}}} - \mathit{\boldsymbol{\beta_0}})\stackrel{d}\longrightarrow N(0, {\rm{ }}{\Sigma ^{ - 1}} A {\Sigma ^{ - 1}})$ . □

Proof of Theorem 2.. Let ${\delta _n} = {n^{ - r/(2r + 1)}}$ and $v = {\rm{ }}{(v_1^T, \cdots, {\rm{ }}v_p^T)^T}$ with $PL$ dimension. Define $\mathit{\boldsymbol{\gamma}} = {{\mathit{\boldsymbol{\gamma}}} _0} + {\delta _n}v$ .

We show that, for any given $\varepsilon > 0$ , there exists a large enough constant $C$ such that

$\begin{eqnarray} P\left\{ {\mathop {\inf }\limits_{||v|| = C} \hat{Q}(\mathit{\boldsymbol{\gamma}} ) > \hat{Q}({\mathit{\boldsymbol{\gamma}} _0})} \right\} \ge 1 - \varepsilon , \end{eqnarray}$

(7.13)

where $\hat{Q}(\mathit{\boldsymbol{\gamma}})$ is defined in (2.15). Let $\varphi (\mathit{\boldsymbol{\gamma}}) = \hat Q(\mathit{\boldsymbol{\gamma}}) - \hat Q({\mathit{\boldsymbol{\gamma}} _0})$ . By Taylor expansion we have

$\begin{eqnarray} \varphi (\mathit{\boldsymbol{\gamma}} ) & = & {\delta _n}\sum\limits_{i = 1}^n {{{\phi }_h'}(({\varepsilon _i} + \mathit{\boldsymbol{X}}_i^TR({U_i}) + g'({\mathit{\boldsymbol{Z}}_i}, \mathit{\boldsymbol{\beta}} )(\mathit{\boldsymbol{\beta}} - \hat {\mathit{\boldsymbol{\beta}}} )){\hat{\Delta} _i}} \mathit{\boldsymbol{W}}_i^Tv \\ &+& \delta _n^2\sum\limits_{i = 1}^n {{{\phi }_h''}(({\varepsilon _i} + \mathit{\boldsymbol{X}}_i^TR({U_i}) + g'({\mathit{\boldsymbol{Z}}_i}, \mathit{\boldsymbol{\beta}} )(\mathit{\boldsymbol{\beta}} - \hat {\mathit{\boldsymbol{\beta}}} ))} {({\hat{\Delta} _i}\mathit{\boldsymbol{W}}_i^Tv)^2} \\ & +& \delta _n^3\sum\limits_{i = 1}^n {{{\phi }_h'''}({\xi _i})(} {\hat{\Delta} _i}\mathit{\boldsymbol{W}}_i^Tv{)^3}\\ & \triangleq & {I_1} + {I_2} + {I_3}, \end{eqnarray}$

where ${\xi _i}$ lies in $Y_i^* - \mathit{\boldsymbol{W}}_i^T{\mathit{\boldsymbol{\gamma}} _0}$ and $Y_i^* - \mathit{\boldsymbol{W}}_i^T\mathit{\boldsymbol{\gamma}}$ . Then, for ${I_1}$ , using Taylor expansion, we obtain that

$\begin{eqnarray} {I_1} & = & {\delta _n}\sum\limits_{i = 1}^n {\{ {{\phi }_h'}({\varepsilon _i}) + {{\phi }_h''}({\varepsilon _i})\mathit{\boldsymbol{X}}_i^TR({U_i}) + {{\phi }_h''}({\varepsilon _i})g'({\mathit{\boldsymbol{Z}}_i}, \mathit{\boldsymbol{\beta}} )(\mathit{\boldsymbol{\beta}} - \hat {\mathit{\boldsymbol{\beta}}} )\} {\hat \Delta _i}\mathit{\boldsymbol{W}}_i^Tv} \\ &+& {\delta _n}\sum\limits_{i = 1}^n {\{ {{\phi}_h '''}({\zeta _i})[\mathit{\boldsymbol{X}}_i^TR({U_i})} + g'({\mathit{\boldsymbol{Z}}_i}, \mathit{\boldsymbol{\beta}} )(\mathit{\boldsymbol{\beta}} - \hat {\mathit{\boldsymbol{\beta}}} ){]^2}\} {\hat \Delta _i}\mathit{\boldsymbol{W}}_i^Tv. , \end{eqnarray}$

where ${\zeta _i}$ lies in ${\varepsilon _i}$ and ${\varepsilon _i} + \mathit{\boldsymbol{X}}_i^TR({U_i}) + g'({\mathit{\boldsymbol{Z}}_i}, \mathit{\boldsymbol{\beta}})(\mathit{\boldsymbol{\beta}} - \hat {\mathit{\boldsymbol{\beta}}})$ .

By Assumption (A6), (A8) and some calculation results, we have ${I_1} = {O_p}(n{K^{ - r}}{\delta _n}\left\| v \right\|)$ = ${O_p}(n\delta _n^2\left\| v \right\|)$ .

Similarly, we can prove that ${I_2} = {O_p}(n\delta _n^2{\left\| v \right\|^2})$ and ${I_3} = {O_p}(n\delta _n^3{\left\| v \right\|^3})$ . Hence, by choosing a sufficiently large $C$ , ${I_2}$ dominates ${I_1}$ uniformly $\left\| v \right\| = C$ . Since ${\delta _n} \to 0$ , it follows that ${\delta _n}||v|| \to 0$ with $|v|| = C$ , which leads to ${I_3} = {O_p}({I_2}){\rm{ }}$ . Hence, (11.13) holds. There exists local maximizers $\hat {\mathit{\boldsymbol{\gamma}}}$ , and we have

$\begin{eqnarray} ||\hat {\mathit{\boldsymbol{\gamma}}} - {\mathit{\boldsymbol{\gamma}} _0}|| = {O_p}({\delta _n}){\rm{ }} = {O_p}({n^{ - \frac{r}{{2r + 1}}}}). \end{eqnarray}$

(7.14)

Similar to the proof of Theorem 2.1(b) in ^[30], we can obtain that

$\begin{eqnarray} ||{\hat \alpha _k}(.) - {\alpha _{0k}}(.)|| = {O_p}({n^{ - \frac{r}{{2r + 1}}}}), k = 1, \cdots , p. \end{eqnarray}$

(7.15)

This completes the Theorem 2. □

References

[1]	R. Li, L. Nie, Efficient statistical inference procedures for partially nonlinear models and their applications, Biometrics, 64 (2008), 904–911. https://doi.org/10.1111/J.1541-0420.2007.00937.X doi: 10.1111/J.1541-0420.2007.00937.X
[2]	N. S. Tang, P. Y. Zhao, Empirical likelihood semiparametric nonlinear regression analysis for longitudinal data with responses missing at random, Ann. I. Stat. Math., 65 (2013), 639–665. https://doi.org/10.1007/s10463-012-0387-4 doi: 10.1007/s10463-012-0387-4
[3]	T. Z. Li, C. L. Mei, Estimation and inference for varying coefficient partially nonlinear models, J. Stat. Plan Infer., 143 (2013), 2023–2037. https://doi.org/10.1016/j.jspi.2013.05.011 doi: 10.1016/j.jspi.2013.05.011
[4]	X. S. Zhou, P. X. Zhao, X. L. Wang, Empirical likelihood inferences for varying coefficient partially nonlinear models, J. Appl. Stat., 44 (2017), 474–492. https://doi.org/10.1080/02664763.2016.1177496 doi: 10.1080/02664763.2016.1177496
[5]	J. Yang, H. Yang, Smooth-threshold estimating equations for varying coefficient partially nonlinear models based on orthogonality-projection method, J. Comput. Appl. Math., 302 (2016), 24–37. https://doi.org/10.1016/j.cam.2016.01.038 doi: 10.1016/j.cam.2016.01.038
[6]	Y. T. Xiao, Z. S. Chen, Bias-corrected estimations in varying-coefficient partially nonlinear models with measurement error in the nonparametric part, J. Appl. Stat., 45 (2018), 586–603. https://doi.org/10.1080/02664763.2017.1288201 doi: 10.1080/02664763.2017.1288201
[7]	X. R. Tang, P. X. Zhao, Y. P. Yang, W. M. Yang, Adjusted empirical likelihood inferences for varying coefficient partially non linear models with endogenous covariates, Commum. Stat.-Theor. M., 51 (2022), 953–973. https://doi.org/10.1080/03610926.2020.1747078 doi: 10.1080/03610926.2020.1747078
[8]	Y. L. Jiang, Q. H. Ji, B. J. Xie, Robust estimation for the varying coefficient partially nonlinear models, J. Comput. Appl. Math., 326 (2017), 31–43. https://doi.org/10.1016/j.cam.2017.04.028 doi: 10.1016/j.cam.2017.04.028
[9]	J. Yang, F. Lu, H. Yang, Quantile regression for robust inference on varying coefficient partially nonlinear models, J. Korean. Stat. Soc., 47 (2018), 172–184. https://doi.org/10.1016/j.jkss.2017.12.002 doi: 10.1016/j.jkss.2017.12.002
[10]	Y. T. Xiao, L. L. Liang, Robust estimation and variable selection for varying-coefficient partially nonlinear models based on modal regression, J. Korean. Stat. Soc., 51 (2020), 692–715. https://doi.org/10.1007/s42952-021-00158-w doi: 10.1007/s42952-021-00158-w
[11]	R. J. A. Little, D. B. Rubin, Statistical analysis with missing data, New York: Wiley, 2002.
[12]	X. L. Wang, P. X. Zhao, H. Y. Du, Statistical inferences for varying coefficient partially non linear model with missing covariates, Commun. Stat-Theor. M, 50 (2021), 2599–2618. https://doi.org/10.1080/03610926.2019.1674870 doi: 10.1080/03610926.2019.1674870
[13]	L. Q. Xia, X. L. Wang, P. X. Zhao, Y. Q. Song, Empirical likelihood for varying coefficient partially nonlinear model with missing responses, AIMS Math., 6 (2021), 7125–7152. https://doi.org/10.3934/MATH.2021418 doi: 10.3934/MATH.2021418
[14]	J. K. Kim, C. L. Yu, A semiparametric estimation of mean functionals with nonignorable missing data, J. Am. Stat. Assoc., 106 (2011), 157–165. https://doi.org/10.1198/jasa.2011.tm10104 doi: 10.1198/jasa.2011.tm10104
[15]	N. S. Tang, P. Y. Zhao, H. T. Zhu, Empirical likelihood for estimating equations with nonignorably missing data, Stat. Sin., 24 (2014), 723–747. https://doi.org/10.5705/ss.2012.254 doi: 10.5705/ss.2012.254
[16]	S. Wang, J. Shao, J. K. Kim, An instrumental variable approach for identification and estimation with nonignorable nonresponse, Stat. Sin., 24 (2014), 1097–1116. https://dx.doi.org/10.5705/ss.2012.074 doi: 10.5705/ss.2012.074
[17]	J. Shao, L. Wang, Semiparametric inverse propensity weighting for nonignorable missing data, Biometrika, 103 (2016), 175–187. https://doi.org/10.1093/biomet/asv071 doi: 10.1093/biomet/asv071
[18]	W. X. Yao, B. G. Lindsay, R. Z. Li, Local modal regression, J. Nonparametr. Stat., 24 (2012), 647–663. https://doi.org/10.1080/10485252.2012.678848 doi: 10.1080/10485252.2012.678848
[19]	R. Q. Zhang, W. H. Zhao, J. C. Liu, Robust estimation and variable selection for semiparametric partially linear varying coefficient model based on modal regression, J. Nonparametr. Stat., 25 (2013), 523–544. https://doi.org/ 10.1080/10485252.2013.772179 doi: 10.1080/10485252.2013.772179
[20]	W. H. Zhao, R. Q. Zhang, J. C. Liu, Y. Z. Lv, Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression, Ann. Inst. Stat. Math., 66 (2014), 165–191. https://doi.org/10.1007/s10463-013-0410-4 doi: 10.1007/s10463-013-0410-4
[21]	L. L. Schumaker, Spline functions: Basic theory, New York: John Wiley and Son, 1981.
[22]	L. Wang, P. Y. Zhao, J. Shao, Dimension-reduced semiparametric estimation of distribution functions and quantiles with nonignorable nonresponse, Comput. Stat. Data An., 156 (2021), 107142. https://doi.org/10.1016/j.csda.2020.107142 doi: 10.1016/j.csda.2020.107142
[23]	J. Li, S. Ray, B. G. Lindsay, A nonparametric statistical approach to clustering via mode identification, J. Mach. Learn. Res., 8 (2007), 1687–1723.
[24]	S. M. Hammer, D. A. Katzenstein, M. D. Hughes, H. Gundacker, R. T. Schooley, R. H. Haubrich, et al., A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter, N. Engl. J. Med., 335 (1996), 1081–1090. https://doi.org/10.1056/NEJM199610103351501 doi: 10.1056/NEJM199610103351501
[25]	Y. Yuan, G. S. Yin, Bayesian quantile regression for longitudinal studies with nonignorable missing data, Biometrics, 66 (2010), 105–114. https://doi.org/10.1111/j.1541-0420.2009.01269.x doi: 10.1111/j.1541-0420.2009.01269.x
[26]	T. Zhang, L. Wang, Smoothed empirical likelihood inference and variable selection for quantile regression with nonignorable missing response, Comput. Stat. Data An., 144 (2020), 106888. https://doi.org/10.1016/j.scda.2019.106888 doi: 10.1016/j.scda.2019.106888
[27]	J. Han, X. H. Liu, X. J. Wei, H. F. Zhang, Adjustable dimension descriptor observer based fault estimation for switched nonlinear systems with partially unknown nonlinear dynamics, Nonlinear. Anal.-Hybri., 42 (2021), 101083. https://doi.org/10.1016/j.nahs.2021.101083 doi: 10.1016/j.nahs.2021.101083
[28]	J. Han, X. H. Liu, X. J. Wei, S. X. Sun, A dynamic proportional-integral observer-based nonlinear fault-tolerant controller design for nonlinear system with partially unknown dynamic, IEEE T. Syst. Man Cy.-S., 52 (2022), 5092–5104. https://doi.org/10.1109/TSMC.2021.3114326 doi: 10.1109/TSMC.2021.3114326
[29]	J. Han, X. H. Liu, X. J. Xie, X. J. Wei, Dynamic output feedback fault tolerant control for switched fuzzy systems with fast time varying and unbounded faults, IEEE. T. Fuzzy Syst., 31 (2023), 3185–3196. https://doi.org/10.1109/TFUZZ.2023.3246061 doi: 10.1109/TFUZZ.2023.3246061
[30]	J. Lv, H. Yang, C. H. Guo, Variable selection in partially linear additive models for modal regression, Commun. Stat.-Simul. C., 46 (2017), 5646–5665. https://doi.org/10.1080/03610918.2016.1171346 doi: 10.1080/03610918.2016.1171346

This article has been cited by:

Cuiping Wang, Xiaoshuang Zhou, Peixin Zhao, Empirical likelihood based heteroscedasticity diagnostics for varying coefficient partially nonlinear models, 2024, 9, 2473-6988, 34705, 10.3934/math.20241652

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.4

Metrics

Article views(1315) PDF downloads(75) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(5) / Tables(5)

AIMS Mathematics

Robust estimation for varying-coefficient partially nonlinear model with nonignorable missing response

Related Papers:

Abstract

1. Introduction

2. Estimation methodology

2.1. Modal regression procedure with nonignorable missing response

2.2. Estimation for semiparametric nonignorable propensity

3. Theoretical properties

4. Bandwidth selection and estimation algorithm

4.1. Bandwidth selection

4.2. The MEM algorithm of parametric components

4.3. The MEM algorithm of nonparametric components

5. Simulation studies

6. A real data analysis

7. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

Appendix: Proofs of theorems

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Mathematics

Robust estimation for varying-coefficient partially nonlinear model with nonignorable missing response

Related Papers:

Abstract

1. Introduction

2. Estimation methodology

2.1. Modal regression procedure with nonignorable missing response

2.2. Estimation for semiparametric nonignorable propensity

3. Theoretical properties

4. Bandwidth selection and estimation algorithm

4.1. Bandwidth selection

4.2. The MEM algorithm of parametric components

4.3. The MEM algorithm of nonparametric components

5. Simulation studies

6. A real data analysis

7. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

Appendix: Proofs of theorems

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog