1.
Introduction
In recent years, regression analysis is widely used in various fields; for example, logistic regression was used to implement distributed classification of large data sets in Wang, Xu and Wu [1]. Traditional regression analysis is based on the mean, which is easy to calculate and is straightforward to interpret. But mean regression may fail for heavy-tailed error distributions, so a series of new regression methods were proposed. Rank regression and quantile regression are robust estimation methods which are widely used. There are many applications in which several response variables are predicted with a common set of predictors. Zhao, Lian and Ma [2] took the possible correlations among the responses into account, and introduced robust reduced-rank estimator via rank regression. Zhang et al. [3] applied rank regression to the varying-coefficient model and proposed a robust multivariate varying-coefficient model based on rank loss that models the relationships among different responses via reduced-rank regression and penalized variable selection. The above two methods are often used to multivariate regression model. Gong, Xu and Chen [4] proposed a penalized modal regression method for additive models in high dimensional. Quantile regression (QR), as introduced by Koenker and Bassett [5], is also a robust regression and can describe the entire conditional distribution of the response variable given the covariates. Because of these significant advantages, QR has become an effective method for statistical research. It is well known that different quantiles may contain different information of error distributions. Therefore, combining different quantile information could appropriately be a feasible way to improve efficiency. With this idea, Zou and Yuan [6] defined a new loss function which is simply an average of the loss function based on different quantiles, and named the new method as composite quantile regression (CQR). CQR could be considered as a useful extension of the quantile regression. Zhao and Xiao [7] pointed out that simple average (using equal weights) is not an efficient way of using distributional information from different quantile regressions. Koenker [8,9] proposed a more general approach, which assigns different weights to different quantiles. Jiang et al. [10] extended the research on robust and efficient estimation and model selection in high dimensions to nonlinear models. Unfortunately, when the number of quantiles is large, the calculation is very demanding. Therefore, Bloznelis et al. [11] considered a model-averaged quantile estimator with a computationally cheaper alternative and compared its performance to the composite quantile estimator in both low and high dimensional cases.
Classical regression analysis and related theories are based on completely observed data, while missing data are frequently encountered in almost all research areas, such as psychological sciences and medical studies. In cases of missing data, classical statistical methods such as maximum likelihood estimation (MLE) cannot be applied directly to the corresponding statistical analysis. We know that the complete-case (CC) method, which only uses the fully observed data, can lead to seriously biased parameter estimations when the covariate is not missing completely at random. Yates [12] introduced an imputation method which is widely used to handle missing responses. This method aims to find an appropriate value that to be filled in for each missing data. Then the data with the filled in values can be treated as fully observed data that can be analyzed by classical methods. Xia [13] employ the profile nonlinear least squares estimation based on the weighted imputation method to estimate the unknown parameter and nonparametric function and consider empirical likelihood inferences based on the weighted imputation method for the varying coefficient partially nonlinear model with missing responses. The inverse probability weighted (IPW) method is another frequently used method dated back to Horvitz and Thompson [14] that can be applied to the case of missing covariates. In this method, the inverse of the selection probability is chosen to be the weight assigned to the fully observed data. The missing at random (MAR) assumption, in the sense of Rubin [15], is a common assumption for statistical analysis with missing data. Under the MAR assumption, many approaches for mean regression with missing values were developed to obtain efficient estimators, such as the imputation method proposed by Little and Rubin [16], the IPW method introduced by Robins et al. [17], and likelihood-based methods given by Ibrahim et al. [18]. For a comprehensive review, readers are referred to Qin, Shao and Zhang [19]. It is worth mentioning that IPW method is unbiased under MAR assumption.
However, most of the above methods are built on least squares (LS) estimator which is not robust against outliers. Recently, Sherwood, Wang and Zhou [20]considered a linear QR approach based on IPW with a parametric model for the selection probability when covariates are missing at random, and investigated the variable selection problem with the proposed method. Chen, Wan and Zhou [21] proposed three estimation methods for a linear quantile regression when observations are missing at random, one of which is to use nonparametric IPW. The above three references focused on a given individual quantile. Due to the effectiveness and robustness of the CQR method, Yang and Liu [22] investigated the CQR estimation of linear models with missing covariates by using IPW method. It is worth pointing out that, they used equal weights at different quantiles to construct their CQR estimator for a linear model. Recently, Wang, Song and Zhang [23] proposed an optimal weighted quantile average estimation for parameters in additive partially linear models with missing covariates, and their simulation results verified that the proposed method is an efficient and reliable alternative of both the weighted least squares (WLS) method and the weighted CQR (WCQR) method. So in this paper, applying the idea of Jiang et al. [24] and Wang, Song and Zhang [23], we consider two types of WCQRs for nonlinear models with missing covariates and the proposed methods are demonstrated superior via simulation studies and a real data example.
The rest of this paper is organized as follows. The proposed estimation technique and its theoretical properties are presented in Section 2. Numerical simulation studies are conducted in Section 3 in order to examine the performance of the proposed methods and to justify the derived theoretical results in Section 2. A real data analysis is given in Section 4 to illustrate the implementation of the proposed methods. The regularity conditions and the proofs of those theoretical results are given in Appendix.
2.
Methodology
Zhao and Lian [25] studied two weighting schemes to further improve the efficiency of CQR for linear models. And they showed that the two weighting schemes are asymptotically equivalent to each other and always result in more efficient estimators compared with CQR in theory. Now, In order to get a more general approach, we generalize the linear models to the nonlinear models and consider the covariates missing at random. Consider the nonlinear model
where Yi is an observable response, Xi=(UTi,VTi)T∈Rq+s is the vector of covariates, β is the p-dimensional vector of unknown parameters, and εi is the random error independent of X. Let K be the number of quantiles, for the equally spaced quantiles τk=kK+1,k=1,2,…,K. Jiang et al. [24] proposed the weighted composite quantile estimator for β by minimizing
over β and b=(bτ1,bτ2,…,bτk)T, where ρτ(t)=t(τ−I(t<0)), and ωk is the weight which controls the amount of contribution of the τk-th quantile regression satisfying ∑Kk=1ωkg(bτk)>0 with g(⋅) being the density of ε.
Here we assume some covariates are missing. More specifically, we assume Ui's are all observed while some Vi's are missing. Let δi=0 if Vi is missing, and δi=1 if Vi is observed. Throughout this paper, following Wang, Song and Zhang [23], we assume the following missing mechanism
where π(⋅) is called the selection probability function or the propensity score.
When the selection probability function π(⋅) is known, the IPW estimator of β under missing covariates is defined as
where Ln(π(U),β,b)=∑Kk=1ωk∑ni=1δiπ(Ui)ρτk(Yi−f(Xi,β)−bτk). However, in reality the selection probability function π(⋅) is usually unknown and needs to be estimated. Next we follow Wang, Song and Zhang [23] and consider estimating π(Ui) using both parametric and nonparametric models.
2.1. Estimation of propensity scores
To estimate the propensity scores nonparametrically, we apply nonparametric smoothing techniques. Particularly, we use the Nadaraya-Watson estimator of π(Ui) which is defined as
where Kh(⋅)=K(⋅/h)/hq is a q-variate kernel function, h is the bandwidth.
When the dimension of U is high, a fully nonparametric estimation is encountered with the curse of dimensionality. In this case, a parametric approach might be more feasible for the estimation of π(Ui) given in (2.2). A commonly used model for (2.2) is the logistic regression given by
where Γi=(1,UTi)T and γ=(γ0,γT1)T∈Θ is an unknown parameter vector with Θ⊂Rq+1. Here γ can be estimated by maximizing the log-likelihood function
Let ˆγ be the MLE of γ, then the parametric estimator of π(Ui) is denoted by π(Ui,ˆγ). If the specified parametric model (2.5) of the selection probability function π(⋅) is valid, then the IPW method is applicable.
2.2. WCQR estimation of regression parameters
In this subsection, we propose two weighting schemes for the WCQR estimation. The first one is based on weighting the quantile loss and the second one is weighting the quantile regression estimator at different levels with details given below. For convenience, we use ˆπ(Ui) for the estimator of π(Ui) by either the parametric or nonparametric method.
As in Jiang et al. [24], we let τk=kK+1, k=1,2,…,K for some K. By weighting the different loss functions in CQR with the IPW method, our first WCQR estimator is defined as
where Ln(ˆπ(U),β,b)=∑Kk=1ωk∑ni=1δiˆπ(Ui)ρτk(Yi−f(Xi,β)−bτk), the weight ωk's are allowed to be negative and satisfy ∑Kk=1ωkg(bτk)>0, where g(⋅) is the density function of the error term ε.
The following theorem presents the asymptotic distribution of ˆβWCQR1. We first introduce some notations. Let β∗ be the true value of β, b∗τk be the τk-th quantile of ε and b∗=(b∗τ1,b∗τ2,…,b∗τK)T. Denote f∗i=f(Xi,β∗), ∇f∗i=∂f(Xi,β)∂β|β=β∗, Σ1=E[∇f∗1(∇f∗1)T], Σ2=E[∇f∗1(∇f∗1)Tπ(U)], g=(g(b∗τ1), g(b∗τ2),…,g(b∗τK))T, Ω={min(τk,τk′)(1−max(τk,τk′))}1≤k,k′≤K, and H=(min(τk,τk′)(1−max(τk,τk′))g(b∗τk)g(b∗τk′))1≤k,k′≤K.
Theorem 2.1. Suppose that the conditions C1−C6 in Appendix hold and β∗ is the true value. Then we have
Similar to Jiang et al. [24] and Zhao et al. [26], we can derive the optimal weights by minimizing ωTΩωωTggTω in the asymptotic variance given in Theorem 2.1.
Corollary 2.1. The optimal weight vector ω∗=(ω∗1,ω∗2,⋯,ω∗K)T for ˆβWCQR1 is
Note that the optimal weight depends on the density function of ε. Based on estimated residuals ˆεi, the usual nonparametric density estimation methods can provide a consistent estimator ˆg(⋅) of g(⋅). Then the estimated optimal weight vector is ˆω∗=(ˆgTΩ−2ˆg)−1/2Ω−1ˆg. With the optimal weight vector ˆω∗ obtained in hand, the first optimal WCQR estimator of β is defined as
Corollary 2.2. The optimal weighted compositive quantile estimators ˆβOWCQ1 of β has the optimal asymptotic variance 1n(gTΩ−1g)−1Σ−11Σ2Σ−11.
Next, we present the second weighting schemes. Our method is inspired by Wang, Song and Zhang [23]. Let
then the second WCQR estimator is defined as
where ωk's satisfy ∑Kk=1ωk=1. The asymptotic distribution of ˆβWCQR2 is summarized in the following theorem.
Theorem 2.2. Suppose that the conditions C1−C6 in Appendix hold and β∗ be is true parameter value. Then we have
Similarly, we can obtain the optimal weight by minimizing ωTHω in the asymptotic covariance given in Theorem 2.2. As a result, the second optimal WCQR of β can be correspondingly defined as ˆβOWCQ2 with the associated optimal asymptotic variance derived in the following corollary.
Corollary 2.3. The optimal weight vector ω∗=(ω∗1,ω∗2,…,ω∗K)T of WCQR2 is
where 1 is a K×1 vector with all elements 1. With this optimal weight vector, the optimal WCQR estimator ˆβOWCQ2=∑Kk=1ω∗kˆβτk has the optimal asymptotic variance
Remark 1. The optimal weight of OWCQ2 is essentially the same as Zhao and Lian [25], but with different representation. And from the above results for the two weighting methods we observe that if we use the optimal weight vectors, the optimal WCQR estimators achieve the same optimal asymptotic variance 1n(gTΩ−1g)−1Σ−11Σ2Σ−11.
3.
Simulation studies
In this section, we use simulation studies to examine the finite sample performance of our proposed methods and compare it with the inverse probability weighted CQR (IWCQ) method which uses the same weight for different QR models, and the inverse probability WLS estimator. Referring to Zou and Yuan [6], the estimator of the proposed methodology is nearly efficient as the oracle maximum likelihood (OML) estimator for K≥9 in various error distributions. Therefore, we take K=10, τk=k/11, k=1,2,…,10, and consider the exponential regression models
where β1=0.5, β2=1, β3=1 and (X1,X2,X3) follows multivariate normal distribution with covariances always 0.5 and variances always 1. The model error ε and X=(X1,X2,X3)T are independent. Then, using the method described in Section 4 in Wang, Chen and Lin [27], we set the data in X3 to be missing at random while X1,X2,Y are fully observed. And we consider two selection probability functions
Their corresponding average missing rates are 15% and 35% respectively. In our simulation, four different distributions of model error ε are considered:
(Case 1) The standard normal distribution N(0,1).
(Case 2) The centralized t distribution with four degrees of freedom.
(Case 3) The mixture of normal distribution 0.6N(0,1)+0.4N(2,1).
(Case 4) The centralized χ2 distribution with four degrees of freedom.
In the simulation, samples of size n=200 and n=600 are generated independently. Four estimation methods, OWCQ1, OWCQ2, WLS and IWCQ are used to estimate β1, β2 and β3 under the above selection probability functions and error distributions. Then the root of mean squared errors (RMSEs) can be calculated. To evaluate the different estimators, we repeat the process 1000 times and calculate the average RMSEs. The simulation results are reported in Tables 1–4 for cases that the selection probability function π(⋅) is known (denoted as T), estimated nonparametrically (denoted as N) and parametrically (denoted as P). When the selection probability is estimated nonparametrically, we use the Gaussian kernel K(x)=1√2πexp(−x22) to construct the multiplicative kernel L(x1,x2)=K(x1)K(x2), and use the bandwidth proposed by Ruppert, Sheather and Wand [28]. When π(⋅) is estimated by parametric method, we apply model (2.5) to estimate it. Meanwhile, similar to Jiang et al. [24], our proposed estimator involves a weighting scheme and the density of error is known in simulations, so we took the optimal weight ω∗ (see Section 2.2) for all simulations.
From Tables 1–4 we observe that when the model error ε follows the standard normal distribution N(0,1), WLS performs the best among the four estimators considered, while OWCQ1, OWCQ2 and IWCQ behave very similarly. For all other non-normal distributions considered, WLS always performs the worst. The performance of the other three methods are very similar when the model error follows the centralized t distribution with four degrees of freedom. It is further noted that when the missing rate is high or the sample size is large, our proposed methods are superior to IWCQ. Particularly, when the model error follows chi-square distribution with four degrees of freedom, the superiority of both OWCQ1 and OWCQ2 are even more obvious. We also find that for OWCQ1 and IWCQ methods a better result can be obtained by estimating the selection probability function with a nonparametric method. At the same time, IWCQ also performs much better than WLS.
When sample size is large, it can be seen from Tables 3 and 4 that the performance of the four estimators are significantly improved compared with that when the sample case is small.And our proposed estimators have more obvious advantages over WLS and IWCQ. We observe that both OWCQ1 and OWCQ2 always have a high accuracy under any of the four error distributions, and OWCQ2 performs slightly better than OWCQ1 except when the model error ε follows the standard normal distribution. We also find that the RMSEs are not sensitive to missing rate. In addition, the calculation speed of OWCQ1 is faster than OWCQ2 when the optimal weight obtained from the known error distribution is used. For example, when we simulated case1 at n = 200 and π = 0.15, we found that OWCQ1 was about 20% faster than OWCQ2. For other cases, the difference between OWCQ1 and OWCQ2 in computing speed is similar.
4.
A real data example
In this section, we will illustrate our proposed methods using a real data originally presented by Baum [29] to investigate how age, marriage state, number of children and education background affect whether a women works or not. For each women there are five variables:
● Work (y): 1 = Yes, 0 = Not;
● Age (x1): the age of the women;
● Children(x2): the number of the children the women raises;
● Education (x3): the years that the women has passed in school;
● Married (x4): 1 = Yes, 0 = Not.
Note that the response y is the average estimated probability of work. A logistic model with all of covariates given by
is suitable for modeling the relationship between the choice of work and all possible factors. In order to use the data set to illustrate our methods, artificial missing data were created by using the selection probability π(X)=exp(γ0+γ1x1+γ2x2)1+exp(γ0+γ1x1+γ2x2). The missing proportion is about 18.65% with γ0=2,γ1=0.15,γ2=0.25, and, following Li and Ding [30], the quantile vector is taken as τ=(0.2,0.4,0.6,0.8)T with K=4.
From (2.7) and (2.10), we know that the optimal weights depend on g(b∗τ) and b∗τ, both of which are unknown here and need to be estimated. Motivated by Sun and Sun [31] and Zhao and Xiao [7], we propose the following procedure under the case when the selection probability is known.
(1) Use the uniform weight ω=(1/K,1/K,…,1/K)T to obtain the preliminary estimator ˆβ of β as follows:
(2) Let m=∑ni=1δi. Without loss of generality, we assume the first m observations are complete. Then, based on the complete data, the pseudo residuals ˆεi with δi=1 are computed as ˆεi=δiπ(Ui)(Yi−f(Xi,ˆβ)), i=1,2,…,m.
(3) Use the nonparametric kernel density estimator to estimate g(t):
where K(⋅) is a non-negative kernel function and the bandwidth b is selected by
where SD and IQR stand for the sample standard deviation and sample interquantile range, respectively.
(4) Estimate g(b∗τk) by ˆg(ˆbτk) and then substitute it into (2.7) or (2.10), from which the optimal weight vector can be obtained, where ˆbτk denotes the sample τk-quantile of ˆε1,ˆε2,…,ˆεm.
It is obvious that when a women has a work, the response yi will take a larger value. Because there are only 32.85% of women in the data set does not work, we could believe that a woman has a job if the corresponding response ^yi is bigger than the 0.3285 quantile of the fitted values ˆy. In order to compare the performance of our proposed methods with IWCQ and the composite quantile estimator which only uses the fully observed data (denoted by CQR-CCA), we calculate the fitted values ˆy with all the 2000 data of the above four methods respectively, and predict whether a women works or not. The prediction accuracy is reported in Table 5. From Table 5 we observe that IWCQ method can obviously improve the efficiency of estimation in the case of missing data, and CQR-CCA estimator has the lowest accuracy. It is obvious that our proposed methods are more accurate compared with IWCQ method.
5.
Discussion
In this article, we have proposed two types of weighted quantile estimators for nonlinear models with missing covariates. The asymptotic properties of our proposals have been obtained under certain conditions. Our simulation studies reveal that our proposed method has better advantages than the existing methods. Finally, we propose some future directions. First, We only consider the estimates of unknown parameters in this paper, and future studies can start from variable selection. Second, the logistic model for the selection probability function is assumed in our article. When the selection probability function is misspecified, how to derive a robust estimation of the selection probability could be a direction for further study. Third, our method could be used in Altun et al. [32] to obtain the unknown model parameters of new extended gamma distribution. At last, how to generalize our method to optimal reinsurance problems of Fang, Cheng and Qu [33] is also an interesting topic.
Acknowledgments
The research is supported by NSF projects (ZR2021MA077, ZR2021MA048 and ZR2019MA016) of Shandong Province of China.
Conflict of interest
All authors declare that there is no conflict of interest.