1.
Introduction
As one of the most important semiparametric regression models, the varying coefficient partially linear model (VCPLM), is an integration of the varying-coefficient model and the classic linear model. It can be described as follows:
where Y∈R represents the response variable, X=(X1,X2,⋯,Xq)T, Z=(Z1,Z2,⋯,ZP)T and U∈R represent the associated covariates, ε is the model error, β=(β1,⋯,βP)T is an unknown p-dimensional parameter vector, and α(⋅)=(α1(⋅),⋯,αq(⋅))T is an unknown q-dimensional varying coefficient function vector. It has been extensively researched due to the fact that model (1.1) has both the interpretability of parametric structure and the flexibility of nonparametric structure.
It is observed that the existence form of the unknown β is linear through the function ZTβ in model (1.1). Actually, the absolute linear relationship may be inappropriate in practical applications. To further explore more accurate information between Y and some certain covariates, the VCPNLM, which was introduced by Li and Mei [1], inherits the following form
The difference between these two models is that model (1.2) extends ZTβ in (1.1) to g(Z,β), and g(⋅,⋅) is set to be a pre-known function. It is notable that the dimensions of the parameter vector β and the covariate Z in g(Z,β) are not necessarily consistent. Taking the generalized linear model as an example, we can write exp(a+ZTβ) as g(Z,˜β) with g(⋅)=exp(⋅) and ˜β=(a,βT)T, where both a and β are parameters.
Compared to the VCPLM (1.1), VCPNLM (1.2) has a stronger adaptability. Therefore, it is a of great significance to conduct some statistical studies related to model (1.2). Fortunately, it has been extensively researched since its introduction. In the work of Li and Mei [1], they presented the profile nonlinear least square estimators for both the unknown β and α(⋅). Zhou et al. [2] developed the construction of confidence regions for the unknown quantities by employing the empirical likelihood technique. Jiang et al. [3] put forward a robust estimation approach based on a novel loss function related to the exponential squared in the case where variables have measurement errors. Xiao and Chen [4] promoted the local bias-corrected empirical likelihood procedures to deal with the additive errors for the nonparametric component. Dai and Huang [5] were dedicated to treating the distorted measurement errors in both the response and the covariates. Qian and Huang [6] proposed the corrected profile least squares estimation procedure with measurement errors in the nonparametric part. For model (1.2) with data missing, Wang et al. [7] used the inverse probability weighted profile nonlinear least squares approach and the empirical likelihood technique to deal with the missing covariates. Xia et al. [8] developed the statistical inferences with missing responses. Furthermore, Xiao and Liang [9] performed a robust two-stage estimator method from the aspect of a modal regression. Xiao and Shi [10] studied the robust estimation for model (1.2) with a nonignorable missing response. Zhou and Zhao [11] studied model (1.2) under the framework of a quantile regression with the censored response variable and a missing censoring indicator.
It is noted that all the aforementioned studies were conducted under the assumption of the equal variances of εi,i=1,⋯,n, that is Eεi=0 and Varεi=σ2,i=1,⋯,n. However, using procedures for homoscedastic models in the case of heteroscedastic errors may lead to the loss of efficiency. Therefore, it is crucial and meaningful to ensure the absence of heteroscedasticity before we perform some statistical inference work. The diagnosis of heteroscedasticity has received sufficient attention from many scholars. We can refer to [12,13,14] for partial linear models, and refer to [15,16,17] for VCPLMs with measurement errors. As we have seen, some statistical works based on the empirical likelihood technique ([18,19,20]) inherit many advantages. A significant advantage is that there is no need for a variance estimation. Recently, it has also been shown to work well in the issue of testing the underlying heteroscedastic errors. Readers can be referred to but not limited to [12,13,14]. To our knowledge, research related to the diagnosis of heteroscedasticity for VCPNLM (1.2) by means of the empirical likelihood technique has not yet emerged.
Taking all the above statements into account, in this paper, we plan to perform a diagnostic method for heteroscedasticity based on the empirical likelihood technique in model (1.2). Assuming that the variance of εi satisfies Varεi=σ2i, then the hypotheses testing problem can be defined as follows:
where σ denotes an ordinary constant. We are concerned in constructing a test for heteroscedasticity by invoking the empirical likelihood technique, which does not specify the distribution of the errors. Under several regularity conditions, we attempt to derive the corresponding Wilk's theorem. Finally, we expect to verify the feasibility of our proposed method through some simulation studies.
The remainder is described as follows: the methodology and the main results of the empirical likelihood based diagnostics method are introduced in Section 2; some simulation studies are implemented to exhibit the finite sample performances of our proposed test statistics in Section 3; we use the Boston housing price data to illustrate our proposed method in Section 4; and the conclusions and ongoing works are presented in Section 5. The proofs of the main results are presented in the Appendix.
2.
Methodology
Denote {(Yi,Zi,Xi,Ui),i=1,…,n} as the i.i.d. copies of {(Y,Z,X,U)}; then, the individual form of the VCPNLM is as follows:
Suppose that β is known beforehand; then, (2.1) can be reexpressed as the following varying coefficient model:
First, we employ the classic local linear smooth technique to derive the estimator of {αj(⋅),j=1,⋯,q} in model (2.2). Based on Taylor's expansion and for u in a small neighborhood of u0, αj(u) can be locally approximated via the following linear form:
According to (2.3), the estimator of {(αj(u0),α′j(u0)),j=1,…,q} can be deduced by minimizing the weighted local least-squares problem as follows:
where K(⋅) is the kernel function, Kh(⋅) has the form K(⋅/h)/h, and h is the bandwidth which can be determined by some usual methods.
We introduce the following matrix notations for simplicity in description. Let
and
Under the above matrix representations, the estimator of H(u0) can be derived by the following:
Then, the estimator of α(⋅) at u0 can be obtained by taking only the first part, that is,
where Iq represents the q×q identity matrix, and 0q represents the q×q matrix for all entries 0.
For the purpose of testing the heteroscedasticity in model (2.1), first we rewrite the expression of the error variance as follows:
where mi>0. Similar to arguments in Liu et al.[16], we presume that mi possesses the subsequent sturcture:
Here, mi is supposed to rely on the known covariate Ui and an unknown q×1 vector γ. It is remarkable that the structure of the function m(⋅,⋅) is usually known in advance. In addition, we assume that m(⋅,⋅) is differentiable with respect to γ and there exists a unique γ∗ that satisfies m(Ui,γ∗)=1 for all Ui. Thus, (1.3) is converted to the following hypothesis problem:
In order to utilize the empirical likelihood technique, we first consider the following estimation function:
Denote ηi=(˙mTi,1)T, and ˙mi represents the derivative of mi with respect to δ under the null hypothesis H0. Write hi=(hT1i,hT2i)T∈Rp+q+1; then, we can easily know that E(hi)=0 under H0. Intuitively, the above heteroscedasticity test problem is converted to testing whether E(hi)=0. This can be completed by means of the empirical likelihood technique.
Denote p1,p2,…,pn be some nonnegative numbers whose sum is 1, that is ∑ni=1pi=1. Under the null hypothesis H0, we can construct the profile empirical likelihood ratio for γ,σ2,β as follows:
Here, β and σ2 are nuisance parameters. It is noteworthy that R0(γ;σ2,β) cannot be used to construct a test directly, for it contains the unknown γ, β, σ2, and α(⋅). A measure to deal with this issue is to substitute ˆα(⋅) for α(⋅). Therefore, we denote ˆh1i and ˆh2i as follows:
Denote ˆhi=(ˆhT1i,ˆhT2i)T. Naturally, the estimated profile empirical likelihood ratio is expressed by the following:
Combining the method of the Lagrange multiplier, we can obtain the optimal value of pi as follows:
and λ satisfies the following equation:
Substituting (2.13) into (2.12), we have the following:
To establish the nonparametric Wilk's theorem for −2logR(γ;σ2,β), the following regularity conditions C1–C7 are needed with references to Zhou et al. [2], and the condition C8 is needed for the proof.
C1 The density function f(u) of U is Lipschitz continuous and has bounded away from zero on its bounded support U.
C2 Γ(u) is a q×q nonsingular matrix for u in the support. Both Γ(u),Γ(u)−1, and Φ(u) are Lipschitz continuous.
C3 There exists an s>2 that satisfies E‖, E\|g'(Z, \beta)\|^{2s} < \infty , E\|\varepsilon\|^{2s} < \infty , and E\|U\|^{2s} < \infty, where \|\cdot\| denotes the Euclidean norm. Meanwhile, for some 0 < \delta < 2-s^{-1} , n^{2\delta-1}h\rightarrow \infty holds.
C4 \{\alpha_{j}''(u), j = 1, \ldots, q\} with respect to u is continuous in U\in\Omega .
C5 The Kernel function K(\cdot) is a univariate symmetric density function that satisfies the Lipschitz condition. The functions u^3K(u) and u^3K'(u) are bounded and \int u^4K(u)du\leq \infty .
C6 nh^8 \rightarrow 0 and nh^2/(\log n)^2 \rightarrow \infty hold.
C7 g(z, \beta) is continuous with respect to \beta for any z , and g''(z, \beta) with respect to \beta are all continuous, where \beta \in \mathcal B and \mathcal B is a compact set.
C8
and
The following Theorem 1 describes the asymptotic behavior of -2{\log}R(\gamma; \sigma^2, \beta) .
Theorem 1. Suppose that Conditions C1–C8 hold. Under the null hypothesis, we have the following:
where " \stackrel{\cal L}\longrightarrow " represents the convergence in distribution and \chi_{p+q+1}^{2} represents the chi-square distribution with p+q+1 degrees of freedom.
To deal with the so-called nuisance parameters \beta and \sigma^{2} , under the null hypothesis H_0 , we address
that is, maximizing (2.16) with respect to \beta and \sigma^{2} . Then, R(\gamma) has the following asymptotic result:
3.
Simulation study
In this section, we assess the finite sample performances of our proposed work by some simulation studies. Let the data be generated from the following VCPNLM:
Specifically, g(Z, \beta) = \exp\{Z\beta\} with \beta = 2 and \alpha(U) = \sin(2\pi U) . The model error \varepsilon is supposed to come from the normal distribution (Case 1) and the uniform distribution (Case 2), respectively, with E(\varepsilon|X, Z, U) = 0 and Var(\varepsilon|X, Z, U) = \sigma^{2}m(U, \gamma) , where \sigma^{2} = 1 , m(U, \gamma) = \exp(\gamma U) . Obviously, \gamma = 0 corresponds to the null hypothesis, and \gamma\neq0 corresponds to the alternative hypothesis. Moreover, in the following simulation, the covariates are generated on the base of Z\sim N(0, 1) , X\sim N(0, 1) and U\sim U(0, 1) , and naturally Y is generated according to the model (3.1). Throughout the simulation studies, we select the Epanechnikov kernel K(u) = \frac{3}{4}(1-u^{2})_{+} as the kernel function in our simulation, and the bandwidth h is taken as h = cn^{-1/5} , where the constant c is chosen as the standard deviation of the covariate U .
To evaluate the performance of the proposed method, the sample size in our simulation is taken as n = 200,400, and 600 , respectively, and the nominal level is 0.05. For each situation, we repeat 1000 simulation replications. With these replications, the power of the proposed empirical likelihood ratio test is displayed in Table 1 and Figure 1. Then, we can make the following observations:
(ⅰ) The power declines rapidly when the null hypothesis holds, and it converges to the correct nominal level when the sample increases. This result declares that our proposed testing method can control the probability of making the Type Ⅰ error.
(ⅱ) For any given n , the simulation performances under different error distribution cases are very similar. This result also indicates that our proposed test method is efficient for different model errors.
(ⅲ) The power quickly tends to 1 when the sample increases and when the alternative hypothesis holds. In this respect, we can also demonstrate that our proposed heteroscedasticity test for the VCPNLM is effective.
Next, we compare the proposed empirical likelihood ratio test method with the profile likelihood ratio (PLR) test method used in [21]. In this simulation, the nominal level is taken as \alpha = 0.05 , the sample size is taken n = 400 , and the experiments are repeated 1000 times for each case. The simulation results of the 1000 replicates are shown in Figure 2, where the dashed line is the empirical power function based on the empirical likelihood ratio (ELR) test method proposed by this paper, and the dotted line is the empirical power function based on the PLR method.
From Figure 2, we can see that the empirical power functions obtained by the ELR test method and the PLR test method both rapidly increase as the value of \gamma increases. In addition, the empirical power derived with the ELR test method, is superior to that obtained by the PLR test method.
4.
Application to Boston housing price data
In this section, we analyse the Boston housing price data to illustrate the model testing procedure proposed by this paper. The data set contains information of 506 different houses from a variety of locations in Boston Standard Metropolitan Statistical Area in 1970. Many researchers have analyzed this data set by using the partially linear additive model, the partially linear additive spatial autoregressive model, the partially linear single-index model, and other semiparametric models (see in [22,23,24]). The objective of these studies is to evaluate the influencing factors of the price of owner-occupied homes such as the the per capita crime rate by town, the weighted distances to five Boston employment centres, the average number of rooms per dwelling, and other factors. Hence, for the purpose of our demonstration, we take the indexes as the pupil-teacher ratio by town (denoted by PTRATIO), the index of accessibility to radial highways(denoted by RAD), the percentage of lower status of the population (denoted by LSTAT), the per capita crime rate by town (denoted by CRIME), and the median value of owner-occupied homes in USD 1000's (denoted by MEDV).
In addition, [25] pointed that the covariate CRIME has a nonlinear effect on the response. Hence, we fit this data set by using the following model:
where Y_{i} is the response MEDV, U_{i} is the covariate CRIME, X_{i} is the covariate log(LSTAT), and Z_{i1} and Z_{i2} are covariates RAD and PTRATIO, respectively. The logarithmic transformation for the covariate LSTAT is taken to ease off the trouble caused by big gaps in the domain.
Here, we consider the null hypothesises H_{0}: Var(\varepsilon_{i}|Z_{i1}, Z_{i2}, X_{i}, U_{i})\equiv \sigma^{2} . By using the ELR testing procedure proposed by this paper, we find that the p-value of this testing problem is 0.3484. This means that the null hypotheses can not be rejected under the nominal level 0.05 , which also implies that the model error \varepsilon does not have significant effect on the covariates.
5.
Conclusions
In this paper, we were concerned with the statistical inferences for the VCPNLM. Combining the empirical likelihood method, we proposed a diagnostic technique for heteroscedasticity in the semiparametric varying-coefficient partially nonlinear models. Under some mild conditions, the nonparametric version of Wilks theorem was derived and proven. Furthermore, simulation studies were performed to illustrate the performances of our proposed methods. As we have known, missing data is common in many fields. Ignoring the missing data will result in the reduction of effective information. Therefore, our forthcoming work is to implement the the statistical inferences for the VCPNLM in the case of missing data.
Author contributions
Cuiping Wang: Writing-original draft, Conceptualization, Formal analysis, Methodology; Xiaoshuang Zhou: Funding acquisition, Validation and data analysis, Methodology, Supervision, Writing-review and editing; Peixin Zhao: Funding acquisition, Software, Supervision, Writing-review and editing. All authors have read and approved the final version of the manuscript for publication
Use of Generative-AI tools declaration
The authors declares they have used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
Xiaoshuang Zhou's research was supported by the Natural Science Foundation of Shandong Province (Grant Nos. ZR2020MA021 and ZR2022MA065); Peixin Zhao's research was supported by the National Social Science Foundation of China (Grant No. 24BTJ062).
Conflict of interest
The author declares no conflicts of interest in this paper.
Appendix
Several Lemmas are needed to prove the main result.
Lemma 1. Assuming that Conditions C1–C8 hold. Then we get the following conclusions:
and d_{n} = h^{2}+(\log n/nh)^{1/2} . If h = dn^{-1/5} with a constant d , then we have
Proof. The proof can be derived in [2]. □
Lemma 2. B = \left(\begin{array}{ll}B_{11}\quad B_{12} \\ B_{21} \quad B_{22}\end{array}\right) is a real symmetric matrix, if B_{22} > 0 , write B_{11.2} \triangleq B_{11}-B_{12} B_{22}^{-1} B_{21} , then we have
(a) B > 0 \Leftrightarrow B_{22} > 0, B_{11.2} > 0 .
(b) If B_{22} > 0 , then B \geq 0 \Leftrightarrow B_{11.2} \geq 0.
Proof. The proof can be seen in [26]. □
Lemma 3. Let \theta_{i}, i = 1, \cdots, n be i.i.d. random variables with E\left(\theta_{i}\right) = 0 and \operatorname{Var}\left(\theta_{i}\right) = \sigma^{2} < \infty , then for any permutation \left(l_{1}, l_{2}, \cdots, l_{n}\right) of (1, 2, \cdots, n) , we have
Proof. The proof of Lemma 3 can be referred to [27]. □
Lemma 4. Assuming that Conditions C1–C8 and H_0 hold, then we have
Proof. Firstly, we prove
(1). \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \hat{h}_{1 i} = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} h_{1 i}+o_{p}(1) ,
(2). \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \hat{h}_{2 i} = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} h_{2 i}+o_{p}(1) .
Firstly, we consider the component \frac{1}{\sqrt{n}} \sum_{i = 1}^{n}\hat{h}_{1i} ,
where R_{1} = \frac{2}{\sqrt{n}} \sum_{i = 1}^{n} \eta_{i} \varepsilon_{i}[X_i^T \hat{\alpha}(U_i) -X_i^T \alpha(U_i)] and R_{2} = \frac{1}{\sqrt{n}}\sum_{i = 1}^{n}\eta_{i}[X_i^T \hat{\alpha}(U_i)-X_i^T \alpha(U_i)]^{2} . Therefore, we can derive that
Similar to the discussion of R_{1} , R_{2} = o_{p}(1) holds. Then
Next, we consider the component \frac{1}{\sqrt{n}}\sum_{i = 1}^{n} \hat{h}_{2 i} ,
where
So \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \hat{h}_{2i} = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \tilde g'(Z_{i}, \beta) \varepsilon_{i}+o_{p}(1). □
Lemma 5. Assuming that Conditions C1–C8 and H_0 hold. Then
Proof. Denote b = (b_{1}^T, b_{2}^T)^T , where b_1\in R^{q+1} and b_1\in R^p , Therefore, b can be regarded as a (p+q+1) -dimensional vector.
Denote \mu_{i} = E\varepsilon^{i} , then we have \mu_{1} = E \varepsilon = 0 and \mu_{2} = E\varepsilon^{2} = \sigma^2,
According to Condition C8, matrix \Sigma = \left(\begin{array}{l}B_{11} \quad B_{12} \\ B_{21}\quad B_{22}\end{array}\right) is nonnegative definite.
We know matrix B_{22} > 0 and B_{11}-B_{12} B_{22}^{-1} B_{21} \geq 0 from Lemma 3(b), this together with Cauchy-Schwartz inequality show that
Since the relationship between the \varepsilon_{i} and \varepsilon_{i}^{2}-\sigma^{2} is nonlinear, the above inequality holds strictly. Then,
Naturally, we get B_{11}\left(\mu_{4}-\mu_{2}^{2}\right)-B_{12}B_{22}^{-1}B_{21} \mu_{3}^{2} / \mu_{2} > 0, B_{22} \mu_{2} > 0 , it follows from Lemma 3 that
is a positive definite matrix. This indicates that the Lindeberg Condition is met. So by means of the Lindeberg-Feller central limit theorem, we obtain
This together with Cramer-Wold method, we deduce that \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \hat{h}_{i}\stackrel{\mathcal{L}}{\rightarrow } N\left(\mathbf{0}, \Sigma^{\prime}\right). The proof is finished. □
Lemma 6. Under null hypothesis and conditions C1–C6, we have
Proof. It is easy to obtain that
Using Condition C6 and Lemma 1, we get
Using the similar derivation method, we arrive at the following conclusion
Moreover, we have
Invoking the law of large number, it is easy to get
□
Lemma 7. Denote \hat{h}_{max} = \max \{\hat{h}_{1}, \cdots, \hat{h}_{n}\} , then under the null hypothesis and Conditions C1–C8, it holds
Proof. The proof can be inspired from [18]. □
Lemma 8. The conclusion about the the Lagrange multiplier \lambda is as follows:
Proof. It can be get from [18], thus we omit here. □
Proof of Theorem 1. Based on the above Lemmas 7 and 8 and the Taylor expansion of (2.15), we deduce that
By Lemmas 5–8, we have
Similar to [18], -2 \log R\left(\gamma; \sigma^{2}, \beta\right) \stackrel{\mathcal{L}}{\rightarrow } \chi_{p+q+1}^{2} can be derived. Theorem 1 follows clearly. □