Adaptive estimation for spatially varying coefficient models

Heng Liu; Xia Cui; Heng Liu; Xia Cui

doi:10.3934/math.2023713

AIMS Mathematics

2023, Volume 8, Issue 6: 13923-13942. doi: 10.3934/math.2023713

Previous Article Next Article

Research article

Adaptive estimation for spatially varying coefficient models

Heng Liu ,
Xia Cui ^,

School of Economics and Statistics, Guangzhou University, Guangzhou 510006, China

Received: 04 February 2023 Revised: 13 March 2023 Accepted: 17 March 2023 Published: 13 April 2023
MSC : 62G05

In this paper, a new adaptive estimation approach is proposed for the spatially varying coefficient models with unknown error distribution, unlike geographically weighted regression (GWR) and local linear geographically weighted regression (LL), this method can adapt to different error distributions. A generalized Modal EM algorithm is presented to implement the estimation, and the asymptotic property of the estimator is established. Simulation and real data results show that the gain of the new adaptive method over the GWR and LL estimation is considerable for the error of non-Gaussian distributions.

Keywords:

Citation: Heng Liu, Xia Cui. Adaptive estimation for spatially varying coefficient models[J]. AIMS Mathematics, 2023, 8(6): 13923-13942. doi: 10.3934/math.2023713

Related Papers:

[1]	Yanting Xiao, Yifan Shi . Robust estimation for varying-coefficient partially nonlinear model with nonignorable missing response. AIMS Mathematics, 2023, 8(12): 29849-29871. doi: 10.3934/math.20231526
[2]	Peng Lai, Wenxin Tian, Yanqiu Zhou . Semi-supervised estimation for the varying coefficient regression model. AIMS Mathematics, 2024, 9(1): 55-72. doi: 10.3934/math.2024004
[3]	Yanting Xiao, Wanying Dong . Robust estimation for varying-coefficient partially linear measurement error model with auxiliary instrumental variables. AIMS Mathematics, 2023, 8(8): 18373-18391. doi: 10.3934/math.2023934
[4]	Yanping Liu, Juliang Yin . B-spline estimation in varying coefficient models with correlated errors. AIMS Mathematics, 2022, 7(3): 3509-3523. doi: 10.3934/math.2022195
[5]	Juxia Xiao, Ping Yu, Zhongzhan Zhang . Weighted composite asymmetric Huber estimation for partial functional linear models. AIMS Mathematics, 2022, 7(5): 7657-7684. doi: 10.3934/math.2022430
[6]	Jieqiong Lu, Peixin Zhao, Xiaoshuang Zhou . Orthogonality based modal empirical likelihood inferences for partially nonlinear models. AIMS Mathematics, 2024, 9(7): 18117-18133. doi: 10.3934/math.2024884
[7]	Ruiyuan Chang, Xiuli Wang, Mingqiu Wang . Corrected optimal subsampling for a class of generalized linear measurement error models. AIMS Mathematics, 2025, 10(2): 4412-4440. doi: 10.3934/math.2025203
[8]	Monthira Duangsaphon, Sukit Sokampang, Kannat Na Bangchang . Bayesian estimation for median discrete Weibull regression model. AIMS Mathematics, 2024, 9(1): 270-288. doi: 10.3934/math.2024016
[9]	Zouaoui Chikr Elmezouar, Fatimah Alshahrani, Ibrahim M. Almanjahie, Salim Bouzebda, Zoulikha Kaid, Ali Laksaci . Strong consistency rate in functional single index expectile model for spatial data. AIMS Mathematics, 2024, 9(3): 5550-5581. doi: 10.3934/math.2024269
[10]	Kunting Yu, Yongming Li . Adaptive fuzzy control for nonlinear systems with sampled data and time-varying input delay. AIMS Mathematics, 2020, 5(3): 2307-2325. doi: 10.3934/math.2020153

Abstract

1. Introduction

In spatial data analysis, a common problem is determining the nature of the relationship between variables. In many cases, a simple global model often fails to explain the relationships between certain sets of variables, as the relationships between them may change with the change of position, which is known as spatial heterogeneity. In order to deal with this heterogeneity, the model needs to reflect the structure of spatial variation in the data. Suppose that the spatial data of $n$ positions are randomly selected in the spatial region $D \subseteq R^2$ , let ${\mathit{\boldsymbol{u}}}_i = (u_{i1}, u_{i2})^\top \in D$ is the position of the point $i, i = 1, \cdots, n$ , $y_i$ is the response variable, ${\bf{x}}_i = (x_{i1}, x_{i2}, \cdots, x_{ip})^\top$ is the explanatory variable and $x_{i1}\equiv 1$ , allowing a varying intercept in the model. $\{y_i, {\bf{x}}_i, u_i\}$ satisfy the following spatially varying coefficient models (SVCM) ^[1,2,3]:

$\begin{equation} y_i = {\bf{x}}_i^\top {\mathit{\boldsymbol{\beta}}}( {\mathit{\boldsymbol{u}}}_i)+\varepsilon_{i} = \sum\limits_{k = 1}^{p}x_{ik}\beta_{k}( {\mathit{\boldsymbol{u}}}_i)+\varepsilon_{i}, \qquad i = 1, 2, \cdots, n, \end{equation}$

(1.1)

where ${\mathit{\boldsymbol{\beta}}}({\mathit{\boldsymbol{u}}}_i) = (\beta_1({\mathit{\boldsymbol{u}}}_i), \beta_2({\mathit{\boldsymbol{u}}}_i), \cdots, \beta_{p}({\mathit{\boldsymbol{u}}}_i))^\top$ is a vector of $p$ -dimensional unknown space-varying functional coefficients defined on $D$ , $\varepsilon_i$ is an independent and and identically distributed random noise, with $E(\varepsilon_{i}) = 0, var(\varepsilon_{i}) = \sigma^2$ , and are independent of ${\bf{x}}_i$ . Over the past few decades, SVCM has been widely used in geography ^[4], econometrics ^[5], meteorology ^[6], and environmental science ^[7]. When $\beta_k(\cdot)$ is a univariate function, the model (1.1) is a varying coefficient model and has been extensively studied ^[8,9]. In this study, $\beta_k(\cdot)$ is a bivariate function of the location-specific, and our main goal is to estimate ${\mathit{\boldsymbol{\beta}}} = (\beta_1, \beta_2, \cdots, \beta_p)$ and explore the spatial heterogeneity of regression relations based on the given observations $\{(y_i, {\bf{x}}_i, {\mathit{\boldsymbol{u}}}_i)\}_{i = 1}^{n}$ .

In the rich literature on how to estimate the regression coefficients of SVCM, the Bayesian approach and the smoothing approach are two competing methods. Firstly, the Bayesian approach is an important spatial modeling method that assumes that the regression coefficients obey a certain prior distribution and calculates their posterior distribution for estimation and inference. For example, Gelfand et al. ^[10] developed a Bayesian hierarchical framework of spatial point reference data by formulating a Gaussian process for spatially varying coefficients, and Assuncao ^[11] introduced the Bayesian space-varying coefficient model (BVCM) for areal data. Recently, Kim and Lee ^[12] extended BVCM to handle mixed data with point reference data and areal data. Luo et al. ^[13] built the Bayesian spatially clustered coefficient (BSCC) model from the spanning trees of a graph. However, Bayesian methods require careful selection of prior distributions and face the high computational cost issue. Secondly, the smoothing method is a traditional framework for regression, divided into kernel smoothing and smoothing splines. For example, Fotheringham et al. ^[1] adopted a locally weighted least squares method of constructing weights by spatial kernel functions, namely geographically weighted regression (GWR), which is essentially a local constant kernel smoother. Mu et al. ^[14] used binary splines over triangulation to estimate regression coefficients, which solves the problem of inappropriate smoothness of complex regional boundary features and processes large data sets quickly and effectively enough. Yet, the kernel-based method needs to solve an optimization problem at each sample position which is computationally intensive, and smoothing splines method inference for spatially varying coefficients relies on a bootstrap method.

Currently, there are also numerous studies on variable selection in SVCM. Shin et al. ^[15] proposed penalized quasi-likelihood methods with spatial dependence. Wang and Sun ^[16] represented the space-varying coefficients as a combination of local polynomials at anchor points and applying the least squares with an additive form of lasso and fused-lasso penalties. Li and Sang ^[17] proposed a spanning tree graph fused lasso-based spatially clustered coefficient regression (SCC) model with the assumption of spatial clusters, and the regularization term of the SCC model is generalized by a chain graph guided fusion penalty plus a group lasso penalty ^[18]. However, each of these methods estimated space-varying coefficients by the least squares criterion, corresponding to the likelihood function when the error term is normally distributed. In practice, the error density was unknown, so it is not appropriate to use the least squares method, which will lose some efficiency, but the adaptive estimation method provides an alternative way.

The adaptive estimation method was first studied to consider the problem of estimating and inferring an infinite dimensional parameter ^[19]. This method replaces the Gaussian density function with a nonparametric estimate of the score function of the log-likelihood estimation and proves that efficiency gain can be achieved in both varying coefficient models ^[20] and varying coefficient models with non-stationary covariates ^[21]. In this study, we propose an adaptive estimation method to estimate spatially varying coefficients, different from the least squares criterion, the logarithmic function of the new adaptive estimation method is similar to the likelihood structure of the mixed density function, without an explicit solution, and we use the generalized Modal EM (GMEM) algorithm to achieve parameter estimation ^[22]. Simulation results show that when the error distribution deviates from the normal distribution, the new estimation is more effective than the existing GWR estimation based on least squares. In addition, the new method is also comparable with existing GWR methods when the error is completely normal. Finally, we illustrate the effectiveness of the proposed adaptive estimation method through two real data examples.

The rest of this study is organized below. In Section 2, the adaptive estimation of spatial varying coefficient models and the generalized Modal EM algorithm are introduced. In Section 3, through simulation research, the proposed method is compared with the GWR method under five different error densities. In Section 4, the new method is applied to two real-world data examples. This article is briefly discussed in Section 5. All technical conditions and certifications are given in Section Appendix A.

2. Adaptive kernel estimation method

For any given ${\mathit{\boldsymbol{u}}}_0$ , approximating the spatially varying coefficients by Taylor's expansion as

$\begin{equation} \beta_k( {\mathit{\boldsymbol{u}}}_i) \approx \beta_k( {\mathit{\boldsymbol{u}}}_0)+\dot{ {\mathit{\boldsymbol{\beta}}}}_k( {\mathit{\boldsymbol{u}}}_0)( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0) \overset{\triangle}{ = } b_k+\mathit{\boldsymbol{c}}_k( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0), \quad k = 0, \cdots, p, \end{equation}$

(2.1)

where ${\mathit{\boldsymbol{u}}}_i$ is in a neighborhood of ${\mathit{\boldsymbol{u}}}_0$ , $\dot{ {\mathit{\boldsymbol{\beta}}}}_k({\mathit{\boldsymbol{u}}}_0) = \{\partial(\beta_k({\mathit{\boldsymbol{u}}})/\partial{u_1}, \partial(\beta_k({\mathit{\boldsymbol{u}}})/\partial{u_2}\}_{ {\mathit{\boldsymbol{u}}} = {\mathit{\boldsymbol{u}}}_0}$ . Using the above approximation, we have the following objective function for estimating $(b_1, \cdots, b_p)$ and $({\mathit{\boldsymbol{c}}}_1, \cdots, {\mathit{\boldsymbol{c}}}_p)$

$\begin{equation} \sum\limits_{i = 1}^n\left[ y_i-\sum\limits_{k = 1}^p\{ b_k+\mathit{\boldsymbol{c}}_k( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\} x_{ik} \right]^2 K_h(\vert| {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0\vert|), \end{equation}$

(2.2)

where $K_h(\cdot) = K(\cdot/h)/h^2$ , $K(\cdot)$ is a kernel function, $h$ is a bandwidth, and $\vert|{\mathit{\boldsymbol{s}}}\vert| = ({\mathit{\boldsymbol{s}}}^\top {\mathit{\boldsymbol{s}}})^{\frac{1}{2}}$ for a vector ${\mathit{\boldsymbol{s}}}$ . Throughout this study, a Gaussian kernel will be used for $K(\cdot)$ . Due to the least squares in (2.2), the resulting estimate may lose some efficiency when the error distribution is not normal. Therefore, we develop an adaptive estimation procedure that can adapt to different error distributions.

Let $f(\varepsilon)$ be the density function of $\varepsilon$ . If $f(\varepsilon)$ were known, it would be natural to estimate the parameters in (2.1) by maximizing the following log-likelihood function

$\begin{equation} \sum\limits_{i = 1}^n\log f \left[ y_i-\sum\limits_{k = 1}^p\{b_k+\mathit{\boldsymbol{c}}_k( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\}x_{ik} \right]K_h(\vert| {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0\vert|). \end{equation}$

(2.3)

However, in practice, $f(\varepsilon)$ is generally unknown but can be replaced by a leave-one-out kernel density estimator

$\begin{equation} \tilde{f}_{\varepsilon} = \frac{1}{n}\sum\limits_{j \neq i}^nK_{h_0}(\varepsilon_i-\tilde{\varepsilon}_j), \end{equation}$

(2.4)

where $\tilde{\varepsilon}_j = y_j-\sum_{k = 0}^p x_{jk}\tilde{\beta}_k({\mathit{\boldsymbol{u}}}_j)$ is a preliminary estimation of $\varepsilon_j$ based on initial estimator $\tilde{\beta}_k$ , +and $\tilde{\beta}_k = \tilde{b}_k$ , which can be estimated by local linear regression estimator (2.2). Let $\mathit{\boldsymbol{\theta}} = (b_1, \cdots, b_p, \mathit{\boldsymbol{c}}_1, \cdots, \mathit{\boldsymbol{c}}_p)^\top$ . Then our proposed adaptive estimate for the parameter $\mathit{\boldsymbol{\theta}}$ is

$\begin{equation} \hat{\mathit{\boldsymbol{\theta}}} = \mathop{\arg\max}\limits_{\mathit{\boldsymbol{\theta}}}Q(\mathit{\boldsymbol{\theta}}) \end{equation}$

(2.5)

where

$\begin{equation} Q(\mathit{\boldsymbol{\theta}}) = \sum\limits_{i = 1}^n \log\left( \frac{1}{n}\sum\limits_{j\neq i}K_{h_0} [y_i-\sum\limits_ {k = 1}^p \{b_k+\mathit{\boldsymbol{c}}_k ( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\}x_{ik}-\tilde{\varepsilon}_j]\right)K_h(\vert| {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0\vert|). \end{equation}$

(2.6)

Since the logarithmic function of (2.6) has an internal sum, which is similar to the objective function from a random sample of mixed density, so there is no explicit solution. In the following, we use the generalized Modal EM algorithm proposed in Yao ^[22] to calculate the parameters.

Generalized Modal EM algorithm (GMEM): GMEM algorithm is the generalization of Modal EM (MEM) algorithm ^[23] that finds the mode of the mixture density and does nonparametric clustering. The MEM algorithm comprises two steps similar to the expectation and the maximization steps in EM algorithm, which aims at maximizing the likelihood function for finite mixture models when the model contains unobserved latent variables. Especially, suppose an m-component finite mixture density be

$\begin{equation*} f(x) = \sum\limits_{j = 1}^{m}\pi_jf_j(x), \end{equation*}$

where $\pi_j$ is the mixing proportions of mixture component $j$ , and $f_j(x)$ is the density of component $j$ . Given any initial value $x^{(0)}$ , in the $(l+1)th$ step of the MEM algorithm solves a local maximum of the mixture by the following two steps:

1) let

$\begin{equation*} p_j = \frac{\pi_jf_j(x^{(l)})}{f(x^{(l)})}, j = 1, \cdots, m, \end{equation*}$

2) update

$\begin{equation*} x^{(l+1)} = \mathop{\arg\max}\limits_{x}\sum\limits_{j = 1}^{m}p_j\log f_j(x). \end{equation*}$

The first step is the "Expectation" step where the probability of each mixture component $j, 1 \leq j \leq m$ , at the current point $x^{(r)}$ is computed. The second step is the "Maximization" step, similar to EM algorithm, which is usually much easier than the original objective function. Detailed properties of MEM algorithms refer to Li et al. ^[23]. Yao ^[22] proves that the MEM algorithm can be applied to maximize a general mixture-type objective function

$\begin{equation} f(x) = \sum\limits_{j = 1}^{m}w_j\left[\log\{\sum\limits_{k = 1}^{K}a_{jk}f_{jk}(x)\}\right] \end{equation}$

(2.7)

where $w_k$ and $a_{kl}$ are known positive constants, $f_{jk}(x)$ is positive known function, when $j = 1$ , the objective function (2.7) is simplified to

$\begin{equation*} f(x) = w_1\log\{\sum\limits_{k = 1}^{K}a_{1k}f_{1k}(x)\} \propto \sum\limits_{k = 1}^{K}a_{1k}f_{1k}(x), \end{equation*}$

Therefore, the MEM algorithm in Eq (2.7) is a special case of the generalized Modal EM algorithm (GMEM) if $\sum_{k = 1}^Ka_{1k} = 1$ and $f_{1k}(x)$ are density functions. Specifically, given the initial value $x^{(0)}$ , in the $(l+1)th$ step of the GMEM algorithm are following:

E-step: let

$\begin{equation*} p_{jk}^{(l+1)} = \frac{a_{jk}f_{jk}(x^{(l)})}{\sum\nolimits_{k = 1}^{K}a_{jk}f_{jk}(x^{(l)})}, j = 1, \cdots, m, k = 1, \cdots, K, \end{equation*}$

M-step: update

$\begin{equation*} x^{(l+1)} = \mathop{\arg\max}\limits_{x}\sum\limits_{j = 1}^{m}\sum\limits_{k = 1}^{K}\{w_jp_{jk}^{l+1}\log f_{jk}(x)\}. \end{equation*}$

In this study, we note that the objective function $Q({\mathit{\boldsymbol{\theta}}})$ of (2.6) has the mixture form of (2.7). Specially, $K_h({\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0), \frac{1}{n}, K_{h_0} [y_i-\sum_ {k = 1}^p \{b_k+\mathit{\boldsymbol{c}}_k ({\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\}x_{ik}-\tilde{\varepsilon}_j]$ in (2.6) corresponds to $w_j, a_{jk}, f_{jk}(x)$ in (2.7), respectively. Therefore, GMEM could be directly applied to estimate the parameters of $b_k$ , $\mathit{\boldsymbol{c}}_k$ in (2.6). Let ${\mathit{\boldsymbol{\theta}}}^{(0)}$ be the initial estimator obtained by minimizing (2.2), $\mathit{\boldsymbol{\theta}}^{(l)} = (b_1^{(l)}, \cdots, b_p^{(l)}, \mathit{\boldsymbol{c}}_1^{(l)}, \cdots, \mathit{\boldsymbol{c}}_p^{(l)})^\top$ is the estimator of (l)th iteration, $\tilde{\varepsilon}_j$ is a preliminary estimation of $\varepsilon_j$ and no need to update, ${\mathit{\boldsymbol{z}}}_{i} = \{ {\bf{x}}_{i}^\top, ({\bf{x}}_{i}\otimes({\mathit{\boldsymbol{u}}}_{i}- {\mathit{\boldsymbol{u}}}_{0}))^\top\}^\top$ . At the $(l+1)$ th iteration, steps E and M are as follows:

E-step: calculate the classification probabilities $p_{ij}^{(k+1)}$ ,

$\begin{equation} p_{ij}^{(l+1)} = \frac{K_{h_0} [ y_i- \sum\nolimits_{k = 1}^p\{b_k^{(l)}+\mathit{\boldsymbol{c}}_k^{(l)}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\}x_{ik}-\tilde{\varepsilon}_j ]} {\sum\nolimits_{j\neq i}K_{h_0} [ y_i-\sum\nolimits_{k = 1}^p\{b_k^{(l)}+\mathit{\boldsymbol{c}}_k^{(l)}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\}x_{ik}- \tilde{\varepsilon}_j]}. \end{equation}$

(2.8)

M-step: update ${\mathit{\boldsymbol{\theta}}}^{(l+1)}$

$\begin{equation} \begin{split} {\mathit{\boldsymbol{\theta}}}^{(l+1)}& = \mathop{\arg\max}\limits_{\mathit{\boldsymbol{\theta}}}\sum\limits_{i = 1}^n\sum\limits_{j \neq i}\{ p_{ij}^{(l+1)}K_h(\vert| {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0\vert|)\log(K_{h_0}[y_i-\sum\limits_{k = 1}^p\{b_k+\mathit{\boldsymbol{c}}_k( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\}x_{ik}- \tilde{\varepsilon}_j]) \}\\ & = \sum\limits_{i = 1}^n\sum\limits_{j \neq i}\mathop{\arg\min}\limits_{\mathit{\boldsymbol{\theta}}}\{ p_{ij}^{(l+1)}K_h(\vert| {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0\vert|) [y_i-\tilde{\varepsilon}_j-{\mathit{\boldsymbol{z}}}_{i}^\top {\mathit{\boldsymbol{\theta}}}]^2 \}\\ & = (\sum\limits_{i = 1}^n\sum\limits_{j \neq i}p_{ij}^{(l+1)} K_h(\vert| {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0\vert|) {\mathit{\boldsymbol{z}}}_{i} {\mathit{\boldsymbol{z}}}_{i}^\top)^{-1}\sum\limits_{i = 1}^n\sum\limits_{j \neq i}p_{ij}^{(l+1)}K_h(\vert| {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0\vert|)(y_i-\tilde{\varepsilon}_j){\mathit{\boldsymbol{z}}}_{i}\\ & = ({\mathit{\boldsymbol{Z}}}^\top {\mathit{\boldsymbol{W}}} {\mathit{\boldsymbol{Z}}})^{-1}{\mathit{\boldsymbol{Z}}}^\top {\mathit{\boldsymbol{W}}} {\mathit{\boldsymbol{Y}}}, \end{split} \end{equation}$

(2.9)

where ${\mathit{\boldsymbol{Z}}} = ({\mathit{\boldsymbol{Z}}}_{1, n-1}, \cdots, {\mathit{\boldsymbol{Z}}}_{n, n-1})^\top$ ,

$\begin{equation*} {\mathit{\boldsymbol{Z}}}_{i, n-1} = \left( \begin{array}{cccc} x_{i1} & x_{i1} & \cdots &x_{i1}\\ \vdots & \vdots & \vdots & \cdots\\ x_{ip} & x_{ip} & \cdots &x_{ip}\\ ( {\mathit{\boldsymbol{u}}}_{i}- {\mathit{\boldsymbol{u}}}_{0})x_{i1} &( {\mathit{\boldsymbol{u}}}_{i}- {\mathit{\boldsymbol{u}}}_{0})x_{i1} &\cdots &( {\mathit{\boldsymbol{u}}}_{i}- {\mathit{\boldsymbol{u}}}_{0})x_{i1}\\ \vdots & \vdots & \vdots & \cdots\\ ( {\mathit{\boldsymbol{u}}}_{i}- {\mathit{\boldsymbol{u}}}_{0})x_{ip} &( {\mathit{\boldsymbol{u}}}_{i}- {\mathit{\boldsymbol{u}}}_{0})x_{ip} &\cdots &( {\mathit{\boldsymbol{u}}}_{i}- {\mathit{\boldsymbol{u}}}_{0})x_{ip} \end{array} \right)_{3p\times(n-1)}, \end{equation*}$

${\mathit{\boldsymbol{W}}} = diag(p_{12}^{(l+1)}K_h(\vert| {\mathit{\boldsymbol{u}}}_1- {\mathit{\boldsymbol{u}}}_0\vert|), \cdots, p_{1n}^{(l+1)}K_h(\vert| {\mathit{\boldsymbol{u}}}_1- {\mathit{\boldsymbol{u}}}_0\vert|), \cdots, p_{n, n-1}^{(l+1)}K_h(\vert| {\mathit{\boldsymbol{u}}}_n- {\mathit{\boldsymbol{u}}}_0\vert|))$ , ${\mathit{\boldsymbol{Y}}} = (y_1-\tilde{\varepsilon}_2, \cdots, y_1-\tilde{\varepsilon}_n, \cdots, y_n-\tilde{\varepsilon}_{n-1})^\top$ , and the second equation follows the use of Gaussian kernel. If $\vert| {\mathit{\boldsymbol{\theta}}}^{(l+1)}- {\mathit{\boldsymbol{\theta}}}^{(l)}\vert| \leq 10^{-5}$ , the algorithm ends. Otherwise, the E and M steps of the algorithm continue to iterate.

Proposition 2.1. Each iteration of the above E and M steps will monotonically increase $Q(\mathit{\boldsymbol{ \theta}})$ in the Eq (2.6), i.e., for any $l$ ,

$\begin{equation*} Q( {\mathit{\boldsymbol{\theta}}}^{(l+1)}) \geq Q( {\mathit{\boldsymbol{\theta}}}^{(l)}). \end{equation*}$

The consistency and asymptotic of ${\mathit{\boldsymbol{\theta}}}$ are established. Let ${\mathit{\boldsymbol{H}}} = diag(1, h, h) \otimes {\mathit{\boldsymbol{I}}}_p$ , where $\otimes$ is the Kronecker product and $I_p$ is the unit matrix of $p\times p$ . For $i, j = 0, 1, 2$ , $k = 1, 2$ , denote $\gamma_{ij} = \int{u_{k}^{i}K^{j}(\|{\mathit{\boldsymbol{u}}}\|) d{\mathit{\boldsymbol{u}}}}$ with ${\mathit{\boldsymbol{u}}} = (u_1, u_2)$ , and $q(\cdot)$ is the marginal density function of ${\mathit{\boldsymbol{u}}}$ .

Theorem 2.1. Under the regularity conditions in the A, there exists a consistent maximizer $\hat{\mathit{\boldsymbol{\theta}}} = (\hat b_1, \cdots, \hat b_p, \hat{\mathit{\boldsymbol{c}}}_1, \cdots, \hat{\mathit{\boldsymbol{c}}}_p)^\top$ of (2.6) with probability approaching 1 such that

$\begin{equation*} {\mathit{\boldsymbol{H}}}(\hat{ {\mathit{\boldsymbol{\theta}}}}- {\mathit{\boldsymbol{\theta}}}) = O_p\{(nh^2)^{-1/2}+h^2\}. \end{equation*}$

Based on Theorem 2.1, we can know that the proposed adaptive estimator of ${\mathit{\boldsymbol{\theta}}}$ is consistent and its proof is provided in the Appendix. Next, we provide the asymptotic distribution of the proposed estimator.

Theorem 2.2. Suppose that the regularity conditions in the A hold. Then $\hat{ {\mathit{\boldsymbol{\theta}}}}$ , given in Theorem 2.1, has the following asymptotic distribution

$\begin{equation*} \sqrt{nh^2}\{{\mathit{\boldsymbol{H}}}(\hat{ {\mathit{\boldsymbol{\theta}}}}- {\mathit{\boldsymbol{\theta}}})-{\mathit{\boldsymbol{S}}}^{-1} \frac{h^2}{2} \sum\limits_{k = 1}^p tr({\mathcal{H}_{\beta_k}}) \mathit{\boldsymbol{\psi}}_k(1+o_p(1))\} \xrightarrow{D} N({{\bf{0}}}_{3p\times 1}, [E\{\rho'(\varepsilon)^2\}]^{-1}q({\mathit{\boldsymbol{u}}}_0)^{-1}{\mathit{\boldsymbol{S}}}^{-1} {\bf{\Lambda}} {\mathit{\boldsymbol{S}}}^{-1})\}\}, \end{equation*}$

where $\rho(\cdot) = \log f(\cdot)$ , ${\mathit{\boldsymbol{S}}} = diag(\gamma_{01}, \gamma_{21}, \gamma_{21})\otimes {\bf{\Gamma}}(\mathit{\boldsymbol{u}}_0)$ , ${\bf{\Gamma}}(\mathit{\boldsymbol{u}}_0) = \{\Gamma_{kj}({\mathit{\boldsymbol{u}}}_0)\}_{1\leq k, j \leq p}$ , $\Gamma_{kj}({\mathit{\boldsymbol{u}}}_0) = E(x_{ik}x_{ij}\vert {\mathit{\boldsymbol{u}}}_0)$ , ${\bf{\Lambda}} = diag(\gamma_{02}, \gamma_{22}, \gamma_{22})\otimes {\bf{\Gamma}}({\mathit{\boldsymbol{u}}}_0)$ , and $\mathit{\boldsymbol{\psi}}_k = \binom{\gamma_{21}}{{{\bf{0}}}_{2 \times 1}}\otimes (\Gamma_{kj}({\mathit{\boldsymbol{u}}}_0))_{1\leq j \leq p}^\top$ .

3. Simulation study

This section simulates the proposed adaptive estimation method and compares it with that of the local linear geographically weighted regression (LL) ^[24] and the Geographically Weighted Regression Model (GWR). In numerical experiments, the following four designs of error structure are considered:

$1)$ $\varepsilon \sim N(0, 1)$ ;

$2)$ $\varepsilon \sim t_3$ ;

$3)$ $\varepsilon \sim 0.5N(-1, 0.5^2)+0.5N(1, 0.5^2)$ ;

$4)$ $\varepsilon \sim e^T-E(e^T)$ , where $T \sim N(0, 1)$ .

The first is the standard normal distribution as a benchmark for comparison, and the second is the $t$ distribution with 3 degrees of freedom. The distributions of the third are doublet and left-biased, and the last one has a long right tail. For the above error distribution, the population positions are located at the $N = 25 \times 25$ regular grid in the square region of $D = [0, 1]^2$ , and the distance between any two adjacent points in the horizontal and vertical directions is equal. At each location, the response variable $y_1, \cdots, y_n$ is generated by $y_i = \beta_1({\mathit{\boldsymbol{u}}}_i)x_{i1}+\beta_2({\mathit{\boldsymbol{u}}}_i)x_{i2}+\varepsilon_i$ , where $x_1$ and $x_2$ follow $N(0, 1)$ with correlation coefficient $\rho = 1/\sqrt{2}$ , and the regression coefficient function is as follows:

$\begin{gather} \beta_1( {\mathit{\boldsymbol{u}}}) = 1+\frac{25}{12}(u_1+u_2), \\ \beta_2( {\mathit{\boldsymbol{u}}}) = 1+\frac{1}{324}[36-(6-\frac{25u_1}{2})^2][36-(6-\frac{25u_2}{2})^2], \end{gather}$

the true coefficient functions contour plots of $\beta_1({\mathit{\boldsymbol{u}}})$ and $\beta_2({\mathit{\boldsymbol{u}}})$ are shown in . We randomly sample $n = 200$ and 400 points from the $25\times 25$ points in each of the 100 Monte Carlo experiments.

Figure 1. True coefficient functions contour plots of

$\beta_1$ (left) and

$\beta_2$ (right).

DownLoad: Full-Size Img PowerPoint

There are two bandwidths $h$ and $h_0$ in the estimate, we use the leave-one-out cross-validation method to select $h$ , and the choice of $h_0 = h/\log(n)$ follows Linton and Xiao ^[25]. The performance of the estimator $\hat{\beta}(\cdot)$ is evaluated by the square root of the average squared errors (RASE), which calculated as follows:

$\begin{equation*} RASE = \sqrt{\frac{1}{n}\sum\limits_{i = 1}^{n}\sum\limits_{p = 1}^{2}[\hat{\beta}_p( {\mathit{\boldsymbol{u}}}_i)-\beta_p( {\mathit{\boldsymbol{u}}}_i)]^2}. \end{equation*}$

The simulation results are summarized in Table 1. It can be clearly seen that when the error is non-normal, the proposed adaptive estimation is better than LL and GWR, and the improvement of estimation efficiency may also be considerable. When the error is fully normally distributed, our method is still comparable to the LL and GWR method.

Table 1. Comparison RASE and its standard error in brackets.

$\varepsilon$	$n=200$			$n=400$
$\varepsilon$	GWR	LL	Adaptive	GWR	LL	Adaptive
1	0.838(0.101)	0.787(0.099)	0.978(0.097)	0.677(0.051)	0.561(0.051)	0.790(0.048)
2	0.964(0.152)	0.857(0.123)	0.854(0.110)	0.737(0.090)	0.655(0.067)	0.538(0.028)
3	1.104(0.109)	1.009(0.107)	0.940(0.081)	0.796(0.061)	0.685(0.052)	0.632(0.031)
4	0.869(0.175)	0.837(0.127)	0.653(0.084)	0.692(0.158)	0.621(0.131)	0.405(0.060)

| Show Table

DownLoad: CSV

visualizes the estimated surfaces of $\beta_1(\cdot)$ and $\beta_2(\cdot)$ using adaptive estimation method, LL and GWR based on sample size $n = 400$ when the error distribution of $\varepsilon$ is the case 3, These results highlight that the adaptive estimation method can capture more accurate spatial pattern than the LL and GWR method.

Figure 2. Estimated surface via adaptive method, LL and GWR based on sample size

$n = 400$ when the error distribution in the case 3.

DownLoad: Full-Size Img PowerPoint

4. Real data analysis

Example 1. (Dublin Voter Turnout Data) This section applies the proposed methodology to Dublin Voter data. This dataset includes the proportion of the voting population in 322 areas, as well as several variables that may explain the change in the proportion of the voting population. Specifically, we will explore how the unemployment rate (Unempl), the proportion of ages 25 to 44 (Age 25_44) and no formal education (LowEduc) affect the proportion of the voting population in each region (GenEl2004). Figure 3 shows the spatial distribution of the dependent variable and the three independent variables.

Figure 3. Response and independent variables for voter turnout data in Dublin.

DownLoad: Full-Size Img PowerPoint

The dependent variable $GenEl2004$ and the independent variable $Unempl$ , $Age 25\_44$ , $LowEduc$ are $y, x_2, x_3, x_4$ , respectively, $x_1 = 1$ as intercept terms. We use the spatially varying coefficient models to fit the data as follows:

$\begin{equation*} y_i = \beta_1( {\mathit{\boldsymbol{u}}}_i)+\sum\limits_{k = 2}^{4}\beta_p( {\mathit{\boldsymbol{u}}}_i)x_{ik}+\varepsilon_i. \end{equation*}$

Figure 4 summarizes estimated coefficient functions using the adaptive method, LL, and GWR respectively, which are considerably in space. Figure 7(a) shows a residual QQ-plot of the Dublin voter turnout via the adaptive method. From the plot, we can see that the distribution of the residual is very close to normal.

Figure 4. Estimated coefficient functions for voter turnout data in Dublin using adaptive method (top), LL (middle) and GWR (bottom).

DownLoad: Full-Size Img PowerPoint

To evaluate the prediction accuracy of the adaptive method, we set aside 50 observations for comparing the mean squared prediction error (MSPE) of the adaptive method, LL, and GWR. The MSPE is computed as follows:

$\begin{equation*} \mbox{MSPE} = \frac{1}{m}\sum\limits_{j = 1}^m{(y_j-\hat{y}_j)^2, j = 1, \cdots, m}, \end{equation*}$

where $m = 50$ and $\hat{y}_j = \hat{\beta}_1({\mathit{\boldsymbol{u}}}_j)+\sum_{k = 2}^{4}\hat{\beta}_p({\mathit{\boldsymbol{u}}}_j)x_{jk}$ . The MSPE values by three methods are comparable, which are 0.018, 0.015 and 0.017, respectively. The QQ-plot of residuals from the Fingure 7(a) are close to the normal distribution, which explains why the MSPE of the adaptive method is very close to the MSPE of the GWR and the LL.

Example 2. (England and Wales House Price Data) England and Wales house price data is publicly available in the R package GWmodel. The dataset includes 10 variables, namely: house sale price $(PurPrice)$ , $BldIntWr$ , $BldPostW$ , $bld 60$ , $bld 70$ , $bld 80$ , $TypDetch$ , $TypSemiD$ , $TypFlat$ and floor area $(FlrArea)$ . With the exception of the floor area $(FlrArea)$ , all independent variables are indicative variables (1 or 0). is shown Spatial distribution of $PurPrice$ and $FlrArea$ .

Figure 5. Response and independent variables for house price data in England and Wales.

DownLoad: Full-Size Img PowerPoint

We take the house sale price ( $y$ ) as the dependent variable, FlrArea( $x_2$ ) as the independent variable, $x_1 = 1$ as the intercept term, and the spatially varying coefficient models of the fitted data is:

$\begin{equation*} y_i = \beta_1( {\mathit{\boldsymbol{u}}}_i)+\beta_2( {\mathit{\boldsymbol{u}}}_i)x_{2}+\varepsilon_i. \end{equation*}$

The estimated coefficient function is shown in Figure 6, and Figure 7(b) shows a residual QQ plot via adaptive method for England and Wales house price data. Similar to the analysis in example 1, we set aside 50 observations as the test set. The MSPE of the adaptive approach, LL and GWR are 0.302, 0.339 and 0.548, respectively. The QQ-plot of residuals from the above fit showed a clear deviation from normality, which explains why the MSPE from the adaptive approach is smaller than LL and GWR.

Figure 6. Estimated coefficient functions for house price data in England and Wales using adaptive method (top), LL (middle) and GWR (bottom).

DownLoad: Full-Size Img PowerPoint

Figure 7. Residual QQ-plot for two data examples: (a) Dublin voter turnout data; (b) England and Wales housing data.

DownLoad: Full-Size Img PowerPoint

5. Concluding remarks

In this article, we proposed an adaptive estimation for spatially varying coefficient models. The new estimation procedure can adapt to different errors and improve estimation efficiency than the LL and the GWR method. Simulation studies and two real data applications confirmed our theoretical findings.

The proposed method in this article can be easily extended to semiparametric varying-coefficient partially linear models, where some coefficients in the model are assumed to be constant and the remaining coefficients are allowed to spatially vary across the studied region. Another interesting future work is the spatiotemporal extension to analyze data collected across time and space.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 11871173), the National Statistical Science Research Project (Grant No. 2020LZ09).

Conflict of interest

The authors declare that there is no conflict of interest.

A. Appendix

This section will give proofs of propositions 2.1, theorem 2.1 and theorem 2.2, with the required regular conditions as follows:

$1)$ $K(\cdot)$ is bounded, symmetric, and has bounded support and bounded derivatives;

$2)$ $\{ {\bf{x}}_i\}_{i = 1}^{n}$ , $\{ {\mathit{\boldsymbol{u}}}_i\}_{i = 1}^{n}$ , $\{\varepsilon_i\}_{i = 1}^{n}$ are independent and identically distributed and $\{\varepsilon_i\}_{i = 1}^{n}$ is independent of $\{ {\bf{x}}_i\}_{i = 1}^{n}$ and $\{ {\mathit{\boldsymbol{u}}}_i\}_{i = 1}^{n}$ . In addition, the independent variable ${\bf{x}}$ has bounded support;

$3)$ The probability density function $f(\varepsilon)$ of $\varepsilon$ has fourth-order bounded continuous derivative. Assume $E[\rho'(\varepsilon)] = 0$ , $E[\rho''(\varepsilon)] < \infty$ , $E[\rho'(\varepsilon)^2] < \infty$ and $\rho'''(\cdot)$ is bounded;

$4)$ The marginal density $q({\mathit{\boldsymbol{u}}})$ of ${\mathit{\boldsymbol{u}}}$ has a continuous second derivative in some neighborhood of ${\mathit{\boldsymbol{u}}}_0$ and $q({\mathit{\boldsymbol{u}}}_0)\neq0$ ;

$5)$ $h\rightarrow0$ , when $n\rightarrow \infty$ , $nh\rightarrow \infty$ , $h_0 = h/\log(n)$ ;

$6)$ $\beta_k(\cdot), k = 1, \cdots, p$ has bounded and continuous third derivative.

Proof of propositions 2.1: Note that

$\begin{equation*} \begin{split} &\quad Q( {\mathit{\boldsymbol{\theta}}}^{(l+1)})-Q( {\mathit{\boldsymbol{\theta}}}^{(l)})\\ & = \sum\limits_{i = 1}^n K_h(\vert| {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0\vert|) \log \left\{\frac {\sum\nolimits_{j\neq i}K_{h_0}\left[y_i-\sum\nolimits_{k = 1}^p \left\{b_k^{(l+1)}+\mathit{\boldsymbol{c}}_k^{(l+1)}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\right\}x_{ik}-\tilde{\varepsilon}_j\right]} {\sum\nolimits_{j\neq i}K_{h_0}\left[y_i-\sum\nolimits_{k = 1}^p \left\{b_k^{(l)}+\mathit{\boldsymbol{c}}_k^{(l)}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\right\}x_{ik}-\tilde{\varepsilon}_j\right]} \right\}\\ & = \sum\limits_{i = 1}^n K_h(\vert| {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0\vert|) \log \sum\limits_{j\neq i}\left( \frac{K_{h_0}\left[y_i-\sum\nolimits_{k = 1}^p \left\{b_k^{(l)}+\mathit{\boldsymbol{c}}_k^{(l)}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\right\}x_{ik}-\tilde{\varepsilon}_j\right]} {\sum\nolimits_{j\neq i}K_{h_0}\left[y_i-\sum\nolimits_{k = 1}^p \left\{b_k^{(l)}+\mathit{\boldsymbol{c}}_k^{(l)}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\right\}x_{ik}-\tilde{\varepsilon}_j\right]} \right)\\ &\quad \times \left( \frac{K_{h_0}\left[y_i-\sum\nolimits_{k = 1}^p \left\{b_k^{(l+1)}+\mathit{\boldsymbol{c}}_k^{(l+1)}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\right\}x_{ik}-\tilde{\varepsilon}_j\right]} {K_{h_0}\left[y_i-\sum\nolimits_{k = 1}^p \left\{b_k^{(l)}+\mathit{\boldsymbol{c}}_k^{(l)}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\right\}x_{ik}-\tilde{\varepsilon}_j\right]} \right)\\ & = \sum\limits_{i = 1}^n K_h(\vert| {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0\vert|) \log \left\{ \sum\limits_{j\neq i} p_{ij}^{(l+1)}\frac{K_{h_0}\left[y_i-\sum\nolimits_{k = 1}^p \left\{b_k^{(l+1)}+\mathit{\boldsymbol{c}}_k^{(l+1)}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\right\}x_{ik}-\tilde{\varepsilon}_j\right]} {K_{h_0}\left[y_i-\sum\nolimits_{k = 1}^p \left\{b_k^{(l)}+\mathit{\boldsymbol{c}}_k^{(l)}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\right\}x_{ik}-\tilde{\varepsilon}_j\right]} \right\}, \end{split} \end{equation*}$

where

$\begin{equation*} p_{ij}^{(l+1)} = \frac{K_{h_0}\left[y_i-\sum\nolimits_{k = 1}^p \left\{b_k^{(l)}+\mathit{\boldsymbol{c}}_k^{(l)}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\right\}x_{ik}-\tilde{\varepsilon}_j\right]} {\sum\nolimits_{j\neq i}K_{h_0}\left[y_i-\sum\nolimits_{k = 1}^p \left\{b_k^{(l)}+\mathit{\boldsymbol{c}}_k^{(l)}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\right\}x_{ik}-\tilde{\varepsilon}_j\right]}. \end{equation*}$

From Jensen's inequality,

$\begin{equation*} \begin{split} &\quad Q( {\mathit{\boldsymbol{\theta}}}^{(k+1)})-Q( {\mathit{\boldsymbol{\theta}}}^{(k)}) \\ &\geq \sum\limits_{i = 1}^n K_h(\vert| {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0\vert|) \sum\limits_{j\neq i} p_{ij}^{(l+1)}\log \left\{ \frac{K_{h_0}\left[y_i-\sum\nolimits_{k = 1}^p \left\{b_k^{(l+1)}+\mathit{\boldsymbol{c}}_k^{(l+1)}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\right\}x_{ik}-\tilde{\varepsilon}_j\right]} {K_{h_0}\left[y_i-\sum\nolimits_{k = 1}^p \left\{b_k^{(l)}+\mathit{\boldsymbol{c}}_k^{(l)}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\right\}x_{ik}-\tilde{\varepsilon}_j\right]} \right\}. \end{split} \end{equation*}$

Based on the properties of step M of formula (2.9), it is proved that $Q ({\mathit{\boldsymbol{\theta}}} ^ {(k+1)}) - Q ({\mathit{\boldsymbol{\theta}}} ^ {(k)}) \geq 0$ .

Proof of theorem 2.1: According to the the result of Linton and Xiao ^[25], the asymptotic behaviour of $\hat{ {\mathit{\boldsymbol{\theta}}}}$ in (7) is the same as that obtained from (4). Therefore, we prove the asymptotic properties of $\hat{ {\mathit{\boldsymbol{\theta}}}}$ based on (2.3).

Denote ${\mathit{\boldsymbol{\theta}}}^* = {\mathit{\boldsymbol{H}}} {\mathit{\boldsymbol{\theta}}}$ , ${\mathit{\boldsymbol{x}}}_i^* = \left(x_{i1}, \cdots, x_{ip}, (\frac{ {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0}{h})^\top x_{i1}, \cdots, (\frac{ {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0}{h})^\top x_{ip}\right)^\top$ , $K_i = K_h(\vert| {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0\vert|)$ , $R({\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i) = \sum_{k = 1}^p\beta_k({\mathit{\boldsymbol{u}}}_i)x_{ik}-\sum_{k = 1}^p \{b_k+\mathit{\boldsymbol{c}}_k ({\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\}x_{ik}$ , and $a_n = (nh^2)^{-1/2}+h^2$ . Let $\rho(\cdot) = \log f(\cdot)$ , objective function (2.3) is written as

$\begin{equation*} L( {\mathit{\boldsymbol{\theta}}}) = \frac{1}{n}\sum\limits_{i = 1}^n K_i\rho(y_i- {\mathit{\boldsymbol{\theta}}}^{*\top}{\mathit{\boldsymbol{x}}}_i^*) \overset{\triangle}{ = } L( {\mathit{\boldsymbol{\theta}}}^*). \end{equation*}$

Based on the definition of ${\mathit{\boldsymbol{\theta}}}^*$ , it is sufficient to show that for any given $\eta > 0$ , there exists a large constant $c$ such that

$\begin{equation*} P\{\mathop{\sup}\limits_{\|\mu\| = 1}L( {\mathit{\boldsymbol{\theta}}}^*+a_n\mu) < L( {\mathit{\boldsymbol{\theta}}}^*)\} \geq 1-\eta \end{equation*}$

where $\mu$ has the same dimension as ${\mathit{\boldsymbol{\theta}}}$ , $a_n$ is the convergence rate. By using Taylor expansion, it follows that

$\begin{equation*} \begin{split} L( {\mathit{\boldsymbol{\theta}}}^*+a_n\mu)-L( {\mathit{\boldsymbol{\theta}}}^*)& = \frac{1}{n}\sum\limits_{i = 1}^n K_i\{\rho(\varepsilon_i+R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i)-a_n\mu^\top {\mathit{\boldsymbol{x}}}_i^*)-\rho(\varepsilon_i+R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i))\}\\ & = -\frac{1}{n}\sum\limits_{i = 1}^n K_i\rho'(\varepsilon_i+R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i))a_n\mu^\top {\mathit{\boldsymbol{x}}}_i^* + \frac{1}{2n}\sum\limits_{i = 1}^n K_i\rho''(\varepsilon_i+R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i))a_n^2(\mu^\top {\mathit{\boldsymbol{x}}}_i^*)^2\\ &\quad-\frac{1}{6n}\sum\limits_{i = 1}^n K_i\rho'''(z_i)a_n^3(\mu^\top {\mathit{\boldsymbol{x}}}_i^*)^3\\ &\overset{\triangle}{ = }I_1+I_2+I_3, \end{split} \end{equation*}$

where $z_i$ is a value between $\varepsilon_i+R({\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i)-a_n\mu^\top {\mathit{\boldsymbol{x}}}_i^*$ and $\varepsilon_i+R({\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i)$ .

For $I_1 = -\frac{1}{n}\sum_{i = 1}^n K_i\rho'(\varepsilon_i+R({\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i))a_n\mu^\top {\mathit{\boldsymbol{x}}}_i^*$ , Let $\delta_1 = E[\rho''(\varepsilon_i)]$ . Since $R({\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i) = \sum_{k = 0}^p\beta_k({\mathit{\boldsymbol{u}}}_i)x_{ik}-\sum_{k = 0}^p \{b_k+\mathit{\boldsymbol{c}}_k ({\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\}x_{ik} = O_p(h^2) = o_p(1)$ and $E[\rho'(\varepsilon)] = 0$ , so

$\begin{equation*} \begin{split} E(I_1)& = -E(K_i\rho'(\varepsilon_i+R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i))a_n\mu^\top {\mathit{\boldsymbol{x}}}_i^*)\\ &\approx -a_n E\{K_i\rho''(\varepsilon_i)R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i)\mu^\top {\mathit{\boldsymbol{x}}}_i^*\}\\ & = -a_n E[\rho''(\varepsilon_i)]E[K_iR( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i)\mu^\top {\mathit{\boldsymbol{x}}}_i^*]\\ & = -a_n \delta_1 E[K_iR( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i)\mu^\top {\mathit{\boldsymbol{x}}}_i^*]\\ & = -a_n \delta_1 E\{E[R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i)\mu^\top {\mathit{\boldsymbol{x}}}_i^*\vert {\mathit{\boldsymbol{u}}}_i]K_i\} \end{split} \end{equation*}$

By using $\mu^\top {\mathit{\boldsymbol{x}}}_i^* \leq \|\mu\| \cdot \|{\mathit{\boldsymbol{x}}}_i^*\|$ , we have $E(I_1) = O(a_nh^2)$ .

$\begin{equation*} \begin{split} var(I_1)& = \frac{1}{n}var\{K_i\rho'(\varepsilon_i+R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i))a_n\mu^\top {\mathit{\boldsymbol{x}}}_i^*\}\\ & = \frac{1}{n}\{E(A^2)-[E(A)]^2\} \end{split} \end{equation*}$

where $A = K_i\rho'(\varepsilon_i+R({\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i))a_n\mu^\top {\mathit{\boldsymbol{x}}}_i^*$ . Let $\delta_2 = E[\rho'(\varepsilon_i)^2]$ , then

$\begin{equation*} \begin{split} E(A^2) & = E\{K_i^2\rho'(\varepsilon_i+R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i))^2a_n^2(\mu^\top {\mathit{\boldsymbol{x}}}_i^*)^2\}\\ &\approx a_n^2 E\{K_i^2\rho'(\varepsilon_i)^2 (\mu^\top {\mathit{\boldsymbol{x}}}_i^*)^2\}\\ & = a_n^2 \delta_2 E\{E[\mu^\top {\mathit{\boldsymbol{x}}}_i^* {\mathit{\boldsymbol{x}}}_i^{*\top} \mu \vert {\mathit{\boldsymbol{u}}}_i]K_i^2\}\\ & = a_n^2 \delta_2 \mu^\top E\{E[{\mathit{\boldsymbol{x}}}_i^* {\mathit{\boldsymbol{x}}}_i^{*\top} \vert {\mathit{\boldsymbol{u}}}_i]K_i^2\}\mu \end{split} \end{equation*}$

Note that ${\mathit{\boldsymbol{x}}}_i^* {\mathit{\boldsymbol{x}}}_i^{*\top} = \left(x_{ij}x_{ik}(\frac{ {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0}{h})^l((\frac{ {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0}{h})^{l'})^\top\right)_{1 \leq j, k\leq p; l, l' = 0, 1}$ , $\Gamma_{jk}({\mathit{\boldsymbol{u}}}_i) = E(x_{ij}x_{ik}\vert {\mathit{\boldsymbol{u}}}_i)$ , $\int_{R^2} {\mathit{\boldsymbol{u}}} K(\| {\mathit{\boldsymbol{u}}}\|)d {\mathit{\boldsymbol{u}}} = {\bf{0}}_{2\times1}$ , $\int_{R^2} {\mathit{\boldsymbol{u}}} {\mathit{\boldsymbol{u}}}' {\mathit{\boldsymbol{u}}} K(\| {\mathit{\boldsymbol{u}}}\|)d {\mathit{\boldsymbol{u}}} = {\bf{0}}_{2\times1}$ , $\int_{R^2}u_1u_2K(\| {\mathit{\boldsymbol{u}}}\|)d {\mathit{\boldsymbol{u}}} = {\bf{0}}_{2\times1}$ , for $1 \leq j, k\leq p$ , then

$\begin{equation} \begin{split} &\quad E\left[E(x_{ij}x_{ik}\vert {\mathit{\boldsymbol{u}}}_i)\left(\frac{ {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0}{h}\right)^l\left(\left(\frac{ {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0}{h}\right)^{l'}\right)^\top K_i^2\right]\\ & = E\left[\Gamma_{jk}( {\mathit{\boldsymbol{u}}}_i)\left(\frac{ {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0}{h}\right)^l\left(\left(\frac{ {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0}{h}\right)^{l'}\right)^\top K_i^2\right]\\ & = \frac{1}{h^4}\int \Gamma_{jk}( {\mathit{\boldsymbol{u}}}_i) \left(\frac{ {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0}{h}\right)^l\left(\left(\frac{ {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0}{h}\right)^{l'}\right)^\top K^2\left(\frac{\| {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0\|}{h}\right) q( {\mathit{\boldsymbol{u}}}_i)d {\mathit{\boldsymbol{u}}}_i\\ & = \frac{1}{h^2}q( {\mathit{\boldsymbol{u}}}_0)\Gamma_{jk}( {\mathit{\boldsymbol{u}}}_0) \int {\mathit{\boldsymbol{t}}}^l ({\mathit{\boldsymbol{t}}}^{l'})^\top K^2(\|{\mathit{\boldsymbol{t}}}\|)d{\mathit{\boldsymbol{t}}} \end{split} \end{equation}$

(A.1)

The second equation follows the Taylor expansion, and the assumption $\varepsilon$ is independent of ${\mathit{\boldsymbol{u}}}$ and ${\mathit{\boldsymbol{x}}}$ . Then, $E\{E[{\mathit{\boldsymbol{x}}}_i^* {\mathit{\boldsymbol{x}}}_i^{*\top} \vert {\mathit{\boldsymbol{u}}}_i]K_i^2\} = \frac{1}{h^2}q({\mathit{\boldsymbol{u}}}_0){\bf{\Lambda}}$ , where ${\bf{\Lambda}} = diag(1, \nu_2, \nu_2)\otimes {\bf{\Gamma}}({\mathit{\boldsymbol{u}}}_0)$ is a $3p \times 3p$ matrix. Thus,

$\begin{equation*} E(A^2) = a_n^2 \delta_2 \frac{1}{h}q( {\mathit{\boldsymbol{u}}}_0) \mu^\top {\bf{\Lambda}} \mu = O(a_n^2\frac{1}{h^2}). \end{equation*}$

Note that $[E(A)]^2 = [E(I_1)]^2 = [O(a_nh^2]^2 \ll E(A^2)$ , then $var(I_1)\approx\frac{1}{n}[E(A)]^2 = O(a_n^2\frac{1}{nh^2})$ . Hence,

$\begin{equation*} I_1 = E(I_1)+O_p(\sqrt{var(I_1)}) = O_p(a_n h^2)+O_p(\sqrt{a_n^2\frac{1}{nh^2}}) = O_p(a_n^2). \end{equation*}$

Similarly,

$\begin{equation*} I_2 = \frac{1}{2n}\sum\limits_{i = 1}^n K_i\rho''(\varepsilon_i+R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i))a_n^2(\mu^\top {\mathit{\boldsymbol{x}}}_i^*)^2 = O_p(a_n^2), \end{equation*}$

and

$\begin{equation*} I_3 = \frac{1}{6n}\sum\limits_{i = 1}^n K_i\rho'''(z_i)a_n^3(\mu^\top {\mathit{\boldsymbol{x}}}_i^*)^3 = O_p(a_n^3). \end{equation*}$

Assume $\delta_1 > 0$ , we can choose $c$ large enough such that $I_1+I_2+I_3 < 0$ with probability at least $1-\eta$ . Thus $P\left\{\mathop{\sup}\limits_{\|\mu\| = c}L({\mathit{\boldsymbol{\theta}}}^*+a_n\mu) < L({\mathit{\boldsymbol{\theta}}}^*)\right\} \geq 1-\eta$ .

Proof of theorem 2.2: Since $\hat {\mathit{\boldsymbol{\theta}}}^*$ maximizes $L({\mathit{\boldsymbol{\theta}}}^*)$ , then $L'(\hat {\mathit{\boldsymbol{\theta}}}^*) = 0$ . By Taylor expansion,

$\begin{equation*} 0 = L'(\hat {\mathit{\boldsymbol{\theta}}}^*) = L'( {\mathit{\boldsymbol{\theta}}}^*) +L''( {\mathit{\boldsymbol{\theta}}}^*)(\hat {\mathit{\boldsymbol{\theta}}}^* - {\mathit{\boldsymbol{\theta}}}^*) + o_p(1), \end{equation*}$

thus

$\begin{equation*} \hat {\mathit{\boldsymbol{\theta}}}^* - {\mathit{\boldsymbol{\theta}}}^* = -[L''( {\mathit{\boldsymbol{\theta}}}^*)]^{-1}L'( {\mathit{\boldsymbol{\theta}}}^*)(1+o_p(1)). \end{equation*}$

For $L''({\mathit{\boldsymbol{\theta}}}^*)$ , since $L({\mathit{\boldsymbol{\theta}}}^*) = \frac{1}{n}\sum_{i = 1}^n K_i\rho(y_i- {\mathit{\boldsymbol{\theta}}}^{*\top}{\mathit{\boldsymbol{x}}}_i^*)$ , and $y_i- {\mathit{\boldsymbol{\theta}}}^{*\top}{\mathit{\boldsymbol{x}}}_i^* = \varepsilon_i+R({\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i$ ), then $L''({\mathit{\boldsymbol{\theta}}}^*) = \frac{1}{n}\sum_{i = 1}^n K_i\rho''(\varepsilon_i+R({\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i)){\mathit{\boldsymbol{x}}}_i^* {\mathit{\boldsymbol{x}}}_i^{*\top}$ , and the expectation is

$\begin{equation*} \begin{split} E[L''( {\mathit{\boldsymbol{\theta}}}^*)]& = E\{K_i\rho''(\varepsilon_i+R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i)){\mathit{\boldsymbol{x}}}_i^* {\mathit{\boldsymbol{x}}}_i^{*\top}\}\\ &\approx E\{K_i\rho''(\varepsilon_i){\mathit{\boldsymbol{x}}}_i^* {\mathit{\boldsymbol{x}}}_i^{*\top}\}\\ & = \delta_1 E\{E\{{\mathit{\boldsymbol{x}}}_i^* {\mathit{\boldsymbol{x}}}_i^{*\top}\vert {\mathit{\boldsymbol{u}}}_i\}K_i\}\\ & = \delta_1 q( {\mathit{\boldsymbol{u}}}_0){\mathit{\boldsymbol{Q}}}(1+o(1)), \end{split} \end{equation*}$

where ${\mathit{\boldsymbol{S}}} = diag(\gamma_{01}, \gamma_{21}, \gamma_{21})\otimes {\bf{\Gamma}}({\mathit{\boldsymbol{u}}}_0)$ , the last equation follow (A.1). In this study, we consider the element-wise variance of a matrix, then

$\begin{equation*} \begin{split} var[L''( {\mathit{\boldsymbol{\theta}}}^*)]& = \frac{1}{n}var\{K_i\rho''(\varepsilon_i+R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i)){\mathit{\boldsymbol{x}}}_i^* {\mathit{\boldsymbol{x}}}_i^{*\top}\}\\ & = O_p(\frac{1}{nh^2}). \end{split} \end{equation*}$

Based on the result $L''({\mathit{\boldsymbol{\theta}}}^*) = E[L''({\mathit{\boldsymbol{\theta}}}^*)] + O_p{\sqrt{var[L''({\mathit{\boldsymbol{\theta}}}^*)]}}$ and the assumption $nh \rightarrow \infty$ , it follows that $L''({\mathit{\boldsymbol{\theta}}}^*) = \delta_1 q({\mathit{\boldsymbol{u}}}_0){\mathit{\boldsymbol{S}}} (1+o_p(1))$ .

For $L'({\mathit{\boldsymbol{\theta}}}^*)$ ,

$\begin{equation*} \begin{split} L'( {\mathit{\boldsymbol{\theta}}}^*) & = -\frac{1}{n}\sum\limits_{i = 1}^n K_i\rho'(\varepsilon_i+R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i)){\mathit{\boldsymbol{x}}}_i^* \\ & = -\frac{1}{n}\sum\limits_{i = 1}^n K_i\rho'(\varepsilon_i){\mathit{\boldsymbol{x}}}_i^* - \frac{1}{n}\sum\limits_{i = 1}^n K_i\rho''(\varepsilon_i)R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i)){\mathit{\boldsymbol{x}}}_i^*\\ & \overset{\triangle}{ = } -{\mathit{\boldsymbol{w}}}_m - \mathit{\boldsymbol{\nu}}_n \end{split} \end{equation*}$

The asymptotic result is determined by ${\mathit{\boldsymbol{w}}}_m$ . Next, calculating the order of $\mathit{\boldsymbol{\nu}}_n$ .

$\begin{equation*} E(\mathit{\boldsymbol{\nu}}_n) = E[K_i\rho''(\varepsilon_i)R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i))x_i^*] = \delta_1 E\{E\{ R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i) x_i^*\vert {\mathit{\boldsymbol{u}}}_i\}K_i\} \end{equation*}$

For $R({\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i) x_i^*$ , since $\beta'''_j(\cdot)$ is bounded, then we have

$\begin{equation*} \begin{split} R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i)& = \sum\limits_{k = 1}^p\beta_k( {\mathit{\boldsymbol{u}}}_i)x_{ik}-\sum\limits_{k = 1}^p \{b_k+\mathit{\boldsymbol{c}}_k ( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\}x_{ik}\\ & = \sum\limits_{k = 1}^p\frac{1}{2}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)^\top\mathcal{H}_{\beta_k}( {\mathit{\boldsymbol{u}}}_0)( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)x_{ik} (1+ o_p(1))\\ \end{split} \end{equation*}$

where $\mathcal{H}_{\beta_k}$ is Hessian matrix. By ${\mathit{\boldsymbol{x}}}_i^* = (x_{i1}, \cdots, x_{ip}, (\frac{ {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0}{h})^\top x_{i1}, \cdots, (\frac{ {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0}{h})^\top x_{ip})^\top$ ,

$\begin{equation*} \begin{split} R( {\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i) {\mathit{\boldsymbol{x}}}_i^*& = \left\{\left(\sum\limits_{k = 1}^p\frac{1}{2}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)^\top\mathcal{H}_{\beta_k}( {\mathit{\boldsymbol{u}}}_0)( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)x_{ik}x_{ij}\right)_{1\leq j \leq p}, \right.\\ & \quad \left. \left(\sum\limits_{k = 1}^p\frac{1}{2h}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)^\top\mathcal{H}_{\beta_k}( {\mathit{\boldsymbol{u}}}_0)( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)^\top x_{ik}x_{ij}\right)_{1\leq j \leq p}\right\}_{3p\times 1}^\top. \end{split} \end{equation*}$

The expectation about the first item is

$\begin{equation*} \begin{split} & \quad E\left\{E\left[\sum\limits_{k = 1}^p\frac{1}{2}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)^\top\mathcal{H}_{\beta_k}( {\mathit{\boldsymbol{u}}}_0)( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)x_{ik}x_{ij} \vert {\mathit{\boldsymbol{u}}}_i\right]K_i\right\}\\ & = E\left\{\sum\limits_{k = 1}^p\frac{1}{2}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)^\top\mathcal{H}_{\beta_k}( {\mathit{\boldsymbol{u}}}_0)( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)\Gamma_{kj}( {\mathit{\boldsymbol{u}}}_i)K_i\right\}\\ & = \frac{h^2}{2} q( {\mathit{\boldsymbol{u}}}_0) \sum\limits_{k = 1}^p \Gamma_{kj}( {\mathit{\boldsymbol{u}}}_0) \int {\mathit{\boldsymbol{t}}}^\top \mathcal{H}_{\beta_k} {\mathit{\boldsymbol{t}}} K(\|{\mathit{\boldsymbol{t}}}\|) d{\mathit{\boldsymbol{t}}}\\ & = \frac{h^2}{2} q( {\mathit{\boldsymbol{u}}}_0) \sum\limits_{k = 1}^p tr({\mathcal{H}_{\beta_k}}) \Gamma_{kj}( {\mathit{\boldsymbol{u}}}_0) \gamma_{21}, \end{split} \end{equation*}$

and the expectation on second item is

$\begin{equation*} \begin{split} & \quad E\left\{E\left[\sum\limits_{k = 1}^p\frac{1}{2h}( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)^\top\mathcal{H}_{\beta_k}( {\mathit{\boldsymbol{u}}}_0)( {\mathit{\boldsymbol{u}}}_i- {\mathit{\boldsymbol{u}}}_0)^2 x_{ik}x_{ij} \vert {\mathit{\boldsymbol{u}}}_i\right]K_i\right\}\\ & = \frac{h^2}{2} q( {\mathit{\boldsymbol{u}}}_0) \sum\limits_{k = 1}^p \Gamma_{kj}( {\mathit{\boldsymbol{u}}}_0) \int {\mathit{\boldsymbol{t}}}^\top \mathcal{H}_{\beta_k} {\mathit{\boldsymbol{t}}} {\mathit{\boldsymbol{t}}}^\top K(\|{\mathit{\boldsymbol{t}}}\|) \mathrm{d}{\mathit{\boldsymbol{t}}}\\ & = {{\bf{0}}}_{2 \times 1}, \end{split} \end{equation*}$

then

$\begin{equation*} E(\mathit{\boldsymbol{\nu}}_n) = \delta_1 \frac{h^2}{2} q( {\mathit{\boldsymbol{u}}}_0) \sum\limits_{k = 1}^p tr({\mathcal{H}_{\beta_k}}) \mathit{\boldsymbol{\psi}}_j(1+o(1)) \end{equation*}$

where $\mathit{\boldsymbol{\psi}}_k = \binom{\gamma_{21}}{{{\bf{0}}}_{2 \times 1}}\otimes (\Gamma_{kj}({\mathit{\boldsymbol{u}}}_0))_{1\leq j \leq p}^\top$ is a $3p \times 1$ vector for $j = 1, \cdots, p$ . Since $var(\mathit{\boldsymbol{\nu}}_n) = \frac{1}{n} var\{K_i\rho''(\varepsilon_i)R({\mathit{\boldsymbol{u}}}_i, {\mathit{\boldsymbol{x}}}_i)){\mathit{\boldsymbol{x}}}_i^*\} = O(h^2/n)$ , then based on the result $\mathit{\boldsymbol{\nu}}_n = E(\mathit{\boldsymbol{\nu}}_n) + var(\sqrt{\mathit{\boldsymbol{\nu}}_n})$ and the assumption $nh \rightarrow \infty$ , it follows that

$\begin{equation*} \mathit{\boldsymbol{\nu}}_n = \delta_1 \frac{h^2}{2} q( {\mathit{\boldsymbol{u}}}_0) \sum\limits_{k = 1}^p tr({\mathcal{H}_{\beta_k}}) \mathit{\boldsymbol{\psi}}_j(1+o_p(1)). \end{equation*}$

Then

$\begin{equation*} \begin{split} \hat {\mathit{\boldsymbol{\theta}}}^* - {\mathit{\boldsymbol{\theta}}}^* & = -[L''( {\mathit{\boldsymbol{\theta}}}^*)]^{-1}L'( {\mathit{\boldsymbol{\theta}}}^*)(1+o_p(1))\\ & = \frac{{\mathit{\boldsymbol{S}}}^{-1} {\mathit{\boldsymbol{w}}}_n}{\delta q({\mathit{\boldsymbol{u}}}_0)}(1+o_p(1)) + {\mathit{\boldsymbol{S}}}^{-1}\frac{h^2}{2} \sum\limits_{k = 1}^p tr({\mathcal{H}_{\beta_k}}) \mathit{\boldsymbol{\psi}}_j(1+o_p(1)).\\ \end{split} \end{equation*}$

For ${\mathit{\boldsymbol{w}}}_n$ , based on the assumption $E[\rho'(\varepsilon_i)] = 0$ , we can easily get $E({\mathit{\boldsymbol{w}}}_n)$ = 0, and

$\begin{equation*} \begin{split} var({\mathit{\boldsymbol{w}}}_n) & = \frac{1}{n}var\{\frac{1}{n} K_i\rho'(\varepsilon_i)x_i^*\}\\ & = \frac{1}{n}E\{K_i^2\rho'(\varepsilon_i)^2 {\mathit{\boldsymbol{x}}}_i^* {\mathit{\boldsymbol{x}}}_i^{*\top}\}\\ & = \frac{1}{nh^2} \delta_2 q({\mathit{\boldsymbol{u}}}_0){\bf{\Lambda}}(1+o(1)). \end{split} \end{equation*}$

Based on Lyapunov Central Limit Theorem, we have the following result

$\begin{equation*} \sqrt{nh^2}\{\hat {\mathit{\boldsymbol{\theta}}}^* - {\mathit{\boldsymbol{\theta}}}^*- {\mathit{\boldsymbol{S}}}^{-1}\frac{h^2}{2} \sum\limits_{k = 1}^p tr({\mathcal{H}_{\beta_k}}) \mathit{\boldsymbol{\psi}}_k(1+o_p(1))\} \stackrel{D}{\rightarrow } N({{\bf{0}}}_{3p\times 1}, \delta_1^{-2}\delta_2 q({\mathit{\boldsymbol{u}}}_0)^{-1}S^{-1} {\bf{\Lambda}} S^{-1}). \end{equation*}$

By $\delta_ 1^{-1} = \delta_ 2$ , the theorem is proved.

References

[1]	C. Brunsdon, A. S. Fotheringham, M. E. Charlton, Geographically weighted regression: a method for exploring spatial nonstationarity, Geogr. Anal., 28 (1996), 281–298. https://doi.org/10.1111/j.1538-4632.1996.tb00936.x doi: 10.1111/j.1538-4632.1996.tb00936.x
[2]	C. Brunsdon, A. S. Fotheringham, M. E. Charlton, Geographically weighted regression, J. R. Stat. Soc. Ser. D-Stat., 47 (1998), 431–443. https://doi.org/10.1111/1467-9884.00145 doi: 10.1111/1467-9884.00145
[3]	S. L. Shen, C. L. Mei, Y. J. Zhang, Spatially varying coefficient models: testing for spatial heteroscedasticity and reweighting estimation of the coefficients, Environ. Plann. A, 43 (2011), 1723–1745. https://doi.org/10.1068/a43201 doi: 10.1068/a43201
[4]	S. L. Su, C. R. Lei, A. Y. Li, J. H. Pi, Z. L. Cai, Coverage inequality and quality of volunteered geographic features in chinese cities: analyzing the associated local characteristics using geographically weighted regression, Appl. Geogr., 78 (2017), 78–93. https://doi.org/10.1016/j.apgeog.2016.11.002 doi: 10.1016/j.apgeog.2016.11.002
[5]	D. Al-Sulami, Z. Y. Jiang, Z. D. Lu, J. Zhu, Estimation for semiparametric nonlinear regression of irregularly located spatial time-series data, Economet. Stat., 2 (2017), 22–35. https://doi.org/10.1016/j.ecosta.2017.01.002 doi: 10.1016/j.ecosta.2017.01.002
[6]	Z. D. Lu, D. J. Steinskog, D. Tjøstheim, Q. W. Yao, Adaptively varying-coefficient spatiotemporal models, J. R. Stat. Soc. Ser. B-Stat. Methodol., 71 (2009), 859–880. https://doi.org/10.1111/j.1467-9868.2009.00710.x doi: 10.1111/j.1467-9868.2009.00710.x
[7]	Y. P. Huang, M. Yuan, Y. P. Lu, Spatially varying relationships between surface urban heat islands and driving factors across cities in China, Environ. Plan. B-Urban, 46 (2019), 377–394. https://doi.org/10.1177/2399808317716935 doi: 10.1177/2399808317716935
[8]	J. Q. Fan, W. Y. Zhang, Statistical methods with varying coefficient models, Stat. Interface, 1 (2008), 179–195. https://doi.org/10.4310/SII.2008.v1.n1.a15 doi: 10.4310/SII.2008.v1.n1.a15
[9]	T. Hastie, R. Tibshirani, Varying-coefficient models, J. R. Stat. Soc. Ser. B-Stat. Methodol., 55 (1993), 757–779. https://doi.org/10.1111/j.2517-6161.1993.tb01939.x doi: 10.1111/j.2517-6161.1993.tb01939.x
[10]	A. E. Gelfand, S. Banerjee, D. Gamerman, Spatial process modelling for univariate and multivariate dynamic spatial data, Environmetrics, 16 (2005), 465–479. https://doi.org/10.1002/env.715 doi: 10.1002/env.715
[11]	R. M. Assuncao, Space varying coefficient models for small area data, Environmetrics, 14 (2003), 453–473. https://doi.org/10.1002/env.599 doi: 10.1002/env.599
[12]	H. Kim, J. Lee, Hierarchical spatially varying coefficient process model, Technometrics, 59 (2017), 521–527. https://doi.org/10.1080/00401706.2017.1317290 doi: 10.1080/00401706.2017.1317290
[13]	Z. T. Luo, H. Y. Sang, B. Mallick, A Bayesian contiguous partitioning method for learning clustered latent variables, J. Mach. Learn. Res., 22 (2021), 1748–1799.
[14]	J. R. Mu, G. N. Wang, L. Wang, Estimation and inference in spatially varying coefficient models, Environmetrics, 29 (2018), e2485. https://doi.org/10.1002/env.2485 doi: 10.1002/env.2485
[15]	Y. E. Shin, H. Y. Sang, D. W. Liu, T. A. Ferguson, P. X. K. Song, Autologistic network model on binary data for disease progression study, Biometrics, 75 (2019), 1310–1320. https://doi.org/10.1111/biom.13111 doi: 10.1111/biom.13111
[16]	W. Wang, Y. Sun, Penalized local polynomial regression for spatial data, Biometrics, 75 (2019), 1179–1190. https://doi.org/10.1111/biom.13077 doi: 10.1111/biom.13077
[17]	F. R. Li, H. Y. Sang, Spatial homogeneity pursuit of regression coefficients for large datasets, J. Am. Stat. Assoc., 114 (2019), 1050–1062. https://doi.org/10.1080/01621459.2018.1529595 doi: 10.1080/01621459.2018.1529595
[18]	Y. Zhong, H. Y. Sang, S. J. Cook, P. M. Kellstedt, Sparse spatially clustered coefficient model via adaptive regularization, Comput. Stat. Data Anal., 177 (2023), 107581. https://doi.org/10.1016/j.csda.2022.107581 doi: 10.1016/j.csda.2022.107581
[19]	C. Stein, Efficient nonparametric testing and estimation, University California Press, 1956.
[20]	Y. X. Chen, Q. Wang, W. X. Yao, Adaptive estimation for varying coefficient models, J. Multivar. Anal., 137 (2015), 17–31. https://doi.org/10.1016/j.jmva.2015.01.017 doi: 10.1016/j.jmva.2015.01.017
[21]	Z. Y. Zhou, J. Yu, Adaptive estimation for varying coefficient models with non stationary covariates, Commun. Stat. Theory M., 48 (2019), 4034–4050. https://doi.org/10.1080/03610926.2018.1484483 doi: 10.1080/03610926.2018.1484483
[22]	W. X. Yao, A note on EM algorithm for mixture models, Stat. Probabil. Lett., 83 (2013), 519–526. https://doi.org/10.1016/j.spl.2012.10.017 doi: 10.1016/j.spl.2012.10.017
[23]	L. Jia, S. Ray, B. G. Lindsay, A nonparametric statistical approach to clustering via mode identification, J. Mach. Learn. Res., 8 (2007), 1687–1723.
[24]	N. Wang, C. L. Mei, X. D. Yan, Local linear estimation of spatially varying coefficient models: an improvement on the geographically weighted regression technique, Environ. Plann. A, 40 (2008), 986–1005. https://doi.org/10.1068/a3941 doi: 10.1068/a3941
[25]	O. Linton, Z. J. Xiao, A nonparametric regression estimator that adapts to error distribution of unknown form, Economet. Theory, 23 (2007), 371–413. https://doi.org/10.1017/S026646660707017X doi: 10.1017/S026646660707017X

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.4

Metrics

Article views(1618) PDF downloads(104) Cited by(0)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(7) / Tables(1)

AIMS Mathematics

Adaptive estimation for spatially varying coefficient models

Related Papers:

Abstract

1. Introduction

2. Adaptive kernel estimation method

3. Simulation study

4. Real data analysis

5. Concluding remarks

Acknowledgments

Conflict of interest

A. Appendix

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Mathematics

Adaptive estimation for spatially varying coefficient models

Related Papers:

Abstract

1. Introduction

2. Adaptive kernel estimation method

3. Simulation study

4. Real data analysis

5. Concluding remarks

Acknowledgments

Conflict of interest

A. Appendix

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog