Processing math: 100%
Research article

Adaptive estimation for spatially varying coefficient models

  • Received: 04 February 2023 Revised: 13 March 2023 Accepted: 17 March 2023 Published: 13 April 2023
  • MSC : 62G05

  • In this paper, a new adaptive estimation approach is proposed for the spatially varying coefficient models with unknown error distribution, unlike geographically weighted regression (GWR) and local linear geographically weighted regression (LL), this method can adapt to different error distributions. A generalized Modal EM algorithm is presented to implement the estimation, and the asymptotic property of the estimator is established. Simulation and real data results show that the gain of the new adaptive method over the GWR and LL estimation is considerable for the error of non-Gaussian distributions.

    Citation: Heng Liu, Xia Cui. Adaptive estimation for spatially varying coefficient models[J]. AIMS Mathematics, 2023, 8(6): 13923-13942. doi: 10.3934/math.2023713

    Related Papers:

    [1] Yanting Xiao, Yifan Shi . Robust estimation for varying-coefficient partially nonlinear model with nonignorable missing response. AIMS Mathematics, 2023, 8(12): 29849-29871. doi: 10.3934/math.20231526
    [2] Peng Lai, Wenxin Tian, Yanqiu Zhou . Semi-supervised estimation for the varying coefficient regression model. AIMS Mathematics, 2024, 9(1): 55-72. doi: 10.3934/math.2024004
    [3] Yanting Xiao, Wanying Dong . Robust estimation for varying-coefficient partially linear measurement error model with auxiliary instrumental variables. AIMS Mathematics, 2023, 8(8): 18373-18391. doi: 10.3934/math.2023934
    [4] Yanping Liu, Juliang Yin . B-spline estimation in varying coefficient models with correlated errors. AIMS Mathematics, 2022, 7(3): 3509-3523. doi: 10.3934/math.2022195
    [5] Juxia Xiao, Ping Yu, Zhongzhan Zhang . Weighted composite asymmetric Huber estimation for partial functional linear models. AIMS Mathematics, 2022, 7(5): 7657-7684. doi: 10.3934/math.2022430
    [6] Jieqiong Lu, Peixin Zhao, Xiaoshuang Zhou . Orthogonality based modal empirical likelihood inferences for partially nonlinear models. AIMS Mathematics, 2024, 9(7): 18117-18133. doi: 10.3934/math.2024884
    [7] Ruiyuan Chang, Xiuli Wang, Mingqiu Wang . Corrected optimal subsampling for a class of generalized linear measurement error models. AIMS Mathematics, 2025, 10(2): 4412-4440. doi: 10.3934/math.2025203
    [8] Monthira Duangsaphon, Sukit Sokampang, Kannat Na Bangchang . Bayesian estimation for median discrete Weibull regression model. AIMS Mathematics, 2024, 9(1): 270-288. doi: 10.3934/math.2024016
    [9] Zouaoui Chikr Elmezouar, Fatimah Alshahrani, Ibrahim M. Almanjahie, Salim Bouzebda, Zoulikha Kaid, Ali Laksaci . Strong consistency rate in functional single index expectile model for spatial data. AIMS Mathematics, 2024, 9(3): 5550-5581. doi: 10.3934/math.2024269
    [10] Kunting Yu, Yongming Li . Adaptive fuzzy control for nonlinear systems with sampled data and time-varying input delay. AIMS Mathematics, 2020, 5(3): 2307-2325. doi: 10.3934/math.2020153
  • In this paper, a new adaptive estimation approach is proposed for the spatially varying coefficient models with unknown error distribution, unlike geographically weighted regression (GWR) and local linear geographically weighted regression (LL), this method can adapt to different error distributions. A generalized Modal EM algorithm is presented to implement the estimation, and the asymptotic property of the estimator is established. Simulation and real data results show that the gain of the new adaptive method over the GWR and LL estimation is considerable for the error of non-Gaussian distributions.



    In spatial data analysis, a common problem is determining the nature of the relationship between variables. In many cases, a simple global model often fails to explain the relationships between certain sets of variables, as the relationships between them may change with the change of position, which is known as spatial heterogeneity. In order to deal with this heterogeneity, the model needs to reflect the structure of spatial variation in the data. Suppose that the spatial data of n positions are randomly selected in the spatial region DR2, let ui=(ui1,ui2)D is the position of the point i,i=1,,n, yi is the response variable, xi=(xi1,xi2,,xip) is the explanatory variable and xi11, allowing a varying intercept in the model. {yi,xi,ui} satisfy the following spatially varying coefficient models (SVCM) [1,2,3]:

    yi=xiβ(ui)+εi=pk=1xikβk(ui)+εi,i=1,2,,n, (1.1)

    where β(ui)=(β1(ui),β2(ui),,βp(ui)) is a vector of p-dimensional unknown space-varying functional coefficients defined on D, εi is an independent and and identically distributed random noise, with E(εi)=0,var(εi)=σ2, and are independent of xi. Over the past few decades, SVCM has been widely used in geography [4], econometrics [5], meteorology [6], and environmental science [7]. When βk() is a univariate function, the model (1.1) is a varying coefficient model and has been extensively studied [8,9]. In this study, βk() is a bivariate function of the location-specific, and our main goal is to estimate β=(β1,β2,,βp) and explore the spatial heterogeneity of regression relations based on the given observations {(yi,xi,ui)}ni=1.

    In the rich literature on how to estimate the regression coefficients of SVCM, the Bayesian approach and the smoothing approach are two competing methods. Firstly, the Bayesian approach is an important spatial modeling method that assumes that the regression coefficients obey a certain prior distribution and calculates their posterior distribution for estimation and inference. For example, Gelfand et al. [10] developed a Bayesian hierarchical framework of spatial point reference data by formulating a Gaussian process for spatially varying coefficients, and Assuncao [11] introduced the Bayesian space-varying coefficient model (BVCM) for areal data. Recently, Kim and Lee [12] extended BVCM to handle mixed data with point reference data and areal data. Luo et al. [13] built the Bayesian spatially clustered coefficient (BSCC) model from the spanning trees of a graph. However, Bayesian methods require careful selection of prior distributions and face the high computational cost issue. Secondly, the smoothing method is a traditional framework for regression, divided into kernel smoothing and smoothing splines. For example, Fotheringham et al. [1] adopted a locally weighted least squares method of constructing weights by spatial kernel functions, namely geographically weighted regression (GWR), which is essentially a local constant kernel smoother. Mu et al. [14] used binary splines over triangulation to estimate regression coefficients, which solves the problem of inappropriate smoothness of complex regional boundary features and processes large data sets quickly and effectively enough. Yet, the kernel-based method needs to solve an optimization problem at each sample position which is computationally intensive, and smoothing splines method inference for spatially varying coefficients relies on a bootstrap method.

    Currently, there are also numerous studies on variable selection in SVCM. Shin et al. [15] proposed penalized quasi-likelihood methods with spatial dependence. Wang and Sun [16] represented the space-varying coefficients as a combination of local polynomials at anchor points and applying the least squares with an additive form of lasso and fused-lasso penalties. Li and Sang [17] proposed a spanning tree graph fused lasso-based spatially clustered coefficient regression (SCC) model with the assumption of spatial clusters, and the regularization term of the SCC model is generalized by a chain graph guided fusion penalty plus a group lasso penalty [18]. However, each of these methods estimated space-varying coefficients by the least squares criterion, corresponding to the likelihood function when the error term is normally distributed. In practice, the error density was unknown, so it is not appropriate to use the least squares method, which will lose some efficiency, but the adaptive estimation method provides an alternative way.

    The adaptive estimation method was first studied to consider the problem of estimating and inferring an infinite dimensional parameter [19]. This method replaces the Gaussian density function with a nonparametric estimate of the score function of the log-likelihood estimation and proves that efficiency gain can be achieved in both varying coefficient models [20] and varying coefficient models with non-stationary covariates [21]. In this study, we propose an adaptive estimation method to estimate spatially varying coefficients, different from the least squares criterion, the logarithmic function of the new adaptive estimation method is similar to the likelihood structure of the mixed density function, without an explicit solution, and we use the generalized Modal EM (GMEM) algorithm to achieve parameter estimation [22]. Simulation results show that when the error distribution deviates from the normal distribution, the new estimation is more effective than the existing GWR estimation based on least squares. In addition, the new method is also comparable with existing GWR methods when the error is completely normal. Finally, we illustrate the effectiveness of the proposed adaptive estimation method through two real data examples.

    The rest of this study is organized below. In Section 2, the adaptive estimation of spatial varying coefficient models and the generalized Modal EM algorithm are introduced. In Section 3, through simulation research, the proposed method is compared with the GWR method under five different error densities. In Section 4, the new method is applied to two real-world data examples. This article is briefly discussed in Section 5. All technical conditions and certifications are given in Section Appendix A.

    For any given u0, approximating the spatially varying coefficients by Taylor's expansion as

    βk(ui)βk(u0)+˙βk(u0)(uiu0)=bk+ck(uiu0),k=0,,p, (2.1)

    where ui is in a neighborhood of u0, ˙βk(u0)={(βk(u)/u1,(βk(u)/u2}u=u0. Using the above approximation, we have the following objective function for estimating (b1,,bp) and (c1,,cp)

    ni=1[yipk=1{bk+ck(uiu0)}xik]2Kh(||uiu0||), (2.2)

    where Kh()=K(/h)/h2, K() is a kernel function, h is a bandwidth, and ||s||=(ss)12 for a vector s. Throughout this study, a Gaussian kernel will be used for K(). Due to the least squares in (2.2), the resulting estimate may lose some efficiency when the error distribution is not normal. Therefore, we develop an adaptive estimation procedure that can adapt to different error distributions.

    Let f(ε) be the density function of ε. If f(ε) were known, it would be natural to estimate the parameters in (2.1) by maximizing the following log-likelihood function

    ni=1logf[yipk=1{bk+ck(uiu0)}xik]Kh(||uiu0||). (2.3)

    However, in practice, f(ε) is generally unknown but can be replaced by a leave-one-out kernel density estimator

    ˜fε=1nnjiKh0(εi˜εj), (2.4)

    where ˜εj=yjpk=0xjk˜βk(uj) is a preliminary estimation of εj based on initial estimator ˜βk, +and ˜βk=˜bk, which can be estimated by local linear regression estimator (2.2). Let θ=(b1,,bp,c1,,cp). Then our proposed adaptive estimate for the parameter θ is

    ˆθ=argmaxθQ(θ) (2.5)

    where

    Q(θ)=ni=1log(1njiKh0[yipk=1{bk+ck(uiu0)}xik˜εj])Kh(||uiu0||). (2.6)

    Since the logarithmic function of (2.6) has an internal sum, which is similar to the objective function from a random sample of mixed density, so there is no explicit solution. In the following, we use the generalized Modal EM algorithm proposed in Yao [22] to calculate the parameters.

    Generalized Modal EM algorithm (GMEM): GMEM algorithm is the generalization of Modal EM (MEM) algorithm [23] that finds the mode of the mixture density and does nonparametric clustering. The MEM algorithm comprises two steps similar to the expectation and the maximization steps in EM algorithm, which aims at maximizing the likelihood function for finite mixture models when the model contains unobserved latent variables. Especially, suppose an m-component finite mixture density be

    f(x)=mj=1πjfj(x),

    where πj is the mixing proportions of mixture component j, and fj(x) is the density of component j. Given any initial value x(0), in the (l+1)th step of the MEM algorithm solves a local maximum of the mixture by the following two steps:

    1) let

    pj=πjfj(x(l))f(x(l)),j=1,,m,

    2) update

    x(l+1)=argmaxxmj=1pjlogfj(x).

    The first step is the "Expectation" step where the probability of each mixture component j,1jm, at the current point x(r) is computed. The second step is the "Maximization" step, similar to EM algorithm, which is usually much easier than the original objective function. Detailed properties of MEM algorithms refer to Li et al. [23]. Yao [22] proves that the MEM algorithm can be applied to maximize a general mixture-type objective function

    f(x)=mj=1wj[log{Kk=1ajkfjk(x)}] (2.7)

    where wk and akl are known positive constants, fjk(x) is positive known function, when j=1, the objective function (2.7) is simplified to

    f(x)=w1log{Kk=1a1kf1k(x)}Kk=1a1kf1k(x),

    Therefore, the MEM algorithm in Eq (2.7) is a special case of the generalized Modal EM algorithm (GMEM) if Kk=1a1k=1 and f1k(x) are density functions. Specifically, given the initial value x(0), in the (l+1)th step of the GMEM algorithm are following:

    E-step: let

    p(l+1)jk=ajkfjk(x(l))Kk=1ajkfjk(x(l)),j=1,,m,k=1,,K,

    M-step: update

    x(l+1)=argmaxxmj=1Kk=1{wjpl+1jklogfjk(x)}.

    In this study, we note that the objective function Q(θ) of (2.6) has the mixture form of (2.7). Specially, Kh(uiu0),1n,Kh0[yipk=1{bk+ck(uiu0)}xik˜εj] in (2.6) corresponds to wj,ajk,fjk(x) in (2.7), respectively. Therefore, GMEM could be directly applied to estimate the parameters of bk, ck in (2.6). Let θ(0) be the initial estimator obtained by minimizing (2.2), θ(l)=(b(l)1,,b(l)p,c(l)1,,c(l)p) is the estimator of (l)th iteration, ˜εj is a preliminary estimation of εj and no need to update, zi={xi,(xi(uiu0))}. At the (l+1)th iteration, steps E and M are as follows:

    E-step: calculate the classification probabilities p(k+1)ij,

    p(l+1)ij=Kh0[yipk=1{b(l)k+c(l)k(uiu0)}xik˜εj]jiKh0[yipk=1{b(l)k+c(l)k(uiu0)}xik˜εj]. (2.8)

    M-step: update θ(l+1)

    θ(l+1)=argmaxθni=1ji{p(l+1)ijKh(||uiu0||)log(Kh0[yipk=1{bk+ck(uiu0)}xik˜εj])}=ni=1jiargminθ{p(l+1)ijKh(||uiu0||)[yi˜εjziθ]2}=(ni=1jip(l+1)ijKh(||uiu0||)zizi)1ni=1jip(l+1)ijKh(||uiu0||)(yi˜εj)zi=(ZWZ)1ZWY, (2.9)

    where Z=(Z1,n1,,Zn,n1),

    Zi,n1=(xi1xi1xi1xipxipxip(uiu0)xi1(uiu0)xi1(uiu0)xi1(uiu0)xip(uiu0)xip(uiu0)xip)3p×(n1),

    W=diag(p(l+1)12Kh(||u1u0||),,p(l+1)1nKh(||u1u0||),,p(l+1)n,n1Kh(||unu0||)), Y=(y1˜ε2,,y1˜εn,,yn˜εn1), and the second equation follows the use of Gaussian kernel. If ||θ(l+1)θ(l)||105, the algorithm ends. Otherwise, the E and M steps of the algorithm continue to iterate.

    Proposition 2.1. Each iteration of the above E and M steps will monotonically increase Q(θ) in the Eq (2.6), i.e., for any l,

    Q(θ(l+1))Q(θ(l)).

    The consistency and asymptotic of θ are established. Let H=diag(1,h,h)Ip, where is the Kronecker product and Ip is the unit matrix of p×p. For i,j=0,1,2, k=1,2, denote γij=uikKj(u)du with u=(u1,u2), and q() is the marginal density function of u.

    Theorem 2.1. Under the regularity conditions in the A, there exists a consistent maximizer ˆθ=(ˆb1,,ˆbp,ˆc1,,ˆcp) of (2.6) with probability approaching 1 such that

    H(ˆθθ)=Op{(nh2)1/2+h2}.

    Based on Theorem 2.1, we can know that the proposed adaptive estimator of θ is consistent and its proof is provided in the Appendix. Next, we provide the asymptotic distribution of the proposed estimator.

    Theorem 2.2. Suppose that the regularity conditions in the A hold. Then ˆθ, given in Theorem 2.1, has the following asymptotic distribution

    nh2{H(ˆθθ)S1h22pk=1tr(Hβk)ψk(1+op(1))}DN(03p×1,[E{ρ(ε)2}]1q(u0)1S1ΛS1)}},

    where ρ()=logf(), S=diag(γ01,γ21,γ21)Γ(u0), Γ(u0)={Γkj(u0)}1k,jp, Γkj(u0)=E(xikxij|u0), Λ=diag(γ02,γ22,γ22)Γ(u0), and ψk=(γ2102×1)(Γkj(u0))1jp.

    This section simulates the proposed adaptive estimation method and compares it with that of the local linear geographically weighted regression (LL) [24] and the Geographically Weighted Regression Model (GWR). In numerical experiments, the following four designs of error structure are considered:

    1) εN(0,1);

    2) εt3;

    3) ε0.5N(1,0.52)+0.5N(1,0.52);

    4) εeTE(eT), where TN(0,1).

    The first is the standard normal distribution as a benchmark for comparison, and the second is the t distribution with 3 degrees of freedom. The distributions of the third are doublet and left-biased, and the last one has a long right tail. For the above error distribution, the population positions are located at the N=25×25 regular grid in the square region of D=[0,1]2, and the distance between any two adjacent points in the horizontal and vertical directions is equal. At each location, the response variable y1,,yn is generated by yi=β1(ui)xi1+β2(ui)xi2+εi, where x1 and x2 follow N(0,1) with correlation coefficient ρ=1/2, and the regression coefficient function is as follows:

    β1(u)=1+2512(u1+u2),β2(u)=1+1324[36(625u12)2][36(625u22)2],

    the true coefficient functions contour plots of β1(u) and β2(u) are shown in Figure 1. We randomly sample n=200 and 400 points from the 25×25 points in each of the 100 Monte Carlo experiments.

    Figure 1.  True coefficient functions contour plots of β1 (left) and β2 (right).

    There are two bandwidths h and h0 in the estimate, we use the leave-one-out cross-validation method to select h, and the choice of h0=h/log(n) follows Linton and Xiao [25]. The performance of the estimator ˆβ() is evaluated by the square root of the average squared errors (RASE), which calculated as follows:

    RASE=1nni=12p=1[ˆβp(ui)βp(ui)]2.

    The simulation results are summarized in Table 1. It can be clearly seen that when the error is non-normal, the proposed adaptive estimation is better than LL and GWR, and the improvement of estimation efficiency may also be considerable. When the error is fully normally distributed, our method is still comparable to the LL and GWR method.

    Table 1.  Comparison RASE and its standard error in brackets.
    ε n=200 n=400
    GWR LL Adaptive GWR LL Adaptive
    1 0.838(0.101) 0.787(0.099) 0.978(0.097) 0.677(0.051) 0.561(0.051) 0.790(0.048)
    2 0.964(0.152) 0.857(0.123) 0.854(0.110) 0.737(0.090) 0.655(0.067) 0.538(0.028)
    3 1.104(0.109) 1.009(0.107) 0.940(0.081) 0.796(0.061) 0.685(0.052) 0.632(0.031)
    4 0.869(0.175) 0.837(0.127) 0.653(0.084) 0.692(0.158) 0.621(0.131) 0.405(0.060)

     | Show Table
    DownLoad: CSV

    Figure 2 visualizes the estimated surfaces of β1() and β2() using adaptive estimation method, LL and GWR based on sample size n=400 when the error distribution of ε is the case 3, These results highlight that the adaptive estimation method can capture more accurate spatial pattern than the LL and GWR method.

    Figure 2.  Estimated surface via adaptive method, LL and GWR based on sample size n=400 when the error distribution in the case 3.

    Example 1. (Dublin Voter Turnout Data) This section applies the proposed methodology to Dublin Voter data. This dataset includes the proportion of the voting population in 322 areas, as well as several variables that may explain the change in the proportion of the voting population. Specifically, we will explore how the unemployment rate (Unempl), the proportion of ages 25 to 44 (Age 25_44) and no formal education (LowEduc) affect the proportion of the voting population in each region (GenEl2004). Figure 3 shows the spatial distribution of the dependent variable and the three independent variables.

    Figure 3.  Response and independent variables for voter turnout data in Dublin.

    The dependent variable GenEl2004 and the independent variable Unempl, Age25_44, LowEduc are y,x2,x3,x4, respectively, x1=1 as intercept terms. We use the spatially varying coefficient models to fit the data as follows:

    yi=β1(ui)+4k=2βp(ui)xik+εi.

    Figure 4 summarizes estimated coefficient functions using the adaptive method, LL, and GWR respectively, which are considerably in space. Figure 7(a) shows a residual QQ-plot of the Dublin voter turnout via the adaptive method. From the plot, we can see that the distribution of the residual is very close to normal.

    Figure 4.  Estimated coefficient functions for voter turnout data in Dublin using adaptive method (top), LL (middle) and GWR (bottom).

    To evaluate the prediction accuracy of the adaptive method, we set aside 50 observations for comparing the mean squared prediction error (MSPE) of the adaptive method, LL, and GWR. The MSPE is computed as follows:

    MSPE=1mmj=1(yjˆyj)2,j=1,,m,

    where m=50 and ˆyj=ˆβ1(uj)+4k=2ˆβp(uj)xjk. The MSPE values by three methods are comparable, which are 0.018, 0.015 and 0.017, respectively. The QQ-plot of residuals from the Fingure 7(a) are close to the normal distribution, which explains why the MSPE of the adaptive method is very close to the MSPE of the GWR and the LL.

    Example 2. (England and Wales House Price Data) England and Wales house price data is publicly available in the R package GWmodel. The dataset includes 10 variables, namely: house sale price (PurPrice), BldIntWr, BldPostW, bld60, bld70, bld80, TypDetch, TypSemiD, TypFlat and floor area (FlrArea). With the exception of the floor area (FlrArea), all independent variables are indicative variables (1 or 0). Figure 5 is shown Spatial distribution of PurPrice and FlrArea.

    Figure 5.  Response and independent variables for house price data in England and Wales.

    We take the house sale price (y) as the dependent variable, FlrArea(x2) as the independent variable, x1=1 as the intercept term, and the spatially varying coefficient models of the fitted data is:

    yi=β1(ui)+β2(ui)x2+εi.

    The estimated coefficient function is shown in Figure 6, and Figure 7(b) shows a residual QQ plot via adaptive method for England and Wales house price data. Similar to the analysis in example 1, we set aside 50 observations as the test set. The MSPE of the adaptive approach, LL and GWR are 0.302, 0.339 and 0.548, respectively. The QQ-plot of residuals from the above fit showed a clear deviation from normality, which explains why the MSPE from the adaptive approach is smaller than LL and GWR.

    Figure 6.  Estimated coefficient functions for house price data in England and Wales using adaptive method (top), LL (middle) and GWR (bottom).
    Figure 7.  Residual QQ-plot for two data examples: (a) Dublin voter turnout data; (b) England and Wales housing data.

    In this article, we proposed an adaptive estimation for spatially varying coefficient models. The new estimation procedure can adapt to different errors and improve estimation efficiency than the LL and the GWR method. Simulation studies and two real data applications confirmed our theoretical findings.

    The proposed method in this article can be easily extended to semiparametric varying-coefficient partially linear models, where some coefficients in the model are assumed to be constant and the remaining coefficients are allowed to spatially vary across the studied region. Another interesting future work is the spatiotemporal extension to analyze data collected across time and space.

    This work was supported by the National Natural Science Foundation of China (Grant No. 11871173), the National Statistical Science Research Project (Grant No. 2020LZ09).

    The authors declare that there is no conflict of interest.

    This section will give proofs of propositions 2.1, theorem 2.1 and theorem 2.2, with the required regular conditions as follows:

    1) K() is bounded, symmetric, and has bounded support and bounded derivatives;

    2) {xi}ni=1, {ui}ni=1, {εi}ni=1 are independent and identically distributed and {εi}ni=1 is independent of {xi}ni=1 and {ui}ni=1. In addition, the independent variable x has bounded support;

    3) The probability density function f(ε) of ε has fourth-order bounded continuous derivative. Assume E[ρ(ε)]=0, E[ρ(ε)]<, E[ρ(ε)2]< and ρ() is bounded;

    4) The marginal density q(u) of u has a continuous second derivative in some neighborhood of u0 and q(u0)0;

    5) h0, when n, nh, h0=h/log(n);

    6) βk(),k=1,,p has bounded and continuous third derivative.

    Proof of propositions 2.1: Note that

    Q(θ(l+1))Q(θ(l))=ni=1Kh(||uiu0||)log{jiKh0[yipk=1{b(l+1)k+c(l+1)k(uiu0)}xik˜εj]jiKh0[yipk=1{b(l)k+c(l)k(uiu0)}xik˜εj]}=ni=1Kh(||uiu0||)logji(Kh0[yipk=1{b(l)k+c(l)k(uiu0)}xik˜εj]jiKh0[yipk=1{b(l)k+c(l)k(uiu0)}xik˜εj])×(Kh0[yipk=1{b(l+1)k+c(l+1)k(uiu0)}xik˜εj]Kh0[yipk=1{b(l)k+c(l)k(uiu0)}xik˜εj])=ni=1Kh(||uiu0||)log{jip(l+1)ijKh0[yipk=1{b(l+1)k+c(l+1)k(uiu0)}xik˜εj]Kh0[yipk=1{b(l)k+c(l)k(uiu0)}xik˜εj]},

    where

    p(l+1)ij=Kh0[yipk=1{b(l)k+c(l)k(uiu0)}xik˜εj]jiKh0[yipk=1{b(l)k+c(l)k(uiu0)}xik˜εj].

    From Jensen's inequality,

    Q(θ(k+1))Q(θ(k))ni=1Kh(||uiu0||)jip(l+1)ijlog{Kh0[yipk=1{b(l+1)k+c(l+1)k(uiu0)}xik˜εj]Kh0[yipk=1{b(l)k+c(l)k(uiu0)}xik˜εj]}.

    Based on the properties of step M of formula (2.9), it is proved that Q(θ(k+1))Q(θ(k))0.

    Proof of theorem 2.1: According to the the result of Linton and Xiao [25], the asymptotic behaviour of ˆθ in (7) is the same as that obtained from (4). Therefore, we prove the asymptotic properties of ˆθ based on (2.3).

    Denote θ=Hθ, xi=(xi1,,xip,(uiu0h)xi1,,(uiu0h)xip), Ki=Kh(||uiu0||), R(ui,xi)=pk=1βk(ui)xikpk=1{bk+ck(uiu0)}xik, and an=(nh2)1/2+h2. Let ρ()=logf(), objective function (2.3) is written as

    L(θ)=1nni=1Kiρ(yiθxi)=L(θ).

    Based on the definition of θ, it is sufficient to show that for any given η>0, there exists a large constant c such that

    P{supμ=1L(θ+anμ)<L(θ)}1η

    where μ has the same dimension as θ, an is the convergence rate. By using Taylor expansion, it follows that

    L(θ+anμ)L(θ)=1nni=1Ki{ρ(εi+R(ui,xi)anμxi)ρ(εi+R(ui,xi))}=1nni=1Kiρ(εi+R(ui,xi))anμxi+12nni=1Kiρ(εi+R(ui,xi))a2n(μxi)216nni=1Kiρ(zi)a3n(μxi)3=I1+I2+I3,

    where zi is a value between εi+R(ui,xi)anμxi and εi+R(ui,xi).

    For I1=1nni=1Kiρ(εi+R(ui,xi))anμxi, Let δ1=E[ρ(εi)]. Since R(ui,xi)=pk=0βk(ui)xikpk=0{bk+ck(uiu0)}xik=Op(h2)=op(1) and E[ρ(ε)]=0, so

    E(I1)=E(Kiρ(εi+R(ui,xi))anμxi)anE{Kiρ(εi)R(ui,xi)μxi}=anE[ρ(εi)]E[KiR(ui,xi)μxi]=anδ1E[KiR(ui,xi)μxi]=anδ1E{E[R(ui,xi)μxi|ui]Ki}

    By using μxiμxi, we have E(I1)=O(anh2).

    var(I1)=1nvar{Kiρ(εi+R(ui,xi))anμxi}=1n{E(A2)[E(A)]2}

    where A=Kiρ(εi+R(ui,xi))anμxi. Let δ2=E[ρ(εi)2], then

    E(A2)=E{K2iρ(εi+R(ui,xi))2a2n(μxi)2}a2nE{K2iρ(εi)2(μxi)2}=a2nδ2E{E[μxixiμ|ui]K2i}=a2nδ2μE{E[xixi|ui]K2i}μ

    Note that xixi=(xijxik(uiu0h)l((uiu0h)l))1j,kp;l,l=0,1, Γjk(ui)=E(xijxik|ui), R2uK(u)du=02×1, R2uuuK(u)du=02×1, R2u1u2K(u)du=02×1, for 1j,kp, then

    E[E(xijxik|ui)(uiu0h)l((uiu0h)l)K2i]=E[Γjk(ui)(uiu0h)l((uiu0h)l)K2i]=1h4Γjk(ui)(uiu0h)l((uiu0h)l)K2(uiu0h)q(ui)dui=1h2q(u0)Γjk(u0)tl(tl)K2(t)dt (A.1)

    The second equation follows the Taylor expansion, and the assumption ε is independent of u and x. Then, E{E[xixi|ui]K2i}=1h2q(u0)Λ, where Λ=diag(1,ν2,ν2)Γ(u0) is a 3p×3p matrix. Thus,

    E(A2)=a2nδ21hq(u0)μΛμ=O(a2n1h2).

    Note that [E(A)]2=[E(I1)]2=[O(anh2]2E(A2), then var(I1)1n[E(A)]2=O(a2n1nh2). Hence,

    I1=E(I1)+Op(var(I1))=Op(anh2)+Op(a2n1nh2)=Op(a2n).

    Similarly,

    I2=12nni=1Kiρ(εi+R(ui,xi))a2n(μxi)2=Op(a2n),

    and

    I3=16nni=1Kiρ(zi)a3n(μxi)3=Op(a3n).

    Assume δ1>0, we can choose c large enough such that I1+I2+I3<0 with probability at least 1η. Thus P{supμ=cL(θ+anμ)<L(θ)}1η.

    Proof of theorem 2.2: Since ˆθ maximizes L(θ), then L(ˆθ)=0. By Taylor expansion,

    0=L(ˆθ)=L(θ)+L(θ)(ˆθθ)+op(1),

    thus

    ˆθθ=[L(θ)]1L(θ)(1+op(1)).

    For L(θ), since L(θ)=1nni=1Kiρ(yiθxi), and yiθxi=εi+R(ui,xi), then L(θ)=1nni=1Kiρ(εi+R(ui,xi))xixi, and the expectation is

    E[L(θ)]=E{Kiρ(εi+R(ui,xi))xixi}E{Kiρ(εi)xixi}=δ1E{E{xixi|ui}Ki}=δ1q(u0)Q(1+o(1)),

    where S=diag(γ01,γ21,γ21)Γ(u0), the last equation follow (A.1). In this study, we consider the element-wise variance of a matrix, then

    var[L(θ)]=1nvar{Kiρ(εi+R(ui,xi))xixi}=Op(1nh2).

    Based on the result L(θ)=E[L(θ)]+Opvar[L(θ)] and the assumption nh, it follows that L(θ)=δ1q(u0)S(1+op(1)).

    For L(θ),

    L(θ)=1nni=1Kiρ(εi+R(ui,xi))xi=1nni=1Kiρ(εi)xi1nni=1Kiρ(εi)R(ui,xi))xi=wmνn

    The asymptotic result is determined by wm. Next, calculating the order of νn.

    E(νn)=E[Kiρ(εi)R(ui,xi))xi]=δ1E{E{R(ui,xi)xi|ui}Ki}

    For R(ui,xi)xi, since βj() is bounded, then we have

    R(ui,xi)=pk=1βk(ui)xikpk=1{bk+ck(uiu0)}xik=pk=112(uiu0)Hβk(u0)(uiu0)xik(1+op(1))

    where Hβk is Hessian matrix. By xi=(xi1,,xip,(uiu0h)xi1,,(uiu0h)xip),

    R(ui,xi)xi={(pk=112(uiu0)Hβk(u0)(uiu0)xikxij)1jp,(pk=112h(uiu0)Hβk(u0)(uiu0)(uiu0)xikxij)1jp}3p×1.

    The expectation about the first item is

    E{E[pk=112(uiu0)Hβk(u0)(uiu0)xikxij|ui]Ki}=E{pk=112(uiu0)Hβk(u0)(uiu0)Γkj(ui)Ki}=h22q(u0)pk=1Γkj(u0)tHβktK(t)dt=h22q(u0)pk=1tr(Hβk)Γkj(u0)γ21,

    and the expectation on second item is

    E{E[pk=112h(uiu0)Hβk(u0)(uiu0)2xikxij|ui]Ki}=h22q(u0)pk=1Γkj(u0)tHβkttK(t)dt=02×1,

    then

    E(νn)=δ1h22q(u0)pk=1tr(Hβk)ψj(1+o(1))

    where ψk=(γ2102×1)(Γkj(u0))1jp is a 3p×1 vector for j=1,,p. Since var(νn)=1nvar{Kiρ(εi)R(ui,xi))xi}=O(h2/n), then based on the result νn=E(νn)+var(νn) and the assumption nh, it follows that

    νn=δ1h22q(u0)pk=1tr(Hβk)ψj(1+op(1)).

    Then

    ˆθθ=[L(θ)]1L(θ)(1+op(1))=S1wnδq(u0)(1+op(1))+S1h22pk=1tr(Hβk)ψj(1+op(1)).

    For wn, based on the assumption E[ρ(εi)]=0, we can easily get E(wn) = 0, and

    var(wn)=1nvar{1nKiρ(εi)xi}=1nE{K2iρ(εi)2xixi}=1nh2δ2q(u0)Λ(1+o(1)).

    Based on Lyapunov Central Limit Theorem, we have the following result

    nh2{ˆθθS1h22pk=1tr(Hβk)ψk(1+op(1))}DN(03p×1,δ21δ2q(u0)1S1ΛS1).

    By δ11=δ2, the theorem is proved.



    [1] C. Brunsdon, A. S. Fotheringham, M. E. Charlton, Geographically weighted regression: a method for exploring spatial nonstationarity, Geogr. Anal., 28 (1996), 281–298. https://doi.org/10.1111/j.1538-4632.1996.tb00936.x doi: 10.1111/j.1538-4632.1996.tb00936.x
    [2] C. Brunsdon, A. S. Fotheringham, M. E. Charlton, Geographically weighted regression, J. R. Stat. Soc. Ser. D-Stat., 47 (1998), 431–443. https://doi.org/10.1111/1467-9884.00145 doi: 10.1111/1467-9884.00145
    [3] S. L. Shen, C. L. Mei, Y. J. Zhang, Spatially varying coefficient models: testing for spatial heteroscedasticity and reweighting estimation of the coefficients, Environ. Plann. A, 43 (2011), 1723–1745. https://doi.org/10.1068/a43201 doi: 10.1068/a43201
    [4] S. L. Su, C. R. Lei, A. Y. Li, J. H. Pi, Z. L. Cai, Coverage inequality and quality of volunteered geographic features in chinese cities: analyzing the associated local characteristics using geographically weighted regression, Appl. Geogr., 78 (2017), 78–93. https://doi.org/10.1016/j.apgeog.2016.11.002 doi: 10.1016/j.apgeog.2016.11.002
    [5] D. Al-Sulami, Z. Y. Jiang, Z. D. Lu, J. Zhu, Estimation for semiparametric nonlinear regression of irregularly located spatial time-series data, Economet. Stat., 2 (2017), 22–35. https://doi.org/10.1016/j.ecosta.2017.01.002 doi: 10.1016/j.ecosta.2017.01.002
    [6] Z. D. Lu, D. J. Steinskog, D. Tjøstheim, Q. W. Yao, Adaptively varying-coefficient spatiotemporal models, J. R. Stat. Soc. Ser. B-Stat. Methodol., 71 (2009), 859–880. https://doi.org/10.1111/j.1467-9868.2009.00710.x doi: 10.1111/j.1467-9868.2009.00710.x
    [7] Y. P. Huang, M. Yuan, Y. P. Lu, Spatially varying relationships between surface urban heat islands and driving factors across cities in China, Environ. Plan. B-Urban, 46 (2019), 377–394. https://doi.org/10.1177/2399808317716935 doi: 10.1177/2399808317716935
    [8] J. Q. Fan, W. Y. Zhang, Statistical methods with varying coefficient models, Stat. Interface, 1 (2008), 179–195. https://doi.org/10.4310/SII.2008.v1.n1.a15 doi: 10.4310/SII.2008.v1.n1.a15
    [9] T. Hastie, R. Tibshirani, Varying-coefficient models, J. R. Stat. Soc. Ser. B-Stat. Methodol., 55 (1993), 757–779. https://doi.org/10.1111/j.2517-6161.1993.tb01939.x doi: 10.1111/j.2517-6161.1993.tb01939.x
    [10] A. E. Gelfand, S. Banerjee, D. Gamerman, Spatial process modelling for univariate and multivariate dynamic spatial data, Environmetrics, 16 (2005), 465–479. https://doi.org/10.1002/env.715 doi: 10.1002/env.715
    [11] R. M. Assuncao, Space varying coefficient models for small area data, Environmetrics, 14 (2003), 453–473. https://doi.org/10.1002/env.599 doi: 10.1002/env.599
    [12] H. Kim, J. Lee, Hierarchical spatially varying coefficient process model, Technometrics, 59 (2017), 521–527. https://doi.org/10.1080/00401706.2017.1317290 doi: 10.1080/00401706.2017.1317290
    [13] Z. T. Luo, H. Y. Sang, B. Mallick, A Bayesian contiguous partitioning method for learning clustered latent variables, J. Mach. Learn. Res., 22 (2021), 1748–1799.
    [14] J. R. Mu, G. N. Wang, L. Wang, Estimation and inference in spatially varying coefficient models, Environmetrics, 29 (2018), e2485. https://doi.org/10.1002/env.2485 doi: 10.1002/env.2485
    [15] Y. E. Shin, H. Y. Sang, D. W. Liu, T. A. Ferguson, P. X. K. Song, Autologistic network model on binary data for disease progression study, Biometrics, 75 (2019), 1310–1320. https://doi.org/10.1111/biom.13111 doi: 10.1111/biom.13111
    [16] W. Wang, Y. Sun, Penalized local polynomial regression for spatial data, Biometrics, 75 (2019), 1179–1190. https://doi.org/10.1111/biom.13077 doi: 10.1111/biom.13077
    [17] F. R. Li, H. Y. Sang, Spatial homogeneity pursuit of regression coefficients for large datasets, J. Am. Stat. Assoc., 114 (2019), 1050–1062. https://doi.org/10.1080/01621459.2018.1529595 doi: 10.1080/01621459.2018.1529595
    [18] Y. Zhong, H. Y. Sang, S. J. Cook, P. M. Kellstedt, Sparse spatially clustered coefficient model via adaptive regularization, Comput. Stat. Data Anal., 177 (2023), 107581. https://doi.org/10.1016/j.csda.2022.107581 doi: 10.1016/j.csda.2022.107581
    [19] C. Stein, Efficient nonparametric testing and estimation, University California Press, 1956.
    [20] Y. X. Chen, Q. Wang, W. X. Yao, Adaptive estimation for varying coefficient models, J. Multivar. Anal., 137 (2015), 17–31. https://doi.org/10.1016/j.jmva.2015.01.017 doi: 10.1016/j.jmva.2015.01.017
    [21] Z. Y. Zhou, J. Yu, Adaptive estimation for varying coefficient models with non stationary covariates, Commun. Stat. Theory M., 48 (2019), 4034–4050. https://doi.org/10.1080/03610926.2018.1484483 doi: 10.1080/03610926.2018.1484483
    [22] W. X. Yao, A note on EM algorithm for mixture models, Stat. Probabil. Lett., 83 (2013), 519–526. https://doi.org/10.1016/j.spl.2012.10.017 doi: 10.1016/j.spl.2012.10.017
    [23] L. Jia, S. Ray, B. G. Lindsay, A nonparametric statistical approach to clustering via mode identification, J. Mach. Learn. Res., 8 (2007), 1687–1723.
    [24] N. Wang, C. L. Mei, X. D. Yan, Local linear estimation of spatially varying coefficient models: an improvement on the geographically weighted regression technique, Environ. Plann. A, 40 (2008), 986–1005. https://doi.org/10.1068/a3941 doi: 10.1068/a3941
    [25] O. Linton, Z. J. Xiao, A nonparametric regression estimator that adapts to error distribution of unknown form, Economet. Theory, 23 (2007), 371–413. https://doi.org/10.1017/S026646660707017X doi: 10.1017/S026646660707017X
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1618) PDF downloads(104) Cited by(0)

Figures and Tables

Figures(7)  /  Tables(1)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog