1.
Introduction
Nonlinear phenomena, namely nonnormality, asymmetric cycles, and nonlinear relationships between lagged variables, have been well observed in some classical data sets, such as the sunspot, Canadian lynx, and Australian blowfly data. However, linear ARMA models can not adequately approximate these nonlinear phenomena[1]. Due to this, nonparametric regression models have found important applications in modeling nonlinear time series[2,3]. Yet, in the multivariate setting with more than two variables, it is difficult to estimate regression function with reasonable accuracy due to the "curse of dimensionality". To solve this problem, various semiparametric models have been studied. For example, Stone[4] proposed the additive model, and Gao[5] investigated a class of partially linear models. In addition, one of the most popular semiparametric models is the varying coefficient model [6], whose regression function depends linearly on some regressors and regression coefficients vary with some threshold variables.
In this study, we consider the varying coefficient model of the following form:
where Xt=(Xt1,⋯,Xtp)⊤ and ββ(Ut)=(β1(Ut)⋯,βp(Ut))⊤. The functions βj(⋅),j=1,⋯,p are assumed to be unknown but smooth. ⊤ denotes the transpose of a vector or matrix. Ut is a univariate random variable, which is called the threshold variable. Both Xt and Ut can consist of either exogenous variables or lagged variables of Yt. ϵt is the error term that satisfies E(ϵt|Xt,Ut)=0, and ϵ=(ϵ1,…,ϵn)⊤. As a generalization of the linear model, the Model (1.1) has attracted a great deal of attention over the past two decades. When Xt and Ut are some lagged variables of Yt, Chen and Tsay [7] proposed an arranged local regression procedure for the specification of the Model (1.1), and the consistency result and a recursive algorithm were given. Cai et al.[1] applied a local linear technique to estimate the time-varying coefficients. The asymptotic properties of the kernel estimators were investigated under the α−mixing condition. Nevertheless, as pointed out in [8,9], the local smoothing method is computationally expensive because it requires re-fitting at every point where the fitted function needs to be evaluated. In contrast, the advantage of the B-spline estimation method is its computational efficiency. However, it is difficult to establish the asymptotic normality of the spline estimators [10,11]. For varying coefficient models, Lai et al.[12] considered the B-spline to estimate the time-varying coefficients.
The aforementioned work depends on the assumption of independent errors. However, model misspecification, such as omitting relevant variables and wrong function form, may result in correlated errors, as mentioned in [13]. From this perspective, the assumption of independent errors is inappropriate. Many authors have studied the topic of non-/semi-parametric regression with correlated errors. Under the assumption that model errors follow an invertible linear process, Xiao et al. [14] proposed a modification of the local polynomial estimation for nonparametric regression. Su and Ullah [13] considered nonparametric regression with an error process in a nonparametric autocorrelated form. Lei et al.[15] investigated a semiparametric autoregressive model with ARMA errors. In this study, we propose a global smoothing method based on B-spline for the estimation of the time-varying coefficients. Similar to [16] and [17], we have relaxed the assumption of independence to model errors, assuming that they can be correlated. That is, when the model errors are independent, the covariance matrix of the errors is E(ϵϵ⊤)=σ2I. Now, we assume that E(ϵϵ⊤|Xt,Ut)=V. Specially, if model errors follow an AR(1) process, ϵt=ρϵt−1+et, |ρ|<1, eti.i.d∼(0,σ2),
Our study assumes that the covariance matrix of model errors is positive definite, without requiring a specific form of autoregression, in which case, the scope of application can be further expanded. Certainly, we extend Theorem 1 of [12] to the case of correlated errors.
The remainder of this paper has been presented as follows. In Section 2, the estimation procedure for the varying coefficient model with correlated errors has been introduced. Section 3 presents the consistency and convergence rates of the spline estimators and provides an estimation algorithm. In Section 4, we present numerical examples to illustrate the performance of the proposed estimation method. In Section 5, we compare the results of the proposed spline method with the local linear method proposed by Cai et al. [1]. The conclusions are presented in Section 6 and the proofs of the main results in the Appendix.
2.
Estimation method
Let a=ξ0<ξ1<⋯<ξM+1=b partition the interval [a,b] into subintervals [ξk,ξk+1),k=0,...,M with M internal knots. A polynomial B spline of order r is a function whose restriction to each subinterval is a polynomial of degree r−1 and globally r−2 times continuously differentiable. A linear space Sr,M with a fixed sequence of knots has a normalized B-spline basis {B1(u),⋯,BK(u)} with K=M+r. As in [18], the basis satisfies (i) Bk(u)≥0,k=1,⋯,K, (ii) K∑s=1Bk(u)≡1, and (iii) Bj(u) is supported inside an interval of length r/K, and at most r of the basis functions are nonzero at any given u.
Suppose that each coefficient function βj(u) in the model (1.1) is smooth; then, it can be well approximated by a B-spline function β∗j(u)∈Sr,M [19]. Thus, there is a set of constants b∗js,s=1,⋯,K, such that βj(u)≈β∗j(u)=K∑s=1b∗jsBs(u). Different coefficients might be approximated by B-spline with a different number of knots in principle, but for simplicity, we have assumed the same basis for the different coefficients. Let b=(b1⊤,⋯,bp⊤)⊤=(b11,⋯,b1K,⋯,bp1,⋯,bpK)⊤, Zt=(Xt1B1(Ut),⋯,Xt1BK(Ut),⋯,XtpBK(Ut))⊤, Z=(Z1,⋯,Zn)⊤, Y=(Y1,⋯,Yn)⊤. Let E(ϵϵ⊤|Xt,Ut)=V, and it is initially known. Subsequently, we can write the criterion as
Denoting the minimizer by ˆb, we estimate βj(u) by ˆβj(u)=K∑k=1ˆbjkBk(u).
3.
Asymptotic property
We have imposed the following conditions. (C1) and (C2) are mild regularity conditions. (C3) is necessary for the identification of the coefficient functions in varying coefficient models, as mentioned in Huang and Shen [20]. (C4), (C6), and (C7) are the same as those used in [1] for stationary mixing data. (C5) imposes some smoothness conditions on the coefficient functions.
3.1. Assumptions
(C1) The eigenvalues of V are bounded away from zero and infinity.
(C2) The smoothing variable Ut has a bounded density supported on [a,b].
(C3) The eigenvalues of the matrix E(XtX⊤t∣Ut=u) are uniformly bounded away from zero and infinity for all u∈[a,b].
(C4) The conditional density of (U1,Ul+1) given (X1,Xl+1) is uniformly bounded on the support of (X1,Xl+1). The conditional density of U1 given X1 is uniformly bounded on the support of X1.
(C5) For g=βj,1≤j≤p, g satisfies a Lipschitz condition of order d>1/2:|g(⌊d⌋)(t)−g(⌊d⌋)(s)|≤C|s−t|d−⌊d⌋, where ⌊d⌋ is the biggest integer strictly smaller than d and g(⌊d⌋) is the ⌊d⌋− th derivative of g. The order of the B-spline used satisfies q≥d+1/2.
(C6) The process {Yt,Xt,Ut}t∈Z is jointly strictly stationary with Ut taking values in R and Xt taking values in Rp. The α-mixing coefficient α(l) of {Yt,Xt,Zt}t∈Z satisfies ∞∑l=1lcα(l)1−2/δ<∞ for some δ>2 and c>1−2/δ.
(C7) E(|Xtj|2δ)<∞,j=1,⋯,d, where δ is given in condition (C6).
3.2. Theoretical result
Theorem 3.1: Assume (C1)−(C7) and that K→∞,K3/n→0, then we have
As noted in Lai et al.[12], we know that by choosing K≍n1/(2d+1), the well-known optimal convergence rate Op(n−2d/(2d+1)) is achieved.
If V is unknown and ˆV is an estimator of V, then an application of the substitution principle leads to
A two-stage estimator ˉb can be obtained by minimizing (3.1); then, ˉβj(u)=K∑k=1ˉbjkBk(u).
Theorem 3.2: Assume (C1)−(C7), K→∞,K3/n→0 and that ˆV is consistent in probability estimating V, then we have
Remark 1. In practice, V is usually unknown. It is customary to replace V in (2.1) with its consistent estimator ˆV [17,21]. Then, ˉb can be derived through a two-stage estimation. If the sample size is sufficiently large, we can split the data set into the training and test parts, and provide a consistent estimator of V from the residuals of ordinary least squares (OLS) for fitting (1.1) by grouping the training set.
3.3. Computational aspects
However, the method in Remark 1 is not feasible when the sample size is small. It should be noted that Montoril et al.[17] have presented an iterative procedure for estimating V, but the iterative procedure is computationally expensive. Hence, they have provided a more efficient method. Generally, if the errors of the model follow an autoregressive process, we can estimate the βj(⋅) by adopting some ideas of [17]. The estimation algorithm is as follows.
Step 1. Estimate the coefficient vector b by OLS, and denote it by b(0).
Step 2. Compute the residuals via ϵ(0)t=Yt−Ztb(0), and fit an autoregressive model to the residuals, i.e.,
Step 3. Letting b(0) and (φ(0)1⋯φ(0)p) in step 1 and step 2 as initial values, estimate b by minimizing numerically
where φp(L)=1−φ1L−⋯−φpLP, with the backshift operator satisfying LkVt=Vt−k, k>0.
4.
Numerical examples
In this section, two simulated examples are considered: the threshold variable Ut is an exogenous variable and the lagged variable of Yt. For a given data set, we used equally spaced knots. The values used as candidates for the number of internal knots varied from one to five. The optimal number of internal knots is selected using the Bayesian information criterion (BIC) (Schwarz [22]). The BIC criterion function is defined as
where n denotes the sample size, RMS denotes the residual mean square, p is equal to the sum of number of autoregressive coefficients assumed for the errors and number of B-spline basis. The B-spline basis {B1(Ut),⋯,BK(Ut)} can be obtained from function bs using the package splines in R language [3]. We consider time series data with lengths n=200,400 and 600, and replicate the simulation 200 times in each case. For each replication, a total of 1000+n observations were generated, and only the last n observations were used to ensure approximate stationarity. The performance of estimators {^βj(⋅)} can be demonstrated by the square root of average squared errors (RASE):
with
and {uk,k=1,⋯,ngrid} are grid points on an interval over which the functions are evaluated. Because the range of the time series data varies from simulation to simulation, we need to select a common interval to compare the RASE values. The intervals selected for Example 4.1 and Example 4.2 are [−0.45,0.45] and [−2,2], respectively.
Example 4.1. Consider the following data generating process
with β1(u)=0.9sin(πu) and β2(u)=0.85cos(πu). We study two autoregressive errors, i.e. AR(1): ϵt=0.8ϵt−1+et and AR(2): ϵt=0.5ϵt−1+0.45ϵt−2+et, where eti.i.d∼N(0,0.22) and Uti.i.d∼U[−0.5,0.5].
Example 4.2. We now discuss an exponential autoregressive (EXPAR) model
with β1(u)=0.6+0.9e(−u2) and β2(u)=−0.4−1.2e(−u2). In this case, the autoregressive errors we study are AR(1): ϵt=0.9ϵt−1+et and AR(2): ϵt=0.6ϵt−1+0.35ϵt−2+et, where eti.i.d∼N(0,0.22).
5.
Numerical results and discussion
In this section, the numerical results for Examples 4.1 and 4.2 are presented. We compare the performance of local linear (Local) estimators proposed by Cai et al.[1], the spline estimators under the assumption of independent errors (Spl.ind) and spline estimators (Spl.cor) proposed by us. Tables 1 and 2 show the mean and standard deviation (in parentheses) of the RASEs for ˆβ1(⋅) and ˆβ2(⋅) with linear (k=1) and cubic (k=3) splines under different AR(p) errors. It is apparent that the standard deviation of the RASEs in columns Spl.cor decrease with the increase in the sample size n. In addition, the results of linear splines are similar to those of the cubic splines. Moreover, it can be seen that the proposed approach performs better than the method that ignores the correlated errors. Based on cubic splines, Tables 3 and 4 provide the resulting estimates while considering different AR (1) errors: ϵt=θϵt−1+et with θ=0.3,0.6,0.9,0.95 when n=600. We can see that with the increase in the correlation levels θ, the performance of Local and Spl.ind worsens, but Spl.cor remains relatively stable. Furthermore, Spl.cor is always better than Spl.ind and Local, which emphasizes the importance of considering the correlated errors. The computational times of Example 4.2 are reported in Table 5. The local linear method is computationally expensive when the sample size is large. Moreover, when considering the correlated errors, the algorithm needs to search for the optimal solution iteratively. Therefore, the calculation time of Spl.cor is longer than that of Spl.ind.
The estimated β1(⋅) and β2(⋅) from a typical sample using cubic splines for Examples 4.1 and 4.2 with AR(1) error structures are plotted in Figures 1 to 4. The solid curve is the true curve, and the dotted curve represents the typical estimated curve. The typical sample is chosen in such a way that its RASE value is equal to the median in the 200 simulations [23]. Clearly, the proposed estimators capture the main features of the true functions well. Although it is not shown here, the examples with AR(2) error structures show similar results.
6.
Conclusions
This study considered the B-spline estimation for varying coefficient models with correlated errors using a weighted least squares method. In terms of the theoretical results, convergence rates were derived under the α-mixing condition. Simulation studies were conducted to illustrate the performance of the proposed estimation method which showed better performance while considering correlated errors in fitting varying coefficient models.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grant No. 61973096 and by GDUPS (2019).
Conflict of interest
The authors declare that they have no any competing interests.
Appendix
A. Proof of the main results
Notation: For two positive sequences an and bn, an≲bn means that an/bn is uniformly bounded, an≍bn if an≲bn and bn≲an. Let ‖g‖2={∫[a,b]g2(x)dx}1/2 be the L2-norm of a square integrable function g(⋅) on [a,b]. R denotes the set of real numbers, and Z denotes the set of integers. |⋅| denotes the Euclidean norm of a vector. I is a identity matrix. ej denotes a unit vector whose j-th entry is 1 and all other entries of which are 0. We denote by λmin(⋅) (λmax(⋅)) the smallest (the largest) eigenvalue of a matrix. Let C denote a generic constant that might assume different values at different places. Given random variables Vn,n≥1, let Vn=Op(bn) means that the random variables Vn/bn,n≥1 are bounded in probability, that is,
And Vn=op(bn) means that the random variables Vn/bn,n≥1 converge to zero in probability, namely
In order to proof the theorems, we need the following Lemmas.
Lemma 1. (Lemma 2, Lai et al.[12]) Under assumptions (C2) and (C3), there are constants 0<b1<b2<∞ such that the eigenvalues of EZ1Z⊤1 fall in [b1K,b2K], and under the additional conditions (C4)−(C7), there are constants 0<b3<b4<∞ such that the eigenvalues of n∑t=1ZtZ⊤t/n fall in [b3K,b4K] with probability approaching 1 as n→∞.
Lemma 2. Under the conditions (C1)−(C7), we have ϵ⊤V−1Z(Z⊤V−1Z)−1Z⊤V−1ϵ=Op(K).
Proof. Note that E(Z⊤Z)=E(n∑t=1ZtZ⊤t)=nE(Z1Z⊤1). Hence, by Lemma 1 we obtain
Using the Markov inequality, we can derive
On the other hand, by Lemma 1 and (C1),
Consequently,
Then, it follows from (A.2) and (A.3) that
This completes the proof of Lemma 2.
Lemma 3. Suppose β0j(u)=K∑k=1b0jkBk(u) is the best approximating B-spline for βj(u) with ‖β0j−βj‖∞=O(K−d) (such approximation property is well-known for B-spline under smoothness assumptions (C5), see for example [19]). Let b0=(b011,⋯,b01K,⋯,b0p1,⋯,b0pK)⊤, η=PV−12ZV−12(Y−Zb0), where PV−12Z=V−12Z(Z⊤V−1Z)−1Z⊤V−12 is a projection matrix. Then under (C1)−(C7) we have
Proof. Denote rt=p∑j=1Xtjβj(Ut) and r=(r1,…,rn)⊤, thus Y=r+ϵ. We have
Noting that PV−12Z is an idempotent matrix, by Lemma 2 we obtain that
By the approximation property of B-spline we have ‖r−Zb0‖2=Op(nK2d), hence
Thus, S2=Op(nK2d), the proof is complete.
Proof of Theorem 3.1.
Proof. By the definition of ˆb, we get
Applying the Cauchy-Schwartz inequality, the equation (A.4) can be continued as
Hence, by Lemma 3 we obtain
Then, it follows from Lemma 1 and (C1) that
which means that ‖b0−ˆb‖2=Op(K2n+1K2d−1). Then, by the property of B-spline (De Boor[19])
for some constants b5,b6>0, we can derive
Proof of Theorem 3.2.
Proof. The proof of this theorem is similar to that of Theorem 2.2 in [17].
Let R=Z⊤V−1Z and ˆR=Z⊤ˆV−1Z. Note that ˆV is consistent, then by the Theorem 1.4.2 in [24] we have
This implies that ˆR is a consistent estimator of R, namely ˆR−1R−I=op(1). It follows from (2.1) that
Therefore, by (3.1) we can derive that
This ensures that
By (A.5), (A.6) and Theorem 3.1, we conclude that
This completes the proof of Theorem 3.2.