Processing math: 100%
Research article Special Issues

Limit theorems for local polynomial estimation of regression for functional dependent data

  • Local polynomial fitting exhibits numerous compelling statistical properties, particularly within the intricate realm of multivariate analysis. However, as functional data analysis gains prominence as a dynamic and pertinent field in data science, the exigency arises for the formulation of a specialized theory tailored to local polynomial fitting. We explored the intricate task of estimating the regression function operator and its partial derivatives for stationary mixing random processes, denoted as (Yi,Xi), using local higher-order polynomial fitting. Our key contributions include establishing the joint asymptotic normality of the estimates for both the regression function and its partial derivatives, specifically in the context of strongly mixing processes. Additionally, we provide explicit expressions for the bias and the variance-covariance matrix of the asymptotic distribution. Demonstrating uniform strong consistency over compact subsets, along with delineating the rates of convergence, we substantiated these results for both the regression function and its partial derivatives. Importantly, these findings rooted in reasonably broad conditions that underpinned the underlying models. To demonstrate practical applicability, we leveraged our results to compute pointwise confidence regions. Finally, we extended our ideas to the nonparametric conditional distribution, and obtained its limiting distribution.

    Citation: Oussama Bouanani, Salim Bouzebda. Limit theorems for local polynomial estimation of regression for functional dependent data[J]. AIMS Mathematics, 2024, 9(9): 23651-23691. doi: 10.3934/math.20241150

    Related Papers:

    [1] Breix Michael Agua, Salim Bouzebda . Single index regression for locally stationary functional time series. AIMS Mathematics, 2024, 9(12): 36202-36258. doi: 10.3934/math.20241719
    [2] Fatimah Alshahrani, Wahiba Bouabsa, Ibrahim M. Almanjahie, Mohammed Kadi Attouch . Robust kernel regression function with uncertain scale parameter for high dimensional ergodic data using $ k $-nearest neighbor estimation. AIMS Mathematics, 2023, 8(6): 13000-13023. doi: 10.3934/math.2023655
    [3] Fatimah Alshahrani, Wahiba Bouabsa, Ibrahim M. Almanjahie, Mohammed Kadi Attouch . $ k $NN local linear estimation of the conditional density and mode for functional spatial high dimensional data. AIMS Mathematics, 2023, 8(7): 15844-15875. doi: 10.3934/math.2023809
    [4] Fatimah A Almulhim, Torkia Merouan, Mohammed B. Alamari, Boubaker Mechab . Local linear estimation for the censored functional regression. AIMS Mathematics, 2024, 9(6): 13980-13997. doi: 10.3934/math.2024679
    [5] Said Attaoui, Billal Bentata, Salim Bouzebda, Ali Laksaci . The strong consistency and asymptotic normality of the kernel estimator type in functional single index model in presence of censored data. AIMS Mathematics, 2024, 9(3): 7340-7371. doi: 10.3934/math.2024356
    [6] Kai Wang, Xiheng Wang, Xiaoping Wang . Curvature estimation for point cloud 2-manifolds based on the heat kernel. AIMS Mathematics, 2024, 9(11): 32491-32513. doi: 10.3934/math.20241557
    [7] Heng Liu, Xia Cui . Adaptive estimation for spatially varying coefficient models. AIMS Mathematics, 2023, 8(6): 13923-13942. doi: 10.3934/math.2023713
    [8] Zuliang Lu, Xiankui Wu, Fei Huang, Fei Cai, Chunjuan Hou, Yin Yang . Convergence and quasi-optimality based on an adaptive finite element method for the bilinear optimal control problem. AIMS Mathematics, 2021, 6(9): 9510-9535. doi: 10.3934/math.2021553
    [9] Woocheol Choi, Young-Pil Choi . A sharp error analysis for the DG method of optimal control problems. AIMS Mathematics, 2022, 7(5): 9117-9155. doi: 10.3934/math.2022506
    [10] Salim Bouzebda . Weak convergence of the conditional single index $ U $-statistics for locally stationary functional time series. AIMS Mathematics, 2024, 9(6): 14807-14898. doi: 10.3934/math.2024720
  • Local polynomial fitting exhibits numerous compelling statistical properties, particularly within the intricate realm of multivariate analysis. However, as functional data analysis gains prominence as a dynamic and pertinent field in data science, the exigency arises for the formulation of a specialized theory tailored to local polynomial fitting. We explored the intricate task of estimating the regression function operator and its partial derivatives for stationary mixing random processes, denoted as (Yi,Xi), using local higher-order polynomial fitting. Our key contributions include establishing the joint asymptotic normality of the estimates for both the regression function and its partial derivatives, specifically in the context of strongly mixing processes. Additionally, we provide explicit expressions for the bias and the variance-covariance matrix of the asymptotic distribution. Demonstrating uniform strong consistency over compact subsets, along with delineating the rates of convergence, we substantiated these results for both the regression function and its partial derivatives. Importantly, these findings rooted in reasonably broad conditions that underpinned the underlying models. To demonstrate practical applicability, we leveraged our results to compute pointwise confidence regions. Finally, we extended our ideas to the nonparametric conditional distribution, and obtained its limiting distribution.



    In recent decades, there has been a growing interest in the statistical study of functional random variables, which are values in infinite-dimensional spaces. Simply put, functional data analysis (FDA) typically deals with statistical problems where the data consists of a sample of n functions x1=x1(t),,xn=xn(t), defined on a compact interval of the real line [0,1]. FDA focuses on statistical issues, often referred to as "inference in stochastic processes", where sample information is derived from a partial trajectory (x(t),t[0,T]) of a stochastic process ({X(t),t0}). In this context, the duration (T) of the observation interval acts as the sample size (n). This increasing interest has brought to light several statistical challenges associated with functional random variables. The drive to explore this field has been fueled by the increasing availability of high-resolution temporal and spatial data. This trend is particularly notable in fields such as meteorology, medicine, satellite imaging, and various other scientific disciplines. Consequently, the statistical modeling of this data, viewed as stochastic functions, has led to numerous complex theoretical and computational research questions. To gain a comprehensive understanding of both the theoretical and practical components of functional data analysis, it is recommended that the reader consult the monographs authored by [13] for linear models about random variables that assume values in a Hilbert space, and [74] for scalar-on-function and function-on-function linear models, as well as functional principal component analysis and parametric discriminant analysis. Ferraty and Vieu [38] primarily concentrated on nonparametric techniques, particularly kernel-based estimation, for scalar-on-function nonlinear regression models. These methods were further expanded to encompass classification and discrimination analysis. In their study, [48] examined the extension of various statistical concepts, including goodness-of-fit tests, portmanteau tests, and change detection, to the functional data framework. Zhang [91] conducted a study on the analysis of variance for functional data, while [71] primarily investigated regression analysis for Gaussian processes. Various semi-parametric models have been explored in the literature, such as functional single index models [44], projection pursuit models [27], partial linear models [5], and functional sliced inverse regression [43]. Additional studies on functional data modeling and analysis have been documented in the following sources: [45,49,57,67] and for recent references [1,2,15,17,18,19,20,21,22,66,79]. One of the most used estimators is the Nadaraya-Watson [68,85] estimator. From a function approximation perspective, the Nadaraya-Watson estimator employs local constant approximations. As highlighted by the numerical analyst [80], "Through all of scientific computing runs this common theme: Increase the accuracy at least to second order. What this means is: Get the linear term right". In other words, local constant approximations are often inadequate, and local linear fits are preferable. Local polynomial fitting emerges as an attractive method from both theoretical and practical standpoints. Local polynomial fitting offers several advantages, as noted in [37]. It adapts to various design types, including random and fixed designs, highly clustered, and nearly uniform designs. Moreover, boundary effects are absent: the bias at the boundary remains of the same order as in the interior, without requiring specific boundary kernels. This is a significant departure from other existing methods. With local polynomial fitting, no boundary modifications are necessary, which is particularly beneficial in multidimensional situations where the boundary can involve a substantial number of data points (see [28,78]). Boundary modifications in higher dimensions pose a considerable challenge. The pivotal method proposed by [60] involved utilizing local higher-order polynomial fitting to estimate the function m and its partial derivatives. The author demonstrated mean-square convergence and joint asymptotic normality (see also [61]). This research holds significant implications for estimating widely used ARCH time series. Masry [59] established the strong consistency of the regression estimator and its partial derivatives up to a fixed total order p, obtaining sharp rates of almost sure convergence that are uniform over compact sets. In [63], the local polynomial estimation of regression functions and their derivatives is examined, establishing the joint asymptotic normality of the relevant estimates for both strongly mixing and ρ-mixing processes. Masry and Mielniczuk [64] considered the nonparametric estimation of a multivariate regression function and its derivatives for a regression model with long-range dependent errors. The authors adopted a local linear fitting approach and established the joint asymptotic distributions for the estimators of the regression function and its derivatives.

    The entire body of literature on functional data is primarily centered around the theme of local linear estimation. Rachdi et al. [73] innovatively amalgamated the k-Nearest Neighbors method with the local linear estimation approach to formulate a novel estimator for the regression operator. This innovative approach was tailored for situations where the regressor adopts a functional form, and the response variable, though scalar, is subject to random missing observations. On a parallel note, [7] undertook the task of estimating the conditional density function through the application of the local linear methodology. Moreover, Chikr-Elmezouar et al. [29] delved into the estimation of the conditional density and mode when confronted with functional covariates. In the realm of local linear regression, [8] presented a novel local linear regression estimator and conducted a comprehensive study of its asymptotic behavior. The work by [9] and [11] introduced a general framework elucidating the local behavior of the regression operator, refer to [3] for recent references. The foundational work of [92] established mean-squared convergence and asymptotic normality for the local linear estimator. Expanding the scope to instances where both the response and explanatory variables are functionally characterized, [32] devised a nonparametric local linear estimator for the regression function. The principal objective of the current investigation is to establish a comprehensive framework for polynomial fitting. The central aim of this paper is to furnish the inaugural comprehensive theoretical validation for the kernel polynomial estimator. This entails determining the uniform consistency rate and discerning the asymptotic distribution. Nonetheless, it becomes evident that addressing this issue transcends a mere amalgamation of ideas from disparate domains. Instead, intricate mathematical derivations are imperative to contend with the challenges posed by estimators in our specific context. This necessitates the adept application of techniques rooted in large sample theory.

    The article is structured as follows. Section 2 provides the necessary notation and definitions. The main results are outlined in Section 3. The asymptotic distribution is discussed in Section 3.1, and its application to the confidence regions is presented in Section 3.3. The establishment of uniform almost complete convergence is detailed in Section 3.4. Local linear estimators for the conditional distribution and their asymptotic normality are presented in Section 4. In Section 5, we summarize the findings and highlight remaining open issues. All proofs are deferred to Section 6, with a focus on the most crucial arguments due to the lengthiness of the proofs. Additionally, relevant technical results are provided in the appendix.

    We begin by introducing the necessary notation and definitions for the forthcoming sections. Specifically, we recall the definition of the strong mixing property. Let Fki(Z) denote the σ-algebra generated by {Zj:ijk}. The following definition of the strong mixing coefficient is attributed to [76]. For further details, refer to [24,75,82,83].

    Definition 2.1. Let Z={Zi,i=1,2,} be a strictly stationary sequence of random variables. Given a positive integer n, define

    α(n)=sup{|P(AB)P(A)P(B)|:AFk1(Z) and BFk+n(Z),kN}.

    The sequence Z is said to be α-mixing (strong mixing) if the mixing coefficient α(n)0 as n.

    The α-mixing condition is the weakest among the commonly used mixing conditions and has numerous practical applications, particularly in economics and finance. For example, several time series models, such as ARCH, ARMA, GARMA, and stochastic volatility models, satisfy the α-mixing condition.

    Remark 2.2. Asymptotic independence, or "mixing", of the data-generating process is a common assumption in statistical learning for time series. For many typical processes, the general forms of various mixing rates are often assumed to be known, but specific coefficients are not. These mixing assumptions are rarely tested, and there are no established methods for estimating mixing coefficients from data. In [65], an estimator for beta-mixing coefficients based on a single stationary sample path is introduced. Khaleghi and Lugosi [51] propose strongly consistent estimators for the 1 norm of the sequence of α-mixing and β-mixing coefficients of a stationary ergodic process. These estimators are subsequently used to develop strongly consistent goodness-of-fit hypothesis tests. Specifically, Khaleghi and Lugosi [51] develop hypothesis tests to determine whether, under the same summability assumption, the α-mixing or β-mixing coefficients of a process are upper bounded by a given rate function.

    We consider a sequence {(Xi,Yi):i1} of stationary* αmixing random copies of the random vector [rv] (X,Y), where X takes its values in some abstract space F and Y in R. Suppose that F is endowed with a semi-metric d(,) defining a topology to measure the proximity between two elements of F and which is disconnected from the definition of X to avoid measurability problems. We will consider especially the conditional expectation of ψ(Y) given X=x,

    r(x)=rψ(x)=E(ψ(Y)X=x), (2.1)

    *In the case of Hilbert space-valued elements, strict stationarity is not necessarily required; second-order stationarity suffices. A Hilbert space-valued sequence {Xt}tZ is second-order (or weakly) stationary if EXt2<, EXt=μ, and E[(Xsμ)(Xtμ)]=E[(Xstμ)(X0μ)] for all s,tZ.

    We say that {Xt}tZ is strictly stationary if the joint distribution of {Xt1,,Xtn} is the same as the joint distribution of {Xt1+h,,Xtn+h} for all t1,,tnZ, n1, and h1.

    A semi-metric (sometimes called pseudo-metric) d(,) is a metric which allows d(x1,x2)=0 for some x1x2.

    whenever this regression function is meaningful. Here and elsewhere, ψ() denotes a specified measurable function, which is assumed to be bounded on each compact subinterval of R. Note that we can write

    ψ(Y)=r(X)+ϵ,withE(ϵX)=0,E(ϵ2X)=σ2(X).

    In this work, we study the problem of nonparametric estimation of the regressor function r() using, for the first time in the functional data, the local polynomial fitting such that the regressor belongs to an infinite dimensional set. From now on, for the ease of the notation, we set ψ(y)=y. Recall that in functional statistics, there are several approaches to developing the concept of local linear methods. For example (see [9]) the linear approximation of r() for any z in the neighborhood of x is given by

    r(z)=r(x)+bβ(z,x)+o(β(z,x)),

    where the quantities a=r(x) and b are estimated by minimizing the following criterion

    min(a,b)R2ni=1(Yiabβ(Xi,x))2K(h1Kδ(x,Xi)), (2.2)

    where the locating functions β(,) and δ(,) are defined on (F×F) and map into R, with |δ(,)|=d(,), where d(,) is a distance metric. The function β refers to the local behavior of our model, while K() is a kernel function, and hK=hK,n is the bandwidth or smoothing parameter of the kernel K(), controlling the size of the local neighborhood and the degree of smoothing. The performance of the estimate depends crucially on the two locating functions, β(,) and δ(,), which are defined on (F×F) and map into R. These functions satisfy |δ(,)|=d(,), where d(,) is a distance metric. K() is a kernel function, and hK=hK,n is the smoothing parameter of the kernel K(). To clarify, specific forms for δ(,) and β(,) can be given. For example, if the functional data are 'smooth' curves, one might use the following family of locating functions:

    loc(q)a(x1,x2)=θ(t)(x(q)1(t)x(q)2(t))dt=θ,x(q)1x(q)2X,

    where θ is a function that can be adapted to the data, X is a Hilbert space, and , denotes the inner product. Choosing β(,) from such a family is motivated by its connection to the following minimization problem:

    ˆc(x)=argminc(x)Rp+1ni=1(Yipl=0cl(x)θ,X(q)ix(q)lX)2K(h1|δ(Xi,x)|),

    which can be viewed as a type of 'local polynomial' regression approach when considering a functional explanatory variable. Metrics, or more generally semi-metrics, based on derivatives could also be suitable for locating one curve relative to another. For example, one might define:

    loc(q)b(x1,x2)=((x(q)1(t)x(q)2(t))2dt)1/2,

    which is a semi-metric. This second family of locating functions is particularly well-suited for δ(,), as it measures the proximity between two elements of X. Let U be an open subset of a real Banach space F. If f:UR is differentiable p+1 times on U, it can be expanded using Taylor's formula:

    f(x)=f(a)+Df(a)h+12!D2f(a)h2++1n!Dpf(a)hp+Rp(x),

    with the following expressions for the remainder term Rp(x) :

    Rp(x)=1p!Dp+1f(η)(xη)phCauchy form of remainder,Rp(x)=1(p+1)!Dp+1f(ξ)hp+1Lagrange form of remainder, Rp(x)=1p!10Dp+1f(a+th)((1t)h)phdtintegral form of remainder.

    Here a and x must be points of U such that the line segment between a and x lie inside U,h is xa, and the points ξ and η lie on the same line segment, strictly between a and x. If we collect the equal mixed partials (assuming that they are continuous) then

    1k!Dkf(a)hk=|J|=k1J!JfxJhJ,

    where J is a multi-index of m components, and each component Ji indicates how many times the derivative with respect to the i th coordinate should be taken, and the exponent that the ith coordinate of h should be raised to in the monomial hJ. The multi-index J runs through all combinations such that J1++Jm=|J|=k in the sum. The notation J! means J1!Jm!. Let us specify our setting as in [11]. Let U be an open subset of F. If f:UR is p+1 times differentiable on U, the Taylor expansion of f around xU is

    f(y)=pj=01j!Djf(x)dj(x,y)+Rp(y),

    where Djf(x) denote the j th Fréchet derivative of f at x, with x and y being points of U and d(,) the metric of F. The remainder is given by

    Rp(y)=1p!10Dp+1f{x+td(x,y)}dp+1(x,y)(1t)pdt=o{dp(x,y)},

    as d(x,y)0; see [26] and [90] for more details. Then, in our setting, by following a similar reasoning as in [9] and [11], the local polynomial estimator of order p of the regression function denoted by ˆr(x) of r(x) is defined as the first component of the estimate ˆc(x)=(ˆc0(x),,ˆcp(x)) which is obtained by the following minimization problem

    ˆc(x)=argminc(x)Rp+1ni=1(Yipl=0cl(x)βl(Xi,x))2K(h1Kδ(x,Xi)). (2.3)

    Furthermore, the estimator ˆrl(x)=l!ˆcl(x), where 0lp can be expressed similarly as an estimator of the existing lth order derivatives of the regression function r(). Then the estimator ˆrl(x) can be written as

    ˆc(x)=(QβWQβ)1QβWY, (2.4)

    where Qβ is the matrix defined by

    Qβ=(11β(X1,x)β(Xn,x)βp(X1,x)βp(Xn,x)),Y=(Y1,,Yn),andW=diag(K(h1kδ(Xi,x))),

    where is the transpose symbol. Let us introduce

    un,j=1nE(K)ni=1(β(Xi,x)hK)jK(h1K(δ(Xi,x)),vn,j=1nE(K)ni=1(β(Xi,x)hK)jK(h1K(δ(Xi,x))Yi,
    Un=(un,0un,pun,pun,2p),andVn=(vn,0,,vn,p).

    Keeping in mind this notation, we have the representation

    ˆc(x)=diag(1,h1K,h2K,,hpK)U1nVn.

    Interestingly, this class of estimators includes both the classical Nadaraya-Watson estimator, which minimizes (2.3) when p=0, and the local linear kernel estimator, which corresponds to p=1.

    Let x (resp. y) be a fixed point in F (resp. R), and Nx (resp. Ny) denote a fixed neighborhood of x (resp. y).

    In the remainder of the paper, we denote the closed ball in F of center xF and radius h by

    B(x,h):={xF:|δ(x,x)|h}.

    We define

    ϕx(l1,l2)=P(l2δ(X,x)l1),

    where l1 and l2 are two real numbers. To establish the asymptotic behavior of the local polynomial of our estimator, we need some following assumptions.

    (H1) For any u>0, ϕx(u):=ϕx(u,u)>0 where limu0ϕx(u)=0.

    It is easy to see that

    φx(u)=P(XB(x,u)).

    When the function |δ(,)| satisfies the conditions of a metric or semi-metric, the expression φx(u) can be understood as the probability of a ball in the set F that is centered at x and has a radius of u. When the value of u approaches zero, the term "small ball probabilities" is frequently employed in the field of probability theory, which has been extensively studied (refer to [54] for a comprehensive summary of this topic in the context of Gaussian processes). The function φx(), which is an extension of the small ball probability concept, serves a similar purpose in the functional situation as the density function does in the finite-dimensional context. In the context of multivariate nonparametric analysis, it is customary to estimate a specific quantity at a given position using a sufficiently large number of observations available in the vicinity. One commonly employed method for assuming this nature is to claim that the density function evaluated at this particular location possesses a strictly positive value. In the context of infinite dimensions, the absence of a reference measure, such as the Lebesgue measure in finite dimensions, necessitates the adoption of a similar assumption that does not rely on the concept of density. The objective of Hypothesis 1 (H1) is to incorporate the functional aspect, indicating that there are a sufficient number of observations surrounding x, hence justifying the estimation of the regression operator at the specific point x. The following conditions are needed in our analysis.

    (H2) (Xi,Yi)iN is an α-mixing sequence and

    (ⅰ) a>0,C>0,nN,α(n)Cna,

    (ⅱ) supijP((Xi,Xj)B(x,hK)×B(x,hK))ψx(hK), where ψx(hK) is such that there exists ϵ]0,1] for which 0<ψx(hK)=O(ϕ1+ϵx(hK));

    (H3) (ⅰ) r() and σ() are continuous in the neighbourhood of x, which means that r() and σ() are both in the set

    {f:FR,lim|δ(x,x)|0f(x)=f(x)},

    and there exists C>0, such that, almost surely,

    supijE(|YiYj|(Xi,Xj))C<,

    (ⅱ) For any k{1,2,,p,p+1}, the quantities Ψ(k)(0) exist, where Ψ(k)() denotes the kth derivative of Ψ() defined by Ψ(s)=E(r(X)r(x)β(X,x)=s);

    (H4) The locating operators β(,) and δ(,) satisfy the following conditions:

    (ⅰ) z F,C1|δ(x,z)||β(x,z)|C2|δ(x,z)|;

    (ⅱ) supvB(x,r)β(v,x)δ(x,v)∣=o(r);

    (H5) The kernel K() is a bounded and positive function which is supported within [1,1] and for which the first derivative K() satisfies: K(1)>0, K(t)<0, for t[1,1];

    (H6) There exists a positive integer n0 for which

    1ϕx(hK)11ϕx(zhK,hK)ddz(z2K(z))dz>C3>0 for n>n0,

    and

    h2KB(x,hK)B(x,hK)β(u,x)β(t,x)dP(X1,X2)(u,t)=o(B(x,hK)B(x,hK) β2(t,x)β2(u,x)dP(X1,X2)),

    where dP(X1,X2) is the cumulative distribution of (X1,X2);

    (H7) (ⅰ) For a>1, the function ϕx() satisfies

    η>0,C2n11aϕx(hK)C1n132aη;

    (ⅱ) The bandwidths hK satisfy

    limnhK=0,limnlognnϕx(hK)=0 and β>0such thatlimnnβhK=0.

    Before presenting our result, we need additional notation. In the sequel, we define certain quantities appearing in the dominant terms of the bias and of the variance in the asymptotic results. For j{1,2}, let

    Mj=Kj(1)11(Kj(u))χx(u)du, (3.1)

    where χx() is a a function such that

    limhK0ϕx(hK,thK)ϕx(hK)=χx(t) for any t[1,1].

    For all a>0, and b>0, let

    N(a,b)=Ka(1)11(ubKa(u))χx(u)du. (3.2)

    Assumption (H2-ⅰ) requires that the mixing coefficients of the dependent case tend to zero at a suitably mild rate. Assumption (H2-ⅱ) describes the behavior of the joint distribution of the pair (Xi,Xj) in relation to its marginal distributions, allowing us to present an explicit asymptotic variance term. Assumption (H3-ⅰ) imposes the usual moment condition on the responses and the covariance structure of the dependent sample, as detailed in [38]. Assumption (H3-ⅱ) specifies the necessary smoothness condition for the current setting. Assumption (H4) is a standard condition in nonparametric estimation. Condition (H5) is very usual in nonparametric estimation literature devoted to the functional data context. Notice that [70] symmetric kernel is not adequate in this context since the random process d(x,Xi) is positive, therefore we consider K() with support [0,1]. This is a natural generalization of the assumption usually made on the kernel in the multivariate case where K() is supposed to be a spherically symmetric density function. Assumption (H6) describes the behavior of the bandwidth h in relation to the small ball probabilities and the kernel function K(). Assumption (H7-ⅰ) is a technical condition illustrating the relationship between the small ball probability and an arithmetically α-mixing coefficient, as discussed in [42]. Assumption (H7-ⅱ) is satisfied when hK=nϱ and ϕx(hk)=hϑK, provided ϑ>0 and β<ϱ<1/ϑ. In the following, we use the notation ZD=N(μ,σ2) to indicate that the random variable Z follows a normal distribution with mean μ and variance σ2. The symbol D denotes convergence in distribution, and P denotes convergence in probability.

    Remark 3.1. According to [40], our methodology is heavily dependent on the distribution function ϕ(). This dependency is evident in our conditions and the convergence rates of our estimate (via the asymptotic behavior of the quantity nϕ(h)). More precisely, the behavior of ϕ() around 0 is of paramount importance. Consequently, the small ball probabilities of the underlying functional variable X are crucial. In probability theory, the calculation of P(Xx<s) for "small" s (i.e., for s tending toward zero) and for a fixed x is known as the "small ball problem". Unfortunately, there are solutions for very few random variables (or processes) X, even when x=0. In several functional spaces, taking x0 presents formidable obstacles that may be insurmountable. Typically, authors emphasize Gaussian random variables. We refer you to [54] for a summary of the key findings regarding the probability of small balls. If X is a Gaussian random element on the separable Banach space E and x belongs to the reproducing kernel Hilbert space associated with X, then the following well-known result holds:

    P(Xxs)CxP(Xs),ass0.

    As far as we know, the results available in the published literature are essentially all of the form

    P(Xx<s)cxsαexp(C/sβ),

    where α,β,cx, and C are positive constants, and may be a supremum norm, an Lp norm, or a Besov norm. The interested reader can refer to [12,16,20,38,39,40,79] for further discussion. Notably, the pioneering book by [38] extensively comments on the links between nonparametric functional statistics, small-ball probability theory, and the topological structure of the functional space E.

    In the following theorem, we present the limiting law.

    Theorem 3.2. Assume that the assumptions (H1)–(H7) are satisfied. We have, as n,

    nϕx(hK){diag(1,hK,,hpK)(ˆr(x)r(x)ˆΨ(1)(0)Ψ(1)(0)ˆΨ(p)(0)Ψ(p)(0))hp+1K(p+1)!Ψ(p+1)(0)S1U}DN(0,σ2(x)S1VS1),

    where

    S=(1N(1,p)M1N(1,p)M1N(1,2p)M1),V=diag(M2M21,N(2,2)M21,,N(2,2p)M21),U=(N(1,p+1)M1N(1,2p+1)M1),

    where ˆcp(x) is a consistent estimator of Ψ(p)(0). Henceforth, we will denote ˆcp(x) as ˆΨ(p)(0), and let ˆr(x)=ˆc0(x).

    The proof of Theorem 3.2 is postponed until Section 6. To eliminate the bias term, we need the following additional assumption.

    (H8) Assume that limnnh2p+2Kϕx(hK)=0.

    Corollary 3.3. Assume that the conditions (H1)–(H8) are satisfied. We have, as n,

    nϕx(hK){diag(1,hKhpK)(ˆr(x)r(x)ˆΨ(1)(0)Ψ(1)(0)ˆΨ(p)(0)Ψ(p)(0))}DN(0,σ2(x)S1VS1).

    Remark 3.4. To cancel the bias term, we need nh2p+2Kϕx(hK)0, as n. Consequently, the last condition and condition nϕx(hK) are satisfied as soon as hK=nξ, 0<ξ<1, with and ϕx(hK)=hcK, for 1ξ(2p2)<c<1ξ.

    A lower α th quantile of the vector Vn is any quantity τnαRp+1 satisfying τnα=inf{ε:P(Vnε)α}, where ε is an infimum over the given set only if there does not exist a ε1<ε in Rp+1 such that P(Vnε1)α. We can, without loss of generality, assume P(Vnτnα)=α. Since variance matrix Σ(x)=σ2(x)S1VS1 is unknown and assumed non-singular, see Corollary 4 of Appendix 1 of [4], there exists a non-singular matrix C(x) such that

    C(x)Σ(x)1C(x)=Ip+1.

    Let us first give a consistent estimate of Σ(x). Making use of the decomposition of χx(t) in (3.1), one may estimate χx(t) by

    ˆχx(t)=Fx,n,1(hk,thk)Fx,n,2(hK),

    where

    Fx,n,1(t,u)=1nni=11{tδ(x,Xi)u} and Fx,n,2(t)=1nni=11{|δ(x,Xi)|t}.

    Hence we have the following estimates, for j=1,2,

    ˆMj=Kj(1)11(Kj(u))ˆχx(u)du. (3.3)

    In a similar way, for all a>0, and b>0, we have

    ˆN(a,b)=Ka(1)11(ubKa(u))ˆχx(u)du. (3.4)

    Hence one can consistently estimate Σ(x) by ˆΣ(x)=ˆσ2(x)ˆS1ˆVˆS1, where

    ˆS=(1ˆN(1,p)ˆM1ˆN(1,p)ˆM1ˆN(1,2p)ˆM1),V=diag(ˆM2ˆM21,ˆN(2,2)ˆM21,,ˆN(2,2p)ˆM21),U=(ˆN(1,p+1)ˆM1ˆN(1,2p+1)ˆM1).

    For n large enough, we have

    ˆC(x)ˆΣ(x)1ˆC(x)=Ip+1.

    An application of Slutsky with Corollary 3.3 gives the following result.

    Corollary 3.5. Assume that the conditions (H1)–(H8) are satisfied. We have, as n,

    nϕx(hK)ˆC(x){diag(1,hKhpK)(ˆr(x)r(x)ˆΨ(1)(0)Ψ(1)(0)ˆΨ(p)(0)Ψ(p)(0))}DN(0,Ip+1).

    This corollary can be applied directly to the construction of the confidence regions.

    In this part, we will establish the rate of uniform almost complete convergence over some subset of F. The following definition is needed.

    Definition 3.6. Let ϵ>0 be given number. A finite set of points x1,x2,,xN in F is called an ϵnet for S if SNk=1B(xk,ϵ). The quantity Ψs(ϵ)=log(Nϵ(S)), where Ψs(ϵ) is the minimal number of open balls in F of radius ϵ which is necessary to cover S, is called Kolmogorov's ϵentropy of the set S.

    The concept of entropy, introduced by Kolmogorov in the mid-1950s (see [52]), serves as a measure of set complexity, indicating that high entropy implies a significant amount of information is required to accurately describe an element within a given tolerance ε. Consequently, the selection of the topological structure, specifically the choice of the semi-metric, plays a crucial role when examining asymptotic results that are uniform over a subset S of F. In particular, we subsequently observe that a well-chosen semi-metric can enhance the concentration of the probability measure for the functional variable X, while minimizing the ε-entropy of the subset S. Ferraty and Vieu [38] emphasized this phenomenon of concentration of the probability measure for the functional variable by calculating small ball probabilities in different standard scenarios (refer to [41]). For readers interested in these concepts (entropy and small ball probabilities) and/or the utilization of Kolmogorov's ε-entropy in dimensionality reduction problems, we recommend referring to [53] or/and [69], respectively. We establish the uniform almost-complete convergence of our estimator on some subset SF such that:

    SFNk=1B(xk,ιn),

    where xkF and ιn (resp. dn) is a sequence of positive real numbers. To establish our result, we need the following hypotheses.

    (U1) The hypothesis (H1) satisfies the following condition: there exists a differentiable function ϕ(), such that

    xF,0<C1ϕ(hk)P(XB(x,hk)C2ϕ(hk),

    and there exists n0 such that for each n<n0

    ϕ(n)<C,

    where ϕ() represents the first derivative of ϕ() and ϕ(0)=0;

    (U2) The hypothesis (H2) is satisfied uniformly on xSF;

    (U3) The function β(,) satisfies (H4) and the following Lipschitz condition, there exists a positive constant C, for all x1,x2 in F, we have

    β(x,x1)β(x,x2)∣≤Cd(x1,x2);

    (U4) The regression operator r() satisfies: there exists a positive constant C and s>0, such for each xF and zB(x,hK), we have

    r(x)r(z))∣≤Cds(x,z);

    (U5) The kernel function K() satisfies (H5) and is Lipschitzian on [1,1];

    (U6) In addition to (H7), the Kolmogorov's ε-entropy of SF satisfies:

    (ⅰ) n0,n>n0,log2nnϕ(h)<ψSF(lognn)<nϕ(h)logn.

    (ⅱ) λ>1, such that

    n=1exp{(1λ)ψSF(lognn)}=n=1d1λn<.

    Conditions (U1) and (U2) correspondingly serve as the uniform counterparts of (H1) and (H2). Assumption (U3), initially introduced and discussed by [9], plays a pivotal role in our methodology, particularly when computing the leading dominant terms in our asymptotic results. Moreover, (U4) proves essential for evaluating the bias component of the uniform convergence rates, and (U5) is a classical requirement in functional estimation within finite or infinite-dimensional spaces, with examples of kernel functions satisfying this condition available in [38]. The final condition regarding entropy, (U6), implies that ψSF(lognn)=o(nϕ(hK)) as n tends to infinity. However, in certain "usual" scenarios, such as a Hilbert space with a projection semi-metric, one can consider ΨsiF(lognn)Clogn, and (H6 i) is fulfilled whenever (logn)2=O(nϕ(h)). For further insights, one may refer to Example 4 in [41].

    We are now equipped to state the main result of this section concerning the uniform almost complete convergence with rate.

    Theorem 3.7. Assume that the assumptions (U1)–(U6) are satisfied. We have as n,

    supxSFˆc(x)c(x)F=O(hp+1K)+Oa.co.(logdnnϕ(hK)),

    Let (un) for nN be a sequence of real random variables. We say that (un) converges almost-completely (a.co.) toward zero if, and only if, for all ϵ>0, n=1P(|un|>ϵ)<. Moreover, we say that the rate of almost-complete convergence of (un) toward zero is of order vn (with vn0), and we write un=Oa.co.(vn) if, and only if, there exists ϵ>0 such that n=1P(|un|>ϵvn)<. This type of convergence implies both almost sure convergence and convergence in probability.

    where

    ˆc(x)=(ˆr(x),hKˆΨ(1)(0),,hpKˆΨ(p)(0)).

    In this section, we will establish results analogous to Theorem 3.2 for the local polynomial estimator of order p of the Cumulative distribution function which we will denote by ˆFx(). We notice that the distribution function is expressed in terms of regression depending on the choice of the function ψ(Y) (see Eq (2.1)). Then if ψy(Y)=1Yy, the distribution function is defined by

    yR:Fx(y)=P(Yy|X=x).

    The functional local polynomial estimator of Fx(y) is based on the minimization, with respect to ˆa=(ˆa0,,ˆap), of

    ˆa=argmin(a0,,ap)Rp+1ni=1(L(h1L(yYi))pl=0al(x)βl(Xi,x))2K(h1Kδ(x,Xi)), (4.1)

    where L() is cumulative distribution function and (hL=hL,n) is a sequence of positive real numbers. In the realm of functional statistics, the estimation of the conditional cumulative distribution function holds significant importance both theoretically and practically. This is evident in various applications, such as reliability and survival analysis. Additionally, within the domain of nonparametric statistics, numerous prediction tools, including conditional density, conditional quantiles, and conditional mode, play crucial roles. It is worth noting that these considerations apply particularly when p, is equal to 1. The almost-complete convergence of the estimator ˆFx() in the case of p=1 has been investigated by [31]. Furthermore, [14] has explored the asymptotic normality of the local linear estimation of the conditional function in the scenario of independent observations. For the development of our estimator, we adopt a similar approach as used for the regression operator. Consequently, the local polynomial estimator of ˆFx(y) can be explicitly expressed as follows:

    (ˆFx(y),ˆa1,ˆap)=diag(1,h1K,h2K,,hpK)A1nBn,

    where

    An=(Λn,0Λn,pΛn,pΛn,2p),andBn=(Υn,0,,Υn,p).

    We denote

    An,j=1nE(K)ni=1(β(Xi,x)hK)jK(h1K(δ(Xi,x)),Υn,j=1nE(K)ni=1(β(Xi,x)hK)jK(h1K(δ(Xi,x))L(h1L(Yiy)).

    To establish the asymptotic convergence of ˆFx(y) we need some notation and assumptions. For any l{0,2,,2p}, we set

    φl(,y)=lF(y)yl and ψl(s)=E[φl(X,y)φl(x,y)|β(X,x)=s].

    (M1) For any k{1,,p+1}, the quantities ψ(k)l(0) exist, where ψ(k)l() denotes the kth order derivative of ψl().

    (M2) The kernel K() satisfies the assumption (H5) and its first derivative K(1)() satisfies:

    K2(1)11(K2(u))Ψx(u)du>0.

    (M3) The kernel function L() is a positive function, symmetric, bounded, integrable, and such that and

    Rz2pL(z)dz<.

    (M4) The coefficients of α-mixing sequence(Xi,Yi)iN satisfies (H2) and the following condition,

    +l=1lδ(α(l))1p< for some p>0 and δ>1p.

    (M5) The function ϕx() satisfies the assumption (H1).

    (M6) Let (wn) and (qn) be sequences of positive integers tending to infinity, such that (wn+qn)n, and

    (ⅰ) qn=o((nϕx(hK))12) and limn+(nϕx(hK))12α(qn)=0,

    (ⅱ) rnqn=o((nϕx(hK))12) and limn+rn(nϕx(hK))12α(qn)=0, where rn is the largest integer, such that rn(wn+qn)=O(n).

    Theorem 4.1. Under assumptions (M1)–(M6), we have

    nϕx(hK){diag(1,hK,,hpK)(^Fx(y)Fx(y)^a1Ψ(1)0(0)^apΨ(p)0(0))BpK(x,y)BpL(x,y)}DN(0,VLK(x,y)),

    where

    BpL(x,y)=pj=0[E(βj1K1)hjKE(K1)I+pk=1pa=1h2kL(2k)!t2kL(t)dtΨ(a)k(0)E(βj+a1K1)hjKE(K1)],I=(h2L2φ2(x,y)t2L(t)dt,,h2pL(2p)!φ2p(x,y)t(2p)L(t)dt)T,BpK(x,y)=hp+1K(p+1)!Ψ(p+1)0(0)S1U,VLK(x,y)=Fx(y)(1Fx(y))S1VS1.

    Remark 4.2. Let xF be a fixed element, and yR. If we define ψy(Y)=1],y](Y), then the operator rψ(x)=E(ψy(Y)X=x) represents the conditional cumulative distribution function (CDF) of Y given X=x, denoted as F(yx)=P(YyX=x). This can be estimated as ˜F(yx):=ˆrψ,n(x). For a given α(0,1), the α-th order conditional quantile of the distribution of Y given X=x is defined as qα(x)=inf{yR:F(yx)α}. If F(x) is strictly increasing and continuous in a neighborhood of qα(x), then F(x) has a unique quantile of order α at qα(x), i.e., F(qα(x)x)=α. In such cases:

    qα(x)=F1(αx)=inf{yR:F(yx)α},

    which is uniquely estimated by ˆqn,α(x)=˜F1(αx). Conditional quantiles have been extensively studied when the predictor X is of finite dimension. Since F(qα(x)x)=α=˜F(ˆqn,α(x)x), and ˜F(x) is continuous and strictly increasing, we have, for any ϵ>0, there exists η(ϵ)>0 such that for all y:

    |˜F(yx)ˆFT(qα(x)x)|η(ϵ)|yqα(x)|ϵ.

    This implies that, for any ϵ>0, there exists η(ϵ)>0 such that:

    P(|ˆqn,α(x)qα(x)|η(ϵ))P(|˜F(ˆqn,α(x)x)˜F(qα(x)x)|η(ϵ))=P(F(qα(x)x)˜F(qα(x)x)η(ϵ)).

    This result may be immediately used to establish the consistency of the quantile estimator. This topic will be investigated in future research.

    Remark 4.3. Over time, it has become increasingly clear that the performance of kernel estimates deteriorates as the dimensionality of the problem increases. This decline is primarily due to the fact that, in high-dimensional spaces, local neighborhoods often lack sufficient sample observations unless the sample size is extraordinarily large. Consequently, in kernel estimation, computing local averages is not feasible unless the bandwidth is significantly wide. This overarching issue is known as the curse of dimensionality [10]. The paper by [77] provides a comprehensive analysis of the feasibility and challenges of high-dimensional estimation, complete with examples and computations. For recent references, see [6,15,16,35]. To address the difficulties associated with high-dimensional kernel estimation and simplify the process, a wide range of dimension reduction techniques have been developed. These techniques major to explain the main features of a set of sample curves using a small set of uncorrelated variables. One effective solution is to generalize principal component analysis (PCA) to the context of a continuous-time stochastic process [33]. The asymptotic properties of the estimators of Functional Principal Component Analysis (FPCA) have been extensively studied in the general context of functional variables [30]. Nonparametric methods have been developed to perform FPCA for cases involving a small number of irregularly spaced observations of each sample curve [50,88]. As in the multivariate case, interpreting principal component scores and loadings is a valuable tool for uncovering relationships among the variables in a functional data set. To avoid misinterpretation of PCA results, a new type of plot, named Structural and Variance Information plots, was recently introduced by [25]. Numerical results suggest that regularization using a smoothness measure often yields more satisfactory outcomes than the FPCA approach. The FPCA method assumes that the top eigenfunctions of the predictors, which are unrelated to the coefficient function or the response, provide a good representation of the coefficient function. While it is generally believed that neither approach will consistently outperform the other, the Reproducing Kernel Hilbert Space (RKHS) method is an interesting alternative that merits attention [58,89]. Spline smoothing is one of the most popular and powerful techniques in nonparametric regression [36,46,47,84]. Penalized splines have gained widespread use in recent years due to their computational simplicity and their connection to mixed effects models [86]. These methods have also found significant success in Functional Data Analysis (FDA), as evidenced by tools like the R package refund. However, there is a consensus that the asymptotic theory of penalized splines is difficult to derive, and many theoretical gaps remain, even in the standard nonparametric regression setting. To our knowledge, there are very few theoretical studies on penalized splines for FDA [87]. Most approaches to functional regression are based on minimizing some L2-norm and are therefore sensitive to outliers. Finally, we can mention methods based on the delta sequences [23] and the wavelet methods [34].

    Local polynomial fitting exhibits various statistically significant characteristics, particularly in the intricate domain of multivariate analysis. As functional data analysis gains traction in the field of data science, there arises a need for a specialized theory focused on local polynomial fitting. This study employs local higher-order polynomial fitting to address the complex challenge of estimating the regression function operator and its partial derivatives for stationary mixing random processes, represented as (Yi,Xi). For the first time, we have achieved significant progress by demonstrating the joint asymptotic normality of the estimates for the regression function and its partial derivatives, particularly applicable to strongly mixed processes. Additionally, we derive precise formulas for the bias and the variance-covariance matrix of the asymptotic distribution. We establish the uniform strong consistency of the regression function and its partial derivatives across compact subsets, providing a detailed analysis of their convergence rates. Importantly, these conclusions are grounded in general conditions that form the foundation of the underlying models. To illustrate practical utility, we utilize our findings to compute confidence regions for individual points. Furthermore, we extend our concepts to encompass the nonparametric conditional distribution and acquire its limiting distribution. There are multiple avenues for further methodological development. As a prospective direction, relaxing the stationarity assumption and exploring similar uniform limit theorems for local stationary functional ergodic processes would be fruitful. Additionally, considering the functional kNN local polynomial approach and expectile regression estimators in future investigations could yield alternative estimators benefiting from the advantages of both methods, as discussed in [20] and referenced in [19]. Investigating the extension of our framework to the censored data setting would also be of interest. In an upcoming investigation, obtaining a precise uniform-in-bandwidth limit law for the proposed estimators will be essential. This outcome would allow the bandwidth to vary across a comprehensive range, ensuring the estimator's consistency and serving as a valuable practical guideline for bandwidth selection in nonparametric functional data analysis. It is crucial to acknowledge that advancing research in this direction poses a more challenging task compared to previous efforts. The primary challenge lies in the need to develop new probabilistic results, such as inequalities and maximal moment inequalities specifically tailored for independent and identically distributed (i.i.d.) samples, as discussed in [81].

    This section is devoted to the proof of our main result. The previously presented notation continues to be used in the following. Before studying the limit law of ˆc(x), we need to center the vector Vn as follows. Let

    un,j=1nE(K)ni=1(βihK)jKi(Yir(x)), (6.1)

    where Vn=(un,0,,un,p). Making use of assumption (H4) and by the Taylor expansion, we obtain

    E(r(Xi)βi)=r(x)+E(r(Xi)r(x)βi)=r(x)+Ψ(βi)=r(x)+pk=11k!Ψ(k)(0)βpi+o(βpi).

    Therefore, we have

    Γ:=((r(X1),,r(Xn))=Qβ(r(x)Ψ(1)(0)Ψ(p)(0))+1(p+1)!Ψ(p+1)(0)(βp+11,,βp+1n)+(o(βp+11),,o(βp+1n)).

    Making used of Eq (6.1), we have:

    Un1Vn=diag(1,hK,h2K,,hpK)(ˆc(x)(QβWQβ)1QβWΓ)=diag(1,hK,h2K,,hpK)(ˆr(x)r(x)ˆΨ(1)(0)Ψ(1)(0)ˆΨ(p)(0)Ψ(p)(0))hp+1K(p+1)!Ψ(p+1)(0)U1n(un,p+1un,2p+1)o(hp+1K)U1n(un,p+1un,2p+1). (6.2)

    The rest of the proof for this theorem relies on the following lemmas, the proofs of which are provided in the appendix.

    Lemma 6.1. Under the assumptions (H1)–(H7), we have, as n,

    UnPSandU1n(un,p+1un,2p+1)PS1U.

    Hence, considering Lemma 6.1, Theorem 3.2 will hold if we can furnish the proof.

    Lemma 6.2. Under the assumptions of Theorem 3.2, we have, as n,

    nϕx(hK)VnDN(0,σ2(x)V).

    Oussama Bouannani: Conceptualization, formal analysis, investigation, methodology, validation, writing – original draft, review, edition; Salim Bouzebda: Conceptualization, formal analysis, methodology, validation, writing – original draft, review, edition. All authors have read and approved the final version of the manuscript for publication.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    The authors extend their sincere gratitude to the Editor-in-Chief, the Associate Editor, and the three referees for their invaluable feedback. Their insightful comments have greatly refined and focused the original work, resulting in a markedly improved presentation.

    The authors declare no conflict of interest. Salim Bouzebda is the(a) Guest Editor of special issue "Advances in Statistical Inference and Stochastic Processes: Theory and Applications" for AIMS Mathematics. Salim Bouzebda was not involved in the editorial review and the decision to publish this article.

    Lemma A.1. (See [92]) Under assumptions (H1), (H2-ii) and (H5)–(H7-i), we have

    (1) E(Kj1)=Mjϕx(hK), for j=1,2;

    (2) E(Ka1β1)=o(hKϕx(hK)), for all a>0;

    (3) E(Ka1βb1)=N(a,b)hbKϕx(hK)+o(hbKϕx(hK)), for all a>0 and b=2,4;

    (4) For all (k,l)N×N,E(Kk1|β1|l)ChlKϕx(hK);

    (5) For all (a,c,l,s)N×N×N×N,E(Ka1Kc2|β1|l|β2|s)Chl+sKψx(hK).

    Proof of Lemma 6.1. To prove this lemma, we use the sufficient convergence condition in probability. For this, it suffices to prove the following formulas

    E(un,0)=1,E(un,1)P0,E(un,p)PN(1,p)M1,E(un,1)PN(1,2p+1)M1,

    and

    Var(un,j)0,forj{0,1,p,2p+1}. (A.1)

    Using Lemma A.1, we get

    E(un,0)=1,E(un,1)=E(hKβ1K1)E(K1)=o(1),E(un,l)=E(hlKβl1K1)E(K1)N(1,l)M1forl{p,2p+1}.

    Concerning Eq (A.1), we can write

    Var(un,j)=1n2E(K1)2(ni=1Var(βjihjKKi)+21ilnCov(βjihjKKi,βjlhjKKl))=Qn1+Qn2.

    For the term Qn1, we have

    Qn1=1n2E(K1)2ni=1Var(βjihjKKi)1nE(K1)2E(β2j1h2jKK21)1nϕx(hK). (A.2)

    Moreover, for computing the term of Qn2, we used the same technique as in [62], we define the sets W1 and W2 as follows

    W1={(i,l){1,,n} such that 1≤∣il∣≤un},

    and

    W2={(i,l){1,,n} such that un+1≤∣il∣≤n1},

    where the sequence un is selected in such a way that un+ as n+. Based on the aforementioned splitting, we obtain

    Q1,n=1n2E2(K1)W1Cov(hjKβjiKi,hjKβljKl), (A.3)

    and

    Q2,n=1n2E2(K1)W2Cov(βckKk,hjKβjlKl). (A.4)

    For the sum Q1,n, by assumptions (H2-ii), (H5) and by Lemma of [92], we have

    Cov(h2jKβjiKi,βjlKl)E(h2jKβjiKiβjlKl)+E2(hjKβj1K1)C(E(KiKl)+E2(K1))C(ϕ1+ϵx(hK)+ϕ2x(hK))C(ϕ1+ϵx(hK)).

    Then, we readily infer

    Q1,n1n2E2(K1)W1Cϕ1+ϵx(hK)unnϕ1+ϵx(hK).

    On the other hand, for computing the sum of covariance Q1,n, we use the inequality for the bounded mixing processes, then for any li, we have

    Cov(hjKβjiKi,hjKβjlKl)∣≤Cα(|il|).

    Next, using the inequality

    jx+1jauxua,

    we derive

    ni=1W2α(|il|)Cn(un)1aa1. (A.5)

    Therefore, we readily obtain

    |Q2,n|C(un)1anϕ2x(hK).

    Subsequently, by choosing un=ϕ1+ϵx(hK))1/a and by assumption (H7), we obtain

    Q1,n0 and Q2,n0 as n.

    Hence the proof is complete.

    Proof of Lemma 6.2. We consider a given vector of real numbers, denoted as a=(α0αp)0, then

    nϕx(hK)aVn=1nni=1(ZiE(Zi))+1nni=1E(Zi), (A.6)

    where

    Zi=nϕx(hK)E(K1)(pj=0αjhjKβji)Ki(Yir(x)).

    By the Cramér-Wold theorem and Slutsky's theorem, to show (13.6), it suffices to prove the following two claims:

    1nni=1E(Zi)=nϕx(hK)Ψ(0)M1(α0o(hK)+pj=1ajhj+1KN(1,j+1)), (A.7)

    and

    1nni=1(ZiE(Zi))DN(0,σ2(x)aVa). (A.8)

    First, we proof Claim (A.7). We have

    1nni=1E(Zi)=nE(Z1)=nϕx(hK)E(K1)E((α0+pj=1αjhjKβj1)K1(Y1r(x)))=nϕx(hK)E(K1)a0E(K1(Y1r(x)))S1+nϕx(hK)E(K1)pj=1ajE(hjkβjK1(Y1r(x)))S2.

    Next, by the same arguments as those used by [72] and [40], we get

    S1=α0Ψ(0)E(K1Ψ(β1))+o(E(K1β1))=α0Ψ(0)o(hKϕx(hK),

    and

    S2=pi=1ajΨ(0)hj+1KN(1,j+1)ϕx(hK),

    then, using Lemma A.1, we obtain

    1nni=1E(Zi)=nϕx(hK)(α0Ψ(0)M1o(hK)+pi=1ajΨ(0)hj+1KN(1,j+1)). (A.9)

    Now, we prove Claim (A.8). Using The CLT by [56] (see Corollary 2.2, Page 196), which rests on the asymptotic behavior of the quantity

    limnni=1E[Δ2i], (A.10)

    where Δi=1n(ZiE(Zi)) in addition to the assumptions:

    Assumption 1. (A1) There exists a sequence τn=o(n), such that

    (i) τn=o(n) such that τn(maxi=1,,nCi)1, where Ci=esssupωΩ|Δi|,

    (ii) for all ϵ>0, nτn α(ϵτn)0, for all ϵ>0.

    Assumption 2. (A2) There exists a sequence (mn) of positive integers tending to , such that

    (i) nmnγn=o(1), where γn:=max1ijn(E|ΔiΔj|),

    (ii) for all ϵ>0, nτnα(ϵτn)0.

    We start by computing the limit of (A.10). In order to do so, let us observe that

    ni=1E(Δ2i)=Var(Z1)=ϕx(hK)E2(K1)E((α0+pj=1αjhjKβj1)2K21(Y1r(x))2)ϕx(hK)E2(K1)E2((α0+pj=1αjhjKβj1)K1(Y1r(x)))=A1A2.

    Using the fact that

    A1=ϕx(hK)E2(K1)(α20E(K21(Y1r(x))2)I1+2α0pj=1αjE(hjKβj1K21(Y1r(x))2I2)+ϕx(hK)E2(K1)E((pj=1αjhjKβj1)2K21(Y1r(x))2I3),

    in combination with Lemma A.1, we readily find

    I1=a20σ2(x)M1ϕx(hK),I2=a0pj=1ajσ2(x)O(ϕx(hK)),

    and

    I3=pj=1a2jE(h2jKβ2j1(K21(Y1r(X1))2))+2p1ijnajaiE(hjiKβj+i1(K21(Y1r(X1))2))=σ2(x)M1pj=1a2jN(2,2j)ϕx(hK)+2σ2(x)M1p1ijnajaiO(ϕx(hK)).

    Under assumptions (H1), we have

    limn+A1=σ2(x)M1M22(a20+pj=1ajN(2,2j))=σ2(x)aVa.

    Subsequently, the assertion (A.7) suggests that E(Z1)0 as n. Consequently, we infer that A20 as n. Concerning Assumption (1), the fact that the kernel K() is bounded, allows us to infer that

    Ci=O(1nϕx(hK)).

    Therefore, an appropriate choice is the following

    τn=nϕx(hK)logn.

    Furthermore, this choice gives, for all ϵ>0

    nτn α(ϵτn)C(n1(a+1)/2(ϕx(hK))(a+1)/2(logn)(a+1)/2)Cn1(a+1)/2+(a+1)/2(a1)(logn)(a+1)/2Cn(3aa2)/2(a1)(logn)(a+1)/20 since a>3.

    Concerning Assumption (2), using assumptions (H2-ⅱ) and (H3), we have

    E|ΔiΔj|ϕx(hK)nE(K1)20jip|αiαj|E(KiKj)0jip|αiαj|nϕx(hK)supijP((Xi,Xj)B(x,hK)×B(x,hK))Cϕϵx(hK)n.

    Elsewhere, using the fact that

    jx+1jauxua=[(a1)xa1]1, (A.11)

    we obtain

    j=mn+1α(j)j=mnα(j)tmntadt=m1ana1,

    thus,

    (j=mn+1α(j))ni=1Ci=O(m1ana1nϕx(hK)).

    We select

    mn=(ϕx(hK)nlogn)1/(2(1a)),

    such that represents the function of the integer part. It is evident that, under assumption (H7), mn. Furthermore, by substituting the expression for mn, we readily get

    j=mn+1α(j)ni=1Ci=o(1).

    Once more, considering assumption (H2-ⅰ) and (H7), we have

    mnγnCn11/(2(1a)(ϕx(hK))1+1/(2(1a)(logn)1/(2(1a))n(3+2a)/(2(1a)(ϕx(hK))(32a)/(2(1a)(logn)1/(2(1a))n1+η(2a3)/(2(1a)(logn)1/(2(1a))=o(n1).

    Hence the proof is complete.

    Proof of Theorem 3.7. According to the formula of (6.2), we deduce the following error formula

    ˆc(x)c(x)=U1n(x)Vn(x)U1n(x)en(x)+hp+1K(p+1)!Ψ(p+1)(0)U1n(x)en(x)+O(hp+1K)U1n(x)en(x),

    where en(x)=(un,p+1(x),,un,2p+1(x)). Hence, Theorem 3.7 is a direct result of the following lemmas.

    Lemma A.2. Assume that the conditions (U1), (U3), (U5) and (U6) are fulfilled. We have, as n,

    supxSF|unj(x)E(unj(x))|=Oa.co.(logdnnϕx(hK)).

    Lemma A.3. Assume that the conditions (U1)–(U6) are fulfilled. We have, as n,

    supxSF|vnj(x)E(vnj(x))|=Oa.co.(logdnnϕx(hK)),
    supxSF|enj(x)E(enj(x))|=Oa.co.(logdnnϕx(hK)).

    Lemma A.4. Assume that the conditions (U1)–(U5) are fulfilled. We have, as n,

    supxSF|E(vnj(x))|=O(hp+1K),supxSF|E(enj(x))|=O(1).

    Proof of Lemma A.2. The proof of this lemma relies on Proposition A.11-ⅰ in [38]. In order to apply the latter, we need to compute a certain quantity. In a first attempt, we evaluate the expression E(unj(x)). For this, we use Lemma A.1 and by the fact that observations are identically distributed, we have

    E(unj(x))=1hjKE(K1)E(βj1K1) (A.12)
    =O(1). (A.13)

    Moreover, for the covariance term, we have :

    S2n,α,l(x)=ni=1ni=1|Cov(Aαi(x),Alj(x))|, (A.14)

    where

    Ali(x)=1hlK(βliki(x)E(βli(x)ki(x)),forl{1,2,,p}. (A.15)

    Under some of the dependence assumptions, we treat the term of S2n,α,l(x). Then we have

    S2n,α,l(x)=Rn,1(x)+Rn,2(x)+nVar(Al1(x)),

    with

    Rn,1(x)=W1Cov(Alj(x)),Aαi(x)));W1={(i,j){1,2,,n} such that 1≤∣il∣≤un},

    and

    Rn,2(x)=W2Cov(Alj(x)),Aαi(x)));W2={(i,j){1,2,,n} such that un+1≤∣il∣≤n1}.

    For the term variance using Lemma A.1 and we deduce that

    nVar(Al1(x))=O(nϕx(hK)). (A.16)

    Concerning the covariance term Rn,1(x), we use the same idea as those used in the Eq (A.3) with the choice un=ϕϵx(hK), we get

    Rn,1(x)=O(nϕx(hK)), (A.17)

    and for the term Rn,1(x) we applied Rio inequality, see Theorem B.3 (ⅱ). So for this technique, we need to compute the absolute moments of the random variable Alj(x). Then, using a Newton expansion, we have

    E[|Alj(x)|m]=E|hlmK(Kj(x)βlj(x)E[Kj(x)βlj(x)])m|=hlmKE|mi=0Cim(Kj(x)βlj)i(x)(E[Kj(x)βlj(x)])mi(1)mi|hlmKmi=0CimE|Kj(x)βlj|i(x)|E[Kj(x)βlj(x)]|mihlmKmi=0CimE|Kij(x)βlij(x)||E[Kj(x)βlj(x)|mi,

    where Cim=m!i!(mi)!. Next, by applying Lemma A.1, we get

    E[|Alj|m]=O(ϕx(hK)).

    By following the same reasoning used to establish Eq (A.4) with the choice un=ϕϵx(hK) permits to get:

    Rn,2(x)=O(nϕx(hK)). (A.18)

    Finlay, by Eqs (A.16), (A.17) and (A.18), we deduce that

    S2n,α,l(x)=O(nϕx(hK)). (A.19)

    On the other hand, we can write

    unj(x)E(unj(x))=1nϕx(hK)ni=1Ali(x), (A.20)

    where Ali(x) is defined in (A.15). Now, we apply the Definition 3.6 to show the uniform convergence of our estimator on a subset S of F. For this aim, we take

    z(x)=argminj{1,,dn}|δ(x,xj)|.

    Let us consider the following decomposition:

    supxSF|unj(x)E(unj(x))|supxSF|unj(x)unj(xz(x)))|T1+supxSF|unj(xz(x))E(unj(xz(x)))|T2+supxSF|E(unj(xz(x)))E(unj(x))|T3.

    Concening the term T2, for all ϵ>0, we have

    P(T2>ϵ)=P(maxj{1,,dn}|unj(x)E(unj(xz(x)))|>ϵ)dnmaxj{1,,dn}P(|unj(x)E(unj(xz(x)))|>ϵ)dnmaxj{1,,dn}P(|Anj(xz(x)))|>ϵnϕ(hK)).

    Next, we can apply the Proposition A.11 of [38] for any m>2,τ>0,ϖ1 and for certain 0<C<,

    P(T2>τ)<C(A1+A2),

    where

    A1=dn(1+τ2n2ϕ(hK)2ϖS2n,α,l)ϖ/2,A2=dnnϖ1(ϖτnϕ(hK))(a+1)m/(a+m). (A.21)

    By using equation of (A.19) and under assumption (U1), we obtain that

    S2n,α,l=supxSFS2n,α,l(x)=O(nϕ(hK)).

    Now, for η>0, we choose

    τ=ηlogdnnϕ(hK)andϖ=(log(dn))2.

    Therefore, by utilizing the fact that ΨSF(ϵ)=log(dn) and by taking C η2=λ, we obtain under the assumption (U6) the following terms

    A1=O(d1λn)andA1=O(n1λ),for λ,λ>0.

    Finally, for η large enough, we infer

    P(T2>ηlogdnnϕ(hK))<C(d1λn+n1λ).

    Using the fact that

    n=1d1λn<and n=11n1+λ<,

    we get

    T2=Oa.co.(logdnnϕ(hK)).

    For the term T1 we have

    T1CnhjKϕ(hK)supxSFni=1Ki(x)1B(x,hK)(Xi)βji(x)βji(xzx)1B(xzx,hK)(Xi)+CnhjKϕ(hK)supxSFni=1βji(xzx)1B(xzx,hK)(Xi)Ki(x)1B(x,hK)(Xi)Ki(xzx):=Lj1+Lj2.

    Evaluation of the term Lj1. Under assumption (U3) we have

    1B(x,hK)(Xi)βji(x)βji(xzx)1B(xzx,hK)(Xi)Cιnhj1K1B(x,hK)B(xzx,hK)(Xi)+ChjK1B(x,hK)¯B(xzx,hK)(Xi),

    which gives the following expression

    Lj1CιnnhKϕ(hK)supxSFni=1Ki(x)1B(x,hK)B(xzx,hK)(Xi)+Cnϕ(hK)supxSFni=1Ki(x)1B(x,hK)¯B(xzx,hK)(Xi). (A.22)

    Making use of the Lipschitz condition on K() and by hypotheses (U3) enables us to directly write

    βji(xz(x))1B(xz(x),hK)(Xi)Ki(x)1B(x,hK)(Xi)Ki(xz(x))ChjKιn1B(x,hK)B(xz(x),hK)(Xi)+ChjKKi(xz(x))1¯B(x,hK)B(xz(x),hK)(Xi).

    Therefore, we readily obtain

    Lj2Cιnnϕ(hK)supxSFni=11B(x,hK)B(xz(x),hK)(Xi)+Cnϕ(hK)supxSFni=1Ki(xz(x))1¯B(x,hK)B(xz(x),hK)(Xi).

    Under assumption (U5) en by combined the last inequality with (A.22) implies that

    T1Zinϕ(hK),

    where

    Zi=CιnhKsupxSFni=11B(x,hK)B(xz(x),hK)(Xi).

    Using a similar approach to the one employed in the proof of (13.14), under the assumptions (U1), (U2) and (U6), we obtain

    S2n=∣Cov(Zi,Zj)∣=O(nϕ(hK)).

    Next, by employing a similar proof to the one applied in the evaluation of T2, we infer that

    T1=Oa.co.(logdnnϕ(hK)).

    Concerning the term of T3, it is obvious that

    T3<E(supxSF|unj(x)unj(xz(x)))|),

    we infer that

    T3=Oa.co.(logdnnϕ(hK)).

    This completes the proof.

    Proof of Lemma A.4. By conditioning on X1, we have

    supxSF|E(vnj(x))|1|E(K1)||E((β1h1K)jK1))|supX1B(x,hK)|r(X1)r(x)|,

    so under assumption (U4) and by using the uniform version of Lemma A.1, we obtain

    supxSF|E(vnj(x))|=O(hp+1K).

    Concerning the term supxSF|E(enj(x))|=O(1) is already treated in Eq (A.12).

    Proof of Theorem 4.1. Using similar reasoning employed for the regression function, we show that

    nϕx(hK)(BnBpL(x,y)DN(0,VF(x,y)), (A.23)

    where

    Bn=(Υn,0,,Υn,p),VF(x,y)=Fx(y)(1Fx(y))aVa)

    and

    Υn,j=1nE(K)ni=1(β(Xi,x)hK)jK(h1K(δ(Xi,x))(L(h1L(Yiy))Fx(y)).

    For any given vector of real numbers a=(α0,,αp)0, we have

    nϕx(hK)aBn=1nni=1(RiE(Ri))+1nni=1E(Ri),

    where

    Ri=ϕx(hK)E(K1)(pl=0αlhlKβli)Ki(LiFx(y)). (A.24)

    According to the Cramér-Wold theorem and Eq (A.24), Eq (A.23), will be verified if we prove the following two statements:

    1nni=1(RiE(Ri))DN(0,Fx(y)(1Fx(y))aVa), (A.25)
    1nni=1(E(Ri))=nϕx(hK)BpL(x,y). (A.26)

    Proof of statement A.25. We utilize Bernstein's big-block method, as employed in Theorem 3.1 of [55]. The set (1,2,,n) is partitioned into 2υn+1 subsets, each containing large blocks of size (wn) and small blocks of size (qn), by considering

    υ=nwn+qn.

    Assumption (M6) allows us to define the size of the large block as follows

    wn=(nϕx(hK))12rn.

    Afterward, under the same assumptions and employing straightforward calculations, we obtain:

    limn+qnwn=0,limn+wnn=0,limn+wnnϕx(hK)=0andlimn+nwnα(qn)=0. (A.27)

    It can be easily inferred that, as n tends to infinity,

    υqnn(nrn+vn)qnnqnwn+qnqnwn=0. (A.28)

    Presently, we partition the sum ni=1(RiE(Ri)) into distinct large and small blocks as outlined below. Let

    Ij=(j1)(w+q)+1,lj=(j1)(w+q)+w+1,j=1,2,,υ.

    One can see that

    1nni=1(RiE(Ri))Ei=υj=1Fjn+υj=1Fjn+Fnn=:n1/2(S1,n+S2,n+S3,n),

    where the random variables Fj,Fj and Fn are defined by

    Fj=Ij+w1i=IjEi,Fj=lj+q1i=ljEi,Fn=ni=υ(w+q)+1Ei. (A.29)

    Subsequently, making use of Slutsky's theorem, we establish that the two terms 1nS2,n and 1nS3,n converge to zero in probability. Before applying Lindeberg-Feller conditions to establish the asymptotic normality of the term 1nS1,n, it is imperative to demonstrate that the variables Fj are asymptotically independent. Therefore, we obtain

    E(S22,n)n=υj=11nVar(Fj)+υj=1υi=1ij2nCov(Fj,Fi)=:1n(A1+A2).

    Using second-order stationarity, we get

    Var(Fj)=qVar(E1)+2qijCov(Ei,Ej).

    Therefore

    A1nυqnVar(E1)VF(x,y)+2nnijCov(Ei,Ej).

    To derive the covariance term, we adopt the identical procedures outlined in the verification of Eq (A.2). Then we set

    T1={(i,j){1,2,,n} such that 1≤∣ij∣≤cn},

    and

    T2={(i,j){1,2,,n} such that cn+1≤∣ij∣≤n1}.

    Based on the aforementioned splitting, we obtain

    2nnijCov(Ei,Ej)=2nG1Cov(Ri,Rj)+2nG2Cov(Ri,Rj)=:U1,n+U2,n.
    Cov(Ri,Rj)=ϕx(hK)E2(K1)pl=0pm=0αlαmE(hlKhmKβliβmjKiKjE((LiFx(y)(LjFx(y)(Xi,Xi)))ϕx(hK)E2(K1)E(pl=0αlhlKβljKi(LiFx(y)))2.

    Utilizing the inequality

    Li(y)Fx(y)∣≤1, (A.30)

    stated in Lemma A.1 and given the conditions (H2-ⅱ) and (H4-ⅱ), we obtain the following

    Cov(Ri,Rj)ϕx(hK)E2(K1)(pl=0pm=0αlαmE(KiKj)+(pl=0αl)2E2(Ki))ϕx(hK)E2(K1)(CP((Xi,Xj)B(x,hK)×B(x,hK))+CM21ϕ2x(hK))Cϕx(hK)ψx(hK)ϕ2x(hK)+CM21nϕx(hK).

    Then, we readily infer

    U1,n(2Cn1ϕx(hK)ψx(hK)ϕ2x(hK)+2Cn1M21ϕx(hK))G1ncn1.

    In relation to the summation over the set G2, the utilization of Theorem B.3 (ⅱ) gives

    G2Cov(Ri,Rj)∣≤G2C(α|ji|)1p(E|Ri|q)1q(E|Rj|r)1r.

    Conditioned on Xi, and using (A.30) along with assumption (M3), we obtain

    E(|Ri|q)=ϕx(hK)M1ϕx(hK)E(|Ki|qE(|LiFx(y)|qXi))C(ϕx(hK))12+1q.

    We readily infer that

    U2,n2G2C(α|ji|)1p(ϕx(hK))1+1q+1r (A.31)
    2cδn(ϕx(hK))1p|l|>cnlδ(α(|l|))1p. (A.32)

    Using the fact that ψx(hK)ϕ2x(hK) is bounded, we select cn=ϕx(hK)1pδ and by assumptions (M4), and (M5) we obtain

    2nnijCov(Ei,Ej)=o(1).

    Furthermore, for the term A2, we have

    A2n1i<jn2nCov(Fj,Fi)=1i<jn2nCov(Rj,Ri)o(1).

    Consequently by (A.27), we get

    E(S22,n)n=0,asn.

    Let's now examine the term of S22,n

    E(S23,n)n1nni=υ(w+q)+1Var(E1)+2n1i<jnCov(Ei,Ej)nυ(w+q)nVar(E1)+2n1i<jnCov(Ei,Ej)o(1).

    Besides, from (A.27) and (A.28), we get υwn1. Then, we obtain

    E(S23,n)n=0,asn.

    In order to demonstrate the asymptotic independence of the variables Fj, we employ Lemma B.1. By setting Vj=exp(itFjn), the ensuing relationship

    |E(exp(itS1,nn))υj=1E(exp(itFjn))|16υα(q)nwα(q)0. (A.33)

    Therefore, according to formula (A.33), the term variance S1,n is computed as follows:

    Var(S1,n)=υwnVar(R1),

    where

    Var(R1)=ϕx(hK)M21ϕ2x(hK)pl=0α2lh2lKVar(βliKi(LiFx(y)).

    Using Lemma 6.3 of [14], we obtain

    Var(βliKi(LiFx(y))=E(β2liK2i)Fx(y)(1Fx(y)).

    Then, by using the fact that υwn1 and under Lemma A.1, we get

    Var(R1)=α20M2M21+pl=1α2lN(2,2l)M21Fx(y)(1Fx(y))=Fx(y)(1Fx(y))aVa.

    Finally, our attention turns to Lindeberg's central limit theorem concerning Fj. It is then enough to demonstrate that for any ϵ>0,

    1nυj=1E[F2j1|Fj|>ϵnVF(x,y)]0 as n+.

    Under assumption (M2), we have

    Fjnpl=0αlwnϕx(hK).

    According to (A.27), we obtain

    Fjn0,asn.

    Hence, for any given ϵ and when n is sufficiently large, the set {|Fj|>ϵnVF(x,y)} becomes empty. Consequently, the demonstration of (A.25) is now concluded.

    Proof of statement A.26. We have

    1nni=1(E(Ri))=nE(R1)=nϕx(hK)(α0E(K1),α1h1KE(K1),,αphpKE(K1)).(E(K1L1)Fx(y)E(K1)E(K1β1L1)Fx(y)E(K1β1)E(K1βp1L1)Fx(y)E(K1βp1)N(x,y)).

    Next, we compute the term N(x,y). Notice that

    E(K1βp1L1)Fx(y)E(K1βp1)=E(K1βp1((E(L1X1)Fx(y))).

    Making use of the Taylor expansion under the assumptions (M2) and (M3), we obtain

    E(L1X1)=pk=0φ2k(X1,y)h2kL(2k)!Rt2kL(t)dt+o(h2pL).

    Therefore, we readily obtain

    E(K1βp1((E(L1X1)Fx(y)))=pk=0E(K1βp1φk(X1,y))h2kL(2k)!Rt2kL(t)dtφ0(x,y)E(K1βp1)+o(h2PLE(K1βp1)). (A.34)

    Observe that ψk(0)=0, we get

    E(K1βp1φk(X1,y))=φk(x,y)E(K1βp1)+pa=1ψ(a)k(0)a!E(K1βp+a1). (A.35)

    By combining Eqs (A.34) and (A.35), we can derive

    E(K1βp1((E(L1X1)Fx(y)))=pk=0h2kL(2k)!RtkL(t)dt[φ2k(x,y)E(K1βp1)+pa=1ψ(a)k(0)a!E(K1βp+a1)]pk=0h2kL(2k)!RtkL(t)dtφ0(x,y)E(K1βp1)]+o(h2pLE(K1βp1)).

    So the statement (A.26) is proved.

    This appendix contains supplementary information that is an essential part of providing a more comprehensive understanding of the paper.

    Lemma B.1. [82] Let V1,,VL be random variables measurable with respect to the σ-algebras Fj1i1,,FjLiL respectively with 1i1<j1<i2<<jLn,il+1jlw1 and |Vj|1 for j=1,, L. Then

    |E(Lj=1Vj)Lj=1E(Vj)|16(L1)α(w),

    where α(w) is the strongly mixing coefficient.

    Theorem B.2. (Lindeberg central limit theorem). For each n1, let {Un1,,Unrn} be a collection of independent random variables such that E(Unj)=0 and Var(Unj)< for j=1,,rn.

    ˜Unj=Unjrnk=1VarUnk,j=1,,rn.

    Then

    rnj=1˜UnjN(0,1) in distribution as n,

    if for every ϵ>0

    limnrnj=1E|˜Unj|21(|˜Unj|>ϵ)=0.

    The following theorem is Proposition A.10. in [38].

    Theorem B.3. Assume that (Tn)nZ is α-mixing. Let us, for some kZ, consider a real variable T (resp. T) which is Ak-measurable (resp. A+n+k-measurable).

    (i) If T and T are bounded, then:

    C,0<C<+,Cov(T,T)Cα(n).

    (ii) If, for some positive numbers p,q,r such that p1+q1+r1=1, we have ETp< and ETq<, then:

    C,0<C<+,Cov(T,T)C(ETp)1p(ETq)1qα(n)1r.


    [1] I. M. Almanjahie, S. Bouzebda, Z. C. Elmezouar, A. Laksaci, The functional kNN estimator of the conditional expectile: Uniform consistency in number of neighbors, Statist. Risk Model., 38 (2021), 47–63. https://doi.org/10.1515/strm-2019-0029 doi: 10.1515/strm-2019-0029
    [2] I. M. Almanjahie, S. Bouzebda, Z. Kaid, A. Laksaci, Nonparametric estimation of expectile regression in functional dependent data, J. Nonparametr. Stat., 34 (2022), 250–281. https://doi.org/10.1080/10485252.2022.2027412 doi: 10.1080/10485252.2022.2027412
    [3] I. M. Almanjahie, S. Bouzebda, Z. Kaid, A. Laksaci, The local linear functional kNN estimator of the conditional expectile: Uniform consistency in number of neighbors, Metrika, 2024. https://doi.org/10.1007/s00184-023-00942-0
    [4] T. W. Anderson, An introduction to multivariate statistical analysis, 3 Eds., John Wiley & Sons, Inc., 1958.
    [5] G. Aneiros, P. Vieu, Partial linear modeling with multi-functional covariates, Comput. Stat., 30 (2015), 647–671. https://doi.org/10.1007/s00180-015-0568-8 doi: 10.1007/s00180-015-0568-8
    [6] S.Attaoui, B. Bentata, S. Bouzebda, A. Laksaci, The strong consistency and asymptotic normality of the kernel estimator type in functional single index model in presence of censored data, AIMS Mathematics, 9 (2024), 7340–7371. https://doi.org/10.3934/math.2024356 doi: 10.3934/math.2024356
    [7] S. Ayad, A. Laksaci, S. Rahmani, R. Rouane, On the local linear modelization of the conditional density for functional and ergodic data, Metron, 78 (2020), 237–254. https://doi.org/10.1007/s40300-020-00174-6 doi: 10.1007/s40300-020-00174-6
    [8] A. Baíllo, A. Grané, Local linear regression for functional predictor and scalar response, J. Multivariate Anal., 100 (2009), 102–111. https://doi.org/10.1016/j.jmva.2008.03.008 doi: 10.1016/j.jmva.2008.03.008
    [9] J. Barrientos-Marin, F. Ferraty, P. Vieu, Locally modelled regression and functional data, J. Nonparametr. Stat., 22 (2010), 617–632. https://doi.org/10.1080/10485250903089930 doi: 10.1080/10485250903089930
    [10] R. E. Bellman, Adaptive control processes: A guided tour, Princeton University Press, 1961.
    [11] K. Benhenni, A. H. Hassan, Y. Su, Local polynomial estimation of regression operators from functional data with correlated errors, J. Multivariate Anal., 170 (2019), 80–94. https://doi.org/10.1016/j.jmva.2018.10.008 doi: 10.1016/j.jmva.2018.10.008
    [12] N.-E. Berrahou, S. Bouzebda, L. Douge, Functional uniform-in-bandwidth moderate deviation principle for the local empirical processes involving functional data, Math. Meth. Stat., 33 (2024), 26–69. https://doi.org/10.3103/S1066530724700030 doi: 10.3103/S1066530724700030
    [13] D. Bosq, Linear processes in function spaces, In: Lecture notes in statistics, New York: Springer, 2000. https://doi.org/10.1007/978-1-4612-1154-9
    [14] O. Bouanani, A. Laksaci, M. Rachdi, S. Rahmani, Asymptotic normality of some conditional nonparametric functional parameters in high-dimensional statistics, Behaviormetrika, 46 (2019), 199–233. https://doi.org/10.1007/s41237-018-0057-9 doi: 10.1007/s41237-018-0057-9
    [15] S. Bouzebda, Limit theorems in the nonparametric conditional single-index u-processes for locally stationary functional random fields under stochastic sampling design, Mathematics, 12 (2024), 1996. https://doi.org/10.3390/math12131996 doi: 10.3390/math12131996
    [16] S. Bouzebda, Weak convergence of the conditional single index U-statistics for locally stationary functional time series, AIMS Mathematics, 9 (2024), 14807–14898. https://doi.org/10.3934/math.2024720 doi: 10.3934/math.2024720
    [17] S. Bouzebda, B. Nemouchi, Uniform consistency and uniform in bandwidth consistency for nonparametric regression estimates and conditional U-statistics involving functional data, J. Nonparametr. Stat., 32 (2020), 452–509. https://doi.org/10.1080/10485252.2020.1759597 doi: 10.1080/10485252.2020.1759597
    [18] S. Bouzebda, B. Nemouchi, Weak-convergence of empirical conditional processes and conditional U-processes involving functional mixing data, Stat. Inference Stoch. Process., 26 (2022), 33–88. https://doi.org/10.1007/s11203-022-09276-6 doi: 10.1007/s11203-022-09276-6
    [19] S. Bouzebda, A. Nezzal, Uniform consistency and uniform in number of neighbors consistency for nonparametric regression estimates and conditional U-statistics involving functional data, Jpn. J. Stat. Data Sci., 5 (2022), 431–533. https://doi.org/10.1007/s42081-022-00161-3 doi: 10.1007/s42081-022-00161-3
    [20] S. Bouzebda, A. Nezzal, Uniform in number of neighbors consistency and weak convergence of kNN empirical conditional processes and kNN conditional U-processes involving functional mixing data, AIMS Mathematics, 9 (2024), 4427–4550. https://doi.org/10.3934/math.2024218 doi: 10.3934/math.2024218
    [21] S. Bouzebda, A. Laksaci, M. Mohammedi, Single index regression model for functional quasi-associated time series data, REVSTAT Stat. J., 20 (2022), 605–631. http://dx.doi.org/10.57805/revstat.v20i5.391 doi: 10.57805/revstat.v20i5.391
    [22] S. Bouzebda, A. Laksaci, M. Mohammedi, The k-nearest neighbors method in single index regression model for functional quasi-associated time series data, Rev. Mat. Complut., 36 (2023), 361–391. https://doi.org/10.1007/s13163-022-00436-z doi: 10.1007/s13163-022-00436-z
    [23] S. Bouzebda, S. Nezzal, T. Zari, Uniform consistency for functional conditional u-statistics using delta-sequences, Mathematics, 11 (2023), 161. https://doi.org/10.3390/math11010161 doi: 10.3390/math11010161
    [24] R. C. Bradley, Introduction to strong mixing conditions, Kendrick Press, 2007.
    [25] J. Camacho, J. Pico, A. Ferrer, Data understanding with PCA: Structural and variance information plots, Chemometr. Intell. Lab. Syst., 100 (2010), 48–56. https://doi.org/10.1016/j.chemolab.2009.10.005 doi: 10.1016/j.chemolab.2009.10.005
    [26] H. Cartan, Calcul différentiel, 1967.
    [27] D. Chen, P. Hall, H. Müller, Single and multiple index functional regression models with nonparametric link, Ann. Statist., 39 (2011), 1720–1747. http://dx.doi.org/10.1214/11-AOS882 doi: 10.1214/11-AOS882
    [28] M. Y. Cheng, J. Fan, J. S. Marron, On automatic boundary corrections, Ann. Statist., 25 (1997), 1691–1708. https://doi.org/10.1214/aos/1031594737 doi: 10.1214/aos/1031594737
    [29] Z. Chikr-Elmezouar, I. M. Almanjahie, A. Laksaci, M. Rachdi, FDA: strong consistency of the kNN local linear estimation of the functional conditional density and mode, J. Nonparametr. Stat., 31 (2019), 175–195. https://doi.org/10.1080/10485252.2018.1538450 doi: 10.1080/10485252.2018.1538450
    [30] J. Dauxois, A. Pousse, Y. Romain, Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference, J. Multivariate Anal., 12 (1982), 136–154. https://doi.org/10.1016/0047-259X(82)90088-4 doi: 10.1016/0047-259X(82)90088-4
    [31] J. Demongeot, A. Laksaci, M. Rachdi, S. Rahmani, On the local linear modelization of the conditional distribution for functional data, Sankhya A, 76 (2014), 328–355. https://doi.org/10.1007/s13171-013-0050-z doi: 10.1007/s13171-013-0050-z
    [32] J. Demongeot, A. Naceri, A. Laksaci, M. Rachdi, Local linear regression modelization when all variables are curves, Statist. Probab. Lett., 121 (2017), 37–44. https://doi.org/10.1016/j.spl.2016.09.021 doi: 10.1016/j.spl.2016.09.021
    [33] J. C. Deville, Méthodes statistiques et numériques de l'analyse harmonique, Annales de l'INSEE, 15 (1974), 3–101. https://doi.org/10.2307/20075177 doi: 10.2307/20075177
    [34] S. Didi, S. Bouzebda, Wavelet density and regression estimators for continuous time functional stationary and ergodic processes, Mathematics, 10 (2022), 4356. https://doi.org/10.3390/math10224356 doi: 10.3390/math10224356
    [35] Z. C.Elmezouar, F. Alshahrani, I. M. Almanjahie, S. Bouzebda, Z. Kaid, A. Laksaci, Strong consistency rate in functional single index expectile model for spatial data, AIMS Mathematics, 9 (2024), 5550–5581. https://doi.org/10.3934/math.2024269 doi: 10.3934/math.2024269
    [36] R. L. Eubank, Nonparametric regression and spline smoothing, 2 Eds., CRC Press, 2014. https://doi.org/10.1201/9781482273144
    [37] J. Fan, I. Gijbels, Local polynomial modeling and its applications, Chapman & Hall, 1996.
    [38] F. Ferraty, P. Vieu, Nonparametric functional data analysis, New York: Springer, 2006. https://doi.org/10.1007/0-387-36620-2
    [39] F. Ferraty, A. Laksaci, P. Vieu, Estimating some characteristics of the conditional distribution in nonparametric functional models, Stat. Infer. Stoch. Process., 9 (2006), 47–76. https://doi.org/10.1007/s11203-004-3561-3 doi: 10.1007/s11203-004-3561-3
    [40] F. Ferraty, A. Mas, P. Vieu, Nonparametric regression on functional data: inference and practical aspects, Aust. N. Z. J. Stat., 49 (2007), 267–286. https://doi.org/10.1111/j.1467-842X.2007.00480.x doi: 10.1111/j.1467-842X.2007.00480.x
    [41] F. Ferraty, A. Laksaci, A. Tadj, P. Vieu, Rate of uniform consistency for nonparametric estimates with functional variables, J. Statist. Plann. Inference, 140 (2010), 335–352. https://doi.org/10.1016/j.jspi.2009.07.019 doi: 10.1016/j.jspi.2009.07.019
    [42] F. Ferraty, A. Laksaci, A. Tadj, P. Vieu, Estimation de la fonction de régression pour variable explicative et réponse fonctionnelles dépendantes, C. R. Acad. Sci. Paris, Ser. I, 350 (2012), 717–720. https://doi.org/10.1016/j.crma.2012.07.014 doi: 10.1016/j.crma.2012.07.014
    [43] L. Ferré, A. F. Yao, Smoothed functional inverse regression, Statist. Sinica, 15 (2005), 665–683.
    [44] A. Goia, P. Vieu, A partitioned single functional index model, Comput. Stat., 30 (2015), 673–692. https://doi.org/10.1007/s00180-014-0530-1 doi: 10.1007/s00180-014-0530-1
    [45] A. Goia, P. Vieu, An introduction to recent advances in high/infinite dimensional statistics, J. Multivariate Anal., 146 (2016), 1–6. https://doi.org/10.1016/j.jmva.2015.12.001 doi: 10.1016/j.jmva.2015.12.001
    [46] P. J. Green, B. W. Silverman, Nonparametric regression and generalized linear models, New York: Chapman and Hall/CRC, 1993. https://doi.org/10.1201/b15710
    [47] C. Gu, Smoothing spline ANOVA models, New York: Springer, 2015. https://doi.org/10.1007/978-1-4614-5369-7
    [48] L. Horváth, P. Kokoszka, Inference for functional data with applications, New York: Springer, 2014. https://doi.org/10.1007/978-1-4614-3655-3
    [49] L. Horváth, G. Rice, An introduction to functional data analysis and a principal component approach for testing the equality of mean curves, Rev. Mat. Complut., 28 (2015), 505–548. https://doi.org/10.1007/s13163-015-0169-7 doi: 10.1007/s13163-015-0169-7
    [50] G. M. James, T. J. Hastie, C. A. Sugar, Principal component models for sparse functional data, Biometrika, 87 (2000), 587–602. https://doi.org/10.1093/biomet/87.3.587 doi: 10.1093/biomet/87.3.587
    [51] A. Khaleghi, G. Lugosi, Inferring the mixing properties of an ergodic process, arXiv: 2106.07054, 2021. https://doi.org/10.48550/arXiv.2106.07054
    [52] A. N. Kolmogorov, V. M. Tihomirov, ε-entropy and ε-capacity of sets in function spaces, Uspehi Mat. Nauk, 14 (1959), 3–86.
    [53] J. Kuelbs, W. V. Li, Metric entropy and the small ball problem for Gaussian measures, J. Funct. Anal., 116 (1993), 133–157. https://doi.org/10.1006/jfan.1993.1107 doi: 10.1006/jfan.1993.1107
    [54] W. V. Li, Q. M. Shao, Gaussian processes: inequalities, small ball probabilities and applications, Handbook of Statist., 19 (2001), 533–597. https://doi.org/10.1016/S0169-7161(01)19019-X doi: 10.1016/S0169-7161(01)19019-X
    [55] H. Y. Liang, J. I. Baek, Asymptotic normality of conditional density estimation with left-truncated and dependent data, Stat. Papers, 57 (2016), 1–20. https://doi.org/10.1007/s00362-014-0635-1 doi: 10.1007/s00362-014-0635-1
    [56] E. Liebscher, Estimation of the density and the regression function under mixing conditions, Statist. Decisions, 19 (2001), 9–26. https://doi.org/10.1524/strm.2001.19.1.9 doi: 10.1524/strm.2001.19.1.9
    [57] N. Ling, P. Vieu, Nonparametric modeling for functional data: Selected survey and tracks for future, Statistics, 52 (2018), 934–949. https://doi.org/10.1080/02331888.2018.1487120 doi: 10.1080/02331888.2018.1487120
    [58] S. Lv, H. Lin, H. Lian, J. Huang, Oracle inequalities for sparse additive quantile regression in reproducing kernel Hilbert space, Ann. Statist., 46 (2018), 781–813. https://doi.org/10.1214/17-AOS1567 doi: 10.1214/17-AOS1567
    [59] E. Masry, Multivariate local polynomial regression for time series: uniform strong consistency and rates, J. Time Ser. Anal., 17 (1996), 571–599. https://doi.org/10.1111/j.1467-9892.1996.tb00294.x doi: 10.1111/j.1467-9892.1996.tb00294.x
    [60] E. Masry, Multivariate regression estimation local polynomial fitting for time series, Stochastic Process. Appl., 65 (1996), 81–101. https://doi.org/10.1016/S0304-4149(96)00095-6 doi: 10.1016/S0304-4149(96)00095-6
    [61] E. Masry, Multivariate regression estimation of continuous-time processes from sampled data: Local polynomial fitting approach, IEEE Trans. Inform. Theory, 45 (1999), 1939–1953. https://doi.org/10.1109/18.782116 doi: 10.1109/18.782116
    [62] E. Masry, Nonparametric regression estimation for dependent functional data: asymptotic normality, Stochastic Process. Appl., 115 (2005), 155–177. https://doi.org/10.1016/j.spa.2004.07.006 doi: 10.1016/j.spa.2004.07.006
    [63] E. Masry, Local polynomial estimation of regression functions for mixing processes, Scand. J. Statist., 24 (1997), 165–179. https://doi.org/10.1111/1467-9469.00056 doi: 10.1111/1467-9469.00056
    [64] E. Masry, J. Mielniczuk, Local linear regression estimation for time series with long-range dependence, Stochastic Process. Appl., 82 (1999), 173–193. https://doi.org/10.1016/S0304-4149(99)00015-0 doi: 10.1016/S0304-4149(99)00015-0
    [65] D. J. McDonald, C. R. Shalizi, M. Schervish, Estimating beta-mixing coefficients via histograms, Electron. J. Stat., 9 (2015), 2855–2883. http://dx.doi.org/10.1214/15-EJS1094 doi: 10.1214/15-EJS1094
    [66] M. Mohammedi, S. Bouzebda, A. Laksaci, The consistency and asymptotic normality of the kernel type expectile regression estimator for functional data, J. Multivariate Anal., 181 (2021), 104673. https://doi.org/10.1016/j.jmva.2020.104673 doi: 10.1016/j.jmva.2020.104673
    [67] H. Müller, Peter hall, functional data analysis and random objects, Ann. Statist., 44 (2016), 1867–1887. https://doi.org/10.1214/16-AOS1492 doi: 10.1214/16-AOS1492
    [68] E. A.Nadaraja, On estimating regression, Theory Probab. Appl., 9 (1964), 141–142. https://doi.org/10.1137/1109020
    [69] T. Nicoleris, Y. G. Yatracos, Rates of convergence of estimates, Kolmogorov's entropy and the dimensionality reduction principle in regression, Ann. Statist., 25 (1997), 2493–2511. https://doi.org/10.1214/aos/1030741082 doi: 10.1214/aos/1030741082
    [70] E. Parzen, On estimation of a probability density function and mode, Ann. Math. Statist., 33 (1962), 1065–1076.
    [71] J. Q. Shi, T. Choi, Gaussian process regression analysis for functional data, New York: Chapman and Hall/CRC, 2011. https://doi.org/10.1201/b11038
    [72] M. Rachdi, A. Laksaci, J. Demongeot, A. Abdali, F. Madani, Theoretical and practical aspects of the quadratic error in the local linear estimation of the conditional density for functional data, Comput. Statist. Data Anal., 73 (2014), 53–68. https://doi.org/10.1016/j.csda.2013.11.011 doi: 10.1016/j.csda.2013.11.011
    [73] M. Rachdi, A. Laksaci, Z. Kaid, A. Benchiha, F. A. Al-Awadhi, k-nearest neighbors local linear regression for functional and missing data at random, Statist. Neerlandica, 75 (2021), 42–65. https://doi.org/10.1111/stan.12224 doi: 10.1111/stan.12224
    [74] J. O. Ramsay, B. W. Silverman, Functional data analysis, In: Springer series in statistics, New York: Springer, 2005. https://doi.org/10.1007/b98888
    [75] E. Rio, Théorie asymptotique des processus aléatoires faiblement dépendants, In: Mathématiques et applications, Heidelberg: Springer-Verlag Berlin, 31 (2000).
    [76] M. Rosenblatt, A central limit theorem and a strong mixing condition, Proc. Natl. Acad. Sci., 42 (1956), 43–47. https://doi.org/10.1073/pnas.42.1.43 doi: 10.1073/pnas.42.1.43
    [77] D. W. Scott, M. P. Wand, Feasibility of multivariate density estimates, Biometrika, 78 (1991), 197–205.
    [78] B. W. Silverman, Density estimation for statistics and data analysis, New York: Routledge, 1998. https://doi.org/10.1201/9781315140919
    [79] I. Soukarieh, S. Bouzebda, Weak convergence of the conditional U-statistics for locally stationary functional time series, Stat. Inference Stoch. Process., 27 (2024), 227–304. https://doi.org/10.1007/s11203-023-09305-y doi: 10.1007/s11203-023-09305-y
    [80] G. Strang, Wavelet transforms versus Fourier transforms, Bull. Amer. Math. Soc., 28 (1993), 288–305.
    [81] M. Talagrand, Sharper bounds for Gaussian and empirical processes, Ann. Probab., 22 (1994), 28–76.
    [82] V. A. Volkonskii, Y. A. Rozanov, Some limit theorems for random functions. Ⅰ, Theory Probab. Appl., 4 (1959), 178–197. https://doi.org/10.1137/1104015 doi: 10.1137/1104015
    [83] V. A. Volkonskii, Y. A. Rozanov, Some limit theorems for random functions. Ⅱ, Theory Probab. Appl., 6 (1961), 186–198. https://doi.org/10.1137/1106023 doi: 10.1137/1106023
    [84] G. Wahba, Spline models for observational data, In: CBMS-NSF Regional conference series in applied mathematics, Society for industrial and applied mathematics (SIAM), 1990. https://doi.org/10.1137/1.9781611970128
    [85] G. S. Watson, Smooth regression analysis, Sankhya, 26 (1964), 359–372.
    [86] L. Xiao, Asymptotic theory of penalized splines, Electron. J. Statist., 13 (2019), 747–794. https://doi.org/10.1214/19-EJS1541 doi: 10.1214/19-EJS1541
    [87] L. Xiao, Asymptotic properties of penalized splines for functional data, Bernoulli, 26 (2020), 2847–2875. https://doi.org/10.3150/20-BEJ1209 doi: 10.3150/20-BEJ1209
    [88] F. Yao, H. G. Müller, J. L. Wang, Functional data analysis for sparse longitudinal data, J. Amer. Statist. Assoc., 100 (2005), 577–590. https://doi.org/10.1198/016214504000001745 doi: 10.1198/016214504000001745
    [89] M. Yuan, T. T. Cai, A reproducing kernel Hilbert space approach to functional linear regression, Ann. Statist., 38 (2010), 3412–3444. https://doi.org/10.1214/09-AOS772 doi: 10.1214/09-AOS772
    [90] E. Zeidler, Nonlinear functional analysis and its applications. I, New York: Springer, 1986.
    [91] J. Zhang, Analysis of variance for functional data, New York: Chapman and Hall/CRC, 2013, https://doi.org/10.1201/b15005
    [92] Z. Zhou, Z. Lin, Asymptotic normality of locally modelled regression estimator for functional data, J. Nonparametr. Stat., 28 (2016), 116–131. https://doi.org/10.1080/10485252.2015.1114112 doi: 10.1080/10485252.2015.1114112
  • This article has been cited by:

    1. Salim Bouzebda, Uniform in Number of Neighbor Consistency and Weak Convergence of k-Nearest Neighbor Single Index Conditional Processes and k-Nearest Neighbor Single Index Conditional U-Processes Involving Functional Mixing Data, 2024, 16, 2073-8994, 1576, 10.3390/sym16121576
    2. Breix Michael Agua, Salim Bouzebda, Single index regression for locally stationary functional time series, 2024, 9, 2473-6988, 36202, 10.3934/math.20241719
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1207) PDF downloads(79) Cited by(2)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog