We present a method to construct optimal clustering via a sequence of merge steps. We formulate the merge-based clustering as a minimum redundancy search tree, and then search the optimal clustering by a branch-and-bound technique. Optimal clustering is found regardless of the objective function used. We also consider two suboptimal polynomial time variants based on the proposed branch-and-bound technique. However, all variants are slow and has merely theoretical interest. We discuss the reasons for the results.
Citation: Pasi Fränti, Olli Virmajoki. Optimal clustering by merge-based branch-and-bound[J]. Applied Computing and Intelligence, 2022, 2(1): 63-82. doi: 10.3934/aci.2022004
[1] | Hassan Shirzeh, Fazel Naghdy, Philip Ciufo, Montserrat Ros . Stochastic energy balancing in substation energy management. AIMS Energy, 2015, 3(4): 810-837. doi: 10.3934/energy.2015.4.810 |
[2] | K. M. S. Y. Konara, M. L. Kolhe, Arvind Sharma . Power dispatching techniques as a finite state machine for a standalone photovoltaic system with a hybrid energy storage. AIMS Energy, 2020, 8(2): 214-230. doi: 10.3934/energy.2020.2.214 |
[3] | Khuthadzo Kgopana, Olawale Popoola . Improved utilization of hybrid energy for low-income houses based on energy consumption pattern. AIMS Energy, 2023, 11(1): 79-109. doi: 10.3934/energy.2023005 |
[4] | Mohamed Hamdi, Hafez A. El Salmawy, Reda Ragab . Optimum configuration of a dispatchable hybrid renewable energy plant using artificial neural networks: Case study of Ras Ghareb, Egypt. AIMS Energy, 2023, 11(1): 171-196. doi: 10.3934/energy.2023010 |
[5] | Ryuto Shigenobu, Oludamilare Bode Adewuyi, Atsushi Yona, Tomonobu Senjyu . Demand response strategy management with active and reactive power incentive in the smart grid: a two-level optimization approach. AIMS Energy, 2017, 5(3): 482-505. doi: 10.3934/energy.2017.3.482 |
[6] | Habibullah Fedayi, Mikaeel Ahmadi, Abdul Basir Faiq, Naomitsu Urasaki, Tomonobu Senjyu . BESS based voltage stability improvement enhancing the optimal control of real and reactive power compensation. AIMS Energy, 2022, 10(3): 535-552. doi: 10.3934/energy.2022027 |
[7] | Syed Sabir Hussain Rizvi, Krishna Teerth Chaturvedi, Mohan Lal Kolhe . A review on peak shaving techniques for smart grids. AIMS Energy, 2023, 11(4): 723-752. doi: 10.3934/energy.2023036 |
[8] | Mohamed G Moh Almihat, MTE Kahn . Centralized control system for islanded minigrid. AIMS Energy, 2023, 11(4): 663-682. doi: 10.3934/energy.2023033 |
[9] | Tilahun Nigussie, Wondwossen Bogale, Feyisa Bekele, Edessa Dribssa . Feasibility study for power generation using off- grid energy system from micro hydro-PV-diesel generator-battery for rural area of Ethiopia: The case of Melkey Hera village, Western Ethiopia. AIMS Energy, 2017, 5(4): 667-690. doi: 10.3934/energy.2017.4.667 |
[10] | Aaron St. Leger . Demand response impacts on off-grid hybrid photovoltaic-diesel generator microgrids. AIMS Energy, 2015, 3(3): 360-376. doi: 10.3934/energy.2015.3.360 |
We present a method to construct optimal clustering via a sequence of merge steps. We formulate the merge-based clustering as a minimum redundancy search tree, and then search the optimal clustering by a branch-and-bound technique. Optimal clustering is found regardless of the objective function used. We also consider two suboptimal polynomial time variants based on the proposed branch-and-bound technique. However, all variants are slow and has merely theoretical interest. We discuss the reasons for the results.
In recent decades, there has been a growing interest in the statistical study of functional random variables, which are values in infinite-dimensional spaces. Simply put, functional data analysis (FDA) typically deals with statistical problems where the data consists of a sample of n functions x1=x1(t),…,xn=xn(t), defined on a compact interval of the real line [0,1]. FDA focuses on statistical issues, often referred to as "inference in stochastic processes", where sample information is derived from a partial trajectory (x(t),t∈[0,T]) of a stochastic process ({X(t),t≥0}). In this context, the duration (T) of the observation interval acts as the sample size (n). This increasing interest has brought to light several statistical challenges associated with functional random variables. The drive to explore this field has been fueled by the increasing availability of high-resolution temporal and spatial data. This trend is particularly notable in fields such as meteorology, medicine, satellite imaging, and various other scientific disciplines. Consequently, the statistical modeling of this data, viewed as stochastic functions, has led to numerous complex theoretical and computational research questions. To gain a comprehensive understanding of both the theoretical and practical components of functional data analysis, it is recommended that the reader consult the monographs authored by [13] for linear models about random variables that assume values in a Hilbert space, and [74] for scalar-on-function and function-on-function linear models, as well as functional principal component analysis and parametric discriminant analysis. Ferraty and Vieu [38] primarily concentrated on nonparametric techniques, particularly kernel-based estimation, for scalar-on-function nonlinear regression models. These methods were further expanded to encompass classification and discrimination analysis. In their study, [48] examined the extension of various statistical concepts, including goodness-of-fit tests, portmanteau tests, and change detection, to the functional data framework. Zhang [91] conducted a study on the analysis of variance for functional data, while [71] primarily investigated regression analysis for Gaussian processes. Various semi-parametric models have been explored in the literature, such as functional single index models [44], projection pursuit models [27], partial linear models [5], and functional sliced inverse regression [43]. Additional studies on functional data modeling and analysis have been documented in the following sources: [45,49,57,67] and for recent references [1,2,15,17,18,19,20,21,22,66,79]. One of the most used estimators is the Nadaraya-Watson [68,85] estimator. From a function approximation perspective, the Nadaraya-Watson estimator employs local constant approximations. As highlighted by the numerical analyst [80], "Through all of scientific computing runs this common theme: Increase the accuracy at least to second order. What this means is: Get the linear term right". In other words, local constant approximations are often inadequate, and local linear fits are preferable. Local polynomial fitting emerges as an attractive method from both theoretical and practical standpoints. Local polynomial fitting offers several advantages, as noted in [37]. It adapts to various design types, including random and fixed designs, highly clustered, and nearly uniform designs. Moreover, boundary effects are absent: the bias at the boundary remains of the same order as in the interior, without requiring specific boundary kernels. This is a significant departure from other existing methods. With local polynomial fitting, no boundary modifications are necessary, which is particularly beneficial in multidimensional situations where the boundary can involve a substantial number of data points (see [28,78]). Boundary modifications in higher dimensions pose a considerable challenge. The pivotal method proposed by [60] involved utilizing local higher-order polynomial fitting to estimate the function m and its partial derivatives. The author demonstrated mean-square convergence and joint asymptotic normality (see also [61]). This research holds significant implications for estimating widely used ARCH time series. Masry [59] established the strong consistency of the regression estimator and its partial derivatives up to a fixed total order p, obtaining sharp rates of almost sure convergence that are uniform over compact sets. In [63], the local polynomial estimation of regression functions and their derivatives is examined, establishing the joint asymptotic normality of the relevant estimates for both strongly mixing and ρ-mixing processes. Masry and Mielniczuk [64] considered the nonparametric estimation of a multivariate regression function and its derivatives for a regression model with long-range dependent errors. The authors adopted a local linear fitting approach and established the joint asymptotic distributions for the estimators of the regression function and its derivatives.
The entire body of literature on functional data is primarily centered around the theme of local linear estimation. Rachdi et al. [73] innovatively amalgamated the k-Nearest Neighbors method with the local linear estimation approach to formulate a novel estimator for the regression operator. This innovative approach was tailored for situations where the regressor adopts a functional form, and the response variable, though scalar, is subject to random missing observations. On a parallel note, [7] undertook the task of estimating the conditional density function through the application of the local linear methodology. Moreover, Chikr-Elmezouar et al. [29] delved into the estimation of the conditional density and mode when confronted with functional covariates. In the realm of local linear regression, [8] presented a novel local linear regression estimator and conducted a comprehensive study of its asymptotic behavior. The work by [9] and [11] introduced a general framework elucidating the local behavior of the regression operator, refer to [3] for recent references. The foundational work of [92] established mean-squared convergence and asymptotic normality for the local linear estimator. Expanding the scope to instances where both the response and explanatory variables are functionally characterized, [32] devised a nonparametric local linear estimator for the regression function. The principal objective of the current investigation is to establish a comprehensive framework for polynomial fitting. The central aim of this paper is to furnish the inaugural comprehensive theoretical validation for the kernel polynomial estimator. This entails determining the uniform consistency rate and discerning the asymptotic distribution. Nonetheless, it becomes evident that addressing this issue transcends a mere amalgamation of ideas from disparate domains. Instead, intricate mathematical derivations are imperative to contend with the challenges posed by estimators in our specific context. This necessitates the adept application of techniques rooted in large sample theory.
The article is structured as follows. Section 2 provides the necessary notation and definitions. The main results are outlined in Section 3. The asymptotic distribution is discussed in Section 3.1, and its application to the confidence regions is presented in Section 3.3. The establishment of uniform almost complete convergence is detailed in Section 3.4. Local linear estimators for the conditional distribution and their asymptotic normality are presented in Section 4. In Section 5, we summarize the findings and highlight remaining open issues. All proofs are deferred to Section 6, with a focus on the most crucial arguments due to the lengthiness of the proofs. Additionally, relevant technical results are provided in the appendix.
We begin by introducing the necessary notation and definitions for the forthcoming sections. Specifically, we recall the definition of the strong mixing property. Let Fki(Z) denote the σ-algebra generated by {Zj:i≤j≤k}. The following definition of the strong mixing coefficient is attributed to [76]. For further details, refer to [24,75,82,83].
Definition 2.1. Let Z={Zi,i=1,2,…} be a strictly stationary sequence of random variables. Given a positive integer n, define
α(n)=sup{|P(A∩B)−P(A)P(B)|:A∈Fk1(Z) and B∈F∞k+n(Z),k∈N∗}. |
The sequence Z is said to be α-mixing (strong mixing) if the mixing coefficient α(n)→0 as n→∞.
The α-mixing condition is the weakest among the commonly used mixing conditions and has numerous practical applications, particularly in economics and finance. For example, several time series models, such as ARCH, ARMA, GARMA, and stochastic volatility models, satisfy the α-mixing condition.
Remark 2.2. Asymptotic independence, or "mixing", of the data-generating process is a common assumption in statistical learning for time series. For many typical processes, the general forms of various mixing rates are often assumed to be known, but specific coefficients are not. These mixing assumptions are rarely tested, and there are no established methods for estimating mixing coefficients from data. In [65], an estimator for beta-mixing coefficients based on a single stationary sample path is introduced. Khaleghi and Lugosi [51] propose strongly consistent estimators for the ℓ1 norm of the sequence of α-mixing and β-mixing coefficients of a stationary ergodic process. These estimators are subsequently used to develop strongly consistent goodness-of-fit hypothesis tests. Specifically, Khaleghi and Lugosi [51] develop hypothesis tests to determine whether, under the same summability assumption, the α-mixing or β-mixing coefficients of a process are upper bounded by a given rate function.
We consider a sequence {(Xi,Yi):i≥1} of stationary* α−mixing random copies of the random vector [rv] (X,Y), where X takes its values in some abstract space F and Y in R. Suppose that F is endowed with a semi-metric d(⋅,⋅)† defining a topology to measure the proximity between two elements of F and which is disconnected from the definition of X to avoid measurability problems. We will consider especially the conditional expectation of ψ(Y) given X=x,
r(x)=rψ(x)=E(ψ(Y)∣X=x), | (2.1) |
*In the case of Hilbert space-valued elements, strict stationarity is not necessarily required; second-order stationarity suffices. A Hilbert space-valued sequence {Xt}t∈Z is second-order (or weakly) stationary if E‖Xt‖2<∞, EXt=μ, and E[(Xs−μ)⊗(Xt−μ)]=E[(Xs−t−μ)⊗(X0−μ)] for all s,t∈Z.
We say that {Xt}t∈Z is strictly stationary if the joint distribution of {Xt1,…,Xtn} is the same as the joint distribution of {Xt1+h,…,Xtn+h} for all t1,…,tn∈Z, n≥1, and h≥1.
†A semi-metric (sometimes called pseudo-metric) d(⋅,⋅) is a metric which allows d(x1,x2)=0 for some x1≠x2.
whenever this regression function is meaningful. Here and elsewhere, ψ(⋅) denotes a specified measurable function, which is assumed to be bounded on each compact subinterval of R. Note that we can write
ψ(Y)=r(X)+ϵ,withE(ϵ∣X)=0,E(ϵ2∣X)=σ2(X). |
In this work, we study the problem of nonparametric estimation of the regressor function r(⋅) using, for the first time in the functional data, the local polynomial fitting such that the regressor belongs to an infinite dimensional set. From now on, for the ease of the notation, we set ψ(y)=y. Recall that in functional statistics, there are several approaches to developing the concept of local linear methods. For example (see [9]) the linear approximation of r(⋅) for any z in the neighborhood of x is given by
r(z)=r(x)+bβ(z,x)+o(β(z,x)), |
where the quantities a=r(x) and b are estimated by minimizing the following criterion
min(a,b)∈R2n∑i=1(Yi−a−bβ(Xi,x))2K(h−1Kδ(x,Xi)), | (2.2) |
where the locating functions β(⋅,⋅) and δ(⋅,⋅) are defined on (F×F) and map into R, with |δ(⋅,⋅)|=d(⋅,⋅), where d(⋅,⋅) is a distance metric. The function β refers to the local behavior of our model, while K(⋅) is a kernel function, and hK=hK,n is the bandwidth or smoothing parameter of the kernel K(⋅), controlling the size of the local neighborhood and the degree of smoothing. The performance of the estimate depends crucially on the two locating functions, β(⋅,⋅) and δ(⋅,⋅), which are defined on (F×F) and map into R. These functions satisfy |δ(⋅,⋅)|=d(⋅,⋅), where d(⋅,⋅) is a distance metric. K(⋅) is a kernel function, and hK=hK,n is the smoothing parameter of the kernel K(⋅). To clarify, specific forms for δ(⋅,⋅) and β(⋅,⋅) can be given. For example, if the functional data are 'smooth' curves, one might use the following family of locating functions:
loc(q)a(x1,x2)=∫θ(t)(x(q)1(t)−x(q)2(t))dt=⟨θ,x(q)1−x(q)2⟩X, |
where θ is a function that can be adapted to the data, X is a Hilbert space, and ⟨⋅,⋅⟩ denotes the inner product. Choosing β(⋅,⋅) from such a family is motivated by its connection to the following minimization problem:
ˆc(x)=argminc(x)∈Rp+1n∑i=1(Yi−p∑l=0cl(x)⟨θ,X(q)i−x(q)⟩lX)2K(h−1|δ(Xi,x)|), |
which can be viewed as a type of 'local polynomial' regression approach when considering a functional explanatory variable. Metrics, or more generally semi-metrics, based on derivatives could also be suitable for locating one curve relative to another. For example, one might define:
loc(q)b(x1,x2)=(∫(x(q)1(t)−x(q)2(t))2dt)1/2, |
which is a semi-metric. This second family of locating functions is particularly well-suited for δ(⋅,⋅), as it measures the proximity between two elements of X. Let U be an open subset of a real Banach space F. If f:U→R is differentiable p+1 times on U, it can be expanded using Taylor's formula:
f(x)=f(a)+Df(a)⋅h+12!D2f(a)⋅h2+⋯+1n!Dpf(a)⋅hp+Rp(x), |
with the following expressions for the remainder term Rp(x) :
Rp(x)=1p!Dp+1f(η)⋅(x−η)phCauchy form of remainder,Rp(x)=1(p+1)!Dp+1f(ξ)⋅hp+1Lagrange form of remainder, Rp(x)=1p!∫10Dp+1f(a+th)⋅((1−t)h)phdtintegral form of remainder. |
Here a and x must be points of U such that the line segment between a and x lie inside U,h is x−a, and the points ξ and η lie on the same line segment, strictly between a and x. If we collect the equal mixed partials (assuming that they are continuous) then
1k!Dkf(a)⋅hk=∑|J|=k1J!∂Jf∂xJhJ, |
where J is a multi-index of m components, and each component Ji indicates how many times the derivative with respect to the i th coordinate should be taken, and the exponent that the ith coordinate of h should be raised to in the monomial hJ. The multi-index J runs through all combinations such that J1+⋯+Jm=|J|=k in the sum. The notation J! means J1!⋯Jm!. Let us specify our setting as in [11]. Let U be an open subset of F. If f:U→R is p+1 times differentiable on U, the Taylor expansion of f around x∈U is
f(y)=p∑j=01j!Djf(x)dj(x,y)+Rp(y), |
where Djf(x) denote the j th Fréchet derivative of f at x, with x and y being points of U and d(⋅,⋅) the metric of F. The remainder is given by
Rp(y)=1p!∫10Dp+1f{x+td(x,y)}dp+1(x,y)(1−t)pdt=o{dp(x,y)}, |
as d(x,y)→0; see [26] and [90] for more details. Then, in our setting, by following a similar reasoning as in [9] and [11], the local polynomial estimator of order p of the regression function denoted by ˆr(x) of r(x) is defined as the first component of the estimate ˆc(x)=(ˆc0(x),…,ˆcp(x)) which is obtained by the following minimization problem
ˆc(x)=argminc(x)∈Rp+1n∑i=1(Yi−p∑l=0cl(x)βl(Xi,x))2K(h−1Kδ(x,Xi)). | (2.3) |
Furthermore, the estimator ˆrl(x)=l!ˆcl(x), where 0≤l≤p can be expressed similarly as an estimator of the existing lth order derivatives of the regression function r(⋅). Then the estimator ˆrl(x) can be written as
ˆc(x)=(⊤QβWQβ)−1⊤QβWY, | (2.4) |
where ⊤Qβ is the matrix defined by
⊤Qβ=(1⋯1β(X1,x)⋯β(Xn,x)⋮⋯⋮βp(X1,x)⋯βp(Xn,x)),Y=⊤(Y1,…,Yn),andW=diag(K(h−1kδ(Xi,x))), |
where ⊤ is the transpose symbol. Let us introduce
un,j=1nE(K)n∑i=1(β(Xi,x)hK)jK(h−1K(δ(Xi,x)),vn,j=1nE(K)n∑i=1(β(Xi,x)hK)jK(h−1K(δ(Xi,x))Yi, |
Un=(un,0⋯un,p⋮⋱⋮un,p⋯un,2p),andVn=⊤(vn,0,…,vn,p). |
Keeping in mind this notation, we have the representation
ˆc(x)=diag(1,h−1K,h−2K,…,h−pK)U−1nVn. |
Interestingly, this class of estimators includes both the classical Nadaraya-Watson estimator, which minimizes (2.3) when p=0, and the local linear kernel estimator, which corresponds to p=1.
Let x (resp. y) be a fixed point in F (resp. ∈R), and Nx (resp. Ny) denote a fixed neighborhood of x (resp. y).
In the remainder of the paper, we denote the closed ball in F of center x∈F and radius h by
B(x,h):={x′∈F:|δ(x,x′)|≤h}. |
We define
ϕx(l1,l2)=P(l2≤δ(X,x)≤l1), |
where l1 and l2 are two real numbers. To establish the asymptotic behavior of the local polynomial of our estimator, we need some following assumptions.
(H1) For any u>0, ϕx(u):=ϕx(−u,u)>0 where limu→0ϕx(u)=0.
It is easy to see that
φx(u)=P(X∈B(x,u)). |
When the function |δ(⋅,⋅)| satisfies the conditions of a metric or semi-metric, the expression φx(u) can be understood as the probability of a ball in the set F that is centered at x and has a radius of u. When the value of u approaches zero, the term "small ball probabilities" is frequently employed in the field of probability theory, which has been extensively studied (refer to [54] for a comprehensive summary of this topic in the context of Gaussian processes). The function φx(⋅), which is an extension of the small ball probability concept, serves a similar purpose in the functional situation as the density function does in the finite-dimensional context. In the context of multivariate nonparametric analysis, it is customary to estimate a specific quantity at a given position using a sufficiently large number of observations available in the vicinity. One commonly employed method for assuming this nature is to claim that the density function evaluated at this particular location possesses a strictly positive value. In the context of infinite dimensions, the absence of a reference measure, such as the Lebesgue measure in finite dimensions, necessitates the adoption of a similar assumption that does not rely on the concept of density. The objective of Hypothesis 1 (H1) is to incorporate the functional aspect, indicating that there are a sufficient number of observations surrounding x, hence justifying the estimation of the regression operator at the specific point x. The following conditions are needed in our analysis.
(H2) (Xi,Yi)i∈N is an α-mixing sequence and
(ⅰ) ∃a>0,∃C>0,∀n∈N,α(n)≤Cn−a,
(ⅱ) supi≠jP((Xi,Xj)∈B(x,hK)×B(x,hK))≤ψx(hK), where ψx(hK) is such that there exists ϵ∈]0,1] for which 0<ψx(hK)=O(ϕ1+ϵx(hK));
(H3) (ⅰ) r(⋅) and σ(⋅) are continuous in the neighbourhood of x, which means that r(⋅) and σ(⋅) are both in the set
{f:F→R,lim|δ(x′,x)|→0f(x′)=f(x)}, |
and there exists C>0, such that, almost surely,
supi≠jE(|YiYj|∣(Xi,Xj))≤C<∞, |
(ⅱ) For any k∈{1,2,…,p,p+1}, the quantities Ψ(k)(0) exist, where Ψ(k)(⋅) denotes the kth derivative of Ψ(⋅) defined by Ψ(s)=E(r(X)−r(x)∣β(X,x)=s);
(H4) The locating operators β(⋅,⋅) and δ(⋅,⋅) satisfy the following conditions:
(ⅰ) ∀z∈ F,C1|δ(x,z)|≤|β(x,z)|≤C2|δ(x,z)|;
(ⅱ) supv∈B(x,r)∣β(v,x)−δ(x,v)∣=o(r);
(H5) The kernel K(⋅) is a bounded and positive function which is supported within [−1,1] and for which the first derivative K′(⋅) satisfies: K(1)>0, K′(t)<0, for t∈[−1,1];
(H6) There exists a positive integer n0 for which
−1ϕx(hK)∫1−1ϕx(zhK,hK)ddz(z2K(z))dz>C3>0 for n>n0, |
and
h2K∫B(x,hK)∫B(x,hK)β(u,x)β(t,x)dP(X1,X2)(u,t)=o(∫B(x,hK)∫B(x,hK) β2(t,x)β2(u,x)dP(X1,X2)), |
where dP(X1,X2) is the cumulative distribution of (X1,X2);
(H7) (ⅰ) For a>1, the function ϕx(⋅) satisfies
∃η>0,C2n11−a≤ϕx(hK)≤C1n13−2a−η; |
(ⅱ) The bandwidths hK satisfy
limn→∞hK=0,limn→∞lognnϕx(hK)=0 and ∃β>0such thatlimn→∞nβhK=0. |
Before presenting our result, we need additional notation. In the sequel, we define certain quantities appearing in the dominant terms of the bias and of the variance in the asymptotic results. For j∈{1,2}, let
Mj=Kj(1)−∫1−1(Kj(u))′χx(u)du, | (3.1) |
where χx(⋅) is a a function such that
limhK→0ϕx(−hK,thK)ϕx(hK)=χx(t) for any t∈[−1,1]. |
For all a>0, and b>0, let
N(a,b)=Ka(1)−∫1−1(ubKa(u))′χx(u)du. | (3.2) |
Assumption (H2-ⅰ) requires that the mixing coefficients of the dependent case tend to zero at a suitably mild rate. Assumption (H2-ⅱ) describes the behavior of the joint distribution of the pair (Xi,Xj) in relation to its marginal distributions, allowing us to present an explicit asymptotic variance term. Assumption (H3-ⅰ) imposes the usual moment condition on the responses and the covariance structure of the dependent sample, as detailed in [38]. Assumption (H3-ⅱ) specifies the necessary smoothness condition for the current setting. Assumption (H4) is a standard condition in nonparametric estimation. Condition (H5) is very usual in nonparametric estimation literature devoted to the functional data context. Notice that [70] symmetric kernel is not adequate in this context since the random process d(x,Xi) is positive, therefore we consider K(⋅) with support [0,1]. This is a natural generalization of the assumption usually made on the kernel in the multivariate case where K(⋅) is supposed to be a spherically symmetric density function. Assumption (H6) describes the behavior of the bandwidth h in relation to the small ball probabilities and the kernel function K(⋅). Assumption (H7-ⅰ) is a technical condition illustrating the relationship between the small ball probability and an arithmetically α-mixing coefficient, as discussed in [42]. Assumption (H7-ⅱ) is satisfied when hK=n−ϱ and ϕx(hk)=hϑK, provided ϑ>0 and β<ϱ<1/ϑ. In the following, we use the notation ZD=N(μ,σ2) to indicate that the random variable Z follows a normal distribution with mean μ and variance σ2. The symbol D→ denotes convergence in distribution, and P→ denotes convergence in probability.
Remark 3.1. According to [40], our methodology is heavily dependent on the distribution function ϕ(⋅). This dependency is evident in our conditions and the convergence rates of our estimate (via the asymptotic behavior of the quantity nϕ(h)). More precisely, the behavior of ϕ(⋅) around 0 is of paramount importance. Consequently, the small ball probabilities of the underlying functional variable X are crucial. In probability theory, the calculation of P(‖X−x‖<s) for "small" s (i.e., for s tending toward zero) and for a fixed x is known as the "small ball problem". Unfortunately, there are solutions for very few random variables (or processes) X, even when x=0. In several functional spaces, taking x≠0 presents formidable obstacles that may be insurmountable. Typically, authors emphasize Gaussian random variables. We refer you to [54] for a summary of the key findings regarding the probability of small balls. If X is a Gaussian random element on the separable Banach space E and x belongs to the reproducing kernel Hilbert space associated with X, then the following well-known result holds:
P(‖X−x‖≤s)∼CxP(‖X‖≤s),ass→0. |
As far as we know, the results available in the published literature are essentially all of the form
P(‖X−x‖<s)∼cxs−αexp(−C/sβ), |
where α,β,cx, and C are positive constants, and ‖⋅‖ may be a supremum norm, an Lp norm, or a Besov norm. The interested reader can refer to [12,16,20,38,39,40,79] for further discussion. Notably, the pioneering book by [38] extensively comments on the links between nonparametric functional statistics, small-ball probability theory, and the topological structure of the functional space E.
In the following theorem, we present the limiting law.
Theorem 3.2. Assume that the assumptions (H1)–(H7) are satisfied. We have, as n→∞,
√nϕx(hK){diag(1,hK,…,hpK)(ˆr(x)−r(x)ˆΨ(1)(0)−Ψ(1)(0)⋮ˆΨ(p)(0)−Ψ(p)(0))−hp+1K(p+1)!Ψ(p+1)(0)S−1U}D→N(0,σ2(x)S−1VS−1), |
where
S=(1⋯N(1,p)M1⋮⋱⋮N(1,p)M1⋯N(1,2p)M1),V=diag(M2M21,N(2,2)M21,…,N(2,2p)M21),U=(N(1,p+1)M1⋮N(1,2p+1)M1), |
where ˆcp(x) is a consistent estimator of Ψ(p)(0). Henceforth, we will denote ˆcp(x) as ˆΨ(p)(0), and let ˆr(x)=ˆc0(x).
The proof of Theorem 3.2 is postponed until Section 6. To eliminate the bias term, we need the following additional assumption.
(H8) Assume that limn→∞nh2p+2Kϕx(hK)=0.
Corollary 3.3. Assume that the conditions (H1)–(H8) are satisfied. We have, as n→∞,
√nϕx(hK){diag(1,hK⋯hpK)(ˆr(x)−r(x)ˆΨ(1)(0)−Ψ(1)(0)⋮ˆΨ(p)(0)−Ψ(p)(0))}D→N(0,σ2(x)S−1VS−1). |
Remark 3.4. To cancel the bias term, we need nh2p+2Kϕx(hK)→0, as n→∞. Consequently, the last condition and condition nϕx(hK)→∞ are satisfied as soon as hK=n−ξ, 0<ξ<1, with and ϕx(hK)=hcK, for 1ξ−(2p−2)<c<1ξ.
A lower α th quantile of the vector Vn is any quantity τnα∈Rp+1 satisfying τnα=inf{ε:P(Vn≤ε)≥α}, where ε is an infimum over the given set only if there does not exist a ε1<ε in Rp+1 such that P(Vn≤ε1)≥α. We can, without loss of generality, assume P(Vn≤τnα)=α. Since variance matrix Σ(x)=σ2(x)S−1VS−1 is unknown and assumed non-singular, see Corollary 4 of Appendix 1 of [4], there exists a non-singular matrix C(x) such that
C(x)Σ(x)−1C(x)=Ip+1. |
Let us first give a consistent estimate of Σ(x). Making use of the decomposition of χx(t) in (3.1), one may estimate χx(t) by
ˆχx(t)=Fx,n,1(−hk,thk)Fx,n,2(hK), |
where
Fx,n,1(t,u)=1nn∑i=11{t≤δ(x,Xi)≤u} and Fx,n,2(t)=1nn∑i=11{|δ(x,Xi)|≤t}. |
Hence we have the following estimates, for j=1,2,
ˆMj=Kj(1)−∫1−1(Kj(u))′ˆχx(u)du. | (3.3) |
In a similar way, for all a>0, and b>0, we have
ˆN(a,b)=Ka(1)−∫1−1(ubKa(u))′ˆχx(u)du. | (3.4) |
Hence one can consistently estimate Σ(x) by ˆΣ(x)=ˆσ2(x)ˆS−1ˆVˆS−1, where
ˆS=(1⋯ˆN(1,p)ˆM1⋮⋱⋮ˆN(1,p)ˆM1⋯ˆN(1,2p)ˆM1),V=diag(ˆM2ˆM21,ˆN(2,2)ˆM21,…,ˆN(2,2p)ˆM21),U=(ˆN(1,p+1)ˆM1⋮ˆN(1,2p+1)ˆM1). |
For n large enough, we have
ˆC(x)ˆΣ(x)−1ˆC(x)=Ip+1. |
An application of Slutsky with Corollary 3.3 gives the following result.
Corollary 3.5. Assume that the conditions (H1)–(H8) are satisfied. We have, as n→∞,
√nϕx(hK)ˆC(x){diag(1,hK⋯hpK)(ˆr(x)−r(x)ˆΨ(1)(0)−Ψ(1)(0)⋮ˆΨ(p)(0)−Ψ(p)(0))}D→N(0,Ip+1). |
This corollary can be applied directly to the construction of the confidence regions.
In this part, we will establish the rate of uniform almost complete convergence over some subset of F. The following definition is needed.
Definition 3.6. Let ϵ>0 be given number. A finite set of points x1,x2,…,xN in F is called an ϵ−net for S if S⊂∪Nk=1B(xk,ϵ). The quantity Ψs(ϵ)=log(Nϵ(S)), where Ψs(ϵ) is the minimal number of open balls in F of radius ϵ which is necessary to cover S, is called Kolmogorov's ϵ−entropy of the set S.
The concept of entropy, introduced by Kolmogorov in the mid-1950s (see [52]), serves as a measure of set complexity, indicating that high entropy implies a significant amount of information is required to accurately describe an element within a given tolerance ε. Consequently, the selection of the topological structure, specifically the choice of the semi-metric, plays a crucial role when examining asymptotic results that are uniform over a subset S of F. In particular, we subsequently observe that a well-chosen semi-metric can enhance the concentration of the probability measure for the functional variable X, while minimizing the ε-entropy of the subset S. Ferraty and Vieu [38] emphasized this phenomenon of concentration of the probability measure for the functional variable by calculating small ball probabilities in different standard scenarios (refer to [41]). For readers interested in these concepts (entropy and small ball probabilities) and/or the utilization of Kolmogorov's ε-entropy in dimensionality reduction problems, we recommend referring to [53] or/and [69], respectively. We establish the uniform almost-complete convergence of our estimator on some subset SF such that:
SF⊂N⋃k=1B(xk,ιn), |
where xk∈F and ιn (resp. dn) is a sequence of positive real numbers. To establish our result, we need the following hypotheses.
(U1) The hypothesis (H1) satisfies the following condition: there exists a differentiable function ϕ(⋅), such that
∀x∈F,0<C1ϕ(hk)≤P(X∈B(x,hk)≤C2ϕ(hk), |
and there exists n0 such that for each n<n0
ϕ′(n)<C, |
where ϕ′(⋅) represents the first derivative of ϕ(⋅) and ϕ(0)=0;
(U2) The hypothesis (H2) is satisfied uniformly on x∈SF;
(U3) The function β(⋅,⋅) satisfies (H4) and the following Lipschitz condition, there exists a positive constant C, for all x1,x2 in F, we have
∣β(x,x1)−β(x,x2)∣≤Cd(x1,x2); |
(U4) The regression operator r(⋅) satisfies: there exists a positive constant C and s>0, such for each x∈F and z∈B(x,hK), we have
∣r(x)−r(z))∣≤Cds(x,z); |
(U5) The kernel function K(⋅) satisfies (H5) and is Lipschitzian on [−1,1];
(U6) In addition to (H7), the Kolmogorov's ε-entropy of SF satisfies:
(ⅰ) ∃n0,∀n>n0,log2nnϕ(h)<ψSF(lognn)<nϕ(h)logn.
(ⅱ) ∃λ>1, such that
∞∑n=1exp{(1−λ)ψSF(lognn)}=∞∑n=1d1−λn<∞. |
Conditions (U1) and (U2) correspondingly serve as the uniform counterparts of (H1) and (H2). Assumption (U3), initially introduced and discussed by [9], plays a pivotal role in our methodology, particularly when computing the leading dominant terms in our asymptotic results. Moreover, (U4) proves essential for evaluating the bias component of the uniform convergence rates, and (U5) is a classical requirement in functional estimation within finite or infinite-dimensional spaces, with examples of kernel functions satisfying this condition available in [38]. The final condition regarding entropy, (U6), implies that ψSF(lognn)=o(nϕ(hK)) as n tends to infinity. However, in certain "usual" scenarios, such as a Hilbert space with a projection semi-metric, one can consider ΨsiF(lognn)∼Clogn, and (H6 i) is fulfilled whenever (logn)2=O(nϕ(h)). For further insights, one may refer to Example 4 in [41].
We are now equipped to state the main result of this section concerning the uniform almost complete convergence with rate.
Theorem 3.7. Assume that the assumptions (U1)–(U6) are satisfied. We have as n→∞,
supx∈SF‖ˆc(x)−c(x)‖F=O(hp+1K)+O‡a.co.(logdnnϕ(hK)), |
‡Let (un) for n∈N be a sequence of real random variables. We say that (un) converges almost-completely (a.co.) toward zero if, and only if, for all ϵ>0, ∑∞n=1P(|un|>ϵ)<∞. Moreover, we say that the rate of almost-complete convergence of (un) toward zero is of order vn (with vn→0), and we write un=Oa.co.(vn) if, and only if, there exists ϵ>0 such that ∑∞n=1P(|un|>ϵvn)<∞. This type of convergence implies both almost sure convergence and convergence in probability.
where
ˆc(x)=(ˆr(x),hKˆΨ(1)(0),…,hpKˆΨ(p)(0)). |
In this section, we will establish results analogous to Theorem 3.2 for the local polynomial estimator of order p of the Cumulative distribution function which we will denote by ˆFx(⋅). We notice that the distribution function is expressed in terms of regression depending on the choice of the function ψ(Y) (see Eq (2.1)). Then if ψy(Y)=1Y≤y, the distribution function is defined by
∀y∈R:Fx(y)=P(Y≤y|X=x). |
The functional local polynomial estimator of Fx(y) is based on the minimization, with respect to ˆa=(ˆa0,…,ˆap), of
ˆa=argmin(a0,…,ap)∈Rp+1n∑i=1(L(h−1L(y−Yi))−p∑l=0al(x)βl(Xi,x))2K(h−1Kδ(x,Xi)), | (4.1) |
where L(⋅) is cumulative distribution function and (hL=hL,n) is a sequence of positive real numbers. In the realm of functional statistics, the estimation of the conditional cumulative distribution function holds significant importance both theoretically and practically. This is evident in various applications, such as reliability and survival analysis. Additionally, within the domain of nonparametric statistics, numerous prediction tools, including conditional density, conditional quantiles, and conditional mode, play crucial roles. It is worth noting that these considerations apply particularly when p, is equal to 1. The almost-complete convergence of the estimator ˆFx(⋅) in the case of p=1 has been investigated by [31]. Furthermore, [14] has explored the asymptotic normality of the local linear estimation of the conditional function in the scenario of independent observations. For the development of our estimator, we adopt a similar approach as used for the regression operator. Consequently, the local polynomial estimator of ˆFx(y) can be explicitly expressed as follows:
(ˆFx(y),ˆa1,…ˆap)=diag(1,h−1K,h−2K,…,h−pK)A−1nBn, |
where
An=(Λn,0⋯Λn,p⋮⋱⋮Λn,p⋯Λn,2p),andBn=⊤(Υn,0,…,Υn,p). |
We denote
An,j=1nE(K)n∑i=1(β(Xi,x)hK)jK(h−1K(δ(Xi,x)),Υn,j=1nE(K)n∑i=1(β(Xi,x)hK)jK(h−1K(δ(Xi,x))L(h−1L(Yi−y)). |
To establish the asymptotic convergence of ˆFx(y) we need some notation and assumptions. For any l∈{0,2,…,2p}, we set
φl(⋅,y)=∂lF⋅(y)∂yl and ψl(s)=E[φl(X,y)−φl(x,y)|β(X,x)=s]. |
(M1) For any k∈{1,…,p+1}, the quantities ψ(k)l(0) exist, where ψ(k)l(⋅) denotes the kth order derivative of ψl(⋅).
(M2) The kernel K(⋅) satisfies the assumption (H5) and its first derivative K(1)(⋅) satisfies:
K2(1)−∫1−1(K2(u))′Ψx(u)du>0. |
(M3) The kernel function L(⋅) is a positive function, symmetric, bounded, integrable, and such that and
∫Rz2pL(z)dz<∞. |
(M4) The coefficients of α-mixing sequence(Xi,Yi)i∈N satisfies (H2) and the following condition,
+∞∑l=1lδ(α(l))1p<∞ for some p>0 and δ>1p. |
(M5) The function ϕx(⋅) satisfies the assumption (H1).
(M6) Let (wn) and (qn) be sequences of positive integers tending to infinity, such that (wn+qn)≤n, and
(ⅰ) qn=o((nϕx(hK))12) and limn→+∞(nϕx(hK))12α(qn)=0,
(ⅱ) rnqn=o((nϕx(hK))12) and limn→+∞rn(nϕx(hK))12α(qn)=0, where rn is the largest integer, such that rn(wn+qn)=O(n).
Theorem 4.1. Under assumptions (M1)–(M6), we have
√nϕx(hK){diag(1,hK,…,hpK)(^Fx(y)−Fx(y)^a1−Ψ(1)0(0)⋮^ap−Ψ(p)0(0))−BpK(x,y)−BpL(x,y)}D→N(0,VLK(x,y)), |
where
BpL(x,y)=p∑j=0[E(βj1K1)hjKE(K1)I+p∑k=1p∑a=1h2kL(2k)!∫t2kL′(t)dtΨ(a)k(0)E(βj+a1K1)hjKE(K1)],I=(h2L2φ2(x,y)∫t2L′(t)dt,…,h2pL(2p)!φ2p(x,y)∫t(2p)L′(t)dt)T,BpK(x,y)=hp+1K(p+1)!Ψ(p+1)0(0)S−1U,VLK(x,y)=Fx(y)(1−Fx(y))S−1VS−1. |
Remark 4.2. Let x∈F be a fixed element, and y∈R. If we define ψy(Y)=1]−∞,y](Y), then the operator rψ(x)=E(ψy(Y)∣X=x) represents the conditional cumulative distribution function (CDF) of Y given X=x, denoted as F(y∣x)=P(Y≤y∣X=x). This can be estimated as ˜F(y∣x):=ˆrψ,n(x). For a given α∈(0,1), the α-th order conditional quantile of the distribution of Y given X=x is defined as qα(x)=inf{y∈R:F(y∣x)≥α}. If F(⋅∣x) is strictly increasing and continuous in a neighborhood of qα(x), then F(⋅∣x) has a unique quantile of order α at qα(x), i.e., F(qα(x)∣x)=α. In such cases:
qα(x)=F−1(α∣x)=inf{y∈R:F(y∣x)≥α}, |
which is uniquely estimated by ˆqn,α(x)=˜F−1(α∣x). Conditional quantiles have been extensively studied when the predictor X is of finite dimension. Since F(qα(x)∣x)=α=˜F(ˆqn,α(x)∣x), and ˜F(⋅∣x) is continuous and strictly increasing, we have, for any ϵ>0, there exists η(ϵ)>0 such that for all y:
|˜F(y∣x)−ˆFT(qα(x)∣x)|≤η(ϵ)⇒|y−qα(x)|≤ϵ. |
This implies that, for any ϵ>0, there exists η(ϵ)>0 such that:
P(|ˆqn,α(x)−qα(x)|≥η(ϵ))≤P(|˜F(ˆqn,α(x)∣x)−˜F(qα(x)∣x)|≥η(ϵ))=P(∣F(qα(x)∣x)−˜F(qα(x)∣x)≥η(ϵ)). |
This result may be immediately used to establish the consistency of the quantile estimator. This topic will be investigated in future research.
Remark 4.3. Over time, it has become increasingly clear that the performance of kernel estimates deteriorates as the dimensionality of the problem increases. This decline is primarily due to the fact that, in high-dimensional spaces, local neighborhoods often lack sufficient sample observations unless the sample size is extraordinarily large. Consequently, in kernel estimation, computing local averages is not feasible unless the bandwidth is significantly wide. This overarching issue is known as the curse of dimensionality [10]. The paper by [77] provides a comprehensive analysis of the feasibility and challenges of high-dimensional estimation, complete with examples and computations. For recent references, see [6,15,16,35]. To address the difficulties associated with high-dimensional kernel estimation and simplify the process, a wide range of dimension reduction techniques have been developed. These techniques major to explain the main features of a set of sample curves using a small set of uncorrelated variables. One effective solution is to generalize principal component analysis (PCA) to the context of a continuous-time stochastic process [33]. The asymptotic properties of the estimators of Functional Principal Component Analysis (FPCA) have been extensively studied in the general context of functional variables [30]. Nonparametric methods have been developed to perform FPCA for cases involving a small number of irregularly spaced observations of each sample curve [50,88]. As in the multivariate case, interpreting principal component scores and loadings is a valuable tool for uncovering relationships among the variables in a functional data set. To avoid misinterpretation of PCA results, a new type of plot, named Structural and Variance Information plots, was recently introduced by [25]. Numerical results suggest that regularization using a smoothness measure often yields more satisfactory outcomes than the FPCA approach. The FPCA method assumes that the top eigenfunctions of the predictors, which are unrelated to the coefficient function or the response, provide a good representation of the coefficient function. While it is generally believed that neither approach will consistently outperform the other, the Reproducing Kernel Hilbert Space (RKHS) method is an interesting alternative that merits attention [58,89]. Spline smoothing is one of the most popular and powerful techniques in nonparametric regression [36,46,47,84]. Penalized splines have gained widespread use in recent years due to their computational simplicity and their connection to mixed effects models [86]. These methods have also found significant success in Functional Data Analysis (FDA), as evidenced by tools like the R package refund. However, there is a consensus that the asymptotic theory of penalized splines is difficult to derive, and many theoretical gaps remain, even in the standard nonparametric regression setting. To our knowledge, there are very few theoretical studies on penalized splines for FDA [87]. Most approaches to functional regression are based on minimizing some L2-norm and are therefore sensitive to outliers. Finally, we can mention methods based on the delta sequences [23] and the wavelet methods [34].
Local polynomial fitting exhibits various statistically significant characteristics, particularly in the intricate domain of multivariate analysis. As functional data analysis gains traction in the field of data science, there arises a need for a specialized theory focused on local polynomial fitting. This study employs local higher-order polynomial fitting to address the complex challenge of estimating the regression function operator and its partial derivatives for stationary mixing random processes, represented as (Yi,Xi). For the first time, we have achieved significant progress by demonstrating the joint asymptotic normality of the estimates for the regression function and its partial derivatives, particularly applicable to strongly mixed processes. Additionally, we derive precise formulas for the bias and the variance-covariance matrix of the asymptotic distribution. We establish the uniform strong consistency of the regression function and its partial derivatives across compact subsets, providing a detailed analysis of their convergence rates. Importantly, these conclusions are grounded in general conditions that form the foundation of the underlying models. To illustrate practical utility, we utilize our findings to compute confidence regions for individual points. Furthermore, we extend our concepts to encompass the nonparametric conditional distribution and acquire its limiting distribution. There are multiple avenues for further methodological development. As a prospective direction, relaxing the stationarity assumption and exploring similar uniform limit theorems for local stationary functional ergodic processes would be fruitful. Additionally, considering the functional kNN local polynomial approach and expectile regression estimators in future investigations could yield alternative estimators benefiting from the advantages of both methods, as discussed in [20] and referenced in [19]. Investigating the extension of our framework to the censored data setting would also be of interest. In an upcoming investigation, obtaining a precise uniform-in-bandwidth limit law for the proposed estimators will be essential. This outcome would allow the bandwidth to vary across a comprehensive range, ensuring the estimator's consistency and serving as a valuable practical guideline for bandwidth selection in nonparametric functional data analysis. It is crucial to acknowledge that advancing research in this direction poses a more challenging task compared to previous efforts. The primary challenge lies in the need to develop new probabilistic results, such as inequalities and maximal moment inequalities specifically tailored for independent and identically distributed (i.i.d.) samples, as discussed in [81].
This section is devoted to the proof of our main result. The previously presented notation continues to be used in the following. Before studying the limit law of ˆc(x), we need to center the vector Vn as follows. Let
u∗n,j=1nE(K)n∑i=1(βihK)jKi(Yi−r(x)), | (6.1) |
where V∗n=⊤(u∗n,0,…,u∗n,p). Making use of assumption (H4) and by the Taylor expansion, we obtain
E(r(Xi)∣βi)=r(x)+E(r(Xi)−r(x)∣βi)=r(x)+Ψ(βi)=r(x)+p∑k=11k!Ψ(k)(0)βpi+o(βpi). |
Therefore, we have
Γ:=⊤((r(X1),…,r(Xn))=Qβ(r(x)Ψ(1)(0)⋮Ψ(p)(0))+1(p+1)!Ψ(p+1)(0)⊤(βp+11,…,βp+1n)+⊤(o(βp+11),…,o(βp+1n)). |
Making used of Eq (6.1), we have:
Un−1V∗n=diag(1,hK,h2K,…,hpK)(ˆc(x)−(⊤QβWQβ)−1QβWΓ)=diag(1,hK,h2K,…,hpK)(ˆr(x)−r(x)ˆΨ(1)(0)−Ψ(1)(0)⋮ˆΨ(p)(0)−Ψ(p)(0))−hp+1K(p+1)!Ψ(p+1)(0)U−1n(un,p+1⋮un,2p+1)−o(hp+1K)U−1n(un,p+1⋮un,2p+1). | (6.2) |
The rest of the proof for this theorem relies on the following lemmas, the proofs of which are provided in the appendix.
Lemma 6.1. Under the assumptions (H1)–(H7), we have, as n→∞,
UnP→SandU−1n(un,p+1⋮un,2p+1)P→S−1U. |
Hence, considering Lemma 6.1, Theorem 3.2 will hold if we can furnish the proof.
Lemma 6.2. Under the assumptions of Theorem 3.2, we have, as n→∞,
√nϕx(hK)V∗nD→N(0,σ2(x)V). |
Oussama Bouannani: Conceptualization, formal analysis, investigation, methodology, validation, writing – original draft, review, edition; Salim Bouzebda: Conceptualization, formal analysis, methodology, validation, writing – original draft, review, edition. All authors have read and approved the final version of the manuscript for publication.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
The authors extend their sincere gratitude to the Editor-in-Chief, the Associate Editor, and the three referees for their invaluable feedback. Their insightful comments have greatly refined and focused the original work, resulting in a markedly improved presentation.
The authors declare no conflict of interest. Salim Bouzebda is the(a) Guest Editor of special issue "Advances in Statistical Inference and Stochastic Processes: Theory and Applications" for AIMS Mathematics. Salim Bouzebda was not involved in the editorial review and the decision to publish this article.
Lemma A.1. (See [92]) Under assumptions (H1), (H2-ii) and (H5)–(H7-i), we have
(1) E(Kj1)=Mjϕx(hK), for j=1,2;
(2) E(Ka1β1)=o(hKϕx(hK)), for all a>0;
(3) E(Ka1βb1)=N(a,b)hbKϕx(hK)+o(hbKϕx(hK)), for all a>0 and b=2,4;
(4) For all (k,l)∈N∗×N,E(Kk1|β1|l)≤ChlKϕx(hK);
(5) For all (a,c,l,s)∈N∗×N∗×N×N,E(Ka1Kc2|β1|l|β2|s)≤Chl+sKψx(hK).
Proof of Lemma 6.1. To prove this lemma, we use the sufficient convergence condition in probability. For this, it suffices to prove the following formulas
E(un,0)=1,E(un,1)P→0,E(un,p)P→N(1,p)M1,E(un,1)P→N(1,2p+1)M1, |
and
Var(un,j)→0,forj∈{0,1,p,2p+1}. | (A.1) |
Using Lemma A.1, we get
E(un,0)=1,E(un,1)=E(hKβ1K1)E(K1)=o(1),E(un,l)=E(h−lKβl1K1)E(K1)⟶N(1,l)M1forl∈{p,2p+1}. |
Concerning Eq (A.1), we can write
Var(un,j)=1n2E(K1)2(n∑i=1Var(βjih−jKKi)+2∑1≤i≠l≤nCov(βjih−jKKi,βjlh−jKKl))=Qn1+Qn2. |
For the term Qn1, we have
Qn1=1n2E(K1)2n∑i=1Var(βjih−jKKi)≤1nE(K1)2E(β2j1h−2jKK21)≤1nϕx(hK). | (A.2) |
Moreover, for computing the term of Qn2, we used the same technique as in [62], we define the sets W1 and W2 as follows
W1={(i,l)∈{1,…,n} such that 1≤∣i−l∣≤un}, |
and
W2={(i,l)∈{1,…,n} such that un+1≤∣i−l∣≤n−1}, |
where the sequence un is selected in such a way that un→+∞ as n→+∞. Based on the aforementioned splitting, we obtain
Q1,n=1n2E2(K1)∑W1Cov(h−jKβjiKi,h−jKβljKl), | (A.3) |
and
Q2,n=1n2E2(K1)∑W2Cov(βckKk,h−jKβjlKl). | (A.4) |
For the sum Q1,n, by assumptions (H2-ii), (H5) and by Lemma of [92], we have
∣Cov(h−2jKβjiKi,βjlKl)∣≤∣E(h−2jKβjiKiβjlKl)∣+E2(h−jKβj1K1)≤C(E(KiKl)+E2(K1))≤C(ϕ1+ϵx(hK)+ϕ2x(hK))≤C(ϕ1+ϵx(hK)). |
Then, we readily infer
∣Q1,n∣≤1n2E2(K1)∑W1Cϕ1+ϵx(hK)≤unnϕ−1+ϵx(hK). |
On the other hand, for computing the sum of covariance Q1,n, we use the inequality for the bounded mixing processes, then for any l≠i, we have
∣Cov(h−jKβjiKi,h−jKβjlKl)∣≤Cα(|i−l|). |
Next, using the inequality
∑j≥x+1j−a≤∫u≥xu−a, |
we derive
n∑i=1∑W2α(|i−l|)≤Cn(un)1−aa−1. | (A.5) |
Therefore, we readily obtain
|Q2,n|≤C(un)1−anϕ2x(hK). |
Subsequently, by choosing un=⌊ϕ1+ϵx(hK))−1/a⌋ and by assumption (H7), we obtain
Q1,n→0 and Q2,n→0 as n→∞. |
Hence the proof is complete.
Proof of Lemma 6.2. We consider a given vector of real numbers, denoted as a=⊤(α0…αp)≠0, then
√nϕx(hK)a⊤V∗n=1√nn∑i=1(Zi−E(Zi))+1√nn∑i=1E(Zi), | (A.6) |
where
Zi=√nϕx(hK)E(K1)(p∑j=0αjh−jKβji)Ki(Yi−r(x)). |
By the Cramér-Wold theorem and Slutsky's theorem, to show (13.6), it suffices to prove the following two claims:
1√nn∑i=1E(Zi)=√nϕx(hK)Ψ′(0)M1(α0o(hK)+p∑j=1ajhj+1KN(1,j+1)), | (A.7) |
and
1√nn∑i=1(Zi−E(Zi))D→N(0,σ2(x)a⊤Va). | (A.8) |
First, we proof Claim (A.7). We have
1√nn∑i=1E(Zi)=√nE(Z1)=√nϕx(hK)E(K1)E((α0+p∑j=1αjh−jKβj1)K1(Y1−r(x)))=nϕx(hK)E(K1)a0E(K1(Y1−r(x)))⏟S1+nϕx(hK)E(K1)p∑j=1ajE(h−jkβjK1(Y1−r(x)))⏟S2. |
Next, by the same arguments as those used by [72] and [40], we get
S1=α0Ψ′(0)E(K1Ψ(β1))+o(E(K1β1))=α0Ψ′(0)o(hKϕx(hK), |
and
S2=p∑i=1ajΨ′(0)hj+1KN(1,j+1)ϕx(hK), |
then, using Lemma A.1, we obtain
1√nn∑i=1E(Zi)=√nϕx(hK)(α0Ψ′(0)M1o(hK)+p∑i=1ajΨ′(0)hj+1KN(1,j+1)). | (A.9) |
Now, we prove Claim (A.8). Using The CLT by [56] (see Corollary 2.2, Page 196), which rests on the asymptotic behavior of the quantity
limn→∞n∑i=1E[Δ2i], | (A.10) |
where Δi=1√n(Zi−E(Zi)) in addition to the assumptions:
Assumption 1. (A1) There exists a sequence τn=o(√n), such that
(i) τn=o(√n) such that τn≤(maxi=1,…,nCi)−1, where Ci=esssupω∈Ω|Δi|,
(ii) for all ϵ>0, nτn α(ϵτn)→0, for all ϵ>0.
Assumption 2. (A2) There exists a sequence (mn) of positive integers tending to ∞, such that
(i) nmnγn=o(1), where γn:=max1≤i≠j≤n(E|ΔiΔj|),
(ii) for all ϵ>0, nτnα(ϵτn)→0.
We start by computing the limit of (A.10). In order to do so, let us observe that
n∑i=1E(Δ2i)=Var(Z1)=ϕx(hK)E2(K1)E((α0+p∑j=1αjh−jKβj1)2K21(Y1−r(x))2)−ϕx(hK)E2(K1)E2((α0+p∑j=1αjh−jKβj1)K1(Y1−r(x)))=A1−A2. |
Using the fact that
A1=ϕx(hK)E2(K1)(α20E(K21(Y1−r(x))2)⏟I1+2α0p∑j=1αjE(h−jKβj1K21(Y1−r(x))2⏟I2)+ϕx(hK)E2(K1)E((p∑j=1αjh−jKβj1)2K21(Y1−r(x))2⏟I3), |
in combination with Lemma A.1, we readily find
I1=a20σ2(x)M1ϕx(hK),I2=a0p∑j=1ajσ2(x)O(ϕx(hK)), |
and
I3=p∑j=1a2jE(h−2jKβ2j1(K21(Y1−r(X1))2))+2p∑1≤i≠j≤najaiE(h−j−iKβj+i1(K21(Y1−r(X1))2))=σ2(x)M1p∑j=1a2jN(2,2j)ϕx(hK)+2σ2(x)M1p∑1≤i≠j≤najaiO(ϕx(hK)). |
Under assumptions (H1), we have
limn→+∞A1=σ2(x)M1M22(a20+p∑j=1ajN(2,2j))=σ2(x)a⊤Va. |
Subsequently, the assertion (A.7) suggests that E(Z1)→0 as n→∞. Consequently, we infer that A2→0 as n→∞. Concerning Assumption (1), the fact that the kernel K(⋅) is bounded, allows us to infer that
Ci=O(1√nϕx(hK)). |
Therefore, an appropriate choice is the following
τn=√nϕx(hK)logn. |
Furthermore, this choice gives, for all ϵ>0
nτn α(ϵτn)≤C(n1−(a+1)/2(ϕx(hK))−(a+1)/2(logn)(a+1)/2)≤Cn1−(a+1)/2+(a+1)/2(a−1)(logn)(a+1)/2≤Cn(3a−a2)/2(a−1)(logn)(a+1)/2→0 since a>3. |
Concerning Assumption (2), using assumptions (H2-ⅱ) and (H3), we have
E|ΔiΔj|≤ϕx(hK)nE(K1)2∑0≤j≠i≤p|αiαj|E(KiKj)≤∑0≤j≠i≤p|αiαj|nϕx(hK)supi≠jP((Xi,Xj)∈B(x,hK)×B(x,hK))≤Cϕϵx(hK)n. |
Elsewhere, using the fact that
∑j≥x+1j−a≤∫u≥xu−a=[(a−1)xa−1]−1, | (A.11) |
we obtain
∞∑j=mn+1α(j)≤∞∑j=mnα(j)≤∫t≥mnt−adt=m1−ana−1, |
thus,
(∞∑j=mn+1α(j))n∑i=1Ci=O(m1−ana−1√nϕx(hK)). |
We select
mn=⌊(ϕx(hK)nlogn)⌋1/(2(1−a)), |
such that ⌊⋅⌋ represents the function of the integer part. It is evident that, under assumption (H7), mn→∞. Furthermore, by substituting the expression for mn, we readily get
∞∑j=mn+1α(j)n∑i=1Ci=o(1). |
Once more, considering assumption (H2-ⅰ) and (H7), we have
mnγn≤Cn−1−1/(2(1−a)(ϕx(hK))1+1/(2(1−a)(logn)−1/(2(1−a))≤n(−3+2a)/(2(1−a)(ϕx(hK))(3−2a)/(2(1−a)(logn)−1/(2(1−a))≤n−1+η(2a−3)/(2(1−a)(logn)−1/(2(1−a))=o(n−1). |
Hence the proof is complete.
Proof of Theorem 3.7. According to the formula of (6.2), we deduce the following error formula
ˆc(x)−c(x)=U−1n(x)V∗n(x)−U−1n(x)en(x)+hp+1K(p+1)!Ψ(p+1)(0)U−1n(x)en(x)+O(hp+1K)U−1n(x)en(x), |
where en(x)=⊤(un,p+1(x),…,un,2p+1(x)). Hence, Theorem 3.7 is a direct result of the following lemmas.
Lemma A.2. Assume that the conditions (U1), (U3), (U5) and (U6) are fulfilled. We have, as n→∞,
supx∈SF|unj(x)−E(unj(x))|=Oa.co.(√logdnnϕx(hK)). |
Lemma A.3. Assume that the conditions (U1)–(U6) are fulfilled. We have, as n→∞,
supx∈SF|v∗nj(x)−E(v∗nj(x))|=Oa.co.(√logdnnϕx(hK)), |
supx∈SF|enj(x)−E(enj(x))|=Oa.co.(√logdnnϕx(hK)). |
Lemma A.4. Assume that the conditions (U1)–(U5) are fulfilled. We have, as n→∞,
supx∈SF|E(v∗nj(x))|=O(hp+1K),supx∈SF|E(enj(x))|=O(1). |
Proof of Lemma A.2. The proof of this lemma relies on Proposition A.11-ⅰ in [38]. In order to apply the latter, we need to compute a certain quantity. In a first attempt, we evaluate the expression E(unj(x)). For this, we use Lemma A.1 and by the fact that observations are identically distributed, we have
E(unj(x))=1hjKE(K1)E(βj1K1) | (A.12) |
=O(1). | (A.13) |
Moreover, for the covariance term, we have :
S2n,α,l(x)=n∑i=1n∑i=1|Cov(Aαi(x),Alj(x))|, | (A.14) |
where
Ali(x)=1hlK(βliki(x)−E(βli(x)ki(x)),forl∈{1,2,…,p}. | (A.15) |
Under some of the dependence assumptions, we treat the term of S2n,α,l(x). Then we have
S2n,α,l(x)=Rn,1(x)+Rn,2(x)+nVar(Al1(x)), |
with
Rn,1(x)=∑W1Cov(Alj(x)),Aαi(x)));W1={(i,j)∈{1,2,…,n} such that 1≤∣i−l∣≤un}, |
and
Rn,2(x)=∑W2Cov(Alj(x)),Aαi(x)));W2={(i,j)∈{1,2,…,n} such that un+1≤∣i−l∣≤n−1}. |
For the term variance using Lemma A.1 and we deduce that
nVar(Al1(x))=O(nϕx(hK)). | (A.16) |
Concerning the covariance term Rn,1(x), we use the same idea as those used in the Eq (A.3) with the choice un=ϕ−ϵx(hK), we get
Rn,1(x)=O(nϕx(hK)), | (A.17) |
and for the term Rn,1(x) we applied Rio inequality, see Theorem B.3 (ⅱ). So for this technique, we need to compute the absolute moments of the random variable Alj(x). Then, using a Newton expansion, we have
E[|Alj(x)|m]=E|h−lmK(Kj(x)βlj(x)−E[Kj(x)βlj(x)])m|=h−lmKE|m∑i=0Cim(Kj(x)βlj)i(x)(E[Kj(x)βlj(x)])m−i(−1)m−i|≤h−lmKm∑i=0CimE|Kj(x)βlj|i(x)|E[Kj(x)βlj(x)]|m−i≤h−lmKm∑i=0CimE|Kij(x)βlij(x)||E[Kj(x)βlj(x)|m−i, |
where Cim=m!i!(m−i)!. Next, by applying Lemma A.1, we get
E[|Alj|m]=O(ϕx(hK)). |
By following the same reasoning used to establish Eq (A.4) with the choice un=ϕ−ϵx(hK) permits to get:
Rn,2(x)=O(nϕx(hK)). | (A.18) |
Finlay, by Eqs (A.16), (A.17) and (A.18), we deduce that
S2n,α,l(x)=O(nϕx(hK)). | (A.19) |
On the other hand, we can write
unj(x)−E(unj(x))=1nϕx(hK)n∑i=1Ali(x), | (A.20) |
where Ali(x) is defined in (A.15). Now, we apply the Definition 3.6 to show the uniform convergence of our estimator on a subset S of F. For this aim, we take
z(x)=argminj∈{1,…,dn}|δ(x,xj)|. |
Let us consider the following decomposition:
supx∈SF|unj(x)−E(unj(x))|≤supx∈SF|unj(x)−unj(xz(x)))|⏟T1+supx∈SF|unj(xz(x))−E(unj(xz(x)))|⏟T2+supx∈SF|E(unj(xz(x)))−E(unj(x))|⏟T3. |
Concening the term T2, for all ϵ>0, we have
P(T2>ϵ)=P(maxj∈{1,…,dn}|unj(x)−E(unj(xz(x)))|>ϵ)≤dnmaxj∈{1,…,dn}P(|unj(x)−E(unj(xz(x)))|>ϵ)≤dnmaxj∈{1,…,dn}P(|Anj(xz(x)))|>ϵnϕ(hK)). |
Next, we can apply the Proposition A.11 of [38] for any m>2,τ>0,ϖ≥1 and for certain 0<C<∞,
P(T2>τ)<C(A1+A2), |
where
A1=dn(1+τ2n2ϕ(hK)2ϖS2n,α,l)−ϖ/2,A2=dnnϖ−1(ϖτnϕ(hK))(a+1)m/(a+m). | (A.21) |
By using equation of (A.19) and under assumption (U1), we obtain that
S2n,α,l=supx∈SFS2n,α,l(x)=O(nϕ(hK)). |
Now, for η>0, we choose
τ=η√logdnnϕ(hK)andϖ=(log(dn))2. |
Therefore, by utilizing the fact that ΨSF(ϵ)=log(dn) and by taking C η2=λ, we obtain under the assumption (U6) the following terms
A1=O(d1−λn)andA1=O(n1−λ′),for λ,λ′>0. |
Finally, for η large enough, we infer
P(T2>η√logdnnϕ(hK))<C(d1−λn+n1−λ′). |
Using the fact that
∞∑n=1d1−λn<∞and ∞∑n=11n1+λ′<∞, |
we get
T2=Oa.co.(√logdnnϕ(hK)). |
For the term T1 we have
T1≤CnhjKϕ(hK)supx∈SFn∑i=1Ki(x)1B(x,hK)(Xi)∣βji(x)−βji(xzx)1B(xzx,hK)(Xi)∣+CnhjKϕ(hK)supx∈SFn∑i=1βji(xzx)1B(xzx,hK)(Xi)∣Ki(x)1B(x,hK)(Xi)−Ki(xzx)∣:=Lj1+Lj2. |
Evaluation of the term Lj1. Under assumption (U3) we have
1B(x,hK)(Xi)∣βji(x)−βji(xzx)1B(xzx,hK)(Xi)∣≤Cιnhj−1K1B(x,hK)⋂B(xzx,hK)(Xi)+ChjK1B(x,hK)⋂¯B(xzx,hK)(Xi), |
which gives the following expression
Lj1≤CιnnhKϕ(hK)supx∈SFn∑i=1Ki(x)1B(x,hK)⋂B(xzx,hK)(Xi)+Cnϕ(hK)supx∈SFn∑i=1Ki(x)1B(x,hK)⋂¯B(xzx,hK)(Xi). | (A.22) |
Making use of the Lipschitz condition on K(⋅) and by hypotheses (U3) enables us to directly write
∣βji(xz(x))∣1B(xz(x),hK)(Xi)∣Ki(x)1B(x,hK)(Xi)−Ki(xz(x))∣≤ChjKιn1B(x,hK)⋂B(xz(x),hK)(Xi)+ChjKKi(xz(x))1¯B(x,hK)⋂B(xz(x),hK)(Xi). |
Therefore, we readily obtain
Lj2≤Cιnnϕ(hK)supx∈SFn∑i=11B(x,hK)⋂B(xz(x),hK)(Xi)+Cnϕ(hK)supx∈SFn∑i=1Ki(xz(x))1¯B(x,hK)⋂B(xz(x),hK)(Xi). |
Under assumption (U5) en by combined the last inequality with (A.22) implies that
T1≤Zinϕ(hK), |
where
Zi=CιnhKsupx∈SFn∑i=11B(x,hK)⋃B(xz(x),hK)(Xi). |
Using a similar approach to the one employed in the proof of (13.14), under the assumptions (U1), (U2) and (U6), we obtain
S2n=∣Cov(Zi,Zj)∣=O(nϕ(hK)). |
Next, by employing a similar proof to the one applied in the evaluation of T2, we infer that
T1=Oa.co.(√logdnnϕ(hK)). |
Concerning the term of T3, it is obvious that
T3<E(supx∈SF|unj(x)−unj(xz(x)))|), |
we infer that
T3=Oa.co.(√logdnnϕ(hK)). |
This completes the proof.
Proof of Lemma A.4. By conditioning on X1, we have
supx∈SF|E(v∗nj(x))|≤1|E(K1)||E((β1h−1K)jK1))|supX1∈B(x,hK)|r(X1)−r(x)|, |
so under assumption (U4) and by using the uniform version of Lemma A.1, we obtain
supx∈SF|E(v∗nj(x))|=O(hp+1K). |
Concerning the term supx∈SF|E(enj(x))|=O(1) is already treated in Eq (A.12).
Proof of Theorem 4.1. Using similar reasoning employed for the regression function, we show that
√nϕx(hK)(B∗n−BpL(x,y)D→N(0,VF(x,y)), | (A.23) |
where
B∗n=⊤(Υ∗n,0,…,Υ∗n,p),VF(x,y)=Fx(y)(1−Fx(y))a⊤Va) |
and
Υ∗n,j=1nE(K)n∑i=1(β(Xi,x)hK)jK(h−1K(δ(Xi,x))(L(h−1L(Yi−y))−Fx(y)). |
For any given vector of real numbers a=⊤(α0,…,αp)≠0, we have
√nϕx(hK)a⊤B∗n=1√nn∑i=1(Ri−E(Ri))+1√nn∑i=1E(Ri), |
where
Ri=√ϕx(hK)E(K1)(p∑l=0αlh−lKβli)Ki(Li−Fx(y)). | (A.24) |
According to the Cramér-Wold theorem and Eq (A.24), Eq (A.23), will be verified if we prove the following two statements:
1√nn∑i=1(Ri−E(Ri))D→N(0,Fx(y)(1−Fx(y))a⊤Va), | (A.25) |
1√nn∑i=1(E(Ri))=√nϕx(hK)BpL(x,y). | (A.26) |
Proof of statement A.25. We utilize Bernstein's big-block method, as employed in Theorem 3.1 of [55]. The set (1,2,…,n) is partitioned into 2υn+1 subsets, each containing large blocks of size (wn) and small blocks of size (qn), by considering
υ=⌊nwn+qn⌋. |
Assumption (M6) allows us to define the size of the large block as follows
wn=⌊(nϕx(hK))12rn⌋. |
Afterward, under the same assumptions and employing straightforward calculations, we obtain:
limn→+∞qnwn=0,limn→+∞wnn=0,limn→+∞wn√nϕx(hK)=0andlimn→+∞nwnα(qn)=0. | (A.27) |
It can be easily inferred that, as n tends to infinity,
υqnn≃(nrn+vn)qnn≃qnwn+qn≃qnwn=0. | (A.28) |
Presently, we partition the sum n∑i=1(Ri−E(Ri)) into distinct large and small blocks as outlined below. Let
Ij=(j−1)(w+q)+1,lj=(j−1)(w+q)+w+1,j=1,2,…,υ. |
One can see that
1√nn∑i=1(Ri−E(Ri))⏟Ei=υ∑j=1Fj√n+υ∑j=1F′j√n+Fn√n=:n−1/2(S1,n+S2,n+S3,n), |
where the random variables Fj,F′j and Fn are defined by
Fj=Ij+w−1∑i=IjEi,F′j=lj+q−1∑i=ljEi,Fn=n∑i=υ(w+q)+1Ei. | (A.29) |
Subsequently, making use of Slutsky's theorem, we establish that the two terms 1√nS2,n and 1√nS3,n converge to zero in probability. Before applying Lindeberg-Feller conditions to establish the asymptotic normality of the term 1√nS1,n, it is imperative to demonstrate that the variables Fj are asymptotically independent. Therefore, we obtain
E(S22,n)n=υ∑j=11nVar(F′j)+υ∑j=1υ∑i=1i≠j2nCov(F′j,F′i)=:1n(A1+A2). |
Using second-order stationarity, we get
Var(F′j)=qVar(E1)+2q∑i≠jCov(Ei,Ej). |
Therefore
A1n≤υqnVar(E1)⏟VF(x,y)+2nn∑i≠jCov(Ei,Ej). |
To derive the covariance term, we adopt the identical procedures outlined in the verification of Eq (A.2). Then we set
T1={(i,j)∈{1,2,…,n} such that 1≤∣i−j∣≤cn}, |
and
T2={(i,j)∈{1,2,…,n} such that cn+1≤∣i−j∣≤n−1}. |
Based on the aforementioned splitting, we obtain
2nn∑i≠jCov(Ei,Ej)=2n∑G1Cov(Ri,Rj)+2n∑G2Cov(Ri,Rj)=:U1,n+U2,n. |
Cov(Ri,Rj)=ϕx(hK)E2(K1)p∑l=0p∑m=0αlαmE(h−lKh−mKβliβmjKiKjE((Li−Fx(y)(Lj−Fx(y)∣(Xi,Xi)))−ϕx(hK)E2(K1)E(p∑l=0αlh−lKβljKi(Li−Fx(y)))2. |
Utilizing the inequality
∣Li(y)−Fx(y)∣≤1, | (A.30) |
stated in Lemma A.1 and given the conditions (H2-ⅱ) and (H4-ⅱ), we obtain the following
∣Cov(Ri,Rj)∣≤ϕx(hK)E2(K1)(p∑l=0p∑m=0∣αlαm∣E(KiKj)+(p∑l=0∣αl∣)2E2(Ki))≤ϕx(hK)E2(K1)(CP((Xi,Xj)∈B(x,hK)×B(x,hK))+C′M21ϕ2x(hK))≤Cϕx(hK)ψx(hK)ϕ2x(hK)+C′M21nϕx(hK). |
Then, we readily infer
∣U1,n∣≤(2Cn−1ϕx(hK)ψx(hK)ϕ2x(hK)+2C′n−1M21ϕx(hK))∑G1⏟ncn1. |
In relation to the summation over the set G2, the utilization of Theorem B.3 (ⅱ) gives
∑G2∣Cov(Ri,Rj)∣≤∑G2C(α|j−i|)1p(E|Ri|q)1q(E|Rj|r)1r. |
Conditioned on Xi, and using (A.30) along with assumption (M3), we obtain
E(|Ri|q)=√ϕx(hK)M1ϕx(hK)E(|Ki|qE(|Li−Fx(y)|q∣Xi))≤C(ϕx(hK))−12+1q. |
We readily infer that
∣U2,n∣≤2∑G2C(α|j−i|)1p(ϕx(hK))−1+1q+1r | (A.31) |
≤2c−δn(ϕx(hK))−1p∑|l|>cnlδ(α(|l|))1p. | (A.32) |
Using the fact that ψx(hK)ϕ2x(hK) is bounded, we select cn=⌊ϕx(hK)⌋−1pδ and by assumptions (M4), and (M5) we obtain
2nn∑i≠jCov(Ei,Ej)=o(1). |
Furthermore, for the term A2, we have
∣A2∣n≤∑1≤i<j≤n2nCov(F′j,F′i)=∑1≤i<j≤n2n∣Cov(Rj,Ri)∣⏟o(1). |
Consequently by (A.27), we get
E(S22,n)n=0,asn⟶∞. |
Let's now examine the term of S22,n
E(S23,n)n≤1nn∑i=υ(w+q)+1Var(E1)+2n∑1≤i<j≤n∣Cov(Ei,Ej)∣≤n−υ(w+q)nVar(E1)+2n∑1≤i<j≤n∣Cov(Ei,Ej)∣⏟o(1). |
Besides, from (A.27) and (A.28), we get υwn⟶1. Then, we obtain
E(S23,n)n=0,asn⟶∞. |
In order to demonstrate the asymptotic independence of the variables Fj, we employ Lemma B.1. By setting Vj=exp(itFj√n), the ensuing relationship
|E(exp(itS1,n√n))−υ∏j=1E(exp(itFj√n))|≤16υα(q)≅nwα(q)⟶0. | (A.33) |
Therefore, according to formula (A.33), the term variance S1,n is computed as follows:
Var(S1,n)=υwnVar(R1), |
where
Var(R1)=ϕx(hK)M21ϕ2x(hK)p∑l=0α2lh−2lKVar(βliKi(Li−Fx(y)). |
Using Lemma 6.3 of [14], we obtain
Var(βliKi(Li−Fx(y))=E(β2liK2i)Fx(y)(1−Fx(y)). |
Then, by using the fact that υwn⟶1 and under Lemma A.1, we get
Var(R1)=α20M2M21+p∑l=1α2lN(2,2l)M21Fx(y)(1−Fx(y))=Fx(y)(1−Fx(y))a⊤Va. |
Finally, our attention turns to Lindeberg's central limit theorem concerning Fj. It is then enough to demonstrate that for any ϵ>0,
1nυ∑j=1E[F2j1|Fj|>ϵ√nVF(x,y)]⟶0 as n⟶+∞. |
Under assumption (M2), we have
∣Fj∣√n≤p∑l=0∣αl∣w√nϕx(hK). |
According to (A.27), we obtain
∣Fj∣√n⟶0,asn⟶∞. |
Hence, for any given ϵ and when n is sufficiently large, the set {|Fj|>ϵ√nVF(x,y)} becomes empty. Consequently, the demonstration of (A.25) is now concluded.
Proof of statement A.26. We have
1√nn∑i=1(E(Ri))=√nE(R1)=√nϕx(hK)(α0E(K1),α1h−1KE(K1),…,αph−pKE(K1)).(E(K1L1)−Fx(y)E(K1)E(K1β1L1)−Fx(y)E(K1β1)⋮E(K1βp1L1)−Fx(y)E(K1βp1)⏟N(x,y)). |
Next, we compute the term N(x,y). Notice that
E(K1βp1L1)−Fx(y)E(K1βp1)=E(K1βp1((E(L1∣X1)−Fx(y))). |
Making use of the Taylor expansion under the assumptions (M2) and (M3), we obtain
E(L1∣X1)=p∑k=0φ2k(X1,y)h2kL(2k)!∫Rt2kL′(t)dt+o(h2pL). |
Therefore, we readily obtain
E(K1βp1((E(L1∣X1)−Fx(y)))=p∑k=0E(K1βp1φk(X1,y))h2kL(2k)!∫Rt2kL′(t)dt−φ0(x,y)E(K1βp1)+o(h2PLE(K1βp1)). | (A.34) |
Observe that ψk(0)=0, we get
E(K1βp1φk(X1,y))=φk(x,y)E(K1βp1)+p∑a=1ψ(a)k(0)a!E(K1βp+a1). | (A.35) |
By combining Eqs (A.34) and (A.35), we can derive
E(K1βp1((E(L1∣X1)−Fx(y)))=p∑k=0h2kL(2k)!∫RtkL′(t)dt[φ2k(x,y)E(K1βp1)+p∑a=1ψ(a)k(0)a!E(K1βp+a1)]−p∑k=0h2kL(2k)!∫RtkL′(t)dtφ0(x,y)E(K1βp1)]+o(h2pLE(K1βp1)). |
So the statement (A.26) is proved.
This appendix contains supplementary information that is an essential part of providing a more comprehensive understanding of the paper.
Lemma B.1. [82] Let V1,…,VL be random variables measurable with respect to the σ-algebras Fj1i1,…,FjLiL respectively with 1⩽i1<j1<i2<⋯<jL⩽n,il+1−jl⩾w⩾1 and |Vj|⩽1 for j=1,…, L. Then
|E(L∏j=1Vj)−L∏j=1E(Vj)|⩽16(L−1)α(w), |
where α(w) is the strongly mixing coefficient.
Theorem B.2. (Lindeberg central limit theorem). For each n≥1, let {Un1,…,Unrn} be a collection of independent random variables such that E(Unj)=0 and Var(Unj)<∞ for j=1,…,rn.
˜Unj=Unj√rn∑k=1VarUnk,j=1,…,rn. |
Then
rn∑j=1˜Unj→N(0,1) in distribution as n→∞, |
if for every ϵ>0
limn→∞rn∑j=1E|˜Unj|21(|˜Unj|>ϵ)=0. |
The following theorem is Proposition A.10. in [38].
Theorem B.3. Assume that (Tn)n∈Z is α-mixing. Let us, for some k∈Z, consider a real variable T (resp. T′) which is Ak−∞-measurable (resp. A+∞n+k-measurable).
(i) If T and T′ are bounded, then:
∃C,0<C<+∞,Cov(T,T′)≤Cα(n). |
(ii) If, for some positive numbers p,q,r such that p−1+q−1+r−1=1, we have ETp<∞ and ET′q<∞, then:
∃C,0<C<+∞,Cov(T,T′)≤C(ETp)1p(ET′q)1qα(n)1r. |
[1] | B. S. Everitt, Cluster Analysis, 3rd ed., Edward Arnold, London / Halsted Press, New York, 1993. ISBN 978-0340584798 / 978-0470220436 |
[2] | L. Kaufman, P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley Sons, New York, 1990. https://doi.org/10.1002/9780470316801 |
[3] | R. Dubes, A. Jain, Algorithms for Clustering Data, Prentice-Hall, Englewood Cliffs, NJ, 1988. ISBN 978-0-13-022278-7 |
[4] | A. Gersho, R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, Dordrecht, 1992. https://doi.org/10.1007/978-1-4615-3626-0 |
[5] |
M. R. Garey, D. S. Johnson, H. S. Witsenhausen, The complexity of the generalized Lloyd-Max problem, IEEE Trans. Inf. Theory, 28 (1982), 255-256. https://doi.org/10.1109/TIT.1982.1056488 doi: 10.1109/TIT.1982.1056488
![]() |
[6] |
J. H. Ward, Hierarchical grouping to optimize an objective function, J. Amer. Statist. Assoc., 58 (1963), 236-244. https://doi.org/10.1080/01621459.1963.10500845 doi: 10.1080/01621459.1963.10500845
![]() |
[7] |
W. H. Equitz, A new vector quantization clustering algorithm, IEEE Trans. Acoust. Speech, Signal Processing, 37 (1989), 1568-1575. https://doi.org/10.1109/29.35395 doi: 10.1109/29.35395
![]() |
[8] | P. Fränti, O. Virmajoki, T. Kaukoranta, Branch-and-bound technique for solving optimal clustering, Int. Conf. on Pattern Recognition (ICPR'02), Québec, Canada, 2 (2002), 232-235. https://doi.org/10.1109/ICPR.2002.1048281 |
[9] | P. Fränti, O. Virmajoki, Polynomial-time clustering algorithms derived from branch-and-bound technique, Advanced Concepts for Intelligent Vision Systems (ACIVS'2002), Gent, Belgium, (2002), 118-123. |
[10] |
W. L. G. Koontz, P. M. Narendra, K. Fukunaga, A branch and bound clustering algorithm, IEEE Trans. Comput., 24 (1975), 908-915. https://doi.org/10.1109/T-C.1975.224336 doi: 10.1109/T-C.1975.224336
![]() |
[11] |
C.-H. Cheng, A branch and bound clustering algorithm, IEEE Trans. SMC, 25 (1995), 895-898. https://doi.org/10.1109/21.376504 doi: 10.1109/21.376504
![]() |
[12] |
G. Palubeckis, A branch-and-bound approach using polyhedral results for a clustering problem, INFORMS Journal of Computing, 9 (1997), 30-42. https://doi.org/10.1287/ijoc.9.1.30 doi: 10.1287/ijoc.9.1.30
![]() |
[13] |
L. S. Iyer, J. E. Aronson, A parallel branch-and-bound method for cluster analysis, Ann. Oper. Res., 90 (1999), 65-86. https://doi.org/10.1023/A:1018925018009 doi: 10.1023/A:1018925018009
![]() |
[14] |
M. J. Brusco, A repetitive branch-and-bound procedure for minimum within-cluster sums of squares partitioning, Psychometrika, 71 (2006), 347-363. https://doi.org/10.1007/s11336-004-1218-1 doi: 10.1007/s11336-004-1218-1
![]() |
[15] | G. Kaminka, Repetitive branch-and-bound using constraint programming for constrained minimum sum-of-squares clustering, European Conf. on Artificial Intelligence, The Hague, The Netherlands - Including Prestigious Applications of Artificial Intelligence (PAIS 2016), 285 (2016), 462-470. IOS Press. https://doi.org/10.3233/978-1-61499-672-9-462 |
[16] |
J. Bennell, G. Scheithauer, Y. Stoyan, T. Romanova, A. Pankratov, Optimal clustering of a pair of irregular objects, J. Glob. Optim., 61 (2015), 497-524. https://doi.org/10.1007/s10898-014-0192-0 doi: 10.1007/s10898-014-0192-0
![]() |
[17] |
Y. Han, L. Zhu, Z. Cheng, J. Li, X. Liu, Discrete optimal graph clustering, IEEE Trans. Cybern., 50 (2018), 1697-1710. https://doi.org/10.1109/TCYB.2018.2881539 doi: 10.1109/TCYB.2018.2881539
![]() |
[18] |
S. Boluki, S. Z. Dadaneh, X. Qian, E. R. Dougherty, Optimal clustering with missing values, BMC bioinformatics, 20 (2019), 1-10. https://doi.org/10.1186/s12859-019-2832-3 doi: 10.1186/s12859-019-2832-3
![]() |
[19] |
T. Feder, D. Greene, Optimal algorithms for approximate clustering, ACM Symposium on Theory of Computing, (1988), 434-444. https://doi.org/10.1145/62212.62255 doi: 10.1145/62212.62255
![]() |
[20] |
R. R. Mettu, C. G. Plaxton, Optimal time bounds for approximate clustering, Machine Learning, 56 (2004), 35-60. https://doi.org/10.1023/B:MACH.0000033114.18632.e0 doi: 10.1023/B:MACH.0000033114.18632.e0
![]() |
[21] |
Z. Wu, R. Leahy, An optimal graph theoretic approach to data clustering: theory and its application to image segmentation, IEEE T. Pattern Anal., 15 (1993), 1101-1113. https://doi.org/10.1109/34.244673 doi: 10.1109/34.244673
![]() |
[22] |
L. Gao, A. L. Rosenberg, R. K. Sitaraman, Optimal clustering of tree-sweep computations for high-latency parallel environments, IEEE T. Parall. Distr., 10 (1999), 813-824. https://doi.org/10.1109/71.790599 doi: 10.1109/71.790599
![]() |
[23] |
X. Wu, Optimal quantization by matrix searching, J. Algorithms, 12 (1991), 663-673. https://doi.org/10.1016/0196-6774(91)90039-2 doi: 10.1016/0196-6774(91)90039-2
![]() |
[24] |
P. Fränti, T. Kaukoranta, D.-F. Shen, K.-S. Chang, Fast and memory efficient implementation of the exact PNN, IEEE Trans. Image Process., 9 (2000), 773-777. https://doi.org/10.1109/83.841516 doi: 10.1109/83.841516
![]() |
[25] | H. Späth, Cluster Analysis Algorithms for Data Reduction and Classification of Objects, Ellis Horwood Limited, West Sussex, 1980. ISBN 978-0853121411 |
[26] | R. L. Graham, D. E. Knuth, O. Patashnik, Concrete Mathematics – a Foundation for Computer Science, 2nd ed., Addison-Wesley, 1994. ISBN 978-0201558029 |
[27] |
T. Kaukoranta, P. Fränti, O. Nevalainen, Vector quantization by lazy pairwise nearest neighbor method, Optical Engineering, 38 (1999), 1862-1868. https://doi.org/10.1117/1.602251 doi: 10.1117/1.602251
![]() |
[28] |
Y. Linde, A. Buzo, R. M. Gray, An algorithm for vector quantizer design, IEEE Trans. on Comm., 28 (1980), 84-95. https://doi.org/10.1109/TCOM.1980.1094577 doi: 10.1109/TCOM.1980.1094577
![]() |
[29] |
P. Fränti, O. Virmajoki, Iterative shrinking method for clustering problems, Pattern Recogn., 39 (2006), 761-765. https://doi.org/10.1016/j.patcog.2005.09.012 doi: 10.1016/j.patcog.2005.09.012
![]() |
[30] |
T. F. Gonzalez, Clustering to minimize the maximum intercluster distance, Theor. Comput. Sci., 38 (1985), 293-306. https://doi.org/10.1016/0304-3975(85)90224-5 doi: 10.1016/0304-3975(85)90224-5
![]() |
[31] |
P. Fränti, S. Sami, How much can k-means be improved by using better initialization and repeats? Pattern Recogn., 93 (2019), 95-112. https://doi.org/10.1016/j.patcog.2019.04.014 doi: 10.1016/j.patcog.2019.04.014
![]() |
[32] |
P. Fränti, N. Teemu, M. Yuan, Converting MST to TSP path by branch elimination, Applied Sciences, 11 (2020), 177. https://doi.org/10.3390/app11010177 doi: 10.3390/app11010177
![]() |
[33] |
P. Fränti, Efficiency of random swap clustering, Journal of Big Data, 5 (2018) 1-29. https://doi.org/10.1186/s40537-018-0122-y doi: 10.1186/s40537-018-0122-y
![]() |
1. | Mir Sayed Shah Danish, Gabor Pinter, 2022, Chapter 10, 978-3-031-12957-5, 115, 10.1007/978-3-031-12958-2_10 | |
2. | Miraj Ahmed Bhuiyan, Nora Hegedusne Baranyai, 2022, Chapter 10, 978-3-031-13145-5, 117, 10.1007/978-3-031-13146-2_10 | |
3. | Santi Agatino Rizzo, Editorial to the 'Special Issue—Distribution network reliability in Smart Grids and Microgrids' of AIMS Energy, 2022, 10, 2333-8334, 533, 10.3934/energy.2022026 | |
4. | Nikita Makarichev, Tsangyao Chang, 2022, Chapter 9, 978-3-031-12957-5, 101, 10.1007/978-3-031-12958-2_9 | |
5. | Georgy Shilov, András Vincze, 2022, Chapter 12, 978-3-031-12957-5, 139, 10.1007/978-3-031-12958-2_12 | |
6. | Ivan Udalov, Almakul Abdimomynova, Svetlana Moldagulova, 2022, Chapter 12, 978-3-031-13145-5, 147, 10.1007/978-3-031-13146-2_12 | |
7. | Andrey Kraykin, Artur Meynkhard, Tomonobu Senjyu, 2022, Chapter 11, 978-3-031-13145-5, 131, 10.1007/978-3-031-13146-2_11 | |
8. | Konstantin Panasenko, Fi-John Chang, 2022, Chapter 9, 978-3-031-13145-5, 105, 10.1007/978-3-031-13146-2_9 | |
9. | Miraj Ahmed Bhuiyan, Elizaveta Ibragimova, 2022, Chapter 11, 978-3-031-12957-5, 127, 10.1007/978-3-031-12958-2_11 | |
10. | Lyailya Maratovna Mutaliyeva, Ulf Henning Richter, 2023, 978-1-80382-884-8, 1, 10.1108/978-1-80382-883-120231001 | |
11. | Khayrilla Abdurasulovich Kurbonov, Gabor Pinter, 2023, 978-1-80382-884-8, 31, 10.1108/978-1-80382-883-120231003 | |
12. | David Philippov, Tomonobu Senjyu, 2023, 978-1-80382-884-8, 177, 10.1108/978-1-80382-883-120231014 | |
13. | Raya Hojabaevna Karlibaeva, Anthony Nyangarika, 2023, 978-1-80382-884-8, 153, 10.1108/978-1-80382-883-120231012 | |
14. | Mahmoud Zadehbagheri, Ashraf Hemeida, 2024, Chapter 14, 978-3-031-51531-6, 173, 10.1007/978-3-031-51532-3_14 | |
15. | Iman Mahmoud, Emerson Guzzi Zuan Esteves, 2024, Chapter 11, 978-3-031-51531-6, 135, 10.1007/978-3-031-51532-3_11 | |
16. | Alexey Mikhaylov, 2024, Chapter 4, 978-3-031-51531-6, 39, 10.1007/978-3-031-51532-3_4 | |
17. | Solomon Eghosa Uhunamure, Tsangyao Chang, 2024, Chapter 7, 978-3-031-51531-6, 85, 10.1007/978-3-031-51532-3_7 | |
18. | Laura M. Baitenova, Lyailya M. Mutaliyeva, Fi-John Chang, 2023, Chapter 2, 978-3-031-26595-2, 13, 10.1007/978-3-031-26596-9_2 | |
19. | Mikhail Dorofeev, Hooi Hooi Lean, 2024, Chapter 20, 978-3-031-51531-6, 245, 10.1007/978-3-031-51532-3_20 | |
20. | Mikhail Dorofeev, Kanato Tamashiro, 2023, Chapter 14, 978-3-031-26595-2, 165, 10.1007/978-3-031-26596-9_14 | |
21. | Solomon Eghosa Uhunamure, Tsangyao Chang, 2023, Chapter 10, 978-3-031-26595-2, 115, 10.1007/978-3-031-26596-9_10 | |
22. | Md. Mominur Rahman, Rizwan Ullah Khan, 2024, Chapter 18, 978-3-031-51531-6, 221, 10.1007/978-3-031-51532-3_18 | |
23. | Mikhail Dorofeev, Vikas Khare, 2024, Chapter 19, 978-3-031-51531-6, 233, 10.1007/978-3-031-51532-3_19 | |
24. | Mir Sayed Shah Danish, Emerson Guzzi Zuan Esteves, 2023, Chapter 12, 978-3-031-26595-2, 141, 10.1007/978-3-031-26596-9_12 | |
25. | Solomon Eghosa Uhunamure, Abderrahmen Bouchenine, 2024, Chapter 23, 978-3-031-51531-6, 283, 10.1007/978-3-031-51532-3_23 | |
26. | Md. Mominur Rahman, 2024, Chapter 17, 978-3-031-51531-6, 209, 10.1007/978-3-031-51532-3_17 | |
27. | Kanato Tamashiro, Raya Karlibaeva, Diana Stepanova, 2024, Chapter 12, 978-3-031-51531-6, 147, 10.1007/978-3-031-51532-3_12 | |
28. | Tomonobu Sengyu, Vikas Khare, 2023, Chapter 8, 978-3-031-26595-2, 87, 10.1007/978-3-031-26596-9_8 | |
29. | Alexey Mikhaylov, 2024, Chapter 9, 978-3-031-51531-6, 111, 10.1007/978-3-031-51532-3_9 | |
30. | Laura M. Baitenova, Lyailya M. Mutaliyeva, Tarek Ismail Mohamed, 2024, Chapter 2, 978-3-031-51531-6, 13, 10.1007/978-3-031-51532-3_2 | |
31. | Yulia Budaeva, David Philippov, Tsangyao Chang, 2023, Chapter 5, 978-3-031-26595-2, 47, 10.1007/978-3-031-26596-9_5 |