
The main motive of this work is to introduce a numerical investigation for the one and two-dimensional (1D/2D) Chaplygin gas model. Namely, we developed the non homogeneous Riemann solver (NHRS) method to solve these models. After discussing the Chaplygin gas models and the numerical scheme, various 1D and 2D test problems are introduced. In order to complete the numerical investigation in a completely unified way, Rusanov scheme, modified Lax-Friedrichs and analytical solution are compared with NHRS scheme in 1D case. The acquired results clarify the high resolution of the NHRS technique. The NHRS technique is efficacious and robust. Finally, our study displays that the NHRS scheme is a very powerful tool to solve many other models arising in applied science.
Citation: Kamel Mohamed, Hanan A. Alkhidhr, Mahmoud A. E. Abdelrahman. The NHRS scheme for the Chaplygin gas model in one and two dimensions[J]. AIMS Mathematics, 2022, 7(10): 17785-17801. doi: 10.3934/math.2022979
[1] | Alessandro Linzi . Polygroup objects in regular categories. AIMS Mathematics, 2024, 9(5): 11247-11277. doi: 10.3934/math.2024552 |
[2] | Lingling Tan, Tiwei Zhao . Extension-closed subcategories in extriangulated categories. AIMS Mathematics, 2022, 7(5): 8250-8262. doi: 10.3934/math.2022460 |
[3] | Limin Liu, Hongjin Liu . Consistent pairs of s-torsion pairs in extriangulated categories with negative first extensions. AIMS Mathematics, 2024, 9(1): 1494-1508. doi: 10.3934/math.2024073 |
[4] | Haijun Cao, Fang Xiao . The category of affine algebraic regular monoids. AIMS Mathematics, 2022, 7(2): 2666-2679. doi: 10.3934/math.2022150 |
[5] | Zhen Zhang, Shance Wang . Silting objects and recollements of extriangulated categories. AIMS Mathematics, 2024, 9(9): 24796-24809. doi: 10.3934/math.20241207 |
[6] | Sana Khadim, Muhammad Qasim . Quotient reflective subcategories of the category of bounded uniform filter spaces. AIMS Mathematics, 2022, 7(9): 16632-16648. doi: 10.3934/math.2022911 |
[7] | Ahmed Ramadan, Anwar Fawakhreh, Enas Elkordy . Novel categorical relations between L-fuzzy co-topologies and L-fuzzy ideals. AIMS Mathematics, 2024, 9(8): 20572-20587. doi: 10.3934/math.2024999 |
[8] | Tareq M. Al-shami, Salem Saleh, Alaa M. Abd El-latif, Abdelwaheb Mhemdi . Novel categories of spaces in the frame of fuzzy soft topologies. AIMS Mathematics, 2024, 9(3): 6305-6320. doi: 10.3934/math.2024307 |
[9] | Luca Giorgetti, Wei Yuan, XuRui Zhao . Separable algebras in multitensor C∗-categories are unitarizable. AIMS Mathematics, 2024, 9(5): 11320-11334. doi: 10.3934/math.2024555 |
[10] | Wenxia Wu, Yunnan Li . Classification of irreducible based modules over the complex representation ring of S4. AIMS Mathematics, 2024, 9(7): 19859-19887. doi: 10.3934/math.2024970 |
The main motive of this work is to introduce a numerical investigation for the one and two-dimensional (1D/2D) Chaplygin gas model. Namely, we developed the non homogeneous Riemann solver (NHRS) method to solve these models. After discussing the Chaplygin gas models and the numerical scheme, various 1D and 2D test problems are introduced. In order to complete the numerical investigation in a completely unified way, Rusanov scheme, modified Lax-Friedrichs and analytical solution are compared with NHRS scheme in 1D case. The acquired results clarify the high resolution of the NHRS technique. The NHRS technique is efficacious and robust. Finally, our study displays that the NHRS scheme is a very powerful tool to solve many other models arising in applied science.
Random forests are a class of non-parametric statistic machine learning algorithms used for regression and classification tasks. Random forest algorithms have the capability to perform sparse tasks with high accuracy in high dimensions, avoiding overfitting. In particular, random forests are considered to be among the most accurate learning algorithm classes for general tasks. They are routinely used in many fields including bio-informatics [15], economics [30], biology [10], and linguistics [18].
The most widely used random forest algorithm was introduced by Breiman [13], who was inspired by the work on random subspaces of Ho [19], and geometrical feature selection of Amit and Geman [2]. In Breiman's random forest, the trees are grown based on the classification and regression trees (CART) procedure, where both splitting directions and training sets are randomized. However, despite the few parameters that need to be tuned [16,21], their mathematical properties are still areas of active research [6,22]. A significant distinction among the class of random forest algorithms consists in the way each individual tree is constructed, and, in particular, the dependence of each tree on the data set. Some of the researchers consider random forests designed independently from the data set [7,14,25].
In 2012, Biau [5] studied a random forest model proposed by Breiman, where the construction is independent of the data set, called in literature centered random forest. In [5] an upper bound on the rate of consistency of the algorithm and its adaption to sparsity were proven. More precisely, about the first item, for a data set of n samples in a space of dimension d, the convergence rate was O(n−1d43log2+1). In 2021, Klusowski [20] improved the rate of convergence to O((nlogd−12n)−(1+δdlog2+1)), where δ is a positive constant that depends on the dimension of the feature space d and converges to zero as d approaches infinity. In addition, in the same paper, Klusowski proved that the rate of convergence of the algorithm is sharp, although it fails to reach the minimax rate of consistency over the class of the Lipschitz functions O(n−2d+2) [29]. There is also important work on the consistency of algorithms that depend on data [23,27,28]. For a comprehensive overview of both theoretical and practical aspects of the random forests, see [8], which surveys the subject up to 2016.
An important tool for algorithmically manipulating random forests is through kernel methods. Breiman [11] observed the connection between kernel theory and random forests by showing the equivalence between tree construction and kernel action. Later this was formalized by Geurts et al. in [17]. In the same direction Scornet in [26] defined Kernel Random Forest (KeRF) by modifying the original algorithm, and providing theoretical and practical results. In particular, in his important work, Scornet provided explicit kernels for some generalizations of algorithms, their rate of consistency, and comparisons with the corresponding random forests. Furthermore, Arnould et al. [3] investigated the trade-off between interpolation of several random forest algorithms and their consistency results.
In the first part of the paper, we provide the notation and the definitions of the centered and uniform random forests and their corresponding kernel-related formulations. In addition, we improve the rate of consistency for the centered KeRF algorithm. Let k≥1 be the depth of the trees used to estimate the target variable Y (see Section 2 for definitions and notation).
Theorem 1. Suppose that X=(X1,…,Xd) and Y are related by Y=m(X)+ϵ where: ϵ is a zero mean Gaussian noise with finite variance independent of X, X is uniformly distributed in [0,1]d, and m is a regression function, which we assume to be Lipschitz. Then, there exists a constant ˜C such that, for every n>1 and for every x∈[0,1]d,
E(˜mCen∞,n(x)−m(x))2≤˜Cn−(11+dlog2)(logn). |
Here, m(x)=E[Y|X=x] is the predicted value of Y for X=x∈[0,1]d, while ˜mCen∞,n(x) is the estimate for m provided by the kernel version of the centered random forest algorithm.
Similarly, with ˜mUn∞,n(x) playing for the uniform KeRF algorithm the role ˜mCen∞,n(x) had above, we have:
Theorem 2. Let X, Y, m, and ϵ be as in Theorem 1, with Y=m(X)+ϵ. Then there exists a constant ˜C such that for every n>1, for every x∈[0,1]d
E(˜mUn∞,n(x)−m(x))2≤˜Cn−(11+32dlog2)(logn). |
Moreover, in Section 4, we provide numerical examples and experiments concerning the tuning parameter k, which is the tree depth of the two kernel-based random forest algorithms, by comparing the L2-error for different values and under specific assumptions on the data set.
In the final part of the article, we consider the reproducing kernel K used in the centered KeRF algorithm per se. It is rewarding looking at it as defined on the finite Abelian group Zkd2, where, as above, d is the dimension of the vector X and k is the depth of the tree. By using elementary Fourier analysis on groups, we obtain several equivalent expressions for K and its group transform, we characterize the functions belonging to the corresponding Reproducing Kernel Hilbert Space (RKHS) HK, we derive results on multipliers, and we obtain bounds for the dimension of HK, which is much smaller than what one might expect.
A usual problem in machine learning is, based on n observations of a random vector (X,Y)∈X×R⊆Rd×R, to estimate the function m(x)=E(Y|X=x). In classification problems, Y ranges over a finite set. In particular we assume that we are given a training sample Dn={(X1,Y1),...,(Xn,Yn)} of independent random variables, where Xi∈[0,1]d for every i=1,...,n and Y∈R with a shared joint distribution PX,Y. The goal is using the data set to construct an estimate mn:X⊆[0,1]d→R of the function m. Our convergence rate requires an a priori assumption on the regularity of the function m. Following [26], we suppose that m belongs to the class of L Lipschitz functions,
|m(x)−m(x′)|≤L⋅‖x−x′‖. |
Here, as is [26], we consider on Rd the distance
‖x−x′‖=d∑j=1|xj−x′j|. |
Next, we provide the general random forest framework by defining firstly the notion of a random tree. Additionally, we present two specific variations of the original random forest algorithm, namely, the centered and uniform random forest algorithms.
Let's assume Θi for i=1,...,M is a collection of independent random variables, distributed as Θ. The random variables Θi correspond to sample the training set or select the positions for splitting. The detailed construction in the case of the centered random forest is performed in Appendix.
Definition 1. For the j-th tree in the forest, the predicted value x will be denoted by
mn,Θj,Dn(x)=n∑i=11Xi∈An,Θj,Dn(x)YiNn,Θj,Dn(x). |
Where An,Θj,Dn(x) is the cell containing x and Nn,Θj,Dn(x) is the number of points that fall into the cell that x belongs to.
For a fixed value of x∈[0,1]d, the value of the tree is the empirical expectation of Y in the unique cell containing x; which is, this is the hope, a good guess for the target value corresponding to x.
A random forest is a finite collection (average) of independent, finite random trees {Θ1,…,ΘM}.
Definition 2. The finite M forest is
mM,n(x)=1MM∑j=1mn,Θj,Dn(x). |
From a modeling point of view, we let M→∞ and consider the infinite forest estimate
m∞,n,Dn(x)=EΘ(mn,Θ,Dn(x)). |
The convergence holds almost surely by the law of the large numbers conditionally on Dn [12] and [25, Theorem 3.1].
In 2016, Scornet [26] introduced kernel methods in the random forest world (KeRF), producing a kernel-based algorithm, together with estimates on how this compares with the old one, described above.
To understand the intuition behind KeRF construction, we reformulate the random forest algorithm.
For all x∈[0,1]d,
mM,n(x)=1MM∑j=1(n∑i=11Xi∈An,Θj,Dn(x)YiNn,Θj,Dn(x)). |
Therefore we can define the weights of every observation Yi as
Wi,j,n(x)=1Xi∈An,Θj,Dn(x)Nn,Θj,Dn(x). |
Hence it is clear that the value of weights changes significantly concerning the number of points in each cell. A way to overcome this nuisance is by simultaneously considering all tree cells containing x, as the tree is randomly picked in the forest.
For all x∈[0,1]d,
˜mM,n,Θ1,Θ2,...,ΘM(x)=1∑Mj=1Nn,Θj(x)M∑j=1n∑i=1Yi1Xi∈An,Θj(x). |
This way, empty cells do not affect the computation of the prediction function of the algorithm.
It is proven in [26], that this representation has indeed a kernel representation.
Proposition 1. [26, Proposition 1] For all x∈[0,1]d almost surely, it holds
˜mM,n,Θ1,Θ2,...,ΘM(x)=∑ni=1KM,n(x,Xi)Yi∑ni=1KM,n(x,Xi), |
where
KM,n(x,z)=1MM∑i=11x∈An,Θi,Dn(z) |
is the proximity function of the M forest.
The infinite random forest arises as the number of trees tends to infinity.
Definition 3. The infinite KeRF is defined as:
˜m∞,n(x)=limM→∞˜mM,n(x,Θ1,Θ2,...,ΘM). |
The extension of the kernel follows also in the infinite random forest.
Proposition 2. [26, Proposition 2] Almost surely for all x,y∈[0.1]d
limM→∞KM,n(x,y)=Kn(x,y), |
where
Kn(x,y)=PΘ(x∈An(y,Θ)), |
where the left-hand side is the probability that x and y belong to the same cell in the infinite forest.
In this paper, we say that an estimator function mn of m is consistent if the following L2−type of convergence holds,
E(mn(x)−m(x))2→0, |
as n→∞.
In the centered and uniform forest algorithms, the way the data set Dn is partitioned is independent of the data set itself.
The centered forest is designed as follows:
1) Fix k∈N.
2) At each node of each individual tree choose a coordinate uniformly from {1,2,..d}.
3) Split the node at the midpoint of the interval of the selected coordinate.
Repeat steps 2) and 3) k times. At the end, we have 2k leaves, or cells. A toy example of this iterative process for k=1,2 is in Figures 1 and 2. Our estimation at a point x is achieved by averaging the Yi corresponding to the Xi in the cell containing x.
Scornet in [26] introduced the corresponding kernel-based centered random forest providing explicitly the proximity kernel function.
Proposition 3. A centered random forest kernel with k∈N parameter has the following multinomial expression [26, Proposition 5],
KCenk(x,z)=∑∑dj=1kj=kk!k1!...kd!(1d)kd∏j=11⌈2kjxj⌉=⌈2kjzj⌉, |
where KCenk is the kernel of the corresponding centered random forest.
Uniform random forest was introduced by Biau et al. [7] and is another toy model of Breinman's random forest as a centered random forest. The algorithm forms a partition in [0,1]d as follows:
1) Fix k∈N.
2) At each node of each individual tree choose a coordinate uniformly from {1,2,..d}.
3) The splitting is performed uniformly on the side of the cell of the selected coordinate.
Repeat steps 2) and 3) k times. At the end, we have 2k leaves. Our final estimation at a point x is achieved by averaging the Yi corresponding to the Xi in the cell x.
Again Scornet in [26, Proposition 6] proved the corresponding kernel-based uniform random forest.
Proposition 4. The corresponding proximity kernel for the uniform KeRF for parameter k∈N and x∈[0,1]d has the following form:
KUnk(0,x)=∑∑dj=1kj=kk!k1!...kd!(1d)kd∏m=1(1−xmkm−1∑j=0(−lnxm)jj!) |
with the convention that
−1∑j=0(−lnxm)jj!=0 |
and by continuity we can extend the kernel also for zero components of the vector.
Unfortunately, it is very hard to obtain a general formula for KUn(x,y) but we consider instead a translation invariant KeRF uniform forest:
Un∞,n(x)=∑ni=1YiKUnk(0,|Xi−x|)∑ni=1KUnk(0,|Xi−x|). |
In this section, after providing some measure concentration type results [9], we improve the rate of consistency of the centered KeRF algorithm. The following lemmata will provide inequalities to derive upper bounds for averages of iid random variables. Lacking a reference, for completeness, we provide detailed proofs of these lemmata. Moreover, we assume for this article that all random variables are real-valued and ||X||Lp:=(E|X|p)1p and ||X||∞:=inf{t:P(|X|≤t)=1}.
Lemma 1. Let X1,...,Xn be a sequence of real independent and identically distributed random variables with E(Xi)=0. Assuming also that there is a uniform bound for the L1-norm and the supremum norm i.e., E(|Xi|)≤C, ||Xi||∞≤CM for every i=1,...,n. Then for every t∈(0,1)
P({|∑ni=1Xi|n≥t})≤2e−˜CCt2nM |
for some positive constant ˜CC that depends only on C.
Proof. ∀x∈[0,1] one has that ex≤1+x+x2. By using the hypothesis for every λ≤1CM,
eλXi≤1+λXi+(λXi)2⇒EeλXi≤1+λ2E(Xi)2≤1+λ2||Xi||1||Xi||∞≤1+λ2C2M≤eλ2C2M. |
By the independence of the random variables Xi,
Ee∑ni=1λXi=n∏i=1EeλXi≤n∏i=1eλ2C2M=enλ2C2M. |
Therefore, by Markov inequality
P({∑ni=1Xin≥t})≤e−λtnEe∑ni=1λXi≤e−λtnenλ2C2M=enλ2C2M−λtn. |
Finally if C≥14 we choose, λ=t2C2M, otherwise for λ=t16CM
P({∑ni=1Xin≥t})≤e−˜CCt2nM. |
By replacing Xi with −Xi we conclude the proof.
Lemma 2. Let X1,...,Xn be a non-negative sequence of independent and identically distributed random variables with E(Xi)≤2, ||Xi||∞≤M for every i=1,...,n. Let also a sequence of independent random variables ϵi following normal distribution with zero mean and finite variance σ2, for every i=1,...,n. We assume also that ϵi are independent from Xi for every i=1,...,n.
Then for every t∈(0,1),
P(1nn∑i=1|ϵiXi|≥t)≤2exp(−Ct2nM) |
with the positive constant C depending only on σ.
Proof.
P(1nn∑i=1ϵiXi≥t)=P(exp(λnn∑i=1ϵiXi≥exp(λt))(for a positive λ)≤exp(−λt)Eexp(λnn∑i=1ϵiXi)(by Chebyshev's inequality)=exp(−λt)n∏i=1Eexp(λnϵiXi)(by independence)=exp(−λt)n∏i=1(1+∞∑k=2λkEXkiEϵkinkk!)≤exp(−λt)n∏i=1(1+2M∞∑k=2λkMkEϵkinkk!)=exp(−λt)n∏i=1(1+2M(Eexp(λMnϵi)−1))≤exp(−λt)n∏i=1(1+2M(exp(λ2σ2M2n2)−1))=exp(−λt)exp(n∑i=1(log(1+2M(exp(λ2σ2M2n2)−1))))≤exp(−λt)exp(n∑i=12M(exp(λ2σ2M2n2)−1))≤exp(−λt)exp(2nM(exp(λ2σ2M2n2)−1))≤exp(−λt)exp(2nM(2λ2σ2M2n2))(forλ≤nσM)=exp(−λt+4Mnλ2σ2). |
Finally, we select λ=tn8Mσ2, when σ≥18 and λ=tnMσ, when σ≤18
P(1nn∑i=1ϵiXi≥t)≤exp(−Ct2nM). |
Replacing Xi with −Xi we conclude the proof.
Theorem 3.1. Y=m(X)+ϵ where ϵ is a zero mean Gaussian noise with finite variance independent of X. Assuming also that X is uniformly distributed in [0,1]d and m is a Lipschitz function. Then there exists exists a constant ˜C such that for every n>1, for every x∈[0,1]d
E(˜mCen∞,n(x)−m(x))2≤˜Cn−(11+dlog2)(logn). |
Proof. Following the notation in [26], let x∈[0,1]d, ‖m‖∞=supx∈[0,1]d|m(x)|, and by the construction of the algorithm
˜mCenn,∞(x)=∑ni=1YiKk(x,Xi)∑ni=1Kk(x,Xi). |
Let
An(x)=1nn∑i=1(YiKk(x,Xi)−E(YKk(x,X))E(Kk(x,X))), |
Bn(x)=1nn∑i=1(Kk(x,Xi)−E(Kk(x,X))E(Kk(x,X))), |
and
Mn(x)=E(YKk(x,X))E(Kk(x,X)). |
Hence, we can reformulate the estimator as
˜mCenn,∞(x)=Mn(x)+An(x)Bn(x)+1. |
Let t∈(0,12) and the event Ct(x) where {An(x),Bn(x)≤t}.
E(˜mccn,∞(x)−m(x))2=E(˜mccn,∞(x)−m(x))21Ct(x)+E(˜mccn,∞(x)−m(x))21Cct(x)≤E(˜mccn,∞(x)−m(x))21Cct(x)+c1(1−12d)2k+c2t2 |
where the last inequality was obtained in [26, p.1496]. Moreover, in [26],
E(˜mccn,∞(x)−m(x))21Cct(x)≤c3(logn)(P(Cct(x)))12. |
In order to find the rate of consistency we need a bound for the probability P(Cct(x)). Obviously,
P(Cct(x))≤P(|An(x)|>t)+P(|Bn(x)|>t). |
We will work separately to obtain an upper bound for both probabilities.
Proposition 5. Let
˜Xi=Kk(x,Xi)E(Kk(x,X))−1, |
a sequence of iid random variables. Then for any t∈(0,1),
P({|∑ni=1˜Xi|n≥t})=P(|Bn(x)|≥t)≤2e−˜C1t2n2k |
for some positive constant ˜C1.
Proof. It is easy to verify that E˜Xi=0, and
|˜Xi|=|Kk(x,Xi)E(Kk(x,X))−1|≤Kk(x,Xi)E(Kk(x,X))+1, |
hence, E|˜Xi|≤2.
Finally,
||˜Xi||∞=sup{|˜Xi|}=sup{|Kk(x,Xi)E(Kk(x,X))−1|}≤1E(Kk(x,X))supKk(x,Xi)+1≤2k+1≤2k+1. |
By Lemma 1,
P({|n∑i=1˜Xi|n≥t})=P(|Bn(x)|≥t)≤2e−˜C1t2n2k. |
We need a bound for P(|An(x)|>t) where,
An(x)=1nn∑i=1(YiKk(x,Xi)−E(YKk(x,X))E(Kk(x,X))). |
Proposition 6. Let
~Zi=YiKk(x,Xi)−E(YKk(x,X))E(Kk(x,X)) |
for i=1,...,n, then for every t∈(0,1),
P({|∑ni=1˜Zi|n≥t})=P(|An(x)|≥t)≤4e−Ct2n2k, |
for some constant C depending only on σ,‖m‖∞.
Proof.
An(x)=1nn∑i=1(YiKk(x,Xi)−E(YKk(x,X))E(Kk(x,X)))=1nn∑i=1(m(Xi)Kk(x,Xi)−E(m(X)Kk(x,X))E(Kk(x,X)))+1nn∑i=1(ϵiKk(x,Xi)−E(ϵKk(x,X))E(Kk(x,X)))=1nn∑i=1(m(Xi)Kk(x,Xi)−E(m(X)Kk(x,X))E(Kk(x,X)))+1nn∑i=1(ϵiKk(x,Xi)E(Kk(x,X))). |
Therefore,
P(|An(x)|≥t)≤P(|2nn∑i=1m(Xi)Kk(x,Xi)−E(m(X)Kk(x,X))E(Kk(x,X))|≥t)+P(|2nn∑i=1ϵiKk(x,Xi)E(Kk(x,X))|≥t). |
Let
Zi=2(m(Xi)Kk(x,Xi)−E(m(X)Kk(x,X)))E(Kk(x,X)), |
a sequence of iid random variables. It is easy to verify that ~Zi are centered and
|˜Zi|=|m(Xi)Kk(x,Xi)−E(m(X)Kk(x,X))E(Kk(x,X))|≤2||m||∞Kk(x,Xi)+E(Kk(x,X))E(Kk(x,X)). |
Hence, E|Zi|≤4||m||∞. Finally,
||Zi||∞=sup{|Zi|}=2sup{|m(Xi)Kk(x,Xi)−E(m(X)Kk(x,X))E(Kk(x,X))|}≤2||m||∞(2k+1)≤4||m||∞2k. |
By Lemma 1,
P(|1nn∑i=1m(Xi)Kk(x,Xi)−E(m(X)Kk(x,X))E(Kk(x,X))|≥t)≤2e−Cnt22k. |
Furthermore, let
˜Wi=2ϵiKk(x,Xi)E(Kk(x,X)) |
for i=1,...,n, a sequence of independent and identically distributed random variables. We can verify that for every for i=1,...,n:
E(2Kk(x,Xi)E(Kk(x,X)))≤2. |
Finally,
sup{|2Kk(x,Xi)E(Kk(x,X))|}≤2E(Kk(x,X))sup{Kk(x,Xi)}≤2k+1. |
By Lemma 2 it is clear,
P(|2nn∑i=1ϵiKk(x,Xi)E(Kk(x,X))|≥t)≤2e−C2nt22k. |
We conclude the proposition by observing
P(|An(x)|≥t)≤4e−min{C2,C}nt22k. |
Finally, let us compute the rate of consistency of the algorithm-centered KeRF. By Propositions 5 and 6, one has that
(P(Cct(x)))12≤(P(|An(x)|>t)+P(|Bn(x)|>t))12≤c3e−c4nt22k, |
for some constants c3,c4 independent of k and n.
Thus,
E(˜m∞,n−m(x))2≤c1(1−12d)2k+c2t2+c3logne−c4t2n2k. |
We compute the minimum of the right-hand side of the inequality for t∈(0,1),
2c2t−2tc4lognc3n2ke−c4t2n2k=0⇒e−c4t2n2k=c2c3c42knlognandt2=1c42knlog(c2c3c4nlogn2k). |
Hence, the inequality becomes
E(˜m∞,n−m(x))2≤c1(1−12d)2k+c21c42knlog(c2c3c4nlogn2k)+c3lognc2c3c42knlogn=c1(1−12d)2k+c21c42knlog(c2c3c4nlogn2kec2c4). |
For every ϵn∈(0,2] it holds, logx≤1ϵnxϵn. Then one has that
E(˜m∞,n−m(x))2≤c1(1−12d)2k+c2(ec2c4c2c3c4)nc4ϵn(2kn(logn)ϵn1−ϵn)1−ϵn. |
We pick
k=c(d)log2n(logn)ϵn1−ϵn, |
thus,
c2(ec2c4c2c3c4)nc4ϵn(2kn(logn)ϵn1−ϵn)1−ϵn≤c′ϵnn(c(d)−1)(1−ϵn)lognϵn(1−c(d)), |
for a constant c′ independent of n and,
c1(1−12d)2k=c1(1−12d)2(c(d)log2n(logn)ϵn1−ϵn)=c122c(d)log2(1−12d)log2n(logn)ϵn1−ϵn=c1n2c(d)log2(1−12d)1(logn)c(d)2ϵn1−ϵnlog2(1−12d). |
Therefore,
c(d)=ϵn−12log2(1−12d)−(1−ϵn). |
Finally,
c1n2c(d)log2(1−12d)1(logn)c(d)2ϵn1−ϵnlog2(1−12d)=c1n2(ϵn−1)2log2(1−12d)−(1−ϵn)log2(1−12d)×1(logn)2(ϵn−1)2log2(1−12d)−(1−ϵn)2ϵn1−ϵnlog2(1−12d)=c1n2(ϵn−1)2(−12dlog2)−(1−ϵn)(−12dlog2)×1(logn)2(ϵn−1)2log2(1−12d)−(1−ϵn)2ϵn1−ϵnlog2(1−12d)=c1n−(1−ϵn1+(1−ϵn)dlog2)(logn)(ϵn1+dlog2(1−ϵn)) |
and, for the second term, with the same arguments
˜cϵnn(c(d)−1)(1−ϵn)lognϵn(1−c(d))=˜cϵnn−(1−ϵn1+(1−ϵn)dlog2)(logn)(ϵn1+dlog2(1−ϵn)) |
for a constant ˜c independent of ϵn, hence,
E(˜mCen∞,n(x)−m(x))2≤Cϵnn−(1−ϵn1+(1−ϵn)dlog2)(logn)(ϵn1+dlog2(1−ϵn)), |
and consequently,
Cϵnn−(1−ϵn1+(1−ϵn)dlog2)(logn)(ϵn1+dlog2(1−ϵn))=Cϵnn−(11+dlog2)(logn)(ϵn1+dlog2(1−ϵn))×n(ϵn(1+dlog2)(1+(1−ϵn))dlog2)≤Cϵnn−(11+dlog2)(logn)(ϵndlog2(1−ϵn))×(logn)lognloglogn(ϵn(dlog2)2(1−ϵn)). |
Finally we finish the proof by selecting ϵn=1logn, and
E(˜mCen∞,n(x)−m(x))2≤˜Cn−(11+dlog2)(logn). |
Theorem 4. Y=m(X)+ϵ where ϵ is a zero mean Gaussian noise with finite variance independent of X. Assuming also that X is uniformly distributed in [0,1]d and m is a Lipschitz function. Providing k→∞, there exists a constant ˜C such that for every n>1, for every x∈[0,1]d
E(˜mUn∞,n(x)−m(x))2≤˜Cn−(11+32dlog2)(logn). |
Proof. By arguing with the same reasoning as the proof of the centered random forest we can verify that
(P(Cct(x)))12≤(P(|An(x)|>t)+P(|Bn(x)|>t))12≤c3e−c4nt22k |
for some constants c3,c4 independent of k and n. The rate of consistency for the Uniform KeRF is the minimum of the right hand in the inequality in terms of n
E(˜mUn∞,n−m(x))2≤c1(1−13d)2k+c2t2+c3logne−c4t2n2k. |
We compute the minimum of the right-hand side of the inequality for t∈(0,1),
2c2t−2tc4lognc3n2ke−c4t2n2k=0⇒e−c4t2n2k=c2c3c42knlognandt2=1c42knlog(c2c3c4nlogn2k). |
Hence, the inequality becomes
E(˜mUn∞,n(x)−m(x))2≤c1(1−13d)2k+c21c42knlog(c2c3c4nlogn2k)+c3lognc2c3c42knlogn=c1(1−13d)2k+c21c42knlog(c2c3c4nlogn2kec2c4). |
For every ϵn∈(0,2] it holds, logx≤1ϵnxϵn. Then one has that,
E(˜mUn∞,n−m(x))2≤c1(1−13d)2k+c2(ec2c4c2c3c4)nc4ϵn(2kn(logn)ϵn1−ϵn)1−ϵn. |
We pick
k=c(d)log2n(logn)ϵn1−ϵn. |
Therefore,
c2(ec2c4c2c3c4)nc4ϵn(2kn(logn)ϵn1−ϵn)1−ϵn≤c′ϵnn(c(d)−1)(1−ϵn)lognϵn(1−c(d)), |
for a constant c′ independent of n and,
c1(1−13d)2k=c1(1−13d)2c(d)log2n(logn)ϵn1−ϵn=c122c(d)log2(1−13d)log2n(logn)ϵn1−ϵn=c1n2c(d)log2(1−13d)1(logn)c(d)2ϵn1−ϵnlog2(1−13d). |
Therefore,
c(d)=ϵn−12log2(1−13d)−(1−ϵn). |
Finally,
c1n2c(d)log2(1−13d)1(logn)c(d)2ϵn1−ϵnlog2(1−13d)=c1n2(ϵn−1)2log2(1−13d)−(1−ϵn)log2(1−13d)×1(logn)2(ϵn−1)2log2(1−13d)−(1−ϵn)2ϵn1−ϵnlog2(1−13d)=c1n2(ϵn−1)2(−13dlog2)−(1−ϵn)(−13dlog2)×1(logn)2(ϵn−1)2(−13dlog2)−(1−ϵn)2ϵn1−ϵn(−13dlog2)=n−(2(1−ϵn)1+(1−ϵn)dlog2)1(logn)2ϵn−2+3dlog2(ϵn−1)=n−(2(1−ϵn)2+(1−ϵn)3dlog2)(logn)(2ϵn2+3dlog2(1−ϵn)) |
and, for the second term, with the same arguments
˜cϵnn(c(d)−1)(1−ϵn)lognϵn(1−c(d))=˜cϵnn−(1−ϵn1+(1−ϵn)32dlog2)(logn)(ϵn1+d32log2(1−ϵn)), |
for a constant ˜c independent of ϵn hence,
E(˜mUn∞,n(x)−m(x))2≤Cϵnn−(1−ϵn1+(1−ϵn)32dlog2)(logn)(ϵn1+32dlog2(1−ϵn)), |
and consequently,
Cϵnn−(1−ϵn1+(1−ϵn)32dlog2)(logn)(ϵn1+32dlog2(1−ϵn))=Cϵnn−(11+32dlog2)(logn)(ϵn1+32dlog2(1−ϵn))×n(ϵn(1+32dlog2)(1+(1−ϵn))dlog2)≤Cϵnn−(11+32dlog2)(logn)(ϵn32dlog2(1−ϵn))×(logn)lognloglogn(ϵn(32dlog2)2(1−ϵn)). |
Finally we finish the proof by selecting ϵn=1logn, and
E(˜mUn∞,n(x)−m(x))2≤˜Cn−(11+32dlog2)(logn). |
In the following section, we summarize the rates of convergence for the centered KeRF and the uniform KeRF, and we compare them with the minimax rate of convergence over the class of the Lipschitz functions [29]. In addition, we provide numerical simulations where we compare the L2−error for different choices of the tree depth. All experiments performed with the software Python (https://www.python.org/), mainly with the numpy library, where random sets uniformly distributed in [0,1]d have been created, for various examples for the dimension d and the function Y. For every experiment the set is divided into a training set (80 %) and a testing set (20 %); then the L2−error (∑Xi∈test set(˜m(Xi)−Yi)2) and the standard deviation of the error is computed.
For the centered KeRF we compare three different values of tree depth, which from theory provide different convergence rates. First, the choice of k in [26, Theorem 1] that provides the previous convergence rate; second, the selection of k as it was delivered from the Theorem 1.1; and, third, the case where the estimator, in probability, interpolates the data set, but the known convergence rate is slow [3, Theorem 4.1], O(logn−d−56) for the dimension of the feature space d>5.
For the uniform KeRF, we compare the two values of tree depth as they were derived from [26] and Theorem 2 nevertheless, it is not known if the uniform-KeRF algorithm converges when our estimator function interpolates the data set. Of course, in practice, since real data might violate the assumptions of the theorems, one should try cross-validation for tuning the parameters of the algorithms.
Comparing the rates of consistency for centered KeRF and the depth of the corresponding trees:
● Scornet in [26, Theorem 1] rate of convergence:
n−(1dlog2+3)(logn)2,and k=⌈1log2+3dlognlogn2⌉. |
● New rate of convergence:
n−(11+dlog2)(logn),andk=⌈1logn−12log2(1−12d)−(1−1logn)log2n(logn)1logn1−1logn⌉. |
● Minimax [29] rate of consistency over the class of Lipschitz functions: n−2d+2 functions.
Thus, theoretically, the improved rate of consistency is achieved when trees grow at a deeper level compared with the parameter selection in [26, Theorem 1].
As it is evident from Figure 3, the improvement in the convergence rate is more significant in the low dimensional feature space scenarios. The constant ˜C=˜C(d) of Theorem 1 depends on the dimension d of the space. The convergence rates in the literature do not try to have a good estimate for ˜C, and they are significant for fixed values of d only.
Finally, we note that Klusowski's rate of convergence in [20], O((nlogd−12n)−(1+δdlog2+1)), where δ is a positive constant that depends on the dimension of the feature space d and converges to zero as d approaches infinity, is sharp and better than the one in Theorem 1 O(n−(11+dlog2)(logn)) for small values of d. For large values of n and d the two estimates are essentially the same, but for now, we do not know if in general, the rate of convergence of the centered KeRF is not improvable.
Comparing the rates of convergence for uniform KeRF and the depth of the corresponding trees:
● Scornet in [26, Theorem 2] rate of convergence:
n−(23dlog2+6)(logn)2,and k=⌈1log2+3dlognlogn2⌉. |
● New rate of convergence:
n−(23dlog2+2)(logn),and k=⌈1logn−12log2(1−13d)−(1−1logn)log2n(logn)1logn1−1logn⌉. |
● Minimax [29] rate of convergence for the consistency over the class of Lipschitz functions: n−2d+2 functions.
Thus, theoretically, as in the case of centered random KeRF, the improved rate of consistency is achieved when trees grow at a deeper level compared with the parameter selection in [26, Theorem 2].
The same considerations on the dependence of the constant ˜C on d we made for the centered KeRF hold in the uniform KeRF case, as it is evident in Figure 4. Moreover, as of now, it is still unknown to us if the convergence rate of the uniform KeRF can be improved.
Finally, numerical simulations of the L2−error of the centered KeRF-approximation for three different values of k in Figure 5 are reported with the standard deviation of the errors in Figure 6. In Appendix, more simulations and plots for different target functions and for both algorithms are illustrated.
In this paper, we have obtained improved rates of convergence for two kernel-based random forests, the centered KeRF and the uniform KeRF. In addition, we provided numerical simulations for both algorithms concerning the parameters of the methods. Finally, we explored the reproducing kernel Hilbert space related to the centered KeRF providing bounds for the dimension of the aforementioned space.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
The first author was funded by the PhD scholarship PON-DOT1303154-3, the second author was partially funded by project GNAMPA 2022 "HOLOMORPHIC FUNCTIONS IN ONE AND SEVERAL VARIABLES" and by Horizon H2020-MSCA-RISE-2017 project 777822 "GHAIA".
All authors declare no conflicts of interest in this paper.
In this last section, we provide more experiments of low dimensional regression examples with additive noise, regarding the centered and the uniform KeRF. In particular, we calculate and compare the L2−errors and the standard deviations against different sample sizes for different values of the parameter k of the estimator. Moreover, in the following subsection, we study the corresponding RKHS produced by the kernel
KCenk(x,z)=∑∑dj=1kj=kk!k1!...kd!(1d)kd∏j=11⌈2kjxj⌉=⌈2kjzj⌉, |
defined in the abelian group Zkd2. To conclude we recall some notation for finite abelian groups, necessary to define the aforementioned RKHS and estimate its dimension.
We provide here an alternative description of the centered random forest algorithm, where the dyadic tiling of the hypercube motivates us to define the kernel in the abelian group Zkd2. First, we define a Random Tree Θ. Start with a random variable Θ0, uniformly distributed in {1,…,d}, and split
I:=[0,1]d=IΘ00∪IΘ01, |
where
IΘ0l=[0,1]×⋯×[l/2,(l+1)/2]×…[0,1], |
where for l=0,1 the splitting was performed in the Θ0-th coordinate. Choose then random variables Θ1,l (l=0,1), distributed as Θ0, and split each
IΘ0l=IΘ0,Θ1l,0∪IΘ0,Θ1l,1, |
where, as before, the splitting is performed at the Θ1-th coordinate, and IΘ0,Θ1l,0 is the lower half of IΘ0l. Iterate the same procedure k times. In order to do that, we need random variables Θj;η0,…,ηj, with ηl∈{1,…,d} and j=1,…,k. We assume that all such random variables are independent. It is useful think of Θ={Θj;η0,…,ηj} as indexed by a d-adic tree, and, in fact, we refer to Θ as a random tree in [0,1]d. We call cells, or leaves, each of the 2k rectangles into which [0,1]d is split at the end of the kth subdivision.
On Figure A1 we see the L2−error of the centered KeRF-approximation for a three dimensional feature space and on Figure A2 the standard deviation of the errors.
Figures A3 and A4 show the L2−error and the standard deviation for the uniform KeRF in a two dimensional feature space and Figures A5 and A6 present a three dimensional example respectively.
B.1. The Fourier analysis of the kernel
The kernel in the centered KeRF defines a RKHS HK structure on a set Γ having 2kd points [4]. In fact, Γ has a group structure, and Fourier analysis can be used. Much research is done in RKHS theory, and in this section (see [1]), we study the structure of HK in itself. A priori, HK might have any dimension less or equal to #Γ. We show in fact that its dimension is much lower than that, a fact which is somehow surprising, and we believe it is interesting in itself. Furthermore, we prove that there are no nonconstant multipliers in the space HK. For completeness we provide definitions and notation on Fourier analysis on Abelian groups in Appendix C.
We identify every real number x∈[0,1] with its binary expression x=0.x1x2x3... with xi∈{0,1} for i∈N. Here we consider the group
G=Zkd2∋x=(xji)i=1,…,kj=1,…,d=(x1|x2|…|xd)=(x1x2…xk). | (B.1) |
The kernel K:\Gamma\times\Gamma\to\mathbb{C} corresponding to the kernel K^{cen}_{k} is,
\begin{eqnarray} K(a, b)& = &\sum\limits_{\genfrac{}{}{0pt}{}{{l\in\mathbb{N}^d}}{{|l| = k}}}\frac{1}{d^k}\binom{k}{l}\prod\limits_{j = 1}^d\chi\left(a^j_1 = b^j_1, \dots, a^j_{k_j} = b^j_{k_j}\right)\\ & = &\sum\limits_{\genfrac{}{}{0pt}{}{{l\in\mathbb{N}^d}}{{|l| = k}}}\frac{1}{d^k}\binom{k}{l}\prod\limits_{j = 1}^d\prod\limits_{i = 1}^{k_j}\chi\left(a^j_i = b^j_i\right)\\ & = &\varphi(a-b), \end{eqnarray} | (B.2) |
where \binom{k}{l} is the multidimensional binomial coefficient, \chi_E is the characteristic function of the set E , and a-b is the difference in the group \mathbb{Z}_2^{kd} . Incidentally, (B.2) shows that the kernel K can be viewed as a convolution kernel on the appropriate group structure. For the last equality, we consider the binary representation of a number in (0, 1] whose digits are not definitely vanishing. The fact that 0 does not have such representation is irrelevant since the probability that one of the coordinates of the data vanishes is zero.
We now compute the anti-Fourier transform \mu = \check{\varphi} . We know that \sharp(\Gamma) = 2^{kd} , and that the characters of \mathbb{Z}_2^{kd} have the form
\begin{equation} \gamma_a(x), \ x\in \mathbb{Z}_2^{kd}, \ a\in \widehat{\mathbb{Z}_2^{kd}}, \ a\cdot x = a^1_1 x^1_1+\dots+a^d_k x^d_k. \end{equation} | (B.3) |
Hence,
\begin{eqnarray} 2^{kd}p^n\mu(x)& = &d^k\sum\limits_{a\in\Gamma}\varphi(a)\gamma_a(x)\\ & = &d^k\sum\limits_{a\in\Gamma}\varphi(a)(-1)^{a\cdot x}\\ & = &\sum\limits_{a\in\Gamma}\sum\limits_{\genfrac{}{}{0pt}{}{{l\in\mathbb{N}^d}}{{|l| = k}}}\binom{k}{l}\prod\limits_{j = 1}^d\prod\limits_{i = 1}^{k_j}\chi\left(a^j_i = 0\right) (-1)^{a\cdot x}\\ & = &\sum\limits_{a\in\Gamma}\sum\limits_{\genfrac{}{}{0pt}{}{{l\in\mathbb{N}^d}}{{|l| = k}}}\binom{k}{l}\prod\limits_{j = 1}^d(-1)^{\tilde{a}_j^{k_j}\cdot\tilde{x}_j^{k_j}}\prod\limits_{i = 1}^{k_j}\left[\chi\left(a^j_i = 0\right)(-1)^{a^j_i x^j_i}\right]\\ &\ &\text{where }\tilde{a}_j^{k_j} = \begin{pmatrix} a^j_{k_j+1}\\ \dots\\ a^j_n \end{pmatrix}\text{ is the lower,}\ (k-k_j) -\text{dimensional} \\ &\ &\text{part of the column}\ a^j , \\ & = &\sum\limits_{\genfrac{}{}{0pt}{}{{l\in\mathbb{N}^d}}{{|l| = k}}}\binom{k}{l}\prod\limits_{j = 1}^d(-1)^{\tilde{a}_j^{k_j}\cdot\tilde{x}_j^{k_j}}\prod\limits_{i = 1}^{k_j}\chi\left(a^j_i = 0\right)\\ & = &\sum\limits_{\genfrac{}{}{0pt}{}{{l\in\mathbb{N}^d}}{{|l| = k}}}\binom{k}{l}\sum\limits_{\genfrac{}{}{0pt}{}{{a\in\Gamma}}{{a^1_1 = \dots a^1_{k_1} = a^2_1 = \dots = a^d_{k_d} = 0}}}\prod\limits_{j = 1}^d(-1)^{\tilde{a}_j^{k_j}\cdot\tilde{x}_j^{k_j}}. \end{eqnarray} | (B.4) |
The last expression vanishes exactly when for all l , there are some 1\le j\le d , and some k_j+1\le i\le k such that x^j_i = 1 , due to the presence of the factor (-1)^{a^j_i x^j_i} = (-1)^{a^j_i} which takes values \pm1 on summands having, two by two, the same absolute values.
If, on the contrary, there is l such that for all 1\le j\le d , and k_j+1\le i\le k , we have that x^j_i = 0 , then \mu(x)\ne0 . Since |l| = k and there are kd binary digits involved in the expression of x , the latter occurs exactly when the binary matrix representing x has a large lower region in which all entries are 0 . More precisely, the number of vanishing entries must be at least
\begin{equation} (k-k_1)+\dots+(k-k_p) = (d-1)k. \end{equation} | (B.5) |
The number N(d, k) of such matrices is the dimension of H_K , the Hilbert space having K as a reproducing kernel.
Next, we prove some estimates for the dimension of the RKHS.
We summarize the main items in the following statement.
Theorem B.1. Let K:\Gamma\times\Gamma\to\mathbb{C} be the kernel in (B.2), K(a, b) = \varphi(a-b) , and let
\begin{equation} E_K = \mathit{\text{supp}}(\check{\varphi})\in K. \end{equation} | (B.6) |
Then,
(i) As a linear space, H_K = L_{E_K} , where
\begin{eqnarray} E_K& = &\{x = (x^1|\dots|x^d):\ x^j_i = 0 \mathit{\text{for}}k_j+1\le i\le k, \ \mathit{\text{for some}}\ l\\ & = &(k_1, \dots, k_d)\in\mathbb{N}^d\ \mathit{\text{with}}\ k_1+\dots+k_d = k\}; \end{eqnarray} | (B.7) |
(ii) For x\in E_K ,
\begin{equation} \check{\varphi}(x) = \frac{1}{2^{k}d^k}\sum\limits_{\genfrac{}{}{0pt}{}{{l\in\mathbb{N}^d, \ |l| = k}}{{x^j_i = 0 \mathit{\text{for}}k_j+1\le i\le k}}}\binom{k}{l}. \end{equation} | (B.8) |
To obtain the expression on (B.8), we used the fact that
\sharp\{a:\ a^1_1 = \dots a^1_{k_1} = a^2_1 = \dots = a^p_{k_p} = 0\} = 2^{(d-1)k}. |
B.2. Some properties of H_K
Linear relations. Among all functions \psi:\Gamma\to\mathbb{C} , those belonging to H_K (i.e., those belonging to L_{E_K} ) are characterized by a set of linear equations,
\begin{equation} 0 = 2^{np}p^n\mu(x) = \sum\limits_{\genfrac{}{}{0pt}{}{{k\in\mathbb{N}^p, \ |k| = n}}{{x^j_i = 0 \text{ for }k_j+1\le i\le n}}}\binom{n}{k}\text{ for }x\notin E_K. \end{equation} | (B.9) |
Multipliers. A multiplier of H_K is a function m:\Gamma\to\mathbb{C} such that m\psi\in H_K whenever \psi\in H_K .
Proposition B.1. The space H_K has no nonconstant multiplier.
In particular, it does not enjoy the complete Pick property, which has been subject of intensive research for the past twenty-five years [1].
Proof. The space H_K coincides as a linear space with L_{E_K} . Let \Lambda_E = \check{L_E} , which is spanned by \{\delta_x:\ x\in E\} . Observe that, since 0 = (0|\dots|0)\in E_K , the constant functions belong to H_K , hence, any multiplier m of H_K lies in H_K ; m = m\cdot 1\in H_K .
Suppose m is not constant. Then, \check{m}(a)\ne0 for some a\in E_K , a\ne0 . Let a be an element in E_K such that \check{m}(a)\ne0 . Since H_K\ni m\cdot\widehat{\delta_x} for all x in E_K , and m\cdot\widehat{\delta_x} = \widehat{\check{m}\ast\delta_x} , we have that the support of \check{m}\ast\delta_x lies in H_K . Now,
\check{m}\ast\delta_x(y) = \check{m}(x-y), |
hence, we have that, for any x in E_K , y = x-a lies in E_K as well. This forces a = 0 , hence m to be constant.
B.3. Bounds for dimension and generating functions
Theorem B.2. For fixed d\ge 1 , we have the estimates:
\begin{equation} \mathit{\text{dim}}(H_K)\sim\frac{2^{k-d+1}k^{d-1}}{(d-1)!}, \ \mathit{\text{hence}}\ \frac{\mathit{\text{dim}}(H_K)}{2^{kd}}\sim\frac{k^{d-1}}{2^{d-1}(d-1)!2^{k(d-1)}}. \end{equation} | (B.10) |
Proof. Let l_{1}, l_{2}, ..., l_{d} such that
0\leq l_{1}+l_{2}+...+l_{d} = m \leq k |
where m is a parameter and let also \lambda = \lvert j : l_{j} \geq 1 \rvert = \lvert \{ \text{stop 1-digits} \} \rvert = \lvert \{ \text{back-entries} \} \vert where \lvert \cdot \rvert denotes the size of the sets, and of course we have that 0 \leq m \leq k and 0\leq \lambda \leq d, m. Goal to obtain a bound for
N(k, d) = \sum\limits_{m = 0}^k \sum\limits_{\lambda = 0}^{d \wedge m} 2^{m-\lambda} { d \choose \lambda} \lvert \{ (l_{1}, l_{2}, ..., l_{d}): l_{1} +l_{2}+...+l_{d} = m \rvert |
and
\lvert \{ j : l_{j} = 1 \} = 1 \rvert \} . |
Let A(m, \lambda) the m-th coefficient of x in the polynomial
\begin{align*} (x+x^{2}+...x^{m}+...)^{\lambda} & = (x(1+x+x^{2}+...)^{\lambda}\\ & = (x^{\lambda}(1+x+...)^{\lambda})\\ & = \frac{x^{\lambda}}{(1-x)^{\lambda}} \end{align*} |
and 2^{m} A(m, \lambda) is the m-th coefficient of x, for the fraction \frac{(2x)^{\lambda}}{(1-2x)^{\lambda}}, therefore 2^{m-\lambda} A(m, \lambda) is the m-th coefficient of \frac{x^{\lambda}}{(1-2x)^{\lambda}}. Let's see the first sum, B(m, d) is the m-th coefficient of x :
\begin{align*} \sum\limits_{\lambda = 0}^{d \wedge m }{ d \choose \lambda} 2^{m-\lambda } A(m, \lambda) & = \sum\limits_{\lambda = 0}^{d \wedge m}{ d \choose \lambda} ( \frac{x}{1-2x} )^{\lambda}\\ & = ( 1+ \frac{x}{1-2x})^d\\ & = (\frac{1-x}{1-2x})^d. \end{align*} |
Again by the same combinatoric argument we are looking the k -th coefficient of the function
f(x) = \frac{1}{1-x}(\frac{1-x}{1-2x})^d . |
Back to the estimate, Let a_{k} the coefficient of the power series centered at z = 0.
\max\limits_{ |z| = r } | f(z)| = \max\limits_{ |z| = r } \bigg| \frac{(1-z)^{d-1}}{(1-2z)^{d}} \bigg| = \max\limits_{ |z| = r }\bigg| \frac{1}{1-z} \bigg(\frac{1-z}{1-2z}\bigg)^{d} \bigg| \leq 2 \max\limits_{ \theta \in ( -\pi, \pi) } \bigg| \frac{1-re^{i \theta}}{1-2re^{i \theta}} \bigg| ^{d}. |
After some calculations since r is fixed one has that the maximum is achieved for \theta = 0. So
\max\limits_{ |z| = r } | f(z)| \leq 2 (\frac{1-r}{1-2r})^{d} . |
Our estimation finally becomes:
\begin{align*} |a_{k}| &\leq \frac{2 \big( \frac{1-r}{1-2r} \big)^{d} }{r^{k}}\\ & = \frac{2(1-r)^{d}}{r^{k} (1-2r)^{k} }\\ & = k^{d} 2^{k}(\frac{1}{2} + \frac{1}{2k})^{d}, \quad \quad ( \text{since, } \quad r = \frac{1}{2}( 1-\frac{1}{k}) )\\ & = k^{d} (1+ \frac{1}{k})^{d} 2^{k-d}. \end{align*} |
Thus an estimate for the dimension of H_K is
\frac{|a_k|}{2^{kd}}\lesssim \frac{k^{d} (1+ \frac{1}{k})^{d} 2^{k(1-d)}}{2^d}. |
Another estimate about the dimension of H_K . For f(z) = \sum_{n = 0}^\infty a_n z^n , we have
|a_n|\le\frac{\max\{|f\left(r e^{i t}\right)|:\ |t|\le\pi\}}{r^n}. |
Consider the function
f(z) = \frac{(1-z)^{d-1}}{(1-2z)^d} |
in |z| < 1/2 and let r = \frac{1-1/k}{2} . Then,
\begin{eqnarray*} |a_k|&\le&\frac{(3/2)^{d-1}}{(1/k)^d(1-1/k)^k2^{-k}}\\ &\le&(3/2)^{d-1}2^ke k^d. \end{eqnarray*} |
Thus,
\frac{|a_k|}{2^{kd}}\lesssim\frac{k^d (3/2)^d}{2^{k(d-1)}}. |
Recursively working out the generating function one gets the estimates in (B.10).
Following the notation of [24], we recall here the basic notions of Fourier theory for a finite, abelian group G , which we employed above. Here, G is endowed with its counting measure. The dual group \Gamma = \widehat{G} of G is populated by labels a for homomorphisms \gamma_a:G\to\mathbb{T} = \{e^{i t}:\ t\in\mathbb{R}\} . Given a function f:G\to\mathbb{C} , its Fourier transform \widehat{f}:\Gamma\to\mathbb{C} is defined as
\begin{equation} \widehat{f}(a) = \sum\limits_{x\in G}f(x)\overline{\gamma_a(x)}. \end{equation} | (C.1) |
We make \Gamma into a (finite), additive group by setting
\gamma_{a+b} = \gamma_a\cdot\gamma_b, \text{ and }\gamma_x(a): = \gamma_a(x). |
It turns out that they have the same number of elements, \sharp(G) = \sharp(\Gamma) . Some basic properties are:
\begin{eqnarray*} f(x)& = &\frac{1}{\sharp(\Gamma)}\sum\limits_{a\in \Gamma}\widehat{f}(a)\gamma_a(x)\text{ (inverse Fourier transform) }\\ \sum\limits_{x\in G}|f(x)|^2& = &\frac{1}{\sharp(\Gamma)}\sum\limits_{a\in \Gamma}|\widehat{f}(a)|^2\text{ (Plancherel) }\\ \widehat{f\ast g}& = &\widehat{f}\cdot\widehat{g}, \end{eqnarray*} |
where
\begin{equation} (f\ast g)(x) = \sum\limits_{y\in G}f(x-y)g(y). \end{equation} | (C.2) |
We write
\begin{equation} \check{\varphi}(x) = \sharp(\Gamma)^{-1}\sum\limits_{a\in\Gamma}\varphi(a)\gamma_a(x), \text{ so that }\widehat{\check{\varphi}} = \varphi. \end{equation} | (C.3) |
The unit element of convolution in G is \delta_0 .
In the other direction, for \varphi, \psi:\Gamma\to\mathbb{C} we define
\begin{equation} (\varphi\ast\psi)(a) = \frac{1}{\sharp(\Gamma)}\sum\limits_{b\in\Gamma}\varphi(a-b)\psi(b), \end{equation} | (C.4) |
and similarly to above, \widehat{\check{\varphi}\check{\psi}} = \varphi\ast\psi . The unit element on convolution in \Gamma is \sharp(\Gamma)\delta_0 .
A function \varphi on \Gamma is positive definite if
\sum\limits_{a, b\in\Gamma}^n c(a) \overline{c(b)}\varphi(b-a)\ge0. |
Theorem C.1. [Bochner's theorem] A function \varphi:\Gamma\to\mathbb{C} is positive definite if and only if there exists \mu:G\to\mathbb{R}_+ such that \varphi = \widehat{\mu} .
The theorem holds in great generality, and its proof in the finite group case is elementary. We include it because it highlights the relationship between the measure \mu on G and the positive definite function (the kernel) \varphi .
Proof. If
\begin{eqnarray} \sharp(\Gamma)^{-2}\sum\limits_{a, b\in\Gamma}\widehat{\mu}(b-a) c(a)\overline{c(b)}& = &\sum\limits_{x\in G}\sharp(\Gamma)^{-2}\sum\limits_{a, b\in\Gamma}\mu(x)\overline{\gamma_{b-a}(x)}c(a)\overline{c(b)}\\ & = &\sum\limits_{x\in G}\sharp(\Gamma)^{-2}\sum\limits_{a, b\in\Gamma}\mu(x)\overline{\gamma_{b}(x)}\overline{c(b)}\gamma_a(x)c(a)\\ & = &\sum\limits_{x\in G}\mu(x)\left|\sharp(\Gamma)^{-1}\sum\limits_{a\in\Gamma}c(a)\gamma_a(x)\right|^2\\ & = &\sum\limits_{x\in G}\mu(x)\left|\check{c}(x)\right|^2\ge0. \end{eqnarray} | (C.5) |
Only if, since for all b in \Gamma ,
\begin{eqnarray} \mu(x)\sharp(\Gamma)& = &\sum\limits_{a\in\Gamma}\varphi(a)\gamma_x(a) \\ & = &\sum\limits_{a\in\Gamma}\varphi(a-b)\gamma_x(a-b)\\ & = &\sum\limits_{a\in\Gamma}\varphi(a-b)\gamma_x(a)\overline{\gamma_x(b)}, \end{eqnarray} | (C.6) |
we have
\begin{equation} \mu(x)\sharp(\Gamma)^2 = \sum\limits_{a, b\in\Gamma}\varphi(a-b)\gamma_x(a)\overline{\gamma_x(b)}\ge0, \end{equation} | (C.7) |
by the assumption.
We now come to reproducing kernels on \Gamma which are based on positive definite functions \varphi:\Gamma\to\mathbb{R}_+ . Set
\begin{equation} K(a, b) = \varphi(a-b) = K_b(a), \ K:\Gamma\times\Gamma\to\mathbb{C}, \end{equation} | (C.18) |
and set
\begin{equation} H_K = \text{span}\{K_b:\ b\in\Gamma\}\ni\sum\limits_{b\in\Gamma}c(b)K_b, \end{equation} | (C.19) |
where H_K is the Hilbert space having K as reproducing kernel. We wish to have a more precise understanding of it.
We start by expressing the norm of an element on H_K is several equivalent ways,
\begin{eqnarray} \left\|\sum\limits_{b\in\Gamma}c(b)K_b\right\|_{H_K}^2& = &\sum\limits_{a, b\in\Gamma}\overline{c(a)}c(b)\langle K_b, K_a\rangle\\ & = &\sum\limits_{a, b\in\Gamma}\overline{c(a)}c(b)K(a, b) = \sum\limits_{a, b\in\Gamma}\overline{c(a)}c(b)\widehat{\mu}(a-b)\\ & = &\sum\limits_{a, b\in\Gamma}\overline{c(a)}c(b)\sum\limits_{x\in G}\mu(x)\gamma_{b-a}(x)\\ & = &\sum\limits_{x\in G}\mu(x)\sum\limits_{a, b\in\Gamma}\overline{c(a)}c(b)\gamma_{b}(x)\overline{\gamma_a(x)}\\ & = &\sum\limits_{x\in G}\mu(x)\left|\sum\limits_{b\in\Gamma}c(b)\gamma_b(x)\right|^2\\ & = &\sharp(\Gamma)^2\sum\limits_{x\in G}\mu(x)\left|\check{c}(x)\right|^2\\ & = &\sharp(\Gamma)^2\sum\limits_{x\in G}\left|\mu(x)^{1/2}\check{c}(x)\right|^2. \end{eqnarray} | (C.10) |
In other terms,
\begin{equation} \sharp(\Gamma)^{-1}\sum\limits_{b\in\Gamma}c(b)K_b\mapsto \check{c} \end{equation} | (C.11) |
is an isometry of H_K onto L^2(\mu) . This will become important later, when we verify that for our kernels \text{supp}(\mu) is sparse in G . In fact, \text{dim}(H_K) = \sharp(\text{supp}(\mu)) .
Corollary C.1. As a linear space, H_K is determined by \mathit{\text{supp}}(\mu) :
\psi\in H_K\ \mathit{\text{if and only if}}\ \mathit{\text{supp}}\ (\check{\psi})\subseteq\mathit{\text{supp}}(\mu). |
Let E\subseteq G . We denote
\begin{equation} L_E = \{G\xrightarrow{\psi}\mathbb{C}:\ \text{supp}(\check{\psi})\subseteq E\}. \end{equation} | (C.12) |
Next, we look for the natural orthonormal system provided by the Fourier isometry (C.11). Fr x\in G , let \check{c_x} = \mu(x)^{-1/2}\delta_x : \{\check{c_x}:\ x\in E: = \text{supp}(\mu)\} is a orthonormal system for L^2(\mu) , and so \{e_x:\ x\in E\} is an orthonormal basis for H_K , where
\begin{eqnarray} c_x(b) = \sum\limits_{y\in G}\mu(x)^{-1/2}\delta_x(y)\overline{\gamma_b(y)} = \mu(x)^{-1/2}\overline{\gamma_b(x)}, \end{eqnarray} | (C.13) |
and
\begin{eqnarray} e_x(a)& = &\sharp(\Gamma)^{-1}\sum\limits_{b\in\Gamma}c_x(b)K_b(a)\\ & = &\frac{\mu(x)^{-1/2}}{\sharp(\Gamma)}\sum\limits_{b\in\Gamma}K_b(a)\overline{\gamma_b(x)}\\ & = &\frac{\mu(x)^{-1/2}}{\sharp(\Gamma)}\sum\limits_{b\in\Gamma}\varphi(a-b)\overline{\gamma_b(x)}\\ & = &\frac{\mu(x)^{-1/2}}{\sharp(\Gamma)}\sum\limits_{b\in\Gamma}\varphi(a-b)\gamma_{a-b}(x)\overline{\gamma_a(x)}\\ & = &\mu(x)^{-1/2}\mu(x)\overline{\gamma_a(x)}\\ & = &\mu(x)^{1/2}\overline{\gamma_a(x)}. \end{eqnarray} | (C.14) |
Let's verify that we obtain the reproducing kernel from the o.n.b. as expected,
\begin{eqnarray} \sum\limits_{x\in\Gamma}e_x(a)\overline{e_x(b)}& = &\sum\limits_{x\in\Gamma}\mu(x)\overline{\gamma_x(a)}\gamma_x(b)\\ & = &\sum\limits_{x\in\Gamma}\mu(x)\overline{\gamma_x(a-b)}\\ & = &\widehat{\mu}(a-b)\\ & = &\varphi(a-b). \end{eqnarray} | (C.15) |
Remark C.1. Since any finite, abelian group can be written as the direct product of cyclic groups,
\begin{equation} G = \bigoplus\limits_{l = 1}^L \mathbb{Z}_{m_l}, \end{equation} | (C.16) |
its dual \Gamma can be written in the same way, because \widehat{\mathbb{Z}_m}\equiv\mathbb{Z}_m . From the Fourier point of view, the only difference is that, if on G we consider the counting measure, then on \Gamma we consider normalized counting measure, as we did above.
[1] | J. Smoller, Shock waves and reaction-diffusion equations, Springer, 1 Ed., 1994. |
[2] | E. F. Toro, Riemann solvers and numerical methods for fluid dynamics, Springer, 1999. |
[3] |
M. A. E. Abdelrahman, Cone-grid scheme for solving hyperbolic systems of conservation laws and one application, Comp. Appl. Math., 37 (2018), 3503–3513. https://doi.org/10.1007/s40314-017-0527-9 doi: 10.1007/s40314-017-0527-9
![]() |
[4] |
M. A. E. Abdelrahman, Global solutions for the ultra-relativistic Euler equations, Nonlinear Anal., 155 (2017), 140–162. https://doi.org/10.1016/j.na.2017.01.014 doi: 10.1016/j.na.2017.01.014
![]() |
[5] |
M. A. E. Abdelrahman, On the shallow water equations, Z. Naturforsch. A, 72 (2017), 873–879. https://doi.org/10.1515/zna-2017-0146 doi: 10.1515/zna-2017-0146
![]() |
[6] | S. Chaplygin, On gas jets, Scientific Memoirs, Moscow University Mathematic Physics, Vol. 21, 1904. |
[7] |
M. R. Setare, Holographic Chaplygin gas model, Phys. Lett. B, 648 (2007), 329–332. https://doi.org/10.1016/j.physletb.2007.03.025 doi: 10.1016/j.physletb.2007.03.025
![]() |
[8] |
Y. Brenier, Solutions with concentration to the Riemann problem for one-dimensional Chaplygin gas equations, J. Math. Fluid Mech., 7 (2005), S326–S331. https://doi.org/10.1007/s00021-005-0162-x doi: 10.1007/s00021-005-0162-x
![]() |
[9] |
L. H. Guo, W. C. Sheng, T. Zhang, The two-dimensional Riemann problelm for isentropic Chaplygin gas dynamic system, Comm. Pure Appl. Anal., 9 (2010), 431–458. http://dx.doi.org/10.3934/cpaa.2010.9.431 doi: 10.3934/cpaa.2010.9.431
![]() |
[10] |
Y. Hu, J. Q. Li, W. C. Sheng, Degenerate Goursat-type boundary value problems arising from the study of two-dimensional isothermal Euler equations, Z. Angew. Math. Phys., 63 (2012), 1021–1046. https://doi.org/10.1007/s00033-012-0203-2 doi: 10.1007/s00033-012-0203-2
![]() |
[11] |
J. Q. Li, Y. X. Zheng, Interaction of four rarefaction waves in the bi-symmetric class of the two-dimensional Euler equations, Commun. Math. Phys., 296 (2010), 303–321. https://doi.org/10.1007/s00220-010-1019-6 doi: 10.1007/s00220-010-1019-6
![]() |
[12] |
S. Chen, A. Qu, Two-dimensional Riemann problems for Chaplygin gas, SIAM J. Math. Anal., 44 (2012), 2146–2178. https://doi.org/10.1137/110838091 doi: 10.1137/110838091
![]() |
[13] |
G. Lai, W. C. Sheng, Y. X. Zheng, Simple waves and pressure delta waves for a Chaplygin gas in two-dimensions, Discrete Contin. Dyn. Syst., 31 (2011), 489–523. https://doi.org/10.3934/dcds.2011.31.489 doi: 10.3934/dcds.2011.31.489
![]() |
[14] | W. C. Sheng, T. Zhang, The Riemann problem for transportation equations in gas dynamics, Memoirs of the American Mathematical Society, Vol. 137, 1999. https://doi.org/10.1090/memo/0654 |
[15] |
F. B. Li, W. Xiao, Interaction of four rarefaction waves in the bi-symmetric class of the pressure gradient system, J. Differ. Equations, 252 (2012), 3920–3952. https://doi.org/10.1016/j.jde.2011.11.010 doi: 10.1016/j.jde.2011.11.010
![]() |
[16] |
K. Song, Y. X. Zheng, Semi-hyperbolic patches of solutions of the pressure gradient system, Discrete Contin. Dyn. Syst., 24 (2009), 1365–1380. https://doi.org/10.3934/dcds.2009.24.1365 doi: 10.3934/dcds.2009.24.1365
![]() |
[17] |
G. Q. Chen, X. M. Deng, W. Xiang, Shock diffraction by convex cornered wedges for the nonlinear wave system, Arch. Rational Mech. Anal., 211 (2014), 61–112. https://doi.org/10.1007/s00205-013-0681-1 doi: 10.1007/s00205-013-0681-1
![]() |
[18] | K. Mohamed, Simulation numérique en volume finis, de problémes d'écoulements multidimensionnels raides, par un schéma de flux á deux pas, Dissertation, University of Paris XIII, 2005. |
[19] |
K. Mohamed, M. Seaid, M. Zahri, A finite volume method for scalar conservation laws with stochastic time-space dependent flux function, J. Comput. Appl. Math., 237 (2013), 614–632. https://doi.org/10.1016/j.cam.2012.07.014 doi: 10.1016/j.cam.2012.07.014
![]() |
[20] | F. Benkhaldoun, K. Mohamed, M. Seaid, A Generalized Rusanov method for Saint-Venant Equations with Variable Horizontal Density, In: J. Fořt, J. Fürst, J. Halama, R. Herbin, F. Hubert, Finite volumes for complex applications Ⅵ problems & perspectives, Springer Proceedings in Mathematics, Springer, Berlin, Heidelberg, 4 (2011), 89–96. https://doi.org/10.1007/978-3-642-20671-9_10 |
[21] |
K. Mohamed, F. Benkhaldoun, A modified Rusanov scheme for shallow water equations with topography and two phase flows, Eur. Phys. J. Plus, 131 (2016), 207. https://doi.org/10.1140/epjp/i2016-16207-3 doi: 10.1140/epjp/i2016-16207-3
![]() |
[22] |
K. Mohamed, A finite volume method for numerical simulation of shallow water models with porosity, Comput. Fluids, 104 (2014), 9–19. https://doi.org/10.1016/j.compfluid.2014.07.020 doi: 10.1016/j.compfluid.2014.07.020
![]() |
[23] |
K. Mohamed, A. R. Seadawy, Finite volume scheme for numerical simulation of the sediment transport model, Int. J. Mod. Phys. B, 33 (2019), 1950283. https://doi.org/10.1142/S0217979219502837 doi: 10.1142/S0217979219502837
![]() |
[24] |
K. Mohamed, M. A. E. Abdelrahman, The modified Rusanov scheme for solving the ultra-relativistic Euler equations, Eur. J. Mech. B/Fluids, 90 (2021), 89–98. https://doi.org/10.1016/j.euromechflu.2021.07.014 doi: 10.1016/j.euromechflu.2021.07.014
![]() |
[25] |
S. Mungkasi, S. G. Roberts, A smoothness indicator for numerical solutions to the Ripa model, J. Phys. Conf. Ser., 693 (2016), 012011. https://doi.org/10.1088/1742-6596/693/1/012011 doi: 10.1088/1742-6596/693/1/012011
![]() |
[26] |
S. J. Sherwin, L. Formaggia, J. Peiro, V. Franke, Computational modelling of 1D blood flow with variable mechanical properties and its application to the simulation of wave propagation in the human arterial system, Int. J. Numer. Methods Fluids, 43 (2003), 673–700. https://doi.org/10.1002/fld.543 doi: 10.1002/fld.543
![]() |
[27] |
P. Gupta, R. K. Chaturvedi, L. P. Singh, The generalized Riemann problem for the Chaplygin gas equation, Eur. J. Mech. B/Fluids, 82 (2020), 61–65. https://doi.org/10.1016/j.euromechflu.2020.03.001 doi: 10.1016/j.euromechflu.2020.03.001
![]() |
[28] | R. J. LeVeque, Numerical methods for conservation laws, Birkhäuser Verlag Basel, Switzerland, 1992. |
[29] | V. Rusanov, Caculation of interaction of non-steady shock waves with obstacles, National Research Council of Canada, Ottawa, 1961,267–279. |
[30] |
P. K. Sweby, High resolution schemes using flux limiters for hyperbolic conservation laws, SIAM J. Numer. Anal., 21 (1984), 995–1011. https://doi.org/10.1137/0721062 doi: 10.1137/0721062
![]() |
[31] |
G. Wang, B. Chen, Y. Hu, The two-dimensional Riemann problem for Chaplygin gas dynamics with three constant states, J. Math. Anal. Appl., 393 (2012), 544–562. https://doi.org/10.1016/j.jmaa.2012.03.017 doi: 10.1016/j.jmaa.2012.03.017
![]() |