
Citation: Elisenda Feliu. Sign-sensitivities for reaction networks: an algebraic approach[J]. Mathematical Biosciences and Engineering, 2019, 16(6): 8195-8213. doi: 10.3934/mbe.2019414
[1] | Alessandro Linzi . Polygroup objects in regular categories. AIMS Mathematics, 2024, 9(5): 11247-11277. doi: 10.3934/math.2024552 |
[2] | Lingling Tan, Tiwei Zhao . Extension-closed subcategories in extriangulated categories. AIMS Mathematics, 2022, 7(5): 8250-8262. doi: 10.3934/math.2022460 |
[3] | Limin Liu, Hongjin Liu . Consistent pairs of s-torsion pairs in extriangulated categories with negative first extensions. AIMS Mathematics, 2024, 9(1): 1494-1508. doi: 10.3934/math.2024073 |
[4] | Haijun Cao, Fang Xiao . The category of affine algebraic regular monoids. AIMS Mathematics, 2022, 7(2): 2666-2679. doi: 10.3934/math.2022150 |
[5] | Zhen Zhang, Shance Wang . Silting objects and recollements of extriangulated categories. AIMS Mathematics, 2024, 9(9): 24796-24809. doi: 10.3934/math.20241207 |
[6] | Sana Khadim, Muhammad Qasim . Quotient reflective subcategories of the category of bounded uniform filter spaces. AIMS Mathematics, 2022, 7(9): 16632-16648. doi: 10.3934/math.2022911 |
[7] | Ahmed Ramadan, Anwar Fawakhreh, Enas Elkordy . Novel categorical relations between L-fuzzy co-topologies and L-fuzzy ideals. AIMS Mathematics, 2024, 9(8): 20572-20587. doi: 10.3934/math.2024999 |
[8] | Tareq M. Al-shami, Salem Saleh, Alaa M. Abd El-latif, Abdelwaheb Mhemdi . Novel categories of spaces in the frame of fuzzy soft topologies. AIMS Mathematics, 2024, 9(3): 6305-6320. doi: 10.3934/math.2024307 |
[9] | Luca Giorgetti, Wei Yuan, XuRui Zhao . Separable algebras in multitensor C∗-categories are unitarizable. AIMS Mathematics, 2024, 9(5): 11320-11334. doi: 10.3934/math.2024555 |
[10] | Wenxia Wu, Yunnan Li . Classification of irreducible based modules over the complex representation ring of S4. AIMS Mathematics, 2024, 9(7): 19859-19887. doi: 10.3934/math.2024970 |
Random forests are a class of non-parametric statistic machine learning algorithms used for regression and classification tasks. Random forest algorithms have the capability to perform sparse tasks with high accuracy in high dimensions, avoiding overfitting. In particular, random forests are considered to be among the most accurate learning algorithm classes for general tasks. They are routinely used in many fields including bio-informatics [15], economics [30], biology [10], and linguistics [18].
The most widely used random forest algorithm was introduced by Breiman [13], who was inspired by the work on random subspaces of Ho [19], and geometrical feature selection of Amit and Geman [2]. In Breiman's random forest, the trees are grown based on the classification and regression trees (CART) procedure, where both splitting directions and training sets are randomized. However, despite the few parameters that need to be tuned [16,21], their mathematical properties are still areas of active research [6,22]. A significant distinction among the class of random forest algorithms consists in the way each individual tree is constructed, and, in particular, the dependence of each tree on the data set. Some of the researchers consider random forests designed independently from the data set [7,14,25].
In 2012, Biau [5] studied a random forest model proposed by Breiman, where the construction is independent of the data set, called in literature centered random forest. In [5] an upper bound on the rate of consistency of the algorithm and its adaption to sparsity were proven. More precisely, about the first item, for a data set of n samples in a space of dimension d, the convergence rate was O(n−1d43log2+1). In 2021, Klusowski [20] improved the rate of convergence to O((nlogd−12n)−(1+δdlog2+1)), where δ is a positive constant that depends on the dimension of the feature space d and converges to zero as d approaches infinity. In addition, in the same paper, Klusowski proved that the rate of convergence of the algorithm is sharp, although it fails to reach the minimax rate of consistency over the class of the Lipschitz functions O(n−2d+2) [29]. There is also important work on the consistency of algorithms that depend on data [23,27,28]. For a comprehensive overview of both theoretical and practical aspects of the random forests, see [8], which surveys the subject up to 2016.
An important tool for algorithmically manipulating random forests is through kernel methods. Breiman [11] observed the connection between kernel theory and random forests by showing the equivalence between tree construction and kernel action. Later this was formalized by Geurts et al. in [17]. In the same direction Scornet in [26] defined Kernel Random Forest (KeRF) by modifying the original algorithm, and providing theoretical and practical results. In particular, in his important work, Scornet provided explicit kernels for some generalizations of algorithms, their rate of consistency, and comparisons with the corresponding random forests. Furthermore, Arnould et al. [3] investigated the trade-off between interpolation of several random forest algorithms and their consistency results.
In the first part of the paper, we provide the notation and the definitions of the centered and uniform random forests and their corresponding kernel-related formulations. In addition, we improve the rate of consistency for the centered KeRF algorithm. Let k≥1 be the depth of the trees used to estimate the target variable Y (see Section 2 for definitions and notation).
Theorem 1. Suppose that X=(X1,…,Xd) and Y are related by Y=m(X)+ϵ where: ϵ is a zero mean Gaussian noise with finite variance independent of X, X is uniformly distributed in [0,1]d, and m is a regression function, which we assume to be Lipschitz. Then, there exists a constant ˜C such that, for every n>1 and for every x∈[0,1]d,
E(˜mCen∞,n(x)−m(x))2≤˜Cn−(11+dlog2)(logn). |
Here, m(x)=E[Y|X=x] is the predicted value of Y for X=x∈[0,1]d, while ˜mCen∞,n(x) is the estimate for m provided by the kernel version of the centered random forest algorithm.
Similarly, with ˜mUn∞,n(x) playing for the uniform KeRF algorithm the role ˜mCen∞,n(x) had above, we have:
Theorem 2. Let X, Y, m, and ϵ be as in Theorem 1, with Y=m(X)+ϵ. Then there exists a constant ˜C such that for every n>1, for every x∈[0,1]d
E(˜mUn∞,n(x)−m(x))2≤˜Cn−(11+32dlog2)(logn). |
Moreover, in Section 4, we provide numerical examples and experiments concerning the tuning parameter k, which is the tree depth of the two kernel-based random forest algorithms, by comparing the L2-error for different values and under specific assumptions on the data set.
In the final part of the article, we consider the reproducing kernel K used in the centered KeRF algorithm per se. It is rewarding looking at it as defined on the finite Abelian group Zkd2, where, as above, d is the dimension of the vector X and k is the depth of the tree. By using elementary Fourier analysis on groups, we obtain several equivalent expressions for K and its group transform, we characterize the functions belonging to the corresponding Reproducing Kernel Hilbert Space (RKHS) HK, we derive results on multipliers, and we obtain bounds for the dimension of HK, which is much smaller than what one might expect.
A usual problem in machine learning is, based on n observations of a random vector (X,Y)∈X×R⊆Rd×R, to estimate the function m(x)=E(Y|X=x). In classification problems, Y ranges over a finite set. In particular we assume that we are given a training sample Dn={(X1,Y1),...,(Xn,Yn)} of independent random variables, where Xi∈[0,1]d for every i=1,...,n and Y∈R with a shared joint distribution PX,Y. The goal is using the data set to construct an estimate mn:X⊆[0,1]d→R of the function m. Our convergence rate requires an a priori assumption on the regularity of the function m. Following [26], we suppose that m belongs to the class of L Lipschitz functions,
|m(x)−m(x′)|≤L⋅‖x−x′‖. |
Here, as is [26], we consider on Rd the distance
‖x−x′‖=d∑j=1|xj−x′j|. |
Next, we provide the general random forest framework by defining firstly the notion of a random tree. Additionally, we present two specific variations of the original random forest algorithm, namely, the centered and uniform random forest algorithms.
Let's assume Θi for i=1,...,M is a collection of independent random variables, distributed as Θ. The random variables Θi correspond to sample the training set or select the positions for splitting. The detailed construction in the case of the centered random forest is performed in Appendix.
Definition 1. For the j-th tree in the forest, the predicted value x will be denoted by
mn,Θj,Dn(x)=n∑i=11Xi∈An,Θj,Dn(x)YiNn,Θj,Dn(x). |
Where An,Θj,Dn(x) is the cell containing x and Nn,Θj,Dn(x) is the number of points that fall into the cell that x belongs to.
For a fixed value of x∈[0,1]d, the value of the tree is the empirical expectation of Y in the unique cell containing x; which is, this is the hope, a good guess for the target value corresponding to x.
A random forest is a finite collection (average) of independent, finite random trees {Θ1,…,ΘM}.
Definition 2. The finite M forest is
mM,n(x)=1MM∑j=1mn,Θj,Dn(x). |
From a modeling point of view, we let M→∞ and consider the infinite forest estimate
m∞,n,Dn(x)=EΘ(mn,Θ,Dn(x)). |
The convergence holds almost surely by the law of the large numbers conditionally on Dn [12] and [25, Theorem 3.1].
In 2016, Scornet [26] introduced kernel methods in the random forest world (KeRF), producing a kernel-based algorithm, together with estimates on how this compares with the old one, described above.
To understand the intuition behind KeRF construction, we reformulate the random forest algorithm.
For all x∈[0,1]d,
mM,n(x)=1MM∑j=1(n∑i=11Xi∈An,Θj,Dn(x)YiNn,Θj,Dn(x)). |
Therefore we can define the weights of every observation Yi as
Wi,j,n(x)=1Xi∈An,Θj,Dn(x)Nn,Θj,Dn(x). |
Hence it is clear that the value of weights changes significantly concerning the number of points in each cell. A way to overcome this nuisance is by simultaneously considering all tree cells containing x, as the tree is randomly picked in the forest.
For all x∈[0,1]d,
˜mM,n,Θ1,Θ2,...,ΘM(x)=1∑Mj=1Nn,Θj(x)M∑j=1n∑i=1Yi1Xi∈An,Θj(x). |
This way, empty cells do not affect the computation of the prediction function of the algorithm.
It is proven in [26], that this representation has indeed a kernel representation.
Proposition 1. [26, Proposition 1] For all x∈[0,1]d almost surely, it holds
˜mM,n,Θ1,Θ2,...,ΘM(x)=∑ni=1KM,n(x,Xi)Yi∑ni=1KM,n(x,Xi), |
where
KM,n(x,z)=1MM∑i=11x∈An,Θi,Dn(z) |
is the proximity function of the M forest.
The infinite random forest arises as the number of trees tends to infinity.
Definition 3. The infinite KeRF is defined as:
˜m∞,n(x)=limM→∞˜mM,n(x,Θ1,Θ2,...,ΘM). |
The extension of the kernel follows also in the infinite random forest.
Proposition 2. [26, Proposition 2] Almost surely for all x,y∈[0.1]d
limM→∞KM,n(x,y)=Kn(x,y), |
where
Kn(x,y)=PΘ(x∈An(y,Θ)), |
where the left-hand side is the probability that x and y belong to the same cell in the infinite forest.
In this paper, we say that an estimator function mn of m is consistent if the following L2−type of convergence holds,
E(mn(x)−m(x))2→0, |
as n→∞.
In the centered and uniform forest algorithms, the way the data set Dn is partitioned is independent of the data set itself.
The centered forest is designed as follows:
1) Fix k∈N.
2) At each node of each individual tree choose a coordinate uniformly from {1,2,..d}.
3) Split the node at the midpoint of the interval of the selected coordinate.
Repeat steps 2) and 3) k times. At the end, we have 2k leaves, or cells. A toy example of this iterative process for k=1,2 is in Figures 1 and 2. Our estimation at a point x is achieved by averaging the Yi corresponding to the Xi in the cell containing x.
Scornet in [26] introduced the corresponding kernel-based centered random forest providing explicitly the proximity kernel function.
Proposition 3. A centered random forest kernel with k∈N parameter has the following multinomial expression [26, Proposition 5],
KCenk(x,z)=∑∑dj=1kj=kk!k1!...kd!(1d)kd∏j=11⌈2kjxj⌉=⌈2kjzj⌉, |
where KCenk is the kernel of the corresponding centered random forest.
Uniform random forest was introduced by Biau et al. [7] and is another toy model of Breinman's random forest as a centered random forest. The algorithm forms a partition in [0,1]d as follows:
1) Fix k∈N.
2) At each node of each individual tree choose a coordinate uniformly from {1,2,..d}.
3) The splitting is performed uniformly on the side of the cell of the selected coordinate.
Repeat steps 2) and 3) k times. At the end, we have 2k leaves. Our final estimation at a point x is achieved by averaging the Yi corresponding to the Xi in the cell x.
Again Scornet in [26, Proposition 6] proved the corresponding kernel-based uniform random forest.
Proposition 4. The corresponding proximity kernel for the uniform KeRF for parameter k∈N and x∈[0,1]d has the following form:
KUnk(0,x)=∑∑dj=1kj=kk!k1!...kd!(1d)kd∏m=1(1−xmkm−1∑j=0(−lnxm)jj!) |
with the convention that
−1∑j=0(−lnxm)jj!=0 |
and by continuity we can extend the kernel also for zero components of the vector.
Unfortunately, it is very hard to obtain a general formula for KUn(x,y) but we consider instead a translation invariant KeRF uniform forest:
Un∞,n(x)=∑ni=1YiKUnk(0,|Xi−x|)∑ni=1KUnk(0,|Xi−x|). |
In this section, after providing some measure concentration type results [9], we improve the rate of consistency of the centered KeRF algorithm. The following lemmata will provide inequalities to derive upper bounds for averages of iid random variables. Lacking a reference, for completeness, we provide detailed proofs of these lemmata. Moreover, we assume for this article that all random variables are real-valued and ||X||Lp:=(E|X|p)1p and ||X||∞:=inf{t:P(|X|≤t)=1}.
Lemma 1. Let X1,...,Xn be a sequence of real independent and identically distributed random variables with E(Xi)=0. Assuming also that there is a uniform bound for the L1-norm and the supremum norm i.e., E(|Xi|)≤C, ||Xi||∞≤CM for every i=1,...,n. Then for every t∈(0,1)
P({|∑ni=1Xi|n≥t})≤2e−˜CCt2nM |
for some positive constant ˜CC that depends only on C.
Proof. ∀x∈[0,1] one has that ex≤1+x+x2. By using the hypothesis for every λ≤1CM,
eλXi≤1+λXi+(λXi)2⇒EeλXi≤1+λ2E(Xi)2≤1+λ2||Xi||1||Xi||∞≤1+λ2C2M≤eλ2C2M. |
By the independence of the random variables Xi,
Ee∑ni=1λXi=n∏i=1EeλXi≤n∏i=1eλ2C2M=enλ2C2M. |
Therefore, by Markov inequality
P({∑ni=1Xin≥t})≤e−λtnEe∑ni=1λXi≤e−λtnenλ2C2M=enλ2C2M−λtn. |
Finally if C≥14 we choose, λ=t2C2M, otherwise for λ=t16CM
P({∑ni=1Xin≥t})≤e−˜CCt2nM. |
By replacing Xi with −Xi we conclude the proof.
Lemma 2. Let X1,...,Xn be a non-negative sequence of independent and identically distributed random variables with E(Xi)≤2, ||Xi||∞≤M for every i=1,...,n. Let also a sequence of independent random variables ϵi following normal distribution with zero mean and finite variance σ2, for every i=1,...,n. We assume also that ϵi are independent from Xi for every i=1,...,n.
Then for every t∈(0,1),
P(1nn∑i=1|ϵiXi|≥t)≤2exp(−Ct2nM) |
with the positive constant C depending only on σ.
Proof.
P(1nn∑i=1ϵiXi≥t)=P(exp(λnn∑i=1ϵiXi≥exp(λt))(for a positive λ)≤exp(−λt)Eexp(λnn∑i=1ϵiXi)(by Chebyshev's inequality)=exp(−λt)n∏i=1Eexp(λnϵiXi)(by independence)=exp(−λt)n∏i=1(1+∞∑k=2λkEXkiEϵkinkk!)≤exp(−λt)n∏i=1(1+2M∞∑k=2λkMkEϵkinkk!)=exp(−λt)n∏i=1(1+2M(Eexp(λMnϵi)−1))≤exp(−λt)n∏i=1(1+2M(exp(λ2σ2M2n2)−1))=exp(−λt)exp(n∑i=1(log(1+2M(exp(λ2σ2M2n2)−1))))≤exp(−λt)exp(n∑i=12M(exp(λ2σ2M2n2)−1))≤exp(−λt)exp(2nM(exp(λ2σ2M2n2)−1))≤exp(−λt)exp(2nM(2λ2σ2M2n2))(forλ≤nσM)=exp(−λt+4Mnλ2σ2). |
Finally, we select λ=tn8Mσ2, when σ≥18 and λ=tnMσ, when σ≤18
P(1nn∑i=1ϵiXi≥t)≤exp(−Ct2nM). |
Replacing Xi with −Xi we conclude the proof.
Theorem 3.1. Y=m(X)+ϵ where ϵ is a zero mean Gaussian noise with finite variance independent of X. Assuming also that X is uniformly distributed in [0,1]d and m is a Lipschitz function. Then there exists exists a constant ˜C such that for every n>1, for every x∈[0,1]d
E(˜mCen∞,n(x)−m(x))2≤˜Cn−(11+dlog2)(logn). |
Proof. Following the notation in [26], let x∈[0,1]d, ‖m‖∞=supx∈[0,1]d|m(x)|, and by the construction of the algorithm
˜mCenn,∞(x)=∑ni=1YiKk(x,Xi)∑ni=1Kk(x,Xi). |
Let
An(x)=1nn∑i=1(YiKk(x,Xi)−E(YKk(x,X))E(Kk(x,X))), |
Bn(x)=1nn∑i=1(Kk(x,Xi)−E(Kk(x,X))E(Kk(x,X))), |
and
Mn(x)=E(YKk(x,X))E(Kk(x,X)). |
Hence, we can reformulate the estimator as
˜mCenn,∞(x)=Mn(x)+An(x)Bn(x)+1. |
Let t∈(0,12) and the event Ct(x) where {An(x),Bn(x)≤t}.
E(˜mccn,∞(x)−m(x))2=E(˜mccn,∞(x)−m(x))21Ct(x)+E(˜mccn,∞(x)−m(x))21Cct(x)≤E(˜mccn,∞(x)−m(x))21Cct(x)+c1(1−12d)2k+c2t2 |
where the last inequality was obtained in [26, p.1496]. Moreover, in [26],
E(˜mccn,∞(x)−m(x))21Cct(x)≤c3(logn)(P(Cct(x)))12. |
In order to find the rate of consistency we need a bound for the probability P(Cct(x)). Obviously,
P(Cct(x))≤P(|An(x)|>t)+P(|Bn(x)|>t). |
We will work separately to obtain an upper bound for both probabilities.
Proposition 5. Let
˜Xi=Kk(x,Xi)E(Kk(x,X))−1, |
a sequence of iid random variables. Then for any t∈(0,1),
P({|∑ni=1˜Xi|n≥t})=P(|Bn(x)|≥t)≤2e−˜C1t2n2k |
for some positive constant ˜C1.
Proof. It is easy to verify that E˜Xi=0, and
|˜Xi|=|Kk(x,Xi)E(Kk(x,X))−1|≤Kk(x,Xi)E(Kk(x,X))+1, |
hence, E|˜Xi|≤2.
Finally,
||˜Xi||∞=sup{|˜Xi|}=sup{|Kk(x,Xi)E(Kk(x,X))−1|}≤1E(Kk(x,X))supKk(x,Xi)+1≤2k+1≤2k+1. |
By Lemma 1,
P({|n∑i=1˜Xi|n≥t})=P(|Bn(x)|≥t)≤2e−˜C1t2n2k. |
We need a bound for P(|An(x)|>t) where,
An(x)=1nn∑i=1(YiKk(x,Xi)−E(YKk(x,X))E(Kk(x,X))). |
Proposition 6. Let
~Zi=YiKk(x,Xi)−E(YKk(x,X))E(Kk(x,X)) |
for i=1,...,n, then for every t∈(0,1),
P({|∑ni=1˜Zi|n≥t})=P(|An(x)|≥t)≤4e−Ct2n2k, |
for some constant C depending only on σ,‖m‖∞.
Proof.
An(x)=1nn∑i=1(YiKk(x,Xi)−E(YKk(x,X))E(Kk(x,X)))=1nn∑i=1(m(Xi)Kk(x,Xi)−E(m(X)Kk(x,X))E(Kk(x,X)))+1nn∑i=1(ϵiKk(x,Xi)−E(ϵKk(x,X))E(Kk(x,X)))=1nn∑i=1(m(Xi)Kk(x,Xi)−E(m(X)Kk(x,X))E(Kk(x,X)))+1nn∑i=1(ϵiKk(x,Xi)E(Kk(x,X))). |
Therefore,
P(|An(x)|≥t)≤P(|2nn∑i=1m(Xi)Kk(x,Xi)−E(m(X)Kk(x,X))E(Kk(x,X))|≥t)+P(|2nn∑i=1ϵiKk(x,Xi)E(Kk(x,X))|≥t). |
Let
Zi=2(m(Xi)Kk(x,Xi)−E(m(X)Kk(x,X)))E(Kk(x,X)), |
a sequence of iid random variables. It is easy to verify that ~Zi are centered and
|˜Zi|=|m(Xi)Kk(x,Xi)−E(m(X)Kk(x,X))E(Kk(x,X))|≤2||m||∞Kk(x,Xi)+E(Kk(x,X))E(Kk(x,X)). |
Hence, E|Zi|≤4||m||∞. Finally,
||Zi||∞=sup{|Zi|}=2sup{|m(Xi)Kk(x,Xi)−E(m(X)Kk(x,X))E(Kk(x,X))|}≤2||m||∞(2k+1)≤4||m||∞2k. |
By Lemma 1,
P(|1nn∑i=1m(Xi)Kk(x,Xi)−E(m(X)Kk(x,X))E(Kk(x,X))|≥t)≤2e−Cnt22k. |
Furthermore, let
˜Wi=2ϵiKk(x,Xi)E(Kk(x,X)) |
for i=1,...,n, a sequence of independent and identically distributed random variables. We can verify that for every for i=1,...,n:
E(2Kk(x,Xi)E(Kk(x,X)))≤2. |
Finally,
sup{|2Kk(x,Xi)E(Kk(x,X))|}≤2E(Kk(x,X))sup{Kk(x,Xi)}≤2k+1. |
By Lemma 2 it is clear,
P(|2nn∑i=1ϵiKk(x,Xi)E(Kk(x,X))|≥t)≤2e−C2nt22k. |
We conclude the proposition by observing
P(|An(x)|≥t)≤4e−min{C2,C}nt22k. |
Finally, let us compute the rate of consistency of the algorithm-centered KeRF. By Propositions 5 and 6, one has that
(P(Cct(x)))12≤(P(|An(x)|>t)+P(|Bn(x)|>t))12≤c3e−c4nt22k, |
for some constants c3,c4 independent of k and n.
Thus,
E(˜m∞,n−m(x))2≤c1(1−12d)2k+c2t2+c3logne−c4t2n2k. |
We compute the minimum of the right-hand side of the inequality for t∈(0,1),
2c2t−2tc4lognc3n2ke−c4t2n2k=0⇒e−c4t2n2k=c2c3c42knlognandt2=1c42knlog(c2c3c4nlogn2k). |
Hence, the inequality becomes
E(˜m∞,n−m(x))2≤c1(1−12d)2k+c21c42knlog(c2c3c4nlogn2k)+c3lognc2c3c42knlogn=c1(1−12d)2k+c21c42knlog(c2c3c4nlogn2kec2c4). |
For every ϵn∈(0,2] it holds, logx≤1ϵnxϵn. Then one has that
E(˜m∞,n−m(x))2≤c1(1−12d)2k+c2(ec2c4c2c3c4)nc4ϵn(2kn(logn)ϵn1−ϵn)1−ϵn. |
We pick
k=c(d)log2n(logn)ϵn1−ϵn, |
thus,
c2(ec2c4c2c3c4)nc4ϵn(2kn(logn)ϵn1−ϵn)1−ϵn≤c′ϵnn(c(d)−1)(1−ϵn)lognϵn(1−c(d)), |
for a constant c′ independent of n and,
c1(1−12d)2k=c1(1−12d)2(c(d)log2n(logn)ϵn1−ϵn)=c122c(d)log2(1−12d)log2n(logn)ϵn1−ϵn=c1n2c(d)log2(1−12d)1(logn)c(d)2ϵn1−ϵnlog2(1−12d). |
Therefore,
c(d)=ϵn−12log2(1−12d)−(1−ϵn). |
Finally,
c1n2c(d)log2(1−12d)1(logn)c(d)2ϵn1−ϵnlog2(1−12d)=c1n2(ϵn−1)2log2(1−12d)−(1−ϵn)log2(1−12d)×1(logn)2(ϵn−1)2log2(1−12d)−(1−ϵn)2ϵn1−ϵnlog2(1−12d)=c1n2(ϵn−1)2(−12dlog2)−(1−ϵn)(−12dlog2)×1(logn)2(ϵn−1)2log2(1−12d)−(1−ϵn)2ϵn1−ϵnlog2(1−12d)=c1n−(1−ϵn1+(1−ϵn)dlog2)(logn)(ϵn1+dlog2(1−ϵn)) |
and, for the second term, with the same arguments
˜cϵnn(c(d)−1)(1−ϵn)lognϵn(1−c(d))=˜cϵnn−(1−ϵn1+(1−ϵn)dlog2)(logn)(ϵn1+dlog2(1−ϵn)) |
for a constant ˜c independent of ϵn, hence,
E(˜mCen∞,n(x)−m(x))2≤Cϵnn−(1−ϵn1+(1−ϵn)dlog2)(logn)(ϵn1+dlog2(1−ϵn)), |
and consequently,
Cϵnn−(1−ϵn1+(1−ϵn)dlog2)(logn)(ϵn1+dlog2(1−ϵn))=Cϵnn−(11+dlog2)(logn)(ϵn1+dlog2(1−ϵn))×n(ϵn(1+dlog2)(1+(1−ϵn))dlog2)≤Cϵnn−(11+dlog2)(logn)(ϵndlog2(1−ϵn))×(logn)lognloglogn(ϵn(dlog2)2(1−ϵn)). |
Finally we finish the proof by selecting ϵn=1logn, and
E(˜mCen∞,n(x)−m(x))2≤˜Cn−(11+dlog2)(logn). |
Theorem 4. Y=m(X)+ϵ where ϵ is a zero mean Gaussian noise with finite variance independent of X. Assuming also that X is uniformly distributed in [0,1]d and m is a Lipschitz function. Providing k→∞, there exists a constant ˜C such that for every n>1, for every x∈[0,1]d
E(˜mUn∞,n(x)−m(x))2≤˜Cn−(11+32dlog2)(logn). |
Proof. By arguing with the same reasoning as the proof of the centered random forest we can verify that
(P(Cct(x)))12≤(P(|An(x)|>t)+P(|Bn(x)|>t))12≤c3e−c4nt22k |
for some constants c3,c4 independent of k and n. The rate of consistency for the Uniform KeRF is the minimum of the right hand in the inequality in terms of n
E(˜mUn∞,n−m(x))2≤c1(1−13d)2k+c2t2+c3logne−c4t2n2k. |
We compute the minimum of the right-hand side of the inequality for t∈(0,1),
2c2t−2tc4lognc3n2ke−c4t2n2k=0⇒e−c4t2n2k=c2c3c42knlognandt2=1c42knlog(c2c3c4nlogn2k). |
Hence, the inequality becomes
E(˜mUn∞,n(x)−m(x))2≤c1(1−13d)2k+c21c42knlog(c2c3c4nlogn2k)+c3lognc2c3c42knlogn=c1(1−13d)2k+c21c42knlog(c2c3c4nlogn2kec2c4). |
For every ϵn∈(0,2] it holds, logx≤1ϵnxϵn. Then one has that,
E(˜mUn∞,n−m(x))2≤c1(1−13d)2k+c2(ec2c4c2c3c4)nc4ϵn(2kn(logn)ϵn1−ϵn)1−ϵn. |
We pick
k=c(d)log2n(logn)ϵn1−ϵn. |
Therefore,
c2(ec2c4c2c3c4)nc4ϵn(2kn(logn)ϵn1−ϵn)1−ϵn≤c′ϵnn(c(d)−1)(1−ϵn)lognϵn(1−c(d)), |
for a constant c′ independent of n and,
c1(1−13d)2k=c1(1−13d)2c(d)log2n(logn)ϵn1−ϵn=c122c(d)log2(1−13d)log2n(logn)ϵn1−ϵn=c1n2c(d)log2(1−13d)1(logn)c(d)2ϵn1−ϵnlog2(1−13d). |
Therefore,
c(d)=ϵn−12log2(1−13d)−(1−ϵn). |
Finally,
c1n2c(d)log2(1−13d)1(logn)c(d)2ϵn1−ϵnlog2(1−13d)=c1n2(ϵn−1)2log2(1−13d)−(1−ϵn)log2(1−13d)×1(logn)2(ϵn−1)2log2(1−13d)−(1−ϵn)2ϵn1−ϵnlog2(1−13d)=c1n2(ϵn−1)2(−13dlog2)−(1−ϵn)(−13dlog2)×1(logn)2(ϵn−1)2(−13dlog2)−(1−ϵn)2ϵn1−ϵn(−13dlog2)=n−(2(1−ϵn)1+(1−ϵn)dlog2)1(logn)2ϵn−2+3dlog2(ϵn−1)=n−(2(1−ϵn)2+(1−ϵn)3dlog2)(logn)(2ϵn2+3dlog2(1−ϵn)) |
and, for the second term, with the same arguments
˜cϵnn(c(d)−1)(1−ϵn)lognϵn(1−c(d))=˜cϵnn−(1−ϵn1+(1−ϵn)32dlog2)(logn)(ϵn1+d32log2(1−ϵn)), |
for a constant ˜c independent of ϵn hence,
E(˜mUn∞,n(x)−m(x))2≤Cϵnn−(1−ϵn1+(1−ϵn)32dlog2)(logn)(ϵn1+32dlog2(1−ϵn)), |
and consequently,
Cϵnn−(1−ϵn1+(1−ϵn)32dlog2)(logn)(ϵn1+32dlog2(1−ϵn))=Cϵnn−(11+32dlog2)(logn)(ϵn1+32dlog2(1−ϵn))×n(ϵn(1+32dlog2)(1+(1−ϵn))dlog2)≤Cϵnn−(11+32dlog2)(logn)(ϵn32dlog2(1−ϵn))×(logn)lognloglogn(ϵn(32dlog2)2(1−ϵn)). |
Finally we finish the proof by selecting ϵn=1logn, and
E(˜mUn∞,n(x)−m(x))2≤˜Cn−(11+32dlog2)(logn). |
In the following section, we summarize the rates of convergence for the centered KeRF and the uniform KeRF, and we compare them with the minimax rate of convergence over the class of the Lipschitz functions [29]. In addition, we provide numerical simulations where we compare the L2−error for different choices of the tree depth. All experiments performed with the software Python (https://www.python.org/), mainly with the numpy library, where random sets uniformly distributed in [0,1]d have been created, for various examples for the dimension d and the function Y. For every experiment the set is divided into a training set (80 %) and a testing set (20 %); then the L2−error (∑Xi∈test set(˜m(Xi)−Yi)2) and the standard deviation of the error is computed.
For the centered KeRF we compare three different values of tree depth, which from theory provide different convergence rates. First, the choice of k in [26, Theorem 1] that provides the previous convergence rate; second, the selection of k as it was delivered from the Theorem 1.1; and, third, the case where the estimator, in probability, interpolates the data set, but the known convergence rate is slow [3, Theorem 4.1], O(logn−d−56) for the dimension of the feature space d>5.
For the uniform KeRF, we compare the two values of tree depth as they were derived from [26] and Theorem 2 nevertheless, it is not known if the uniform-KeRF algorithm converges when our estimator function interpolates the data set. Of course, in practice, since real data might violate the assumptions of the theorems, one should try cross-validation for tuning the parameters of the algorithms.
Comparing the rates of consistency for centered KeRF and the depth of the corresponding trees:
● Scornet in [26, Theorem 1] rate of convergence:
n−(1dlog2+3)(logn)2,and k=⌈1log2+3dlognlogn2⌉. |
● New rate of convergence:
n−(11+dlog2)(logn),andk=⌈1logn−12log2(1−12d)−(1−1logn)log2n(logn)1logn1−1logn⌉. |
● Minimax [29] rate of consistency over the class of Lipschitz functions: n−2d+2 functions.
Thus, theoretically, the improved rate of consistency is achieved when trees grow at a deeper level compared with the parameter selection in [26, Theorem 1].
As it is evident from Figure 3, the improvement in the convergence rate is more significant in the low dimensional feature space scenarios. The constant ˜C=˜C(d) of Theorem 1 depends on the dimension d of the space. The convergence rates in the literature do not try to have a good estimate for ˜C, and they are significant for fixed values of d only.
Finally, we note that Klusowski's rate of convergence in [20], O((nlogd−12n)−(1+δdlog2+1)), where δ is a positive constant that depends on the dimension of the feature space d and converges to zero as d approaches infinity, is sharp and better than the one in Theorem 1 O(n−(11+dlog2)(logn)) for small values of d. For large values of n and d the two estimates are essentially the same, but for now, we do not know if in general, the rate of convergence of the centered KeRF is not improvable.
Comparing the rates of convergence for uniform KeRF and the depth of the corresponding trees:
● Scornet in [26, Theorem 2] rate of convergence:
n−(23dlog2+6)(logn)2,and k=⌈1log2+3dlognlogn2⌉. |
● New rate of convergence:
n−(23dlog2+2)(logn),and k=⌈1logn−12log2(1−13d)−(1−1logn)log2n(logn)1logn1−1logn⌉. |
● Minimax [29] rate of convergence for the consistency over the class of Lipschitz functions: n−2d+2 functions.
Thus, theoretically, as in the case of centered random KeRF, the improved rate of consistency is achieved when trees grow at a deeper level compared with the parameter selection in [26, Theorem 2].
The same considerations on the dependence of the constant ˜C on d we made for the centered KeRF hold in the uniform KeRF case, as it is evident in Figure 4. Moreover, as of now, it is still unknown to us if the convergence rate of the uniform KeRF can be improved.
Finally, numerical simulations of the L2−error of the centered KeRF-approximation for three different values of k in Figure 5 are reported with the standard deviation of the errors in Figure 6. In Appendix, more simulations and plots for different target functions and for both algorithms are illustrated.
In this paper, we have obtained improved rates of convergence for two kernel-based random forests, the centered KeRF and the uniform KeRF. In addition, we provided numerical simulations for both algorithms concerning the parameters of the methods. Finally, we explored the reproducing kernel Hilbert space related to the centered KeRF providing bounds for the dimension of the aforementioned space.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
The first author was funded by the PhD scholarship PON-DOT1303154-3, the second author was partially funded by project GNAMPA 2022 "HOLOMORPHIC FUNCTIONS IN ONE AND SEVERAL VARIABLES" and by Horizon H2020-MSCA-RISE-2017 project 777822 "GHAIA".
All authors declare no conflicts of interest in this paper.
In this last section, we provide more experiments of low dimensional regression examples with additive noise, regarding the centered and the uniform KeRF. In particular, we calculate and compare the L2−errors and the standard deviations against different sample sizes for different values of the parameter k of the estimator. Moreover, in the following subsection, we study the corresponding RKHS produced by the kernel
KCenk(x,z)=∑∑dj=1kj=kk!k1!...kd!(1d)kd∏j=11⌈2kjxj⌉=⌈2kjzj⌉, |
defined in the abelian group Zkd2. To conclude we recall some notation for finite abelian groups, necessary to define the aforementioned RKHS and estimate its dimension.
We provide here an alternative description of the centered random forest algorithm, where the dyadic tiling of the hypercube motivates us to define the kernel in the abelian group Zkd2. First, we define a Random Tree Θ. Start with a random variable Θ0, uniformly distributed in {1,…,d}, and split
I:=[0,1]d=IΘ00∪IΘ01, |
where
IΘ0l=[0,1]×⋯×[l/2,(l+1)/2]×…[0,1], |
where for l=0,1 the splitting was performed in the Θ0-th coordinate. Choose then random variables Θ1,l (l=0,1), distributed as Θ0, and split each
IΘ0l=IΘ0,Θ1l,0∪IΘ0,Θ1l,1, |
where, as before, the splitting is performed at the Θ1-th coordinate, and IΘ0,Θ1l,0 is the lower half of IΘ0l. Iterate the same procedure k times. In order to do that, we need random variables Θj;η0,…,ηj, with ηl∈{1,…,d} and j=1,…,k. We assume that all such random variables are independent. It is useful think of Θ={Θj;η0,…,ηj} as indexed by a d-adic tree, and, in fact, we refer to Θ as a random tree in [0,1]d. We call cells, or leaves, each of the 2k rectangles into which [0,1]d is split at the end of the kth subdivision.
On Figure A1 we see the L2−error of the centered KeRF-approximation for a three dimensional feature space and on Figure A2 the standard deviation of the errors.
Figures A3 and A4 show the L2−error and the standard deviation for the uniform KeRF in a two dimensional feature space and Figures A5 and A6 present a three dimensional example respectively.
B.1. The Fourier analysis of the kernel
The kernel in the centered KeRF defines a RKHS HK structure on a set Γ having 2kd points [4]. In fact, Γ has a group structure, and Fourier analysis can be used. Much research is done in RKHS theory, and in this section (see [1]), we study the structure of HK in itself. A priori, HK might have any dimension less or equal to #Γ. We show in fact that its dimension is much lower than that, a fact which is somehow surprising, and we believe it is interesting in itself. Furthermore, we prove that there are no nonconstant multipliers in the space HK. For completeness we provide definitions and notation on Fourier analysis on Abelian groups in Appendix C.
We identify every real number x∈[0,1] with its binary expression x=0.x1x2x3... with xi∈{0,1} for i∈N. Here we consider the group
G=Zkd2∋x=(xji)i=1,…,kj=1,…,d=(x1|x2|…|xd)=(x1x2…xk). | (B.1) |
The kernel K:Γ×Γ→C corresponding to the kernel Kcenk is,
K(a,b)=∑l∈Nd|l|=k1dk(kl)d∏j=1χ(aj1=bj1,…,ajkj=bjkj)=∑l∈Nd|l|=k1dk(kl)d∏j=1kj∏i=1χ(aji=bji)=φ(a−b), | (B.2) |
where (kl) is the multidimensional binomial coefficient, χE is the characteristic function of the set E, and a−b is the difference in the group Zkd2. Incidentally, (B.2) shows that the kernel K can be viewed as a convolution kernel on the appropriate group structure. For the last equality, we consider the binary representation of a number in (0,1] whose digits are not definitely vanishing. The fact that 0 does not have such representation is irrelevant since the probability that one of the coordinates of the data vanishes is zero.
We now compute the anti-Fourier transform μ=ˇφ. We know that ♯(Γ)=2kd, and that the characters of Zkd2 have the form
γa(x), x∈Zkd2, a∈^Zkd2, a⋅x=a11x11+⋯+adkxdk. | (B.3) |
Hence,
2kdpnμ(x)=dk∑a∈Γφ(a)γa(x)=dk∑a∈Γφ(a)(−1)a⋅x=∑a∈Γ∑l∈Nd|l|=k(kl)d∏j=1kj∏i=1χ(aji=0)(−1)a⋅x=∑a∈Γ∑l∈Nd|l|=k(kl)d∏j=1(−1)˜akjj⋅˜xkjjkj∏i=1[χ(aji=0)(−1)ajixji] where ˜akjj=(ajkj+1…ajn) is the lower, (k−kj)−dimensional part of the column aj,=∑l∈Nd|l|=k(kl)d∏j=1(−1)˜akjj⋅˜xkjjkj∏i=1χ(aji=0)=∑l∈Nd|l|=k(kl)∑a∈Γa11=…a1k1=a21=⋯=adkd=0d∏j=1(−1)˜akjj⋅˜xkjj. | (B.4) |
The last expression vanishes exactly when for all l, there are some 1≤j≤d, and some kj+1≤i≤k such that xji=1, due to the presence of the factor (−1)ajixji=(−1)aji which takes values ±1 on summands having, two by two, the same absolute values.
If, on the contrary, there is l such that for all 1≤j≤d, and kj+1≤i≤k, we have that xji=0, then μ(x)≠0. Since |l|=k and there are kd binary digits involved in the expression of x, the latter occurs exactly when the binary matrix representing x has a large lower region in which all entries are 0. More precisely, the number of vanishing entries must be at least
(k−k1)+⋯+(k−kp)=(d−1)k. | (B.5) |
The number N(d,k) of such matrices is the dimension of HK, the Hilbert space having K as a reproducing kernel.
Next, we prove some estimates for the dimension of the RKHS.
We summarize the main items in the following statement.
Theorem B.1. Let K:Γ×Γ→C be the kernel in (B.2), K(a,b)=φ(a−b), and let
EK=supp(ˇφ)∈K. | (B.6) |
Then,
(i) As a linear space, HK=LEK, where
EK={x=(x1|…|xd): xji=0forkj+1≤i≤k, for some l=(k1,…,kd)∈Nd with k1+⋯+kd=k}; | (B.7) |
(ii) For x∈EK,
ˇφ(x)=12kdk∑l∈Nd, |l|=kxji=0forkj+1≤i≤k(kl). | (B.8) |
To obtain the expression on (B.8), we used the fact that
♯{a: a11=…a1k1=a21=⋯=apkp=0}=2(d−1)k. |
B.2. Some properties of HK
Linear relations. Among all functions ψ:Γ→C, those belonging to HK (i.e., those belonging to LEK) are characterized by a set of linear equations,
0=2nppnμ(x)=∑k∈Np, |k|=nxji=0 for kj+1≤i≤n(nk) for x∉EK. | (B.9) |
Multipliers. A multiplier of HK is a function m:Γ→C such that mψ∈HK whenever ψ∈HK.
Proposition B.1. The space HK has no nonconstant multiplier.
In particular, it does not enjoy the complete Pick property, which has been subject of intensive research for the past twenty-five years [1].
Proof. The space HK coincides as a linear space with LEK. Let ΛE=ˇLE, which is spanned by {δx: x∈E}. Observe that, since 0=(0|…|0)∈EK, the constant functions belong to HK, hence, any multiplier m of HK lies in HK; m=m⋅1∈HK.
Suppose m is not constant. Then, ˇm(a)≠0 for some a∈EK, a≠0. Let a be an element in EK such that ˇm(a)≠0. Since HK∋m⋅^δx for all x in EK, and m⋅^δx=^ˇm∗δx, we have that the support of ˇm∗δx lies in HK. Now,
ˇm∗δx(y)=ˇm(x−y), |
hence, we have that, for any x in EK, y=x−a lies in EK as well. This forces a=0, hence m to be constant.
B.3. Bounds for dimension and generating functions
Theorem B.2. For fixed d≥1, we have the estimates:
dim(HK)∼2k−d+1kd−1(d−1)!, hence dim(HK)2kd∼kd−12d−1(d−1)!2k(d−1). | (B.10) |
Proof. Let l1,l2,...,ld such that
0≤l1+l2+...+ld=m≤k |
where m is a parameter and let also λ=|j:lj≥1|=|{stop 1-digits}|=|{back-entries}| where |⋅| denotes the size of the sets, and of course we have that 0≤m≤k and 0≤λ≤d,m. Goal to obtain a bound for
N(k,d)=k∑m=0d∧m∑λ=02m−λ(dλ)|{(l1,l2,...,ld):l1+l2+...+ld=m| |
and
|{j:lj=1}=1|}. |
Let A(m,λ) the m-th coefficient of x in the polynomial
(x+x2+...xm+...)λ=(x(1+x+x2+...)λ=(xλ(1+x+...)λ)=xλ(1−x)λ |
and 2mA(m,λ) is the m-th coefficient of x, for the fraction (2x)λ(1−2x)λ, therefore 2m−λA(m,λ) is the m-th coefficient of xλ(1−2x)λ. Let's see the first sum, B(m,d) is the m-th coefficient of x:
d∧m∑λ=0(dλ)2m−λA(m,λ)=d∧m∑λ=0(dλ)(x1−2x)λ=(1+x1−2x)d=(1−x1−2x)d. |
Again by the same combinatoric argument we are looking the k-th coefficient of the function
f(x)=11−x(1−x1−2x)d. |
Back to the estimate, Let ak the coefficient of the power series centered at z=0.
max|z|=r|f(z)|=max|z|=r|(1−z)d−1(1−2z)d|=max|z|=r|11−z(1−z1−2z)d|≤2maxθ∈(−π,π)|1−reiθ1−2reiθ|d. |
After some calculations since r is fixed one has that the maximum is achieved for θ=0. So
max|z|=r|f(z)|≤2(1−r1−2r)d. |
Our estimation finally becomes:
|ak|≤2(1−r1−2r)drk=2(1−r)drk(1−2r)k=kd2k(12+12k)d,(since, r=12(1−1k))=kd(1+1k)d2k−d. |
Thus an estimate for the dimension of HK is
|ak|2kd≲kd(1+1k)d2k(1−d)2d. |
Another estimate about the dimension of HK. For f(z)=∑∞n=0anzn, we have
|an|≤max{|f(reit)|: |t|≤π}rn. |
Consider the function
f(z)=(1−z)d−1(1−2z)d |
in |z|<1/2 and let r=1−1/k2. Then,
|ak|≤(3/2)d−1(1/k)d(1−1/k)k2−k≤(3/2)d−12kekd. |
Thus,
|ak|2kd≲kd(3/2)d2k(d−1). |
Recursively working out the generating function one gets the estimates in (B.10).
Following the notation of [24], we recall here the basic notions of Fourier theory for a finite, abelian group G, which we employed above. Here, G is endowed with its counting measure. The dual group Γ=ˆG of G is populated by labels a for homomorphisms γa:G→T={eit: t∈R}. Given a function f:G→C, its Fourier transform ˆf:Γ→C is defined as
ˆf(a)=∑x∈Gf(x)¯γa(x). | (C.1) |
We make Γ into a (finite), additive group by setting
γa+b=γa⋅γb, and γx(a):=γa(x). |
It turns out that they have the same number of elements, ♯(G)=♯(Γ). Some basic properties are:
f(x)=1♯(Γ)∑a∈Γˆf(a)γa(x) (inverse Fourier transform) ∑x∈G|f(x)|2=1♯(Γ)∑a∈Γ|ˆf(a)|2 (Plancherel) ^f∗g=ˆf⋅ˆg, |
where
(f∗g)(x)=∑y∈Gf(x−y)g(y). | (C.2) |
We write
ˇφ(x)=♯(Γ)−1∑a∈Γφ(a)γa(x), so that ˆˇφ=φ. | (C.3) |
The unit element of convolution in G is δ0.
In the other direction, for φ,ψ:Γ→C we define
(φ∗ψ)(a)=1♯(Γ)∑b∈Γφ(a−b)ψ(b), | (C.4) |
and similarly to above, ^ˇφˇψ=φ∗ψ. The unit element on convolution in Γ is ♯(Γ)δ0.
A function φ on Γ is positive definite if
n∑a,b∈Γc(a)¯c(b)φ(b−a)≥0. |
Theorem C.1. [Bochner's theorem] A function φ:Γ→C is positive definite if and only if there exists μ:G→R+ such that φ=ˆμ.
The theorem holds in great generality, and its proof in the finite group case is elementary. We include it because it highlights the relationship between the measure μ on G and the positive definite function (the kernel) φ.
Proof. If
♯(Γ)−2∑a,b∈Γˆμ(b−a)c(a)¯c(b)=∑x∈G♯(Γ)−2∑a,b∈Γμ(x)¯γb−a(x)c(a)¯c(b)=∑x∈G♯(Γ)−2∑a,b∈Γμ(x)¯γb(x)¯c(b)γa(x)c(a)=∑x∈Gμ(x)|♯(Γ)−1∑a∈Γc(a)γa(x)|2=∑x∈Gμ(x)|ˇc(x)|2≥0. | (C.5) |
Only if, since for all b in Γ,
μ(x)♯(Γ)=∑a∈Γφ(a)γx(a)=∑a∈Γφ(a−b)γx(a−b)=∑a∈Γφ(a−b)γx(a)¯γx(b), | (C.6) |
we have
μ(x)♯(Γ)2=∑a,b∈Γφ(a−b)γx(a)¯γx(b)≥0, | (C.7) |
by the assumption.
We now come to reproducing kernels on Γ which are based on positive definite functions φ:Γ→R+. Set
K(a,b)=φ(a−b)=Kb(a), K:Γ×Γ→C, | (C.18) |
and set
HK=span{Kb: b∈Γ}∋∑b∈Γc(b)Kb, | (C.19) |
where HK is the Hilbert space having K as reproducing kernel. We wish to have a more precise understanding of it.
We start by expressing the norm of an element on HK is several equivalent ways,
‖∑b∈Γc(b)Kb‖2HK=∑a,b∈Γ¯c(a)c(b)⟨Kb,Ka⟩=∑a,b∈Γ¯c(a)c(b)K(a,b)=∑a,b∈Γ¯c(a)c(b)ˆμ(a−b)=∑a,b∈Γ¯c(a)c(b)∑x∈Gμ(x)γb−a(x)=∑x∈Gμ(x)∑a,b∈Γ¯c(a)c(b)γb(x)¯γa(x)=∑x∈Gμ(x)|∑b∈Γc(b)γb(x)|2=♯(Γ)2∑x∈Gμ(x)|ˇc(x)|2=♯(Γ)2∑x∈G|μ(x)1/2ˇc(x)|2. | (C.10) |
In other terms,
♯(Γ)−1∑b∈Γc(b)Kb↦ˇc | (C.11) |
is an isometry of HK onto L2(μ). This will become important later, when we verify that for our kernels supp(μ) is sparse in G. In fact, dim(HK)=♯(supp(μ)).
Corollary C.1. As a linear space, HK is determined by supp(μ):
ψ∈HK if and only if supp (ˇψ)⊆supp(μ). |
Let E⊆G. We denote
LE={Gψ→C: supp(ˇψ)⊆E}. | (C.12) |
Next, we look for the natural orthonormal system provided by the Fourier isometry (C.11). Fr x∈G, let ˇcx=μ(x)−1/2δx: {ˇcx: x∈E:=supp(μ)} is a orthonormal system for L2(μ), and so {ex: x∈E} is an orthonormal basis for HK, where
cx(b)=∑y∈Gμ(x)−1/2δx(y)¯γb(y)=μ(x)−1/2¯γb(x), | (C.13) |
and
ex(a)=♯(Γ)−1∑b∈Γcx(b)Kb(a)=μ(x)−1/2♯(Γ)∑b∈ΓKb(a)¯γb(x)=μ(x)−1/2♯(Γ)∑b∈Γφ(a−b)¯γb(x)=μ(x)−1/2♯(Γ)∑b∈Γφ(a−b)γa−b(x)¯γa(x)=μ(x)−1/2μ(x)¯γa(x)=μ(x)1/2¯γa(x). | (C.14) |
Let's verify that we obtain the reproducing kernel from the o.n.b. as expected,
∑x∈Γex(a)¯ex(b)=∑x∈Γμ(x)¯γx(a)γx(b)=∑x∈Γμ(x)¯γx(a−b)=ˆμ(a−b)=φ(a−b). | (C.15) |
Remark C.1. Since any finite, abelian group can be written as the direct product of cyclic groups,
G=L⨁l=1Zml, | (C.16) |
its dual Γ can be written in the same way, because ^Zm≡Zm. From the Fourier point of view, the only difference is that, if on G we consider the counting measure, then on Γ we consider normalized counting measure, as we did above.
[1] | T. S. Gardner, D. di Bernardo, D. Lorenz, et al., Inferring genetic networks and identifying compound mode of action via expression profiling, Science, 301 (2003), 102–105. |
[2] | A. F. Villaverde and J. R. Banga, Reverse engineering and identification in systems biology: strategies, perspectives and challenges, J. R. Soc. Interface, 11 (2014), 20130505. |
[3] | S. Prabakaran, J. Gunawardena and E. Sontag, Paradoxical results in perturbation-based signaling network reconstruction, Biophys. J., 106 (2014), 2720–2728. |
[4] | B. N. Kholodenko, A. Kiyatkin, F. J. Bruggeman, et al., Untangling the wires: a strategy to trace functional interactions in signaling and gene networks, Proc. Natl. Acad. Sci. U.S.A., 99 (2002), 12841–12846. |
[5] | E. D. Sontag, A technique for determining the signs of sensitivities of steady states in chemical reaction networks, IET Syst. Biol., 8 (2014), 251–267. |
[6] | A. Gupta and M. Khammash, An efficient and unbiased method for sensitivity analysis of stochastic reaction networks, J. R. S. Interface, 11 (2014), 20140979. |
[7] | P. Vera-Licona, A. Jarrah, L. D. Garcia-Puente, et al., An algebra-based method for inferring gene regulatory networks, BMC Syst. Biol., 8 (2014), 37. |
[8] | E. Feliu, M. Knudsen, L. N. Andersen, et al., An algebraic approach to signaling cascades with N layers, Bull. Math. Biol., 74 (2012), 45–72. |
[9] | R. Fritsche-Guenther, F. Witzel, A. Sieber, et al., Strong negative feedback from Erk to Raf confers robustness to MAPK signalling, Mol. Syst. Biol., 7 (2011), 489. |
[10] | T. Okada and A. Mochizuki, Sensitivity and network topology in chemical reaction systems, Phys. Rev. E., 96 (2017), 022322. |
[11] | T. Okada and A. Mochizuki, Law of Localization in Chemical Reaction Networks, Phys. Rev. Lett., 117 (2016), 048101. |
[12] | B. Brehm and B. Fiedler, Sensitivity of chemical reaction networks: A structural approach. 3. Regular multimolecular systems, Math. Methods Appl. Sciences, 41 (2018), 1344–1376. |
[13] | G. Shinar, U. Alon and M. Feinberg, Sensitivity and robustness in chemical reaction networks, SIAM J. Appl. Math., 69 (2009), 977–998. |
[14] | G. Shinar, A. Mayo, H. Ji, et al., Constraints on reciprocal flux sensitivities in biochemical reaction networks, Biophys. J., 100 (2011), 1383–1391. |
[15] | C. Conradi, E. Feliu, M. Mincheva, et al., Identifying parameter regions for multistationarity, PLoS Comput. Biol., 13 (2017), e1005751. |
[16] | S. Müller, E. Feliu, G. Regensburger, et al., Sign conditions for injectivity of generalized polynomial maps with applications to chemical reaction networks and real algebraic geometry, Found. Comput. Math., 16 (2016), 69–97. |
[17] | J. Gunawardena, Chemical reaction network theory for in-silico biologists, 2003. Available from: http://vcp.med.harvard.edu/papers/crnt.pdf. |
[18] | M. Feinberg, Lectures on chemical reaction networks, 1980. Available from: http://www.crnt.osu.edu/LecturesOnReactionNetworks. |
[19] | E. D. Sontag, Structure and stability of certain chemical networks and applications to the kinetic proofreading model of T-cell receptor signal transduction, Institute of Electrical and Electronics Engineers. Transactions on Automatic Control, 46 (2001), 1028–1047. |
[20] | C. Wiuf and E. Feliu, Power-law kinetics and determinant criteria for the preclusion of multistationarity in networks of interacting species, SIAM J. Appl. Dyn. Syst., 12 (2013), 1685–1721. |
[21] | A. Dickenstein, M. P. Millán, A. Shiu, et al., Multistationarity in Structured Reaction Networks, Bull. Math. Biol., 81 (2019), 1527–1581. |
[22] | N. Obatake, A. Shiu, X. Tang, et al., Oscillations and bistability in a model of ERK regulation, J. Math. Biol., (2019), 1–35. |
[23] | C. Conradi, M. Mincheva and A. Shiu, Emergence of oscillations in a mixed-mechanism phosphorylation system, Bull. Math. Biol., 81 (2019), 1829–1852. |
[24] | D. A. Fell, Metabolic control analysis: a survey of its theoretical and experimental development, Biochem. J., 286 (1992), 313–330. |
[25] | J. Gunawardena, Notes on Metabolic Control Analysis, 2002. Available from: vcp.med. harvard.edu/papers/mca.pdf. |
[26] | M. P. Millán and A. Dickenstein, The structure of MESSI biological systems, SIAM J. Appl. Dyn. Syst., 17 (2018), 1650–1682. |
[27] | V. B. Kothamachu, E. Feliu, L. Cardelli, et al., Unlimited multistability and Boolean logic in microbial signalling, J. R. Soc. Interface, 12 (2015), 20150234. |
[28] | A. Torres and E. Feliu, Symbolic proof of bistability in reaction networks, Submitted. |
[29] | A. Baudier, F. Fages and S. Soliman, Graphical requirements for multistationarity in reaction networks and their verification in BioModels, J. Theor. Biol., 459 (2018), 79–89. |
[30] | E. Feliu and C. Wiuf, A computational method to preclude multistationarity in networks of interacting species, Bioinformatics, 29 (2013), 2327–2334. |
[31] | C. Conradi and M. Mincheva, Catalytic constants enable the emergence of bistability in dual phosphorylation, J. R. Soc. Interface, 11 (2014), 20140158. |
[32] | L. Wang and E. D. Sontag, On the number of steady states in a multiple futile cycle, J. Math. Biol., 57 (2008), 29–52. |