A recent line of works established the approximation complexity estimation of deep ReLU networks for the bandlimited functions in the MSE (mean square error) sense. In this note, we significantly enhance this result, that is, we estimate the approximation complexity in the L∞ sense. The key to the proof is to establish a frequency decomposition lemma which may be of independent interest.
Citation: Liang Chen, Wenjun Liu. On the uniform approximation estimation of deep ReLU networks via frequency decomposition[J]. AIMS Mathematics, 2022, 7(10): 19018-19025. doi: 10.3934/math.20221045
[1] | Mohamed Kharrat, Moez Krichen, Loay Alkhalifa, Karim Gasmi . Neural networks-based adaptive command filter control for nonlinear systems with unknown backlash-like hysteresis and its application to single link robot manipulator. AIMS Mathematics, 2024, 9(1): 959-973. doi: 10.3934/math.2024048 |
[2] | Bing Jiang . Rate of approximaton by some neural network operators. AIMS Mathematics, 2024, 9(11): 31679-31695. doi: 10.3934/math.20241523 |
[3] | Maha M. Althobaiti, José Escorcia-Gutierrez . Weighted salp swarm algorithm with deep learning-powered cyber-threat detection for robust network security. AIMS Mathematics, 2024, 9(7): 17676-17695. doi: 10.3934/math.2024859 |
[4] | Jeong-Kweon Seo, Byeong-Chun Shin . Reduced-order modeling using the frequency-domain method for parabolic partial differential equations. AIMS Mathematics, 2023, 8(7): 15255-15268. doi: 10.3934/math.2023779 |
[5] | Yan Wang, Ying Cao, Ziling Heng, Weiqiong Wang . Linear complexity and 2-adic complexity of binary interleaved sequences with optimal autocorrelation magnitude. AIMS Mathematics, 2022, 7(8): 13790-13802. doi: 10.3934/math.2022760 |
[6] | Martin Do Pham . Fractal approximation of chaos game representations using recurrent iterated function systems. AIMS Mathematics, 2019, 5(6): 1824-1840. doi: 10.3934/math.2019.6.1824 |
[7] | Sultanah M. Alshammari, Nofe A. Alganmi, Mohammed H. Ba-Aoum, Sami Saeed Binyamin, Abdullah AL-Malaise AL-Ghamdi, Mahmoud Ragab . Hybrid arithmetic optimization algorithm with deep learning model for secure Unmanned Aerial Vehicle networks. AIMS Mathematics, 2024, 9(3): 7131-7151. doi: 10.3934/math.2024348 |
[8] | Mahmoud M. Abdelwahab, Khamis A. Al-Karawi, H. E. Semary . Integrating gene selection and deep learning for enhanced Autisms' disease prediction: a comparative study using microarray data. AIMS Mathematics, 2024, 9(7): 17827-17846. doi: 10.3934/math.2024867 |
[9] | Gaosheng Liu, Yang Bai . Statistical inference in functional semiparametric spatial autoregressive model. AIMS Mathematics, 2021, 6(10): 10890-10906. doi: 10.3934/math.2021633 |
[10] | Kairui Chen, Yongping Du, Shuyan Xia . Adaptive state observer event-triggered consensus control for multi-agent systems with actuator failures. AIMS Mathematics, 2024, 9(9): 25752-25775. doi: 10.3934/math.20241258 |
A recent line of works established the approximation complexity estimation of deep ReLU networks for the bandlimited functions in the MSE (mean square error) sense. In this note, we significantly enhance this result, that is, we estimate the approximation complexity in the L∞ sense. The key to the proof is to establish a frequency decomposition lemma which may be of independent interest.
The approximation theory of neural networks has been widely concerned in recent years [6,12,18,21,22,24,25,26,27,30,31,32,33,34,35]. There are two widely accepted advantages of neural networks, one is that neural network can overcome the curse of dimensionality [1,4,5,7,9,11,15,16,17], the other is that as the number of layers increases, deep neural networks can fit high-oscillation or high-frequency functions [3,17,18,19,21,23]. For the bandlimited functions, the approximation complexity of deep ReLU networks has been studied [3,17] in the MSE sense. Compared with the results of the shallow neural networks [1], these results show deep ReLU networks can approximate the ultrawide bandwidth high-dimensional signals with very low complexity. In this paper, we continue this line of research, we estimate the approximation complexity of deep ReLU networks in the L∞ sense, thus, this estimate is significantly better than the previous results [3,17]. It worths mentioning that the approximate complexity estimations of the shallow neural networks are significantly different for different error norms [9,13]. Similarly, for the deep ReLU networks, we can not resort to the Maurey's theorem [20] in the L∞ case. We need to use the fat-shattering dimension theory [8] to establish a frequency decomposition lemma which is our main result.
Lemma 1.1. Let f:R→R be a bandlimited function of the form
f(x)=∫[−M,M]dˆf(w)ei2πx⋅wdw, |
and
∫[−M,M]d|ˆf(w)|dw:=Cf<∞, |
there exist 2l frequencies
w1,1,w1,2,…,w1,l,w2,1,w2,2,…,w2,l∈[−M,M]d, |
and 2l coefficients
a1,1,a1,2,…,a1,l,a2,1,a2,2,…,a2,l∈R |
satisfying
l∑j=1(|a1,j|+|a2,j|)≤2Cf |
and
l≥C(dC2f/ϵ2)log(CDM+log(Cf/ϵ))ln2(Cf/ϵ), |
such that
supx∈D|f(x)−l∑j=1(a1,jcos(2πw1,j⋅x)+a2,jsin(2πw2,j⋅x))|<ϵ, | (1.1) |
where C is some universal constant, 0<ϵ<1/2, D is any bounded subset of Rd and
CD:=supx∈[−1,1]dsupy∈D|x⋅y|. |
On the other hand, a feedforward neural network ˜X consists of layers of nodes, each node of the form
y=σ(N∑k=1wkxk+b), | (1.2) |
where {xk:k=1,2,…,N} are nodes in the previous layers, and σ is a activation function. If σ(x)=max(x,0), the multilayer feedforward neural network is the deep ReLU (Rectified Linear Unit) network. The complexity of deep neural networks approximating to sinusoidal functions has been discussed [3,17,19].
Lemma 1.2. ([3]) There exists a deep ReLU network ˜Xϵ with O(log(CDM)log22(1/ϵ)) layers and O(logCDMlog32(1/ϵ)) nodes such that
|˜Xϵ(t)−cos(2πw⋅t)|≤ϵforw∈[−M,M]d,t∈D. | (1.3) |
Based on Lemma 1.1 and Lemma 1.2, the following main theorem can be obtained.
Theorem 1.3. Let f:D→R be a bandlimited function of the form
f(x)=∫[−M,M]dˆf(w)ei2πx⋅wdw, |
and
Cf:=∫[−M,M]d|ˆf(w)|dw<∞, |
there exists a deep ReLU network ˜Xϵ with
O(log(CDM)log22(1/ϵ)) |
layers and
O((log(CDM)log32(1/ϵ)dC2f/ϵ2)log2(CDM+log(Cf/ϵ))ln2(Cf/ϵ)) |
nodes such that
supx∈D|f(x)−˜Xϵ|<ϵ. | (1.4) |
For any complex value z∈C, ℜz and ℑz denote the real part and the imaginary part of z, respectively. We denote by Cj,j=1,2,… some universal constants.
To prove Lemma 1.1, some notations and lemmas are needed. Given a class H of real-valued functions defined on X and a real number γ≥0. The fat-shattering dimension of H, denoted fatH(γ), is the largest integer m satisfying the following property: There exist V={x1,x2,…,xm} is a subset of the domain X, and the real numbers r1,r2,…,rm such that for all b∈{0,1}m there is a function fb in H with fb(xi)≥ri+γifbi=1 and fb(xi)<ri−γifbi=0 for 1≤i≤m. Fat-shattering dimension is a generalization of Vapnik-Chervonenkis dimension [29] which is often used to estimate approximation complexity [10,14,35].
Lemma 2.1. ([2]) Let H be a class of functions from some domain X into [0,1]. Then for all distributions P over X and for all 0<ϵ,δ<1/2
Prx1,x2,…,xl[supf∈H|1mm∑i=1f(xi)−Ex∼P[f(x)]|≥ϵ]≤δ |
where {xi}li=1 are l independent draws from P, Prx1,x2,…,xl denotes the joint probability distribution function for random variables x1,x2,…,xl and
l=O(1ϵ2(fatH(ϵ/5)ln21ϵ+ln1δ)). |
Lemma 2.2. ([28]) Let p1,p2,…,pl be d-variable real polynomials of degree at most q. If l≥d, then the number of vectors
(sgn(p1(x)),sgn(p2(x)),…,sgn(pl(x))) |
is at most (4elq/d)d, where sgn(x):={1,x≥0,−1,x<0.
By the Taylor expansion, the sines and cosines can be approximation by polynomials.
Lemma 2.3. Let 0<ϵ<1, a>0 and f(x)=sin(x) or cos(x). Then there exist C1>0 and a polynomial Q of order less than or equal to C1(a+log(1/ϵ)) such that supx∈[−a,a]|f(x)−Q(x)|≤ϵ.
Proof. The Taylor expansion for an n times differentiable function f at 0 with the Lagrange remainder is given by
f(x)=n−1∑j=1f(j)(0)j!xj+f(n)(x0)n!xn, | (2.1) |
where x0 is between 0 and x, f(j) denotes the derivative of order j, and the factorial n!=n(n−1)⋯1. Applying the Stirling formula to n!, we have
n!≥√2πn(ne)n>(ne)n, n∈N. | (2.2) |
Note that |sin(n)(x)|≤1 and |cos(n)(x)|≤1 for all x∈R. Thus, by (2.1) and (2.2), we have for all n≥max{2ea,1log2log(1ϵ)}
supx∈[−a,a]|f(n)(x0)n!(x)n|≤ann!<(ean)n<12n<ϵ. |
The proof is hence complete.
Proof of Lemma 1.1. Since f is a real-valued function, we have for all x∈Rd
f(x)=∫[−M,M]dˆf(w)ei2πw⋅xdw=∫[−M,M]d(ℜˆf(w)+iℑˆf(w))(cos(2πw⋅x)+isin(2πw⋅x))dw=∫[−M,M]dcos(2πw⋅x)ℜˆf(w)−sin(2πw⋅x))ℑˆf(w)dw=:f1−f2−f3+f4, | (2.3) |
where
f1(x):=∫[−M,M]dcos(2πw⋅x)max(0,ℜˆf(w))dw, |
f2(x):=∫[−M,M]dcos(2πw⋅x)max(−ℜˆf(w),0)dw, |
f3(x):=∫[−M,M]dsin(2πw⋅x))max(0,ℑˆf(w))dw, |
f4(x):=∫[−M,M]dsin(2πw⋅x))max(−ℑˆf(w),0)dw. |
Let F1:=max(0,ℜˆf). Without loss of generality, we assume that
∫[−M,M]dF1(w)dw>0. |
We only discuss the approximation for the first term
f1(x)=∫[−M,M]dcos(2πw⋅x)F1(w)dw, x∈Rd. |
The other three terms f2,f3,f4 in (2.3) can be handled in a similar manner. Observe that
|w⋅x|≤CDM for all w∈[−M,M]d and x∈D. |
By Lemma 2.3, there exists C2>0 and a polynomial Q of degree C2(CDM+log(1/ϵ)) such that
|cos(2πw⋅x)−Q(2πw⋅x)|<ϵ for all w∈[−M,M]d and x∈D. | (2.4) |
Let
˜f(x):=∫[−M,M]dQ(2πw⋅x)F1(w)dw, x∈Rd. |
Then, we get
|˜f(x)−f1(x)|<ϵ∫[−M,M]dF1(w)dw,x∈D. | (2.5) |
Set
G:={gx(w)=cos(2πx⋅w):x∈D,w∈[−M,M]d} |
and
Gϵ={gx(w)=Q(2πx⋅w):x∈D,w∈[−M,M]d}. |
By Lemma 2.2, the number of vectors
(sgn(Q(2πw1⋅x)),sgn(Q(2πw2⋅x)),…,sgn(Q(2πwl⋅x)) |
is at most (4elC2(CDM+log(1/ϵ))/d)d. Thus,
2fatGϵ(0)≤(4efatGϵ(0)C2(CDM+log(1/ϵ)/d)d, |
that is
fatGϵ(0)≤dlog2(4eC2(CDM+log(1/ϵ))+dlog2(fatGϵ(0)d). |
Note that
dlog2(fatGϵ(0)d)≤12fatGϵ(0)forfatGϵ(0)>4d, |
then
fatG(ϵ∫[−M,M]dF1(w)dw)≤fatGϵ(0)≤2dlog2(8eC2(CDM+log(1/ϵ)))+4d. |
By Lemma 2.1, choosing δ<1/4 and
l≳1ϵ2(dlog2(CDM+log(1/ϵ))ln21ϵ+ln1δ), |
we have
Prw1,w2,…,wl(supx∈D,|∫[−M,M]dF1(w)dwll∑j=1cos(wj⋅x)−f1(x)|>ϵ∫[−M,M]dF1(w)dw)≤δ<1, |
where {wi}li=1 obeys the probability distribution generated by the density function
F1(w)∫[−M,M]dF1(w)dw. |
Therefore, there exist w1,w2,…,wl such that
|∫[−M,M]dF1(w)dwll∑j=1cos(wj⋅x)−f1(x)|≤O(ϵ∫[−M,M]dF1(w)dw). | (2.6) |
The approximation for f2 is similar to f1 given as in (2.6). Hence, we can obtain an approximation for f3 or f4 by the linear combination of sines. The sum of 2l coefficients in (1.1) is bounded by
l∑j=1(|a1,j|+|a2,j|)≤∫[−M,M]d(max(0,ℜˆf(w))+max(0,−ℜˆf(w))+max(0,ℑˆf(w))+max(0,−ℑˆf(w)))dw≤2Cf. | (2.7) |
The proof is hence complete.
In this paper we prove an interesting frequency decomposition lemma, as a corollary, we establish the uniform approximation estimation of bandlimited functions via deep ReLU networks. We will discuss the learnability of neural networks for bandlimied functions in the following work. These theoretical results will deepen the understanding of machine learning methods in signal processing.
The authors are thankful to their Universities for their support. This work was supported by the National Natural Science Foundation of China (No. 11661065).
The authors declare that they have no competing interests.
[1] |
A. R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE T. Inform. Theory, 39 (1993), 930–945. https://doi.org/10.1109/18.256500 doi: 10.1109/18.256500
![]() |
[2] | P. L. Bartlett, P. M. Long, More theorems about scalesensitive dimensions and learning, Proceedings of the Eighth Annual Conference on Computational Learning Theory, 1995,392–401. |
[3] |
L. Chen, C. Wu, A note on the expressive power of deep rectified linear unit networks in high-dimensional spaces, Math. Methods Appl. Sci., 42 (2019), 3400–3404. https://doi.org/10.1002/mma.5575 doi: 10.1002/mma.5575
![]() |
[4] | N. Cohen, O. Sharir, A. Shashua, On the expressive power of deep learning: A tensor analysis, 29th Annual Conference on Learning Theory, 2016,698–728. |
[5] | R. Eldan, O. Shamir, The power of depth for feedforward neural networks, 29th Annual Conference on Learning Theory, 2016,907–940. |
[6] |
I. Gühring, G. Kutyniok, P. Petersen, Error bounds for approximations with deep ReLU neural networks in Ws,p norms, Anal. Appl., 18 (2020), 803–859. https://doi.org/10.1142/S0219530519410021 doi: 10.1142/S0219530519410021
![]() |
[7] | A. Jentzen, D. Salimova, T. Welti, A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients, arXiv, 2018. https://arXiv.org/abs/1809.07321 |
[8] |
M. J. Kearns, R. E. Schapire, Efficient distribution-free learning of probabilistic concepts, J. Comput. Syst. Sci., 48 (1994), 464–497. https://doi.org/10.1016/S0022-0000(05)80062-5 doi: 10.1016/S0022-0000(05)80062-5
![]() |
[9] |
J. M. Klusowski, A. R. Barron, Approximation by combinations of ReLU and squared ReLU ridge functions with ℓ1 and ℓ0 controls, IEEE T. Inform. Theory, 64 (2018), 7649–7656. https://doi.org/10.1109/TIT.2018.2874447 doi: 10.1109/TIT.2018.2874447
![]() |
[10] |
V. N. Konovalov, On the orders of nonlinear approximations for classes of functions of given form, Math. Notes, 78 (2005), 88–104. https://doi.org/10.1007/s11006-005-0102-3 doi: 10.1007/s11006-005-0102-3
![]() |
[11] | H. Lee, R. Ge, T. Ma, A. Risteski, S. Arora, On the ability of neural nets to express distributions, Proceedings of the 2017 Conference on Learning Theory, 65 (2017), 1271–1296. |
[12] | S. Liang, R. Srikant, Why deep neural networks for function approximation? arXiv, 2016. https://arXiv.org/abs/1610.04161 |
[13] |
Y. Makovoz, Uniform approximation by neural networks, J. Approx. Theory, 95 (1998), 215–228. https://doi.org/10.1006/jath.1997.3217 doi: 10.1006/jath.1997.3217
![]() |
[14] |
V. Maiorov, J. Ratsaby, On the degree of approximation by manifolds of finite pseudo-dimension, Constr. Approx., 15 (1999), 291–300. https://doi.org/10.1007/s003659900108 doi: 10.1007/s003659900108
![]() |
[15] |
H. N. Mhaskar, T. Poggio Deep vs. shallow networks: An approximation theory perspective, Anal. Appl., 14 (2016), 829–848. https://doi.org/10.1142/S0219530516400042 doi: 10.1142/S0219530516400042
![]() |
[16] |
H. Montanelli, Q. Du, New error bounds for deep ReLU networks using sparse grids, SIAM J. Math. Data Sci., 1 (2019), 78–92. https://doi.org/10.1137/18M1189336 doi: 10.1137/18M1189336
![]() |
[17] | H. Montanelli, H. Yang, Q. Du, Deep ReLU networks overcome the curse of dimensionality for bandlimited functions, arXiv, 2019. https://arXiv.org/abs/1903.00735 |
[18] | G. F. Montúfar, R. Pascanu, K. Cho, Y. Bengio, On the number of linear regions of deep neural networks, Adv. Neural Inform. Proc. Syst., 2014, 2924–2932. |
[19] | D. Perekrestenko, P. Grohs, D. Elbrachter, H. Bölcskei, The universal approximation power of finite-width deep ReLU networks, arXiv, 2018. https://arXiv.org/abs/1806.01528 |
[20] | G. Pisier, Remarques sur un résultat non publié de B. Maurey, Sˊeminaire Anal. Fonct. (Polytechnique), 1981, 1–12. |
[21] | M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, J. Sohl-Dickstein On the expressive power of deep neural networks, Proceedings of the 34th International Conference on Machine Learning, 70 (2017), 2847–2854. |
[22] |
U. Shaham, A. Cloninger, R. R. Coifman, Provable approximation properties for deep neural networks, Appl. Comput. Harmon. Anal., 44 (2018), 537–557. https://doi.org/10.1016/j.acha.2016.04.003 doi: 10.1016/j.acha.2016.04.003
![]() |
[23] | T. Serra, C. Tjandraatmadja, S. Ramalingam, Bounding and counting linear regions of deep neural networks, Proceedings of the 35th International Conference on Machine Learning, 2018, 4558–4566. |
[24] |
Z. Shen, H. Yang, S. Zhang, Nonlinear approximation via compositions, Neural Networks, 119 (2019), 74–84. https://doi.org/10.1016/j.neunet.2019.07.011 doi: 10.1016/j.neunet.2019.07.011
![]() |
[25] | Z. Shen, H. Yang, S. Zhang, Deep network approximation with discrepancy being reciprocal of width to power of depth, arXiv, 2020. |
[26] |
Z. Shen, H. Yang, S. Zhang, Neural network approximation: Three hidden layers are enough, Neural Networks, 141 (2021), 160–173. https://doi.org/10.1016/j.neunet.2021.04.011 doi: 10.1016/j.neunet.2021.04.011
![]() |
[27] | M. Telgarsky, Representation benefits of deep feedforward networ, arXiv, 2015. https://arXiv.org/abs/1509.08101 |
[28] |
H. E. Warren, Lower bounds for approximation by nonlinear manifolds, Trans. Amer. Math. Soc., 133 (1968), 167–178. https://doi.org/10.1090/S0002-9947-1968-0226281-1 doi: 10.1090/S0002-9947-1968-0226281-1
![]() |
[29] | V. N. Vapnik, A. Y. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities, In: V. Vovk, H. Papadopoulos, A. Gammerman, Measures of complexity, Springer, 1971, 11–30. https://doi.org/10.1007/978-3-319-21852-6_3 |
[30] |
D. Yarotsky, Error bounds for approximations with deep ReLU networks, Neural Networks, 94 (2017), 103–114. https://doi.org/10.1016/j.neunet.2017.07.002 doi: 10.1016/j.neunet.2017.07.002
![]() |
[31] |
D. X. Zhou, Universality of deep convolutional neural networks, Appl. Comput. Harmon. Anal., 48 (2020), 787–794. https://doi.org/10.1016/j.acha.2019.06.004 doi: 10.1016/j.acha.2019.06.004
![]() |
[32] |
Y. Xu, H. Zhang, Convergence of deep convolutional neural networks, Neural Networks, 153 (2022), 553–563. https://doi.org/10.1016/j.neunet.2022.06.031 doi: 10.1016/j.neunet.2022.06.031
![]() |
[33] |
R. DeVore, H. Boris, P. Guergana, Neural network approximation, Acta Numer., 30 (2021), 327–444. https://doi.org/10.1017/S0962492921000052 doi: 10.1017/S0962492921000052
![]() |
[34] |
I. Gühring, R. Mones, Approximation rates for neural networks with encodable weights in smoothness spaces, Neural Networks, 134 (2021), 107–130. https://doi.org/10.1016/j.neunet.2020.11.010 doi: 10.1016/j.neunet.2020.11.010
![]() |
[35] |
D. Dũng, V. K. Nguyen, Deep ReLU neural networks in high-dimensional approximation, Neural Networks, 142 (2021), 619–635. https://doi.org/10.1016/j.neunet.2021.07.027 doi: 10.1016/j.neunet.2021.07.027
![]() |
1. | Yu-Dong Zhang, Yanrong Pei, Juan Manuel Górriz, SCNN: A Explainable Swish-based CNN and Mobile App for COVID-19 Diagnosis, 2023, 28, 1383-469X, 1936, 10.1007/s11036-023-02161-3 |