Research article

On the uniform approximation estimation of deep ReLU networks via frequency decomposition

  • Received: 30 March 2022 Revised: 07 August 2022 Accepted: 08 August 2022 Published: 29 August 2022
  • MSC : 41A30, 41A63

  • A recent line of works established the approximation complexity estimation of deep ReLU networks for the bandlimited functions in the MSE (mean square error) sense. In this note, we significantly enhance this result, that is, we estimate the approximation complexity in the L sense. The key to the proof is to establish a frequency decomposition lemma which may be of independent interest.

    Citation: Liang Chen, Wenjun Liu. On the uniform approximation estimation of deep ReLU networks via frequency decomposition[J]. AIMS Mathematics, 2022, 7(10): 19018-19025. doi: 10.3934/math.20221045

    Related Papers:

    [1] Mohamed Kharrat, Moez Krichen, Loay Alkhalifa, Karim Gasmi . Neural networks-based adaptive command filter control for nonlinear systems with unknown backlash-like hysteresis and its application to single link robot manipulator. AIMS Mathematics, 2024, 9(1): 959-973. doi: 10.3934/math.2024048
    [2] Bing Jiang . Rate of approximaton by some neural network operators. AIMS Mathematics, 2024, 9(11): 31679-31695. doi: 10.3934/math.20241523
    [3] Maha M. Althobaiti, José Escorcia-Gutierrez . Weighted salp swarm algorithm with deep learning-powered cyber-threat detection for robust network security. AIMS Mathematics, 2024, 9(7): 17676-17695. doi: 10.3934/math.2024859
    [4] Jeong-Kweon Seo, Byeong-Chun Shin . Reduced-order modeling using the frequency-domain method for parabolic partial differential equations. AIMS Mathematics, 2023, 8(7): 15255-15268. doi: 10.3934/math.2023779
    [5] Yan Wang, Ying Cao, Ziling Heng, Weiqiong Wang . Linear complexity and 2-adic complexity of binary interleaved sequences with optimal autocorrelation magnitude. AIMS Mathematics, 2022, 7(8): 13790-13802. doi: 10.3934/math.2022760
    [6] Martin Do Pham . Fractal approximation of chaos game representations using recurrent iterated function systems. AIMS Mathematics, 2019, 5(6): 1824-1840. doi: 10.3934/math.2019.6.1824
    [7] Sultanah M. Alshammari, Nofe A. Alganmi, Mohammed H. Ba-Aoum, Sami Saeed Binyamin, Abdullah AL-Malaise AL-Ghamdi, Mahmoud Ragab . Hybrid arithmetic optimization algorithm with deep learning model for secure Unmanned Aerial Vehicle networks. AIMS Mathematics, 2024, 9(3): 7131-7151. doi: 10.3934/math.2024348
    [8] Mahmoud M. Abdelwahab, Khamis A. Al-Karawi, H. E. Semary . Integrating gene selection and deep learning for enhanced Autisms' disease prediction: a comparative study using microarray data. AIMS Mathematics, 2024, 9(7): 17827-17846. doi: 10.3934/math.2024867
    [9] Gaosheng Liu, Yang Bai . Statistical inference in functional semiparametric spatial autoregressive model. AIMS Mathematics, 2021, 6(10): 10890-10906. doi: 10.3934/math.2021633
    [10] Kairui Chen, Yongping Du, Shuyan Xia . Adaptive state observer event-triggered consensus control for multi-agent systems with actuator failures. AIMS Mathematics, 2024, 9(9): 25752-25775. doi: 10.3934/math.20241258
  • A recent line of works established the approximation complexity estimation of deep ReLU networks for the bandlimited functions in the MSE (mean square error) sense. In this note, we significantly enhance this result, that is, we estimate the approximation complexity in the L sense. The key to the proof is to establish a frequency decomposition lemma which may be of independent interest.



    The approximation theory of neural networks has been widely concerned in recent years [6,12,18,21,22,24,25,26,27,30,31,32,33,34,35]. There are two widely accepted advantages of neural networks, one is that neural network can overcome the curse of dimensionality [1,4,5,7,9,11,15,16,17], the other is that as the number of layers increases, deep neural networks can fit high-oscillation or high-frequency functions [3,17,18,19,21,23]. For the bandlimited functions, the approximation complexity of deep ReLU networks has been studied [3,17] in the MSE sense. Compared with the results of the shallow neural networks [1], these results show deep ReLU networks can approximate the ultrawide bandwidth high-dimensional signals with very low complexity. In this paper, we continue this line of research, we estimate the approximation complexity of deep ReLU networks in the L sense, thus, this estimate is significantly better than the previous results [3,17]. It worths mentioning that the approximate complexity estimations of the shallow neural networks are significantly different for different error norms [9,13]. Similarly, for the deep ReLU networks, we can not resort to the Maurey's theorem [20] in the L case. We need to use the fat-shattering dimension theory [8] to establish a frequency decomposition lemma which is our main result.

    Lemma 1.1. Let f:RR be a bandlimited function of the form

    f(x)=[M,M]dˆf(w)ei2πxwdw,

    and

    [M,M]d|ˆf(w)|dw:=Cf<,

    there exist 2l frequencies

    w1,1,w1,2,,w1,l,w2,1,w2,2,,w2,l[M,M]d,

    and 2l coefficients

    a1,1,a1,2,,a1,l,a2,1,a2,2,,a2,lR

    satisfying

    lj=1(|a1,j|+|a2,j|)2Cf

    and

    lC(dC2f/ϵ2)log(CDM+log(Cf/ϵ))ln2(Cf/ϵ),

    such that

    supxD|f(x)lj=1(a1,jcos(2πw1,jx)+a2,jsin(2πw2,jx))|<ϵ, (1.1)

    where C is some universal constant, 0<ϵ<1/2, D is any bounded subset of Rd and

    CD:=supx[1,1]dsupyD|xy|.

    On the other hand, a feedforward neural network ˜X consists of layers of nodes, each node of the form

    y=σ(Nk=1wkxk+b), (1.2)

    where {xk:k=1,2,,N} are nodes in the previous layers, and σ is a activation function. If σ(x)=max(x,0), the multilayer feedforward neural network is the deep ReLU (Rectified Linear Unit) network. The complexity of deep neural networks approximating to sinusoidal functions has been discussed [3,17,19].

    Lemma 1.2. ([3]) There exists a deep ReLU network ˜Xϵ with O(log(CDM)log22(1/ϵ)) layers and O(logCDMlog32(1/ϵ)) nodes such that

    |˜Xϵ(t)cos(2πwt)|ϵforw[M,M]d,tD. (1.3)

    Based on Lemma 1.1 and Lemma 1.2, the following main theorem can be obtained.

    Theorem 1.3. Let f:DR be a bandlimited function of the form

    f(x)=[M,M]dˆf(w)ei2πxwdw,

    and

    Cf:=[M,M]d|ˆf(w)|dw<,

    there exists a deep ReLU network ˜Xϵ with

    O(log(CDM)log22(1/ϵ))

    layers and

    O((log(CDM)log32(1/ϵ)dC2f/ϵ2)log2(CDM+log(Cf/ϵ))ln2(Cf/ϵ))

    nodes such that

    supxD|f(x)˜Xϵ|<ϵ. (1.4)

    For any complex value zC, z and z denote the real part and the imaginary part of z, respectively. We denote by Cj,j=1,2, some universal constants.

    To prove Lemma 1.1, some notations and lemmas are needed. Given a class H of real-valued functions defined on X and a real number γ0. The fat-shattering dimension of H, denoted fatH(γ), is the largest integer m satisfying the following property: There exist V={x1,x2,,xm} is a subset of the domain X, and the real numbers r1,r2,,rm such that for all b{0,1}m there is a function fb in H with fb(xi)ri+γifbi=1 and fb(xi)<riγifbi=0 for 1im. Fat-shattering dimension is a generalization of Vapnik-Chervonenkis dimension [29] which is often used to estimate approximation complexity [10,14,35].

    Lemma 2.1. ([2]) Let H be a class of functions from some domain X into [0,1]. Then for all distributions P over X and for all 0<ϵ,δ<1/2

    Prx1,x2,,xl[supfH|1mmi=1f(xi)ExP[f(x)]|ϵ]δ

    where {xi}li=1 are l independent draws from P, Prx1,x2,,xl denotes the joint probability distribution function for random variables x1,x2,,xl and

    l=O(1ϵ2(fatH(ϵ/5)ln21ϵ+ln1δ)).

    Lemma 2.2. ([28]) Let p1,p2,,pl be d-variable real polynomials of degree at most q. If ld, then the number of vectors

    (sgn(p1(x)),sgn(p2(x)),,sgn(pl(x)))

    is at most (4elq/d)d, where sgn(x):={1,x0,1,x<0.

    By the Taylor expansion, the sines and cosines can be approximation by polynomials.

    Lemma 2.3. Let 0<ϵ<1, a>0 and f(x)=sin(x) or cos(x). Then there exist C1>0 and a polynomial Q of order less than or equal to C1(a+log(1/ϵ)) such that supx[a,a]|f(x)Q(x)|ϵ.

    Proof. The Taylor expansion for an n times differentiable function f at 0 with the Lagrange remainder is given by

    f(x)=n1j=1f(j)(0)j!xj+f(n)(x0)n!xn, (2.1)

    where x0 is between 0 and x, f(j) denotes the derivative of order j, and the factorial n!=n(n1)1. Applying the Stirling formula to n!, we have

    n!2πn(ne)n>(ne)n, nN. (2.2)

    Note that |sin(n)(x)|1 and |cos(n)(x)|1 for all xR. Thus, by (2.1) and (2.2), we have for all nmax{2ea,1log2log(1ϵ)}

    supx[a,a]|f(n)(x0)n!(x)n|ann!<(ean)n<12n<ϵ.

    The proof is hence complete.

    Proof of Lemma 1.1. Since f is a real-valued function, we have for all xRd

    f(x)=[M,M]dˆf(w)ei2πwxdw=[M,M]d(ˆf(w)+iˆf(w))(cos(2πwx)+isin(2πwx))dw=[M,M]dcos(2πwx)ˆf(w)sin(2πwx))ˆf(w)dw=:f1f2f3+f4, (2.3)

    where

    f1(x):=[M,M]dcos(2πwx)max(0,ˆf(w))dw,
    f2(x):=[M,M]dcos(2πwx)max(ˆf(w),0)dw,
    f3(x):=[M,M]dsin(2πwx))max(0,ˆf(w))dw,
    f4(x):=[M,M]dsin(2πwx))max(ˆf(w),0)dw.

    Let F1:=max(0,ˆf). Without loss of generality, we assume that

    [M,M]dF1(w)dw>0.

    We only discuss the approximation for the first term

    f1(x)=[M,M]dcos(2πwx)F1(w)dw, xRd.

    The other three terms f2,f3,f4 in (2.3) can be handled in a similar manner. Observe that

    |wx|CDM for all w[M,M]d and xD.

    By Lemma 2.3, there exists C2>0 and a polynomial Q of degree C2(CDM+log(1/ϵ)) such that

    |cos(2πwx)Q(2πwx)|<ϵ for all w[M,M]d and xD. (2.4)

    Let

    ˜f(x):=[M,M]dQ(2πwx)F1(w)dw, xRd.

    Then, we get

    |˜f(x)f1(x)|<ϵ[M,M]dF1(w)dw,xD. (2.5)

    Set

    G:={gx(w)=cos(2πxw):xD,w[M,M]d}

    and

    Gϵ={gx(w)=Q(2πxw):xD,w[M,M]d}.

    By Lemma 2.2, the number of vectors

    (sgn(Q(2πw1x)),sgn(Q(2πw2x)),,sgn(Q(2πwlx))

    is at most (4elC2(CDM+log(1/ϵ))/d)d. Thus,

    2fatGϵ(0)(4efatGϵ(0)C2(CDM+log(1/ϵ)/d)d,

    that is

    fatGϵ(0)dlog2(4eC2(CDM+log(1/ϵ))+dlog2(fatGϵ(0)d).

    Note that

    dlog2(fatGϵ(0)d)12fatGϵ(0)forfatGϵ(0)>4d,

    then

    fatG(ϵ[M,M]dF1(w)dw)fatGϵ(0)2dlog2(8eC2(CDM+log(1/ϵ)))+4d.

    By Lemma 2.1, choosing δ<1/4 and

    l1ϵ2(dlog2(CDM+log(1/ϵ))ln21ϵ+ln1δ),

    we have

    Prw1,w2,,wl(supxD,|[M,M]dF1(w)dwllj=1cos(wjx)f1(x)|>ϵ[M,M]dF1(w)dw)δ<1,

    where {wi}li=1 obeys the probability distribution generated by the density function

    F1(w)[M,M]dF1(w)dw.

    Therefore, there exist w1,w2,,wl such that

    |[M,M]dF1(w)dwllj=1cos(wjx)f1(x)|O(ϵ[M,M]dF1(w)dw). (2.6)

    The approximation for f2 is similar to f1 given as in (2.6). Hence, we can obtain an approximation for f3 or f4 by the linear combination of sines. The sum of 2l coefficients in (1.1) is bounded by

    lj=1(|a1,j|+|a2,j|)[M,M]d(max(0,ˆf(w))+max(0,ˆf(w))+max(0,ˆf(w))+max(0,ˆf(w)))dw2Cf. (2.7)

    The proof is hence complete.

    In this paper we prove an interesting frequency decomposition lemma, as a corollary, we establish the uniform approximation estimation of bandlimited functions via deep ReLU networks. We will discuss the learnability of neural networks for bandlimied functions in the following work. These theoretical results will deepen the understanding of machine learning methods in signal processing.

    The authors are thankful to their Universities for their support. This work was supported by the National Natural Science Foundation of China (No. 11661065).

    The authors declare that they have no competing interests.



    [1] A. R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE T. Inform. Theory, 39 (1993), 930–945. https://doi.org/10.1109/18.256500 doi: 10.1109/18.256500
    [2] P. L. Bartlett, P. M. Long, More theorems about scalesensitive dimensions and learning, Proceedings of the Eighth Annual Conference on Computational Learning Theory, 1995,392–401.
    [3] L. Chen, C. Wu, A note on the expressive power of deep rectified linear unit networks in high-dimensional spaces, Math. Methods Appl. Sci., 42 (2019), 3400–3404. https://doi.org/10.1002/mma.5575 doi: 10.1002/mma.5575
    [4] N. Cohen, O. Sharir, A. Shashua, On the expressive power of deep learning: A tensor analysis, 29th Annual Conference on Learning Theory, 2016,698–728.
    [5] R. Eldan, O. Shamir, The power of depth for feedforward neural networks, 29th Annual Conference on Learning Theory, 2016,907–940.
    [6] I. Gühring, G. Kutyniok, P. Petersen, Error bounds for approximations with deep ReLU neural networks in Ws,p norms, Anal. Appl., 18 (2020), 803–859. https://doi.org/10.1142/S0219530519410021 doi: 10.1142/S0219530519410021
    [7] A. Jentzen, D. Salimova, T. Welti, A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients, arXiv, 2018. https://arXiv.org/abs/1809.07321
    [8] M. J. Kearns, R. E. Schapire, Efficient distribution-free learning of probabilistic concepts, J. Comput. Syst. Sci., 48 (1994), 464–497. https://doi.org/10.1016/S0022-0000(05)80062-5 doi: 10.1016/S0022-0000(05)80062-5
    [9] J. M. Klusowski, A. R. Barron, Approximation by combinations of ReLU and squared ReLU ridge functions with 1 and 0 controls, IEEE T. Inform. Theory, 64 (2018), 7649–7656. https://doi.org/10.1109/TIT.2018.2874447 doi: 10.1109/TIT.2018.2874447
    [10] V. N. Konovalov, On the orders of nonlinear approximations for classes of functions of given form, Math. Notes, 78 (2005), 88–104. https://doi.org/10.1007/s11006-005-0102-3 doi: 10.1007/s11006-005-0102-3
    [11] H. Lee, R. Ge, T. Ma, A. Risteski, S. Arora, On the ability of neural nets to express distributions, Proceedings of the 2017 Conference on Learning Theory, 65 (2017), 1271–1296.
    [12] S. Liang, R. Srikant, Why deep neural networks for function approximation? arXiv, 2016. https://arXiv.org/abs/1610.04161
    [13] Y. Makovoz, Uniform approximation by neural networks, J. Approx. Theory, 95 (1998), 215–228. https://doi.org/10.1006/jath.1997.3217 doi: 10.1006/jath.1997.3217
    [14] V. Maiorov, J. Ratsaby, On the degree of approximation by manifolds of finite pseudo-dimension, Constr. Approx., 15 (1999), 291–300. https://doi.org/10.1007/s003659900108 doi: 10.1007/s003659900108
    [15] H. N. Mhaskar, T. Poggio Deep vs. shallow networks: An approximation theory perspective, Anal. Appl., 14 (2016), 829–848. https://doi.org/10.1142/S0219530516400042 doi: 10.1142/S0219530516400042
    [16] H. Montanelli, Q. Du, New error bounds for deep ReLU networks using sparse grids, SIAM J. Math. Data Sci., 1 (2019), 78–92. https://doi.org/10.1137/18M1189336 doi: 10.1137/18M1189336
    [17] H. Montanelli, H. Yang, Q. Du, Deep ReLU networks overcome the curse of dimensionality for bandlimited functions, arXiv, 2019. https://arXiv.org/abs/1903.00735
    [18] G. F. Montúfar, R. Pascanu, K. Cho, Y. Bengio, On the number of linear regions of deep neural networks, Adv. Neural Inform. Proc. Syst., 2014, 2924–2932.
    [19] D. Perekrestenko, P. Grohs, D. Elbrachter, H. Bölcskei, The universal approximation power of finite-width deep ReLU networks, arXiv, 2018. https://arXiv.org/abs/1806.01528
    [20] G. Pisier, Remarques sur un résultat non publié de B. Maurey, Sˊeminaire Anal. Fonct. (Polytechnique), 1981, 1–12.
    [21] M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, J. Sohl-Dickstein On the expressive power of deep neural networks, Proceedings of the 34th International Conference on Machine Learning, 70 (2017), 2847–2854.
    [22] U. Shaham, A. Cloninger, R. R. Coifman, Provable approximation properties for deep neural networks, Appl. Comput. Harmon. Anal., 44 (2018), 537–557. https://doi.org/10.1016/j.acha.2016.04.003 doi: 10.1016/j.acha.2016.04.003
    [23] T. Serra, C. Tjandraatmadja, S. Ramalingam, Bounding and counting linear regions of deep neural networks, Proceedings of the 35th International Conference on Machine Learning, 2018, 4558–4566.
    [24] Z. Shen, H. Yang, S. Zhang, Nonlinear approximation via compositions, Neural Networks, 119 (2019), 74–84. https://doi.org/10.1016/j.neunet.2019.07.011 doi: 10.1016/j.neunet.2019.07.011
    [25] Z. Shen, H. Yang, S. Zhang, Deep network approximation with discrepancy being reciprocal of width to power of depth, arXiv, 2020.
    [26] Z. Shen, H. Yang, S. Zhang, Neural network approximation: Three hidden layers are enough, Neural Networks, 141 (2021), 160–173. https://doi.org/10.1016/j.neunet.2021.04.011 doi: 10.1016/j.neunet.2021.04.011
    [27] M. Telgarsky, Representation benefits of deep feedforward networ, arXiv, 2015. https://arXiv.org/abs/1509.08101
    [28] H. E. Warren, Lower bounds for approximation by nonlinear manifolds, Trans. Amer. Math. Soc., 133 (1968), 167–178. https://doi.org/10.1090/S0002-9947-1968-0226281-1 doi: 10.1090/S0002-9947-1968-0226281-1
    [29] V. N. Vapnik, A. Y. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities, In: V. Vovk, H. Papadopoulos, A. Gammerman, Measures of complexity, Springer, 1971, 11–30. https://doi.org/10.1007/978-3-319-21852-6_3
    [30] D. Yarotsky, Error bounds for approximations with deep ReLU networks, Neural Networks, 94 (2017), 103–114. https://doi.org/10.1016/j.neunet.2017.07.002 doi: 10.1016/j.neunet.2017.07.002
    [31] D. X. Zhou, Universality of deep convolutional neural networks, Appl. Comput. Harmon. Anal., 48 (2020), 787–794. https://doi.org/10.1016/j.acha.2019.06.004 doi: 10.1016/j.acha.2019.06.004
    [32] Y. Xu, H. Zhang, Convergence of deep convolutional neural networks, Neural Networks, 153 (2022), 553–563. https://doi.org/10.1016/j.neunet.2022.06.031 doi: 10.1016/j.neunet.2022.06.031
    [33] R. DeVore, H. Boris, P. Guergana, Neural network approximation, Acta Numer., 30 (2021), 327–444. https://doi.org/10.1017/S0962492921000052 doi: 10.1017/S0962492921000052
    [34] I. Gühring, R. Mones, Approximation rates for neural networks with encodable weights in smoothness spaces, Neural Networks, 134 (2021), 107–130. https://doi.org/10.1016/j.neunet.2020.11.010 doi: 10.1016/j.neunet.2020.11.010
    [35] D. Dũng, V. K. Nguyen, Deep ReLU neural networks in high-dimensional approximation, Neural Networks, 142 (2021), 619–635. https://doi.org/10.1016/j.neunet.2021.07.027 doi: 10.1016/j.neunet.2021.07.027
  • This article has been cited by:

    1. Yu-Dong Zhang, Yanrong Pei, Juan Manuel Górriz, SCNN: A Explainable Swish-based CNN and Mobile App for COVID-19 Diagnosis, 2023, 28, 1383-469X, 1936, 10.1007/s11036-023-02161-3
  • Reader Comments
  • © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1516) PDF downloads(57) Cited by(1)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog