A recent line of works established the approximation complexity estimation of deep ReLU networks for the bandlimited functions in the MSE (mean square error) sense. In this note, we significantly enhance this result, that is, we estimate the approximation complexity in the $ L_{\infty} $ sense. The key to the proof is to establish a frequency decomposition lemma which may be of independent interest.
Citation: Liang Chen, Wenjun Liu. On the uniform approximation estimation of deep ReLU networks via frequency decomposition[J]. AIMS Mathematics, 2022, 7(10): 19018-19025. doi: 10.3934/math.20221045
A recent line of works established the approximation complexity estimation of deep ReLU networks for the bandlimited functions in the MSE (mean square error) sense. In this note, we significantly enhance this result, that is, we estimate the approximation complexity in the $ L_{\infty} $ sense. The key to the proof is to establish a frequency decomposition lemma which may be of independent interest.
[1] | A. R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE T. Inform. Theory, 39 (1993), 930–945. https://doi.org/10.1109/18.256500 doi: 10.1109/18.256500 |
[2] | P. L. Bartlett, P. M. Long, More theorems about scalesensitive dimensions and learning, Proceedings of the Eighth Annual Conference on Computational Learning Theory, 1995,392–401. |
[3] | L. Chen, C. Wu, A note on the expressive power of deep rectified linear unit networks in high-dimensional spaces, Math. Methods Appl. Sci., 42 (2019), 3400–3404. https://doi.org/10.1002/mma.5575 doi: 10.1002/mma.5575 |
[4] | N. Cohen, O. Sharir, A. Shashua, On the expressive power of deep learning: A tensor analysis, 29th Annual Conference on Learning Theory, 2016,698–728. |
[5] | R. Eldan, O. Shamir, The power of depth for feedforward neural networks, 29th Annual Conference on Learning Theory, 2016,907–940. |
[6] | I. Gühring, G. Kutyniok, P. Petersen, Error bounds for approximations with deep ReLU neural networks in $W^{s, p}$ norms, Anal. Appl., 18 (2020), 803–859. https://doi.org/10.1142/S0219530519410021 doi: 10.1142/S0219530519410021 |
[7] | A. Jentzen, D. Salimova, T. Welti, A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients, arXiv, 2018. https://arXiv.org/abs/1809.07321 |
[8] | M. J. Kearns, R. E. Schapire, Efficient distribution-free learning of probabilistic concepts, J. Comput. Syst. Sci., 48 (1994), 464–497. https://doi.org/10.1016/S0022-0000(05)80062-5 doi: 10.1016/S0022-0000(05)80062-5 |
[9] | J. M. Klusowski, A. R. Barron, Approximation by combinations of ReLU and squared ReLU ridge functions with $ \ell^1 $ and $ \ell^0 $ controls, IEEE T. Inform. Theory, 64 (2018), 7649–7656. https://doi.org/10.1109/TIT.2018.2874447 doi: 10.1109/TIT.2018.2874447 |
[10] | V. N. Konovalov, On the orders of nonlinear approximations for classes of functions of given form, Math. Notes, 78 (2005), 88–104. https://doi.org/10.1007/s11006-005-0102-3 doi: 10.1007/s11006-005-0102-3 |
[11] | H. Lee, R. Ge, T. Ma, A. Risteski, S. Arora, On the ability of neural nets to express distributions, Proceedings of the 2017 Conference on Learning Theory, 65 (2017), 1271–1296. |
[12] | S. Liang, R. Srikant, Why deep neural networks for function approximation? arXiv, 2016. https://arXiv.org/abs/1610.04161 |
[13] | Y. Makovoz, Uniform approximation by neural networks, J. Approx. Theory, 95 (1998), 215–228. https://doi.org/10.1006/jath.1997.3217 doi: 10.1006/jath.1997.3217 |
[14] | V. Maiorov, J. Ratsaby, On the degree of approximation by manifolds of finite pseudo-dimension, Constr. Approx., 15 (1999), 291–300. https://doi.org/10.1007/s003659900108 doi: 10.1007/s003659900108 |
[15] | H. N. Mhaskar, T. Poggio Deep vs. shallow networks: An approximation theory perspective, Anal. Appl., 14 (2016), 829–848. https://doi.org/10.1142/S0219530516400042 doi: 10.1142/S0219530516400042 |
[16] | H. Montanelli, Q. Du, New error bounds for deep ReLU networks using sparse grids, SIAM J. Math. Data Sci., 1 (2019), 78–92. https://doi.org/10.1137/18M1189336 doi: 10.1137/18M1189336 |
[17] | H. Montanelli, H. Yang, Q. Du, Deep ReLU networks overcome the curse of dimensionality for bandlimited functions, arXiv, 2019. https://arXiv.org/abs/1903.00735 |
[18] | G. F. Montúfar, R. Pascanu, K. Cho, Y. Bengio, On the number of linear regions of deep neural networks, Adv. Neural Inform. Proc. Syst., 2014, 2924–2932. |
[19] | D. Perekrestenko, P. Grohs, D. Elbrachter, H. Bölcskei, The universal approximation power of finite-width deep ReLU networks, arXiv, 2018. https://arXiv.org/abs/1806.01528 |
[20] | G. Pisier, Remarques sur un résultat non publié de B. Maurey, S$\acute{e}$minaire Anal. Fonct. (Polytechnique), 1981, 1–12. |
[21] | M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, J. Sohl-Dickstein On the expressive power of deep neural networks, Proceedings of the 34th International Conference on Machine Learning, 70 (2017), 2847–2854. |
[22] | U. Shaham, A. Cloninger, R. R. Coifman, Provable approximation properties for deep neural networks, Appl. Comput. Harmon. Anal., 44 (2018), 537–557. https://doi.org/10.1016/j.acha.2016.04.003 doi: 10.1016/j.acha.2016.04.003 |
[23] | T. Serra, C. Tjandraatmadja, S. Ramalingam, Bounding and counting linear regions of deep neural networks, Proceedings of the 35th International Conference on Machine Learning, 2018, 4558–4566. |
[24] | Z. Shen, H. Yang, S. Zhang, Nonlinear approximation via compositions, Neural Networks, 119 (2019), 74–84. https://doi.org/10.1016/j.neunet.2019.07.011 doi: 10.1016/j.neunet.2019.07.011 |
[25] | Z. Shen, H. Yang, S. Zhang, Deep network approximation with discrepancy being reciprocal of width to power of depth, arXiv, 2020. |
[26] | Z. Shen, H. Yang, S. Zhang, Neural network approximation: Three hidden layers are enough, Neural Networks, 141 (2021), 160–173. https://doi.org/10.1016/j.neunet.2021.04.011 doi: 10.1016/j.neunet.2021.04.011 |
[27] | M. Telgarsky, Representation benefits of deep feedforward networ, arXiv, 2015. https://arXiv.org/abs/1509.08101 |
[28] | H. E. Warren, Lower bounds for approximation by nonlinear manifolds, Trans. Amer. Math. Soc., 133 (1968), 167–178. https://doi.org/10.1090/S0002-9947-1968-0226281-1 doi: 10.1090/S0002-9947-1968-0226281-1 |
[29] | V. N. Vapnik, A. Y. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities, In: V. Vovk, H. Papadopoulos, A. Gammerman, Measures of complexity, Springer, 1971, 11–30. https://doi.org/10.1007/978-3-319-21852-6_3 |
[30] | D. Yarotsky, Error bounds for approximations with deep ReLU networks, Neural Networks, 94 (2017), 103–114. https://doi.org/10.1016/j.neunet.2017.07.002 doi: 10.1016/j.neunet.2017.07.002 |
[31] | D. X. Zhou, Universality of deep convolutional neural networks, Appl. Comput. Harmon. Anal., 48 (2020), 787–794. https://doi.org/10.1016/j.acha.2019.06.004 doi: 10.1016/j.acha.2019.06.004 |
[32] | Y. Xu, H. Zhang, Convergence of deep convolutional neural networks, Neural Networks, 153 (2022), 553–563. https://doi.org/10.1016/j.neunet.2022.06.031 doi: 10.1016/j.neunet.2022.06.031 |
[33] | R. DeVore, H. Boris, P. Guergana, Neural network approximation, Acta Numer., 30 (2021), 327–444. https://doi.org/10.1017/S0962492921000052 doi: 10.1017/S0962492921000052 |
[34] | I. Gühring, R. Mones, Approximation rates for neural networks with encodable weights in smoothness spaces, Neural Networks, 134 (2021), 107–130. https://doi.org/10.1016/j.neunet.2020.11.010 doi: 10.1016/j.neunet.2020.11.010 |
[35] | D. Dũng, V. K. Nguyen, Deep ReLU neural networks in high-dimensional approximation, Neural Networks, 142 (2021), 619–635. https://doi.org/10.1016/j.neunet.2021.07.027 doi: 10.1016/j.neunet.2021.07.027 |