We consider the rescaled pure greedy learning algorithm (RPGLA) with the dependent samples drawn according to a non-identical sequence of probability distributions. The generalization performance is provided by applying the independent-blocks technique and adding the drift error. We derive the satisfactory learning rate for the algorithm under the assumption that the process satisfies stationary $ \beta $-mixing, and also find that the optimal rate $ O(n^{-1}) $ can be obtained for i.i.d. processes.
Citation: Qin Guo, Binlei Cai. Learning capability of the rescaled pure greedy algorithm with non-iid sampling[J]. Electronic Research Archive, 2023, 31(3): 1387-1404. doi: 10.3934/era.2023071
We consider the rescaled pure greedy learning algorithm (RPGLA) with the dependent samples drawn according to a non-identical sequence of probability distributions. The generalization performance is provided by applying the independent-blocks technique and adding the drift error. We derive the satisfactory learning rate for the algorithm under the assumption that the process satisfies stationary $ \beta $-mixing, and also find that the optimal rate $ O(n^{-1}) $ can be obtained for i.i.d. processes.
[1] | J. Fang, S. B. Lin, Z. B. Xu, Learning and approximation capabilities of orthogonal super greedy algorithm, Knowl-Based Syst., 95 (2016), 86–98. https://doi.org/10.1016/j.knosys.2015.12.011 doi: 10.1016/j.knosys.2015.12.011 |
[2] | H. Chen, L. Q. Li, Z. B. Pan, Learning rates of multi-kernel regression by orthogonal greedy algorithm, J. Stat. Plan. Infer., 143 (2013), 276–282. https://doi.org/10.1016/j.jspi.2012.08.002 doi: 10.1016/j.jspi.2012.08.002 |
[3] | S. B. Lin, Y. H. Rong, X. P. Sun, Z. B. Xu, Learning capability of relaxed greedy algorithms, IEEE Trans. Neur. Net. Lear., 24 (2013), 1598–1608. https://doi.org/10.1109/TNNLS.2013.2265397 doi: 10.1109/TNNLS.2013.2265397 |
[4] | L. Xu, S. B. Lin, Z. B. Xu, Learning capability of the truncated greedy algorithm, Sci. China Inform. Sci. 59 (2016), 052103. https://doi.org/10.1007/s11432-016-5536-6 |
[5] | A. R. Barron, A. Cohen, W. Dahmen, R. A. DeVore, Approximation and learning by greedy algorithms, Ann. Statist., 36 (2008), 64–94. https://doi.org/10.1214/009053607000000631 doi: 10.1214/009053607000000631 |
[6] | L. Xu, S. B. Lin, J. S. Zeng, X. Liu, Z. B. Xu, Greedy criterion in orthogonal greedy learning, IEEE Trans. Cybernetics, 48 (2018), 955–966. https://doi.org/10.1109/TCYB.2017.2669259 doi: 10.1109/TCYB.2017.2669259 |
[7] | G. Petrova, Rescaled pure greedy algorithm for Hilbert and Banach spaces, Appl. Comput. Harmon. Anal., 41 (2016), 852–866. https://doi.org/10.1016/j.acha.2015.10.008 doi: 10.1016/j.acha.2015.10.008 |
[8] | S. G. Lv, D. M. Shi, Q. W. Xiao, M. S. Zhang, Sharp learning rates of coefficient-based $l^{q}$-regularized regression with indefinite kernels, Sci. China Math., 56 (2013), 1557–1574. https://doi.org/10.1007/s11425-013-4688-8 doi: 10.1007/s11425-013-4688-8 |
[9] | Y. L. Feng, S. G. Lv, Unified approach to coefficient-based regularized regression, Comput. Math. Appl., 62 (2011), 506–515. https://doi.org/10.1016/j.camwa.2011.05.034 doi: 10.1016/j.camwa.2011.05.034 |
[10] | W. L. Nie, C. Wang, Constructive analysis for coefficient regularization regression algorithms, J. Math. Anal. Appl., 431 (2015), 1153–1171. https://doi.org/10.1016/j.jmaa.2015.06.006 doi: 10.1016/j.jmaa.2015.06.006 |
[11] | H. W. Sun, Q. Wu, Least square regression with indefinite kernels and coefficient regularization, Appl. Comput. Harmon., 30 (2011), 96–109. https://doi.org/10.1016/j.acha.2010.04.001 doi: 10.1016/j.acha.2010.04.001 |
[12] | B. Schölkopf, A. Smola, Learning with Kernels, MIT Press, Cambridge, 2002. |
[13] | C. J. Liu, Gabor-based kernel pca with fractional power polynomial models for face recognition, IEEE Trans. Pattern Anal. Mach. Intell., 26 (2004), 572–581. https://doi.org/10.1109/TPAMI.2004.1273927 doi: 10.1109/TPAMI.2004.1273927 |
[14] | R. Opfer, Multiscale kernels, Adv. Comput. Math., 25 (2006), 357–380. https://doi.org/10.1007/s10444-004-7622-3 |
[15] | A. R. Barron, Universal approximation bounds for superposition of a sigmoidal function, IEEE Trans. Inform. Theory, 39 (1993), 930–945. https://doi.org/10.1109/18.256500 doi: 10.1109/18.256500 |
[16] | H. Chen, Y. C. Zhou, Y. Y. Tang, Convergence rate of the semi-supervised greedy algorithm, Neural Networks, 44 (2013), 44–50. https://doi.org/10.1016/j.neunet.2013.03.001 doi: 10.1016/j.neunet.2013.03.001 |
[17] | S. Smale, D. X. Zhou, Online learning with markov sampling, Anal. Appl., 7 (2009), 87–113. https://doi.org/10.1142/S0219530509001293 doi: 10.1142/S0219530509001293 |
[18] | Z. C. Guo, L. Shi, Classification with non-i.i.d. sampling, Math. Comput. Model., 54 (2011), 1347–1364. https://doi.org/10.1016/j.mcm.2011.03.042 doi: 10.1016/j.mcm.2011.03.042 |
[19] | Z. W. Pan, Q. W. Xiao, Least-square regularized regression with non-iid sampling, J. Stat. Plan. Infer., 139 (2009), 3579–3587. https://doi.org/10.1016/j.jspi.2009.04.007 doi: 10.1016/j.jspi.2009.04.007 |
[20] | R. C. Bradley, Basic properties of strong mixing conditions, Progr. Probab. Statist., 2 (1986), 165–192. https://doi.org/10.1007/978-1-4615-8162-8_8 doi: 10.1007/978-1-4615-8162-8_8 |
[21] | Q. Guo, P. X. Ye, B. L. Cai, Convergence Rate for $l^{q}$-Coefficient Regularized Regression With Non-i.i.d. Sampling, IEEE Access, 6 (2018), 18804–18813. https://doi.org/10.1109/ACCESS.2018.2817215 doi: 10.1109/ACCESS.2018.2817215 |
[22] | L. Shi, Y. L. Feng, D. X. Zhou, Concentration estimates for learning with $l^{1}$-regularizer and data dependent hypothesis spaces, Appl. Comput. Harmon. Anal., 31 (2011), 286–302. https://doi.org/10.1016/j.acha.2011.01.001 doi: 10.1016/j.acha.2011.01.001 |
[23] | B. Yu, Rates of convergence for empirical processes of stationary mixing sequences, Ann. Probab., 22 (1994), 94–116. https://www.jstor.org/stable/2244496 |