We consider the regression problem of estimating functions on $ \mathbb{R}^D $ but supported on a $ d $-dimensional manifold $ \mathcal{M} ~~\subset \mathbb{R}^D $ with $ d \ll D $. Drawing ideas from multi-resolution analysis and nonlinear approximation, we construct low-dimensional coordinates on $ \mathcal{M} $ at multiple scales, and perform multiscale regression by local polynomial fitting. We propose a data-driven wavelet thresholding scheme that automatically adapts to the unknown regularity of the function, allowing for efficient estimation of functions exhibiting nonuniform regularity at different locations and scales. We analyze the generalization error of our method by proving finite sample bounds in high probability on rich classes of priors. Our estimator attains optimal learning rates (up to logarithmic factors) as if the function was defined on a known Euclidean domain of dimension $ d $, instead of an unknown manifold embedded in $ \mathbb{R}^D $. The implemented algorithm has quasilinear complexity in the sample size, with constants linear in $ D $ and exponential in $ d $. Our work therefore establishes a new framework for regression on low-dimensional sets embedded in high dimensions, with fast implementation and strong theoretical guarantees.
Citation: Wenjing Liao, Mauro Maggioni, Stefano Vigogna. Multiscale regression on unknown manifolds[J]. Mathematics in Engineering, 2022, 4(4): 1-25. doi: 10.3934/mine.2022028
We consider the regression problem of estimating functions on $ \mathbb{R}^D $ but supported on a $ d $-dimensional manifold $ \mathcal{M} ~~\subset \mathbb{R}^D $ with $ d \ll D $. Drawing ideas from multi-resolution analysis and nonlinear approximation, we construct low-dimensional coordinates on $ \mathcal{M} $ at multiple scales, and perform multiscale regression by local polynomial fitting. We propose a data-driven wavelet thresholding scheme that automatically adapts to the unknown regularity of the function, allowing for efficient estimation of functions exhibiting nonuniform regularity at different locations and scales. We analyze the generalization error of our method by proving finite sample bounds in high probability on rich classes of priors. Our estimator attains optimal learning rates (up to logarithmic factors) as if the function was defined on a known Euclidean domain of dimension $ d $, instead of an unknown manifold embedded in $ \mathbb{R}^D $. The implemented algorithm has quasilinear complexity in the sample size, with constants linear in $ D $ and exponential in $ d $. Our work therefore establishes a new framework for regression on low-dimensional sets embedded in high dimensions, with fast implementation and strong theoretical guarantees.
[1] | W. K. Allard, G. Chen, M. Maggioni, Multi-scale geometric methods for data sets II: geometric multi-resolution analysis, Appl. Comput. Harmon. Anal., 32 (2012), 435-462. doi: 10.1016/j.acha.2011.08.001 |
[2] | M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput., 15 (2003), 1373-1396. doi: 10.1162/089976603321780317 |
[3] | A. Beygelzimer, S. Kakade, J. Langford, Cover trees for nearest neighbor, In: Proceedings of the 23rd international conference on Machine learning, 2006, 97-104. |
[4] | P. J. Bickel, B. Li, Local polynomial regression on unknown manifolds, Lecture Notes-Monograph Series, 54 (2007), 177-186. |
[5] | P. Binev, A. Cohen, W. Dahmen, R. A. DeVore, Universal algorithms for learning theory part II: Piecewise polynomial functions, Constr. Approx., 26 (2007), 127-152. |
[6] | P. Binev, A. Cohen, W. Dahmen, R. A. DeVore, V. N. Temlyakov, Universal algorithms for learning theory part I: Piecewise constant functions, J. Mach. Learn. Res., 6 (2005), 1297-1321. |
[7] | V. Buldygin, E. Pechuk, Inequalities for the distributions of functionals of sub-Gaussian vectors, Theor. Probability and Math. Statist., 80 (2010), 25-36. |
[8] | G. Chen, G. Lerman, Spectral Curvature Clustering (SCC), Int. J. Comput. Vis., 81 (2009), 317-330. |
[9] | G. Chen, M. Maggioni, Multiscale geometric and spectral analysis of plane arrangements, In: IEEE Conference on Computer Vision and Pattern Recognition, 2011, 2825-2832. |
[10] | M. Christ, A $T(b)$ theorem with remarks on analytic capacity and the {C}auchy integral, Colloq. Math., 60/61 (1990), 601-628. |
[11] | A. Cohen, W. Dahmen, I. Daubechies, R. A. DeVore, Tree approximation and optimal encoding, Appl. Comput. Harmon. Anal., 11 (2001), 192-226. doi: 10.1006/acha.2001.0336 |
[12] | R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, et al., Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps, PNAS, 102 (2005), 7426-7431. |
[13] | I. Daubechies, Ten lectures on wavelets, SIAM, 1992. |
[14] | D. Deng, Y. Han, Harmonic analysis on spaces of homogeneous type, Springer, 2008. |
[15] | D. L. Donoho, C. Grimes, Hessian eigenmaps: locally linear embedding techniques for high-dimensional data, PNAS, 100 (2003), 5591-5596. |
[16] | D. L. Donoho, J. M. Johnstone, Ideal spatial adaptation by wavelet shrinkage, Biometrika, 81 (1994), 425-455. |
[17] | D. L. Donoho, J. M. Johnstone, Adapting to unknown smoothness via wavelet shrinkage, J. Am. Stat. Assoc., 90 (1995), 1200-1224. |
[18] | E. Elhamifar, R. Vidal, Sparse subspace clustering, In: IEEE Conference on Computer Vision and Pattern Recognition, 2009, 2790-2797. |
[19] | H. Federer, Curvature measures, T. Am. Math. Soc., 93 (1959), 418-491. |
[20] | J. Friedman, T. Hastie, R. Tibshirani, The elements of statistical learning, Springer, 2001. |
[21] | L. Györfi, M. Kohler, A. Krzyżak, H. Walk, A distribution-free theory of nonparametric regression, Springer, 2002. |
[22] | N. Halko, P. G. Martinsson, J. A. Tropp, Finding structure with randomness: stochastic algorithms for constructing approximate matrix decompositions, SIAM Rev., 53 (2011), 217-288. |
[23] | P. C. Hansen, The truncated SVD as a method for regularization, Bit Numer. Math., 27 (1987), 534-553. |
[24] | H. Hotelling, Analysis of a complex of statistical variables into principal components, Journal of Educational Psychology, 24 (1933), 417-441. |
[25] | H. Hotelling, Relations between two sets of variates, Biometrika, 28 (1936), 321-377. |
[26] | I. T. Jolliffe, A note on the use of principal components in regression, J. C. Stat. Soc. C. Appl., 31 (1982), 300-303. |
[27] | G. Karypis, V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput., 20 (1999), 359-392. |
[28] | T. Klock, A. Lanteri, S. Vigogna, Estimating multi-index models with response-conditional least squares, Electron. J. Stat., 15 (2021), 589-629. |
[29] | S. Kpotufe, $k$-NN regression adapts to local intrinsic dimension, In: Advances in Neural Information Processing Systems 24 (NIPS 2011), 2011,729-737. |
[30] | S. Kpotufe, S. Dasgupta, A tree-based regressor that adapts to intrinsic dimension, J. Comput. Syst. Sci., 78 (2012), 1496-1515. |
[31] | S. Kpotufe, V. K. Garg, Adaptivity to local smoothness and dimension in kernel regression, In: Advances in Neural Information Processing Systems 26 (NIPS 2011), 2013, 3075-3083. |
[32] | A. Lanteri, M. Maggioni, S. Vigogna, Conditional regression for single-index models, 2020 arXiv: 2002.10008. |
[33] | A. B. Lee, R. Izbicki, A spectral series approach to high-dimensional nonparametric regression, Electron. J. Stat., 10 (2016), 423-463. |
[34] | W. Liao, M. Maggioni, Adaptive geometric multiscale approximations for intrinsically low-dimensional data, J. Mach. Learn. Res., 20 (2019), 1-63. |
[35] | W. Liao, M. Maggioni, S. Vigogna, Learning adaptive multiscale approximations to data and functions near low-dimensional sets, In: IEEE Information Theory Workshop (ITW), 2016,226-230. |
[36] | G. Liu, Z. Lin, Y. Yu, Robust subspace segmentation by low-rank representation, In: Proceedings of the 26 th International Conference on Machine Learning, 2010,663-670. |
[37] | M. Maggioni, S. Minsker, N. Strawn, Multiscale dictionary learning: Non-asymptotic bounds and robustness, J. Mach. Learn. Res., 17 (2016), 1-51. |
[38] | S. Mallat, A wavelet tour of signal processing, 2 Eds., Academic Press, 1999. |
[39] | K. Pearson, On lines and planes of closest fit to systems of points in space, Philos. Mag., 2 (1901), 559-572. |
[40] | S. T. Roweis, L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science, 290 (2000), 2323-2326. |
[41] | I. Steinwart, D. R. Hush, C. Scovel, Optimal rates for regularized least squares regression, In: The 22nd Annual Conference on Learning Theory, 2009. |
[42] | A. Szlam, Asymptotic regularity of subdivisions of euclidean domains by iterated PCA and iterated 2-means, Appl. Comput. Harmon. Anal., 27 (2009), 342-350. |
[43] | J. B. Tenenbaum, V. D. Silva, J. C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science, 290 (2000), 2319-2323. |
[44] | J. A. Tropp, User-friendly tools for random matrices: An introduction, NIPS version, 2012. |
[45] | A. B. Tsybakov, Introduction to nonparametric estimation, Springer, 2009. |
[46] | R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, In: Compressed sensing, Cambridge University Press, 2012,210-268. |
[47] | R. Vidal, Y. Ma, S. Sastry, Generalized principal component analysis (GPCA), IEEE T. Pattern Anal., 27 (2005), 1945-1959. |
[48] | G. B. Ye, D. X. Zhou, Learning and approximation by Gaussians on Riemannian manifolds, Adv. Comput. Math., 29 (2008), 291-310. doi: 10.1007/s10444-007-9049-0 |
[49] | Z. Zhang, H. Zha, Principal manifolds and nonlinear dimension reduction via local tangent space alignment, SIAM J. Sci. Comput., 26 (2002), 313-338. |
[50] | X. Zhou, N. Srebro, Error analysis of Laplacian eigenmaps for semi-supervised learning, In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, 2011,901-908. |