Composite sparsity generalizes the standard sparsity that considers the sparsity on a linear transformation of the variables. In this paper, we study the composite sparse optimization problem consisting of minimizing the sum of a nondifferentiable loss function and the $ {\mathcal{\ell}_0} $ penalty term of a matrix times the coefficient vector. First, we consider an exact continuous relaxation problem with a capped-$ {\mathcal{\ell}_1} $ penalty that has the same optimal solution as the primal problem. Specifically, we propose the lifted stationary point of the relaxation problem and then establish the equivalence of the original and relaxation problems. Second, we propose a smoothing gradient descent (SGD) algorithm for the continuous relaxation problem, which solves the subproblem inexactly since the objective function is inseparable. We show that if the sequence generated by the SGD algorithm has an accumulation point, then it is a lifted stationary point. At last, we present several computational examples to illustrate the efficiency of the algorithm.
Citation: Wei Yang, Lili Pan, Jinhui Wan. Smoothing gradient descent algorithm for the composite sparse optimization[J]. AIMS Mathematics, 2024, 9(12): 33401-33422. doi: 10.3934/math.20241594
Composite sparsity generalizes the standard sparsity that considers the sparsity on a linear transformation of the variables. In this paper, we study the composite sparse optimization problem consisting of minimizing the sum of a nondifferentiable loss function and the $ {\mathcal{\ell}_0} $ penalty term of a matrix times the coefficient vector. First, we consider an exact continuous relaxation problem with a capped-$ {\mathcal{\ell}_1} $ penalty that has the same optimal solution as the primal problem. Specifically, we propose the lifted stationary point of the relaxation problem and then establish the equivalence of the original and relaxation problems. Second, we propose a smoothing gradient descent (SGD) algorithm for the continuous relaxation problem, which solves the subproblem inexactly since the objective function is inseparable. We show that if the sequence generated by the SGD algorithm has an accumulation point, then it is a lifted stationary point. At last, we present several computational examples to illustrate the efficiency of the algorithm.
[1] | E. J. Candès, J. Romberg, T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, IEEE T. Inform. Theory, 52 (2006), 489–509. |
[2] | D. L. Donoho, Compressed sensing, IEEE T. Inform. Theory, 52 (2006), 1289–1306. |
[3] | F. Facchinei, Minimization of SC1 functions and the Maratos effect, Oper. Res. Lett., 17 (1995), 131–137. https://doi.org/10.1016/0167-6377(94)00059-F doi: 10.1016/0167-6377(94)00059-F |
[4] | M. Elad, Sparse and redundant representations: From theory to applications in signal and image processing, Springer Science & Business Media, 2010. |
[5] | M. Elad, M. A. Figueiredo, Y. Ma, On the role of sparse and redundant representations in image processing, P. IEEE, 98 (2010), 972–982. https://doi.org/10.1109/JPROC.2009.2037655 doi: 10.1109/JPROC.2009.2037655 |
[6] | W. Bian, X. Chen, Linearly constrained non-Lipschitz optimization for image restoration, SIAM J. Imaging Sci., 8 (2015), 2294–2322. https://doi.org/10.1137/140985639 doi: 10.1137/140985639 |
[7] | X. Chen, M. K. Ng, C. Zhang, Non-Lipschitz $\ell_ {p} $-regularization and box constrained model for image restoration, IEEE T. Image Process., 21 (2012), 4709–4721. https://doi.org/10.1109/TIP.2012.2214051 doi: 10.1109/TIP.2012.2214051 |
[8] | J. Fan, L. Xue, H. Zou, Strong oracle optimality of folded concave penalized estimation, Ann. Stat., 42 (2014), 819. https://doi.org/10.1214/13-AOS1198 doi: 10.1214/13-AOS1198 |
[9] | W. Bian, X. Chen, A smoothing proximal gradient algorithm for nonsmooth convex regression with cardinality penalty, SIAM J. Numer. Anal., 58 (2020), 858–883. https://doi.org/10.1137/18M1186009 doi: 10.1137/18M1186009 |
[10] | X. Li, Z. Yang, X. Chen, Quantitative damage detection and sparse sensor array optimization of carbon fiber reinforced resin composite laminates for wind turbine blade structural health monitoring, Sensors, 14 (2014), 7312–7331. https://doi.org/10.3390/s140407312 doi: 10.3390/s140407312 |
[11] | W. Huang, Q. Fu, H. Dou, Z. Dong, Resonance-based sparse signal decomposition based on genetic optimization and its application to composite fault diagnosis of rolling bearings, In: ASME International Mechanical Engineering Congress and Exposition (Vol. 57403, p. V04BT04A054). American Society of Mechanical Engineers, 2015. https://doi.org/10.1115/IMECE2015-50874 |
[12] | L. L. Beyer, N. Balabanska, E. Tal, S. Karaman, Multi-modal motion planning using composite pose graph optimization, In: 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, 9981–9987. |
[13] | R. Zhou, Y. Wang, B. Qiao, W. Zhu, J. Liu, X. Chen, Impact force identification on composite panels using fully overlapping group sparsity based on Lp-norm regularization, Struct. Health Monit., 23 (2024), 137–161. |
[14] | J. Liu, L. Yuan, J. Ye, Guaranteed sparse recovery under linear transformation, In: International Conference on Machine Learning, 2013, 91–99. |
[15] | R. J. Tibshirani, The solution path of the generalized lasso, Stanford University, 2011. |
[16] | B. Xin, Y. Kawahara, Y. Wang, W. Gao, Efficient generalized fused lasso and its application to the diagnosis of Alzheimer's disease, In: Proceedings of the AAAI Conference on Artificial Intelligence, 2014. https://doi.org/10.1145/2847421 |
[17] | R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, K. Knight, Sparsity and smoothness via the fused lasso, J. R. Stat. Soc. B, 67 (2005), 91–108 https://doi.org/10.1111/j.1467-9868.2005.00490.x doi: 10.1111/j.1467-9868.2005.00490.x |
[18] | L. I. Rudin, S. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D, 60 (1992), 259–268. |
[19] | S. J. Kim, K. Koh, S. Boyd, D. Gorinevsky, ${\mathcal{\ell}_1}$ trend filtering, SIAM Rev., 51 (2009), 339–360. https://doi.org/10.1137/070690274 doi: 10.1137/070690274 |
[20] | J. Fan, R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc., 96 (2001), 1348–1360. https://doi.org/10.1198/016214501753382273 doi: 10.1198/016214501753382273 |
[21] | Z. Zheng, Y. Fan, J. Lv, High dimensional thresholded regression and shrinkage effect, J. R. Stat. Soc. B, 76 (2014), 627–649. https://doi.org/10.1111/rssb.12037 doi: 10.1111/rssb.12037 |
[22] | D. Peleg, R. Meir, A bilinear formulation for vector sparsity optimization, Signal Process., 88 (2008), 375–389. |
[23] | C. S. Ong, L. T. H. An, Learning sparse classifiers with difference of convex functions algorithms, Optim. Method. Softw., 28 (2013), 830–854. |
[24] | T. Zhang, Multi-stage convex relaxation for feature selection, Bernoulli, 19 (2013), 2277–2293. https://doi.org/10.3150/12-BEJ452 doi: 10.3150/12-BEJ452 |
[25] | W. Jiang, F. Nie, H. Huang, Robust dictionary learning with capped ${\mathcal{\ell}_1}$-norm, In: Twenty-fourth international joint conference on artificial intelligence, 2015. |
[26] | L. Pan, X. Chen, Group sparse optimization for images recovery using capped folded concave functions, SIAM J. Imaging Sci., 14 (2021), 1–25. |
[27] | M. Nikolova, Local strong homogeneity of a regularized estimator, SIAM J. Appl. Math., 61 (2000), 633–658. |
[28] | M. Chen, Q. Wang, S. Chen, X. Li, Capped ${\mathcal{\ell}_1}$-norm sparse representation method for graph clustering, IEEE Access, 7 (2019), 54464–54471. |
[29] | Z. Xue, L. Cai, Robust fisher-regularized twin extreme learning machine with capped ${\mathcal{\ell}_1}$-norm for classification, Axioms, 12 (2023), 717. |
[30] | E. Soubies, L. Blanc-Féraud, G. Aubert, A unified view of exact continuous penalties for ${\mathcal{\ell}_2}$-${\mathcal{\ell}_0}$ minimization, SIAM J. Optimiz., 27 (2017), 2034–2060. https://doi.org/10.1137/16m1059333 doi: 10.1137/16m1059333 |
[31] | D. Gabay, B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximation, Comput. Math. Appl., 2 (1976), 17–40. https://doi.org/10.1016/0898-1221(76)90003-1 doi: 10.1016/0898-1221(76)90003-1 |
[32] | Z. J. Bai, M. K. Ng, L. Qi, A coordinate gradient descent method for nonsmooth nonseparable minimization, Numer. Math.-Theory Me., 2 (2009), 377–402. https://doi.org/10.4208/nmtma.2009.m9002s doi: 10.4208/nmtma.2009.m9002s |
[33] | J. S. Pang, M. Razaviyayn, A. Alvarado, Computing B-stationary points of nonsmooth DC programs, Math. Oper. Res., 42 (2017), 95–118. https://doi.org/10.1287/MOOR.2016.0795 doi: 10.1287/MOOR.2016.0795 |
[34] | H. Lütkepohl, Handbook of matrices, John Wiley & Sons, 1997. |
[35] | A. M. Bruckstein, D. L. Donoho, M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images, SIAM Rev., 51 (2009), 34–81. |
[36] | M. Nikolova, M. K. Ng, Analysis of half-quadratic minimization methods for signal and image recovery, SIAM J. Sci. Comput., 27 (2005), 937–966. |
[37] | C. Cortes, V. Vapnik, Support-vector networks, Machine learning, 20 (1995), 273–297. https://doi.org/10.1023/A: 1022627411411 |
[38] | R. Blundell, J. L. Powell, Censored regression quantiles with endogenous regressors, J. Econometrics, 141 (2007), 65–83. |
[39] | T. B. Arnold, R. J. Tibshirani, Efficient implementations of the generalized lasso dual path algorithm, J. Comput. Graph. Stat., 25 (2016), 1–27. |
[40] | Z. Shen, Q. Chen, F. Yang, A convex relaxation framework consisting of a primal-dual alternative algorithm for solving ${\mathcal{\ell}_0}$ sparsity-induced optimization problems with application to signal recovery based image restoration, J. Comput. Appl. Math., 421 (2023), 114878. |
[41] | Q. Chen, Z. Shen, A two-metric variable scaled forward-backward algorithm for ${\mathcal{\ell}_0}$ optimization problem and its applications, Numer. Algorithms, 97 (2024), 191–221. |
[42] | T. Zhang, Analysis of multi-stage convex relaxation for sparse regularization, J. Mach. Learn. Res., 11 (2010). |
[43] | H. Zou, R. Li, One-step sparse estimates in nonconcave penalized likelihood models, Anna. Stat., 36 (2008), 1509. |