This work considers stochastic optimization problems in which the objective function values can only be computed by a blackbox corrupted by some random noise following an unknown distribution. The proposed method is based on sequential stochastic optimization (SSO), i.e., the original problem is decomposed into a sequence of subproblems. Each subproblem is solved by using a zeroth-order version of a sign stochastic gradient descent with momentum algorithm (i.e., ZO-signum) and with increasingly fine precision. This decomposition allows a good exploration of the space while maintaining the efficiency of the algorithm once it gets close to the solution. Under the Lipschitz continuity assumption on the blackbox, a convergence rate in mean is derived for the ZO-signum algorithm. Moreover, if the blackbox is smooth and convex or locally convex around its minima, the rate of convergence to an $ \epsilon $-optimal point of the problem may be obtained for the SSO algorithm. Numerical experiments are conducted to compare the SSO algorithm with other state-of-the-art algorithms and to demonstrate its competitiveness.
Citation: Charles Audet, Jean Bigeon, Romain Couderc, Michael Kokkolaras. Sequential stochastic blackbox optimization with zeroth-order gradient estimators[J]. AIMS Mathematics, 2023, 8(11): 25922-25956. doi: 10.3934/math.20231321
This work considers stochastic optimization problems in which the objective function values can only be computed by a blackbox corrupted by some random noise following an unknown distribution. The proposed method is based on sequential stochastic optimization (SSO), i.e., the original problem is decomposed into a sequence of subproblems. Each subproblem is solved by using a zeroth-order version of a sign stochastic gradient descent with momentum algorithm (i.e., ZO-signum) and with increasingly fine precision. This decomposition allows a good exploration of the space while maintaining the efficiency of the algorithm once it gets close to the solution. Under the Lipschitz continuity assumption on the blackbox, a convergence rate in mean is derived for the ZO-signum algorithm. Moreover, if the blackbox is smooth and convex or locally convex around its minima, the rate of convergence to an $ \epsilon $-optimal point of the problem may be obtained for the SSO algorithm. Numerical experiments are conducted to compare the SSO algorithm with other state-of-the-art algorithms and to demonstrate its competitiveness.
[1] | C. Audet, J. Dennis, Mesh adaptive direct search algorithms for constrained optimization, SIAM J. Optimiz., 17 (2006), 188–217. http://dx.doi.org/10.1137/040603371 doi: 10.1137/040603371 |
[2] | C. Audet, K. Dzahini, M. Kokkolaras, S. Le Digabel, Stochastic mesh adaptive direct search for blackbox optimization using probabilistic estimates, Comput. Optim. Appl., 79 (2021), 1–34. http://dx.doi.org/10.1007/s10589-020-00249-0 doi: 10.1007/s10589-020-00249-0 |
[3] | C. Audet, W. Hare, Derivative-free and blackbox optimization, Cham: Springer, 2017. http://dx.doi.org/10.1007/978-3-319-68913-5 |
[4] | C. Audet, A. Ihaddadene, S. Le Digabel, C. Tribes, Robust optimization of noisy blackbox problems using the mesh adaptive direct search algorithm, Optim. Lett., 12 (2018), 675–689. http://dx.doi.org/10.1007/s11590-017-1226-6 doi: 10.1007/s11590-017-1226-6 |
[5] | K. Balasubramanian, S. Ghadimi, Zeroth-order nonconvex stochastic optimization: handling constraints, high dimensionality, and saddle points, Found. Computat. Math., 22 (2022), 35–76. http://dx.doi.org/10.1007/s10208-021-09499-8 doi: 10.1007/s10208-021-09499-8 |
[6] | J. Bernstein, Y. Wang, K. Azizzadenesheli, A. Anandkumar, SignSGD: compressed optimisation for non-convex problems, Proceedings of International Conference on Machine Learning, 2018,560–569. |
[7] | S. Bhatnagar, H. Prasad, L. Prashanth, Stochastic recursive algorithms for optimization, London: Springer, 2013. http://dx.doi.org/10.1007/978-1-4471-4285-0 |
[8] | J. Blank, K. Deb, Pymoo: multi-objective optimization in Python, IEEE Access, 8 (2020), 89497–89509. http://dx.doi.org/10.1109/ACCESS.2020.2990567 doi: 10.1109/ACCESS.2020.2990567 |
[9] | H. Cai, Y. Lou, D. McKenzie, W. Yin, A zeroth-order block coordinate descent algorithm for huge-scale black-box optimization, Proceedings of the 38th International Conference on Machine Learning, 2021, 1193–1203. |
[10] | H. Cai, D. McKenzie, W. Yin, Z. Zhang, A one-bit, comparison-based gradient estimator, Appl. Comput. Harmon. Anal., 60 (2022), 242–266. http://dx.doi.org/10.1016/j.acha.2022.03.003 doi: 10.1016/j.acha.2022.03.003 |
[11] | H. Cai, D. Mckenzie, W. Yin, Z. Zhang, Zeroth-order regularized optimization (zoro): approximately sparse gradients and adaptive sampling, SIAM J. Optim., 32 (2022), 687–714. http://dx.doi.org/10.1137/21M1392966 doi: 10.1137/21M1392966 |
[12] | N. Carlini, D. Wagner, Towards evaluating the robustness of neural networks, Proceedings of 2017 IEEE Symposium on Security and Privacy, 2017, 39–57. http://dx.doi.org/10.1109/SP.2017.49 doi: 10.1109/SP.2017.49 |
[13] | K. Chang, Stochastic nelder-mead simplex method-a new globally convergent direct search method for simulation optimization, Eur. J. Oper. Res., 220 (2012), 684–694. http://dx.doi.org/10.1016/j.ejor.2012.02.028 doi: 10.1016/j.ejor.2012.02.028 |
[14] | R. Chen, M. Menickelly, K. Scheinberg, Stochastic optimization using a trust-region method and random models, Math. Program., 169 (2018), 447–487. http://dx.doi.org/10.1007/s10107-017-1141-8 doi: 10.1007/s10107-017-1141-8 |
[15] | X. Chen, S. Liu, K. Xu, X. Li, X. Lin, M. Hong, et al., Zo-adamm: zeroth-order adaptive momentum method for black-box optimization, Proceedings of 33rd Conference on Neural Information Processing Systems, 2019, 1–12. |
[16] | A. Conn, K. Scheinberg, L. Vicente, Introduction to derivative-free optimization, Philadelphia: SIAM, 2009. http://dx.doi.org/10.1137/1.9780898718768 |
[17] | F. Curtis, K. Scheinberg, R. Shi, A stochastic trust region algorithm based on careful step normalization, Informs Journal on Optimization, 1 (2019), 200–220. http://dx.doi.org/10.1287/ijoo.2018.0010 doi: 10.1287/ijoo.2018.0010 |
[18] | J. Deng, W. Dong, R. Socher, L. Li, K. Li, F. Li, Imagenet: a large-scale hierarchical image database, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009,248–255. http://dx.doi.org/10.1109/CVPR.2009.5206848 doi: 10.1109/CVPR.2009.5206848 |
[19] | M. Garneau, Modelling of a solar thermal power plant for benchmarking blackbox optimization solvers, Ph. D Thesis, École Polytechnique de Montréal, 2015. |
[20] | S. Ghadimi, G. Lan, Stochastic first-and zeroth-order methods for nonconvex stochastic programming, SIAM J. Optim., 23 (2013), 2341–2368. http://dx.doi.org/10.1137/120880811 doi: 10.1137/120880811 |
[21] | S. Ghadimi, A. Ruszczynski, M. Wang, A single timescale stochastic approximation method for nested stochastic optimization, SIAM J. Optim., 30 (2020), 960–979. http://dx.doi.org/10.1137/18M1230542 doi: 10.1137/18M1230542 |
[22] | N. Hansen, The CMA evolution strategy: a comparing review, In: Towards a new evolutionary computation, Berlin: Springer, 2006, 75–102. http://dx.doi.org/10.1007/3-540-32494-1_4 |
[23] | S. Karimireddy, Q. Rebjock, S. Stich, M. Jaggi, Error feedback fixes signsgd and other gradient compression schemes, Proceedings of the 36th International Conference on Machine Learning, 2019, 3252–3261. |
[24] | J. Kiefer, J. Wolfowitz, Stochastic estimation of the maximum of a regression function, Ann. Math. Statist., 23 (1952), 462–466. http://dx.doi.org/10.1214/aoms/1177729392 doi: 10.1214/aoms/1177729392 |
[25] | B. Kim, H. Cai, D. McKenzie, W. Yin, Curvature-aware derivative-free optimization, arXiv:2109.13391. |
[26] | D. Kingma, J. Ba, Adam: a method for stochastic optimization, arXiv:1412.6980. |
[27] | M. Kokkolaras, Z. Mourelatos, P. Papalambros, Impact of uncertainty quantification on design: an engine optimisation case study, International Journal of Reliability and Safety, 1 (2006), 225–237. http://dx.doi.org/10.1504/IJRS.2006.010786 doi: 10.1504/IJRS.2006.010786 |
[28] | A. Krizhevsky, I. Sutskever, G. Hinton, Imagenet classification with deep convolutional neural networks, Commun. ACM, 60 (2017), 84–90. http://dx.doi.org/10.1145/3065386 doi: 10.1145/3065386 |
[29] | S. Le Digabel, Algorithm 909: NOMAD: nonlinear optimization with the MADS algorithm, ACM T. Math. Software, 37 (2011), 1–15. http://dx.doi.org/10.1145/1916461.1916468 doi: 10.1145/1916461.1916468 |
[30] | S. Liu, P. Chen, X. Chen, M. Hong, Sign-SGD via zeroth-order oracle, Proceedings of International Conference on Learning Representations, 2019, 1–24. |
[31] | S. Liu, P. Chen, B. Kailkhura, G. Zhang, A. Hero, P. Varshney, A primer on zeroth-order optimization in signal processing and machine learning: principals, recent advances, and applications, IEEE Signal Proc. Mag., 37 (2020), 43–54. http://dx.doi.org/10.1109/MSP.2020.3003837 doi: 10.1109/MSP.2020.3003837 |
[32] | S. Liu, B. Kailkhura, P. Chen, P. Ting, S. Chang, L. Amini, Zeroth-order stochastic variance reduction for nonconvex optimization, Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, 3731–3741. |
[33] | A. Maggiar, A. Wachter, I. Dolinskaya, J. Staum, A derivative-free trust-region algorithm for the optimization of functions smoothed via gaussian convolution using adaptive multiple importance sampling, SIAM J. Optim., 28 (2018), 1478–1507. http://dx.doi.org/10.1137/15M1031679 doi: 10.1137/15M1031679 |
[34] | Y. Nesterov, V. Spokoiny, Random gradient-free minimization of convex functions, Found. Comput. Math., 17 (2017), 527–566. http://dx.doi.org/10.1007/s10208-015-9296-2 doi: 10.1007/s10208-015-9296-2 |
[35] | N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. Berkay Celik, A. Swami, Practical black-box attacks against machine learning, Proceedings of the 2017 ACM on Asia conference on computer and communications security, 2017,506–519. http://dx.doi.org/10.1145/3052973.3053009 doi: 10.1145/3052973.3053009 |
[36] | E. Real, S. Moore, A. Selle, S. Saxena, Y. Suematsu, J. Tan, et al., Large-scale evolution of image classifiers, Proceedings of the 34th International Conference on Machine Learning, 2017, 2902–2911. |
[37] | H. Robbins, S. Monro, A stochastic approximation method, Ann. Math. Statist., 22 (1951), 400–407. http://dx.doi.org/10.1214/aoms/1177729586 doi: 10.1214/aoms/1177729586 |
[38] | R. Rockafellar, J. Royset, Risk measures in engineering design under uncertainty, Proceedings of International Conference on Applications of Statistics and Probability, 2015, 1–8. http://dx.doi.org/10.14288/1.0076159 doi: 10.14288/1.0076159 |
[39] | R. Rubinstein, Simulation and the Monte Carlo method, Hoboken: John Wiley & Sons Inc., 1981. http://dx.doi.org/10.1002/9780470316511 |
[40] | A. Ruszczynski, W. Syski, Stochastic approximation method with gradient averaging for unconstrained problems, IEEE T. Automat. Contr., 28 (1983), 1097–1105. http://dx.doi.org/10.1109/TAC.1983.1103184 doi: 10.1109/TAC.1983.1103184 |
[41] | J. Spall, Multivariate stochastic approximation using a simultaneous perturbation gradient approximation, IEEE T. Automat. Contr., 37 (1992), 332–341. http://dx.doi.org/10.1109/9.119632 doi: 10.1109/9.119632 |
[42] | M. Styblinski, T. Tang, Experiments in nonconvex optimization: stochastic approximation with function smoothing and simulated annealing, Neural Networks, 3 (1990), 467–483. |
[43] | C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016, 2818–2826. http://dx.doi.org/10.1109/CVPR.2016.308 doi: 10.1109/CVPR.2016.308 |
[44] | V. Volz, J. Schrum, J. Liu, S. Lucas, A. Smith, S. Risi, Evolving mario levels in the latent space of a deep convolutional generative adversarial network, Proceedings of the Genetic and Evolutionary Computation Conference, 2018,221–228. http://dx.doi.org/10.1145/3205455.3205517 doi: 10.1145/3205455.3205517 |
[45] | K. Xu, S. Liu, P. Zhao, P. Chen, H. Zhang, Q. Fan, et al., Structured adversarial attack: towards general implementation and better interpretability, Proceedings of International Conference on Learning Representations, 2019, 1–21. |