Sequential stochastic blackbox optimization with zeroth-order gradient estimators

Charles Audet; Jean Bigeon; Romain Couderc; Michael Kokkolaras; Charles Audet; Jean Bigeon; Romain Couderc; Michael Kokkolaras

doi:10.3934/math.20231321

AIMS Mathematics

2023, Volume 8, Issue 11: 25922-25956. doi: 10.3934/math.20231321

Previous Article Next Article

Research article Special Issues

Sequential stochastic blackbox optimization with zeroth-order gradient estimators

1.
GERAD, Department of Mathematical and Industrial Engineering, École Polytechnique de Montréal, Montréal, Québec, Canada
2.
Nantes University, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
3.
GSCOP, Department of Industrial Engineering, Grenoble-Alpes University, Grenoble, France
4.
GERAD and Department of Mechanical Engineering, McGill University, Montréal, Canada

Received: 23 May 2023 Revised: 11 August 2023 Accepted: 21 August 2023 Published: 08 September 2023
MSC : 65K05, 90C15, 90C30, 90C56, 90C90

This work considers stochastic optimization problems in which the objective function values can only be computed by a blackbox corrupted by some random noise following an unknown distribution. The proposed method is based on sequential stochastic optimization (SSO), i.e., the original problem is decomposed into a sequence of subproblems. Each subproblem is solved by using a zeroth-order version of a sign stochastic gradient descent with momentum algorithm (i.e., ZO-signum) and with increasingly fine precision. This decomposition allows a good exploration of the space while maintaining the efficiency of the algorithm once it gets close to the solution. Under the Lipschitz continuity assumption on the blackbox, a convergence rate in mean is derived for the ZO-signum algorithm. Moreover, if the blackbox is smooth and convex or locally convex around its minima, the rate of convergence to an $ \epsilon $-optimal point of the problem may be obtained for the SSO algorithm. Numerical experiments are conducted to compare the SSO algorithm with other state-of-the-art algorithms and to demonstrate its competitiveness.
- stochastic blackbox optimization,
- gradient approximation,
- sequential optimization,
- momentum-based method,
- convergence rate analysis
Citation: Charles Audet, Jean Bigeon, Romain Couderc, Michael Kokkolaras. Sequential stochastic blackbox optimization with zeroth-order gradient estimators[J]. AIMS Mathematics, 2023, 8(11): 25922-25956. doi: 10.3934/math.20231321

Related Papers:

Abstract

This work considers stochastic optimization problems in which the objective function values can only be computed by a blackbox corrupted by some random noise following an unknown distribution. The proposed method is based on sequential stochastic optimization (SSO), i.e., the original problem is decomposed into a sequence of subproblems. Each subproblem is solved by using a zeroth-order version of a sign stochastic gradient descent with momentum algorithm (i.e., ZO-signum) and with increasingly fine precision. This decomposition allows a good exploration of the space while maintaining the efficiency of the algorithm once it gets close to the solution. Under the Lipschitz continuity assumption on the blackbox, a convergence rate in mean is derived for the ZO-signum algorithm. Moreover, if the blackbox is smooth and convex or locally convex around its minima, the rate of convergence to an $ \epsilon $-optimal point of the problem may be obtained for the SSO algorithm. Numerical experiments are conducted to compare the SSO algorithm with other state-of-the-art algorithms and to demonstrate its competitiveness.

References

[1]	C. Audet, J. Dennis, Mesh adaptive direct search algorithms for constrained optimization, SIAM J. Optimiz., 17 (2006), 188–217. http://dx.doi.org/10.1137/040603371 doi: 10.1137/040603371
[2]	C. Audet, K. Dzahini, M. Kokkolaras, S. Le Digabel, Stochastic mesh adaptive direct search for blackbox optimization using probabilistic estimates, Comput. Optim. Appl., 79 (2021), 1–34. http://dx.doi.org/10.1007/s10589-020-00249-0 doi: 10.1007/s10589-020-00249-0
[3]	C. Audet, W. Hare, Derivative-free and blackbox optimization, Cham: Springer, 2017. http://dx.doi.org/10.1007/978-3-319-68913-5
[4]	C. Audet, A. Ihaddadene, S. Le Digabel, C. Tribes, Robust optimization of noisy blackbox problems using the mesh adaptive direct search algorithm, Optim. Lett., 12 (2018), 675–689. http://dx.doi.org/10.1007/s11590-017-1226-6 doi: 10.1007/s11590-017-1226-6
[5]	K. Balasubramanian, S. Ghadimi, Zeroth-order nonconvex stochastic optimization: handling constraints, high dimensionality, and saddle points, Found. Computat. Math., 22 (2022), 35–76. http://dx.doi.org/10.1007/s10208-021-09499-8 doi: 10.1007/s10208-021-09499-8
[6]	J. Bernstein, Y. Wang, K. Azizzadenesheli, A. Anandkumar, SignSGD: compressed optimisation for non-convex problems, Proceedings of International Conference on Machine Learning, 2018,560–569.
[7]	S. Bhatnagar, H. Prasad, L. Prashanth, Stochastic recursive algorithms for optimization, London: Springer, 2013. http://dx.doi.org/10.1007/978-1-4471-4285-0
[8]	J. Blank, K. Deb, Pymoo: multi-objective optimization in Python, IEEE Access, 8 (2020), 89497–89509. http://dx.doi.org/10.1109/ACCESS.2020.2990567 doi: 10.1109/ACCESS.2020.2990567
[9]	H. Cai, Y. Lou, D. McKenzie, W. Yin, A zeroth-order block coordinate descent algorithm for huge-scale black-box optimization, Proceedings of the 38th International Conference on Machine Learning, 2021, 1193–1203.
[10]	H. Cai, D. McKenzie, W. Yin, Z. Zhang, A one-bit, comparison-based gradient estimator, Appl. Comput. Harmon. Anal., 60 (2022), 242–266. http://dx.doi.org/10.1016/j.acha.2022.03.003 doi: 10.1016/j.acha.2022.03.003
[11]	H. Cai, D. Mckenzie, W. Yin, Z. Zhang, Zeroth-order regularized optimization (zoro): approximately sparse gradients and adaptive sampling, SIAM J. Optim., 32 (2022), 687–714. http://dx.doi.org/10.1137/21M1392966 doi: 10.1137/21M1392966
[12]	N. Carlini, D. Wagner, Towards evaluating the robustness of neural networks, Proceedings of 2017 IEEE Symposium on Security and Privacy, 2017, 39–57. http://dx.doi.org/10.1109/SP.2017.49 doi: 10.1109/SP.2017.49
[13]	K. Chang, Stochastic nelder-mead simplex method-a new globally convergent direct search method for simulation optimization, Eur. J. Oper. Res., 220 (2012), 684–694. http://dx.doi.org/10.1016/j.ejor.2012.02.028 doi: 10.1016/j.ejor.2012.02.028
[14]	R. Chen, M. Menickelly, K. Scheinberg, Stochastic optimization using a trust-region method and random models, Math. Program., 169 (2018), 447–487. http://dx.doi.org/10.1007/s10107-017-1141-8 doi: 10.1007/s10107-017-1141-8
[15]	X. Chen, S. Liu, K. Xu, X. Li, X. Lin, M. Hong, et al., Zo-adamm: zeroth-order adaptive momentum method for black-box optimization, Proceedings of 33rd Conference on Neural Information Processing Systems, 2019, 1–12.
[16]	A. Conn, K. Scheinberg, L. Vicente, Introduction to derivative-free optimization, Philadelphia: SIAM, 2009. http://dx.doi.org/10.1137/1.9780898718768
[17]	F. Curtis, K. Scheinberg, R. Shi, A stochastic trust region algorithm based on careful step normalization, Informs Journal on Optimization, 1 (2019), 200–220. http://dx.doi.org/10.1287/ijoo.2018.0010 doi: 10.1287/ijoo.2018.0010
[18]	J. Deng, W. Dong, R. Socher, L. Li, K. Li, F. Li, Imagenet: a large-scale hierarchical image database, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009,248–255. http://dx.doi.org/10.1109/CVPR.2009.5206848 doi: 10.1109/CVPR.2009.5206848
[19]	M. Garneau, Modelling of a solar thermal power plant for benchmarking blackbox optimization solvers, Ph. D Thesis, École Polytechnique de Montréal, 2015.
[20]	S. Ghadimi, G. Lan, Stochastic first-and zeroth-order methods for nonconvex stochastic programming, SIAM J. Optim., 23 (2013), 2341–2368. http://dx.doi.org/10.1137/120880811 doi: 10.1137/120880811
[21]	S. Ghadimi, A. Ruszczynski, M. Wang, A single timescale stochastic approximation method for nested stochastic optimization, SIAM J. Optim., 30 (2020), 960–979. http://dx.doi.org/10.1137/18M1230542 doi: 10.1137/18M1230542
[22]	N. Hansen, The CMA evolution strategy: a comparing review, In: Towards a new evolutionary computation, Berlin: Springer, 2006, 75–102. http://dx.doi.org/10.1007/3-540-32494-1_4
[23]	S. Karimireddy, Q. Rebjock, S. Stich, M. Jaggi, Error feedback fixes signsgd and other gradient compression schemes, Proceedings of the 36th International Conference on Machine Learning, 2019, 3252–3261.
[24]	J. Kiefer, J. Wolfowitz, Stochastic estimation of the maximum of a regression function, Ann. Math. Statist., 23 (1952), 462–466. http://dx.doi.org/10.1214/aoms/1177729392 doi: 10.1214/aoms/1177729392
[25]	B. Kim, H. Cai, D. McKenzie, W. Yin, Curvature-aware derivative-free optimization, arXiv:2109.13391.
[26]	D. Kingma, J. Ba, Adam: a method for stochastic optimization, arXiv:1412.6980.
[27]	M. Kokkolaras, Z. Mourelatos, P. Papalambros, Impact of uncertainty quantification on design: an engine optimisation case study, International Journal of Reliability and Safety, 1 (2006), 225–237. http://dx.doi.org/10.1504/IJRS.2006.010786 doi: 10.1504/IJRS.2006.010786
[28]	A. Krizhevsky, I. Sutskever, G. Hinton, Imagenet classification with deep convolutional neural networks, Commun. ACM, 60 (2017), 84–90. http://dx.doi.org/10.1145/3065386 doi: 10.1145/3065386
[29]	S. Le Digabel, Algorithm 909: NOMAD: nonlinear optimization with the MADS algorithm, ACM T. Math. Software, 37 (2011), 1–15. http://dx.doi.org/10.1145/1916461.1916468 doi: 10.1145/1916461.1916468
[30]	S. Liu, P. Chen, X. Chen, M. Hong, Sign-SGD via zeroth-order oracle, Proceedings of International Conference on Learning Representations, 2019, 1–24.
[31]	S. Liu, P. Chen, B. Kailkhura, G. Zhang, A. Hero, P. Varshney, A primer on zeroth-order optimization in signal processing and machine learning: principals, recent advances, and applications, IEEE Signal Proc. Mag., 37 (2020), 43–54. http://dx.doi.org/10.1109/MSP.2020.3003837 doi: 10.1109/MSP.2020.3003837
[32]	S. Liu, B. Kailkhura, P. Chen, P. Ting, S. Chang, L. Amini, Zeroth-order stochastic variance reduction for nonconvex optimization, Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, 3731–3741.
[33]	A. Maggiar, A. Wachter, I. Dolinskaya, J. Staum, A derivative-free trust-region algorithm for the optimization of functions smoothed via gaussian convolution using adaptive multiple importance sampling, SIAM J. Optim., 28 (2018), 1478–1507. http://dx.doi.org/10.1137/15M1031679 doi: 10.1137/15M1031679
[34]	Y. Nesterov, V. Spokoiny, Random gradient-free minimization of convex functions, Found. Comput. Math., 17 (2017), 527–566. http://dx.doi.org/10.1007/s10208-015-9296-2 doi: 10.1007/s10208-015-9296-2
[35]	N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. Berkay Celik, A. Swami, Practical black-box attacks against machine learning, Proceedings of the 2017 ACM on Asia conference on computer and communications security, 2017,506–519. http://dx.doi.org/10.1145/3052973.3053009 doi: 10.1145/3052973.3053009
[36]	E. Real, S. Moore, A. Selle, S. Saxena, Y. Suematsu, J. Tan, et al., Large-scale evolution of image classifiers, Proceedings of the 34th International Conference on Machine Learning, 2017, 2902–2911.
[37]	H. Robbins, S. Monro, A stochastic approximation method, Ann. Math. Statist., 22 (1951), 400–407. http://dx.doi.org/10.1214/aoms/1177729586 doi: 10.1214/aoms/1177729586
[38]	R. Rockafellar, J. Royset, Risk measures in engineering design under uncertainty, Proceedings of International Conference on Applications of Statistics and Probability, 2015, 1–8. http://dx.doi.org/10.14288/1.0076159 doi: 10.14288/1.0076159
[39]	R. Rubinstein, Simulation and the Monte Carlo method, Hoboken: John Wiley & Sons Inc., 1981. http://dx.doi.org/10.1002/9780470316511
[40]	A. Ruszczynski, W. Syski, Stochastic approximation method with gradient averaging for unconstrained problems, IEEE T. Automat. Contr., 28 (1983), 1097–1105. http://dx.doi.org/10.1109/TAC.1983.1103184 doi: 10.1109/TAC.1983.1103184
[41]	J. Spall, Multivariate stochastic approximation using a simultaneous perturbation gradient approximation, IEEE T. Automat. Contr., 37 (1992), 332–341. http://dx.doi.org/10.1109/9.119632 doi: 10.1109/9.119632
[42]	M. Styblinski, T. Tang, Experiments in nonconvex optimization: stochastic approximation with function smoothing and simulated annealing, Neural Networks, 3 (1990), 467–483.
[43]	C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016, 2818–2826. http://dx.doi.org/10.1109/CVPR.2016.308 doi: 10.1109/CVPR.2016.308
[44]	V. Volz, J. Schrum, J. Liu, S. Lucas, A. Smith, S. Risi, Evolving mario levels in the latent space of a deep convolutional generative adversarial network, Proceedings of the Genetic and Evolutionary Computation Conference, 2018,221–228. http://dx.doi.org/10.1145/3205455.3205517 doi: 10.1145/3205455.3205517
[45]	K. Xu, S. Liu, P. Zhao, P. Chen, H. Zhang, Q. Fan, et al., Structured adversarial attack: towards general implementation and better interpretability, Proceedings of International Conference on Learning Representations, 2019, 1–21.

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)