Hyperparameter optimization: Classics, acceleration, online, multi-objective, and tools

Jia Mian Tan; Haoran Liao; Wei Liu; Changjun Fan; Jincai Huang; Zhong Liu; Junchi Yan; Jia Mian Tan; Haoran Liao; Wei Liu; Changjun Fan; Jincai Huang; Zhong Liu; Junchi Yan

doi:10.3934/mbe.2024275

Mathematical Biosciences and Engineering

2024, Volume 21, Issue 6: 6289-6335. doi: 10.3934/mbe.2024275

Previous Article Next Article

Survey

Hyperparameter optimization: Classics, acceleration, online, multi-objective, and tools

1.
Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
2.
College of Systems Engineering, National University of Defense Technology, Changsha, China

Received: 31 December 2023 Revised: 04 March 2024 Accepted: 28 March 2024 Published: 14 June 2024

Hyperparameter optimization (HPO) has been well-developed and evolved into a well-established research topic over the decades. With the success and wide application of deep learning, HPO has garnered increased attention, particularly within the realm of machine learning model training and inference. The primary objective is to mitigate the challenges associated with manual hyperparameter tuning, which can be ad-hoc, reliant on human expertise, and consequently hinders reproducibility while inflating deployment costs. Recognizing the growing significance of HPO, this paper surveyed classical HPO methods, approaches for accelerating the optimization process, HPO in an online setting (dynamic algorithm configuration, DAC), and when there is more than one objective to optimize (multi-objective HPO). Acceleration strategies were categorized into multi-fidelity, bandit-based, and early stopping; DAC algorithms encompassed gradient-based, population-based, and reinforcement learning-based methods; multi-objective HPO can be approached via scalarization, metaheuristics, and model-based algorithms tailored for multi-objective situation. A tabulated overview of popular frameworks and tools for HPO was provided, catering to the interests of practitioners.
- hyperparameter optimization,
- machine learning,
- deep neural networks,
- bayesian optimization,
- survey
Citation: Jia Mian Tan, Haoran Liao, Wei Liu, Changjun Fan, Jincai Huang, Zhong Liu, Junchi Yan. Hyperparameter optimization: Classics, acceleration, online, multi-objective, and tools[J]. Mathematical Biosciences and Engineering, 2024, 21(6): 6289-6335. doi: 10.3934/mbe.2024275

Related Papers:

Abstract

Hyperparameter optimization (HPO) has been well-developed and evolved into a well-established research topic over the decades. With the success and wide application of deep learning, HPO has garnered increased attention, particularly within the realm of machine learning model training and inference. The primary objective is to mitigate the challenges associated with manual hyperparameter tuning, which can be ad-hoc, reliant on human expertise, and consequently hinders reproducibility while inflating deployment costs. Recognizing the growing significance of HPO, this paper surveyed classical HPO methods, approaches for accelerating the optimization process, HPO in an online setting (dynamic algorithm configuration, DAC), and when there is more than one objective to optimize (multi-objective HPO). Acceleration strategies were categorized into multi-fidelity, bandit-based, and early stopping; DAC algorithms encompassed gradient-based, population-based, and reinforcement learning-based methods; multi-objective HPO can be approached via scalarization, metaheuristics, and model-based algorithms tailored for multi-objective situation. A tabulated overview of popular frameworks and tools for HPO was provided, catering to the interests of practitioners.

References

[1]	E. Alpaydin, Introduction to Machine Learning, MIT press, Cambridge, 2020.
[2]	A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, Commun. ACM, 60 (2012), 84–90. https://doi.org/10.1145/3065386 doi: 10.1145/3065386
[3]	A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, et al., Dermatologist-level classification of skin cancer with deep neural networks, Nature, 542 (2017), 115–118. https://doi.org/10.1038/nature21056 doi: 10.1038/nature21056
[4]	J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (2019), 4171–4186. https://doi.org/10.18653/v1/n19-1423
[5]	V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, et al., Human-level control through deep reinforcement learning, Nature, 518 (2015), 529–533. https://doi.org/10.1038/nature14236 doi: 10.1038/nature14236
[6]	A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner et al., An image is worth 16x16 words: Transformers for image recognition at scale, in International Conference on Learning Representations, 2021.
[7]	C. C. Chiu, C. Raffel, Monotonic chunkwise attention, in International Conference on Learning Representations, 2018.
[8]	X. He, K. Zhao, X. Chu, AutoML: survey of the state-of-the-art, Knowl.-Based Syst., 212 (2021), 106622. https://doi.org/10.1016/j.knosys.2020.106622 doi: 10.1016/j.knosys.2020.106622
[9]	D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, preprint, arXiv: 1412.6980.
[10]	J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, J. Mach. Learn. Res., 13 (2012), 281–305.
[11]	K. Hussain, M. N. Mohd Salleh, S. Cheng, Y. Shi, Metaheuristic research: a comprehensive survey, Artif. Intell. Rev., 52 (2018), 2191–2233. https://doi.org/10.1007/s10462-017-9605-z doi: 10.1007/s10462-017-9605-z
[12]	I. Boussaïd, J. Lepagnot, P. Siarry, Survey on optimization metaheuristics, Inf. Sci., 237 (2013), 82–117. https://doi.org/10.1016/j.ins.2013.02.041 doi: 10.1016/j.ins.2013.02.041
[13]	J. Snoek, H. Larochelle, R. P. Adams, Practical bayesian optimization of machine learning algorithms, Adv. Neural Inform. Process. Syst., 25 (2012).
[14]	B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, N. de Freitas, Taking the human out of the loop: A review of bayesian optimization, Proc. IEEE, 104 (2016), 148–175. https://doi.org/10.1109/jproc.2015.2494218 doi: 10.1109/jproc.2015.2494218
[15]	H. Cai, C. Gan, T. Wang, Z. Zhang, S. Han, Once-for-all: Train one network and specialize it for efficient deployment, in International Conference on Learning Representations, 2020.
[16]	S. Adriaensen, A. Biedenkapp, G. Shala, N. Awad, T. Eimer, M. Lindauer, et al., Automated dynamic algorithm configuration, J. Artif. Intell. Res., 75 (2022), 1633–1699. https://doi.org/10.1613/jair.1.13922 doi: 10.1613/jair.1.13922
[17]	M. Donini, L. Franceschi, O. Majumder, M. Pontil, P. Frasconi, Marthe: scheduling the learning rate via online hypergradients, in Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, (2021), 2119–2125. https://doi.org/10.24963/ijcai.2020/293
[18]	J. Parker-Holder, V. Nguyen, S. J. Roberts, Provably efficient online hyperparameter optimization with population-based bandits, Adv. Neural Inform. Process. Syst., 33 (2020), 17200–17211.
[19]	A. Biedenkapp, H. F. Bozkurt, T. Eimer, F. Hutter, M. T. Lindauer, Dynamic algorithm configuration: Foundation of a new meta-algorithmic framework, in the 24th European Conference on Artificial Intelligence, (2020), 427–434. https://doi.org/10.3233/FAIA200122
[20]	F. Karl, T. Pielok, J. Moosbauer, F. Pfisterer, S. Coors, M. Binder, et al., Multi-objective hyperparameter optimization in machine learning–An overview, in ACM Transactions on Evolutionary Learning and Optimization, 3 (2023), 1–50. https://doi.org/10.1145/3610536
[21]	A. Morales-Hernández, I. Van Nieuwenhuyse, S. Rojas Gonzalez, A survey on multi-objective hyperparameter optimization algorithms for machine learning, Artif. Intell. Rev., 56 (2023), 8043–8093. https://doi.org/10.1007/s10462-022-10359-2 doi: 10.1007/s10462-022-10359-2
[22]	B. Bischl, M. Binder, M. Lang, T. Pielok, J. Richter, S. Coors, et al., Hyperparameter optimization: Foundations, algorithms, best practices and open challenges, WIRES Data Min. Knowl., 13 (2023). https://doi.org/10.1002/widm.1484 doi: 10.1002/widm.1484
[23]	T. Yu, H. Zhu, Hyper-parameter optimization: A review of algorithms and applications, preprint, arXiv: 2003.05689.
[24]	L. Yang, A. Shami, On hyperparameter optimization of machine learning algorithms: Theory and practice, Neurocomputing, 415 (2020), 295–316. https://doi.org/10.1016/j.neucom.2020.07.061 doi: 10.1016/j.neucom.2020.07.061
[25]	R. Mohakud, R. Dash, Survey on hyperparameter optimization using nature-inspired algorithm of deep convolution neural network, in Intelligent and Cloud Computing, (2020), 737–744. https://doi.org/10.1007/978-981-15-5971-6_77
[26]	N. Del Buono, F. Esposito, L. Selicato, Methods for hyperparameters optimization in learning approaches: An overview, in Machine Learning, Optimization, and Data Science, (2020), 100–112. https://doi.org/10.1007/978-3-030-64583-0_11
[27]	M. Feurer, F. Hutter, Hyperparameter Optimization, in Automated Machine Learning, (2019), 3–33. https://doi.org/10.1007/978-3-030-05318-5_1
[28]	X. Wang, Y. Jin, S. Schmitt, M. Olhofer, Recent Advances in Bayesian Optimization, ACM Comput. Surv., 55 (2023), 1–36. https://doi.org/10.1145/3582078 doi: 10.1145/3582078
[29]	P. I. Frazier, A tutorial on Bayesian optimization, preprint, arXiv: 1807.02811.
[30]	E. Brochu, V. M. Cora, N. De Freitas, A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning, preprint, arXiv: 1012.2599.
[31]	R. E. Shawi, M. Maher, S. Sakr, Automated machine learning: State-of-the-art and open challenges, preprint, arXiv: 1906.02287.
[32]	M. A. Zöller, M. F. Huber, Benchmark and survey of automated machine learning frameworks, J. Artif. Intell. Res., 70 (2021), 409–472. https://doi.org/10.1613/jair.1.11854 doi: 10.1613/jair.1.11854
[33]	Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, et al., Backpropagation applied to handwritten zip code recognition, Neural Comput., 1 (1989), 541–551. https://doi.org/10.1162/neco.1989.1.4.541 doi: 10.1162/neco.1989.1.4.541
[34]	Y. Bengio, Gradient-based optimization of hyperparameters, Neural Comput., 12 (2000), 1889–1900. https://doi.org/10.1162/089976600300015187 doi: 10.1162/089976600300015187
[35]	J. Domke, Generic methods for optimization-based modeling, in Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, (2012), 318–326.
[36]	D. Maclaurin, D. Duvenaud, R. Adams, Gradient-based hyperparameter optimization through reversible learning, in Proceedings of the 32nd International Conference on Machine Learning, (2015), 2113–2122.
[37]	F. Pedregosa, Hyperparameter optimization with approximate gradient, in Proceedings of The 33rd International Conference on Machine Learning, (2016), 737–746.
[38]	L. Franceschi, M. Donini, P. Frasconi, M. Pontil, Forward and reverse gradient-based hyperparameter optimization, in Proceedings of the 34th International Conference on Machine Learning, (2017), 1165–1173.
[39]	J. Lorraine, P. Vicol, D. Duvenaud, Optimizing millions of hyperparameters by implicit differentiation, in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, (2020), 1540–1552.
[40]	C. W. Hsu, C. C. Chang, C. J. Lin, A Practical Guide to Support Vector Classification, 2003. Available from: http://www.csie.ntu.edu.tw/cjlin/papers/guide/guide.pdf
[41]	C. Audet, J. E. Dennis, Mesh adaptive direct search algorithms for constrained optimization, SIAM J. Optim., 17 (2006), 188–217. https://doi.org/10.1137/040603371 doi: 10.1137/040603371
[42]	G. E. Dahl, T. N. Sainath, G. E. Hinton, Improving deep neural networks for lvcsr using rectified linear units and dropout, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, (2013), 8609–8613. https://doi.org/10.1109/ICASSP.2013.6639346
[43]	Y. Chen, A. Huang, Z. Wang, I. Antonoglou, J. Schrittwieser, D. Silver, et al., Bayesian optimization in alphago, preprint, arXiv: 1812.06855.
[44]	C. E. Rasmussen, C. K. I. Williams, Gaussian Processes for Machine Learning, The MIT Press, Cambridge, 2005. https://doi.org/10.7551/mitpress/3206.001.0001
[45]	D. R. Jones, M. Schonlau, W. J. Welch, Efficient global optimization of expensive black-box functions, J. Glob. Optim., 13 (1998), 455–492. https://doi.org/10.1023/A:1008306431147 doi: 10.1023/A:1008306431147
[46]	K. Swersky, D. Duvenaud, J. Snoek, F. Hutter, M. A. Osborne, Raiders of the lost architecture: Kernels for bayesian optimization in conditional parameter spaces, preprint, arXiv: 1409.4011.
[47]	E. Snelson, Z. Ghahramani, Sparse gaussian processes using pseudo-inputs, Adv. Neural Inform. Process. Syst., 18 (2006), 1259–1266.
[48]	C. Oh, E. Gavves, M. Welling, Bock: Bayesian optimization with cylindrical kernels, in Proceedings of the 35th International Conference on Machine Learning, (2018), 3868–3877.
[49]	K. Kandasamy, J. Schneider, B. Póczos, High dimensional bayesian optimisation and bandits via additive models, in Proceedings of the 32nd International Conference on Machine Learning, (2015), 295–304.
[50]	F. Hutter, H. H. Hoos, K. Leyton-Brown, Sequential model-based optimization for general algorithm configuration, in Learning and Intelligent Optimization, Springer, (2011), 507–523. https://doi.org/10.1007/978-3-642-25566-3_40
[51]	J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper-parameter optimization, Adv. Neural Inform. Process. Syst., 24 (2011), 2546–2554.
[52]	K. Eggensperger, M. Feurer, F. Hutter, J. Bergstra, J. Snoek, H. Hoos, et al., Towards an empirical foundation for assessing bayesian optimization of hyperparameters, in NIPS workshop on Bayesian Optimization in Theory and Practice, (2013).
[53]	S. Falkner, A. Klein, F. Hutter, Bohb: Robust and efficient hyperparameter optimization at scale, in Bohb: Robust and efficient hyperparameter optimization at scale, (2018), 1437–1446.
[54]	E. Goan, C. Fookes, Bayesian neural networks: An introduction and survey, in Case Studies in Applied Bayesian Data Science, Springer, (2020), 45–87. https://doi.org/10.1007/978-3-030-42553-1_3
[55]	J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram, et al., Scalable bayesian optimization using deep neural networks, in Proceedings of the 32nd International Conference on Machine Learning, 37 (2015), 2171–2180.
[56]	J. T. Springenberg, A. Klein, S. Falkner, F. Hutter, Bayesian optimization with robust bayesian neural networks, Adv. Neural Inf. Process. Syst., 29 (2016), 4134–4142.
[57]	T. Chen, E. B. Fox, C. Guestrin, Stochastic gradient hamiltonian monte carlo, in Proceedings of the 31st International Conference on Machine Learning, 32 (2014), 1683–1691.
[58]	N. Srinivas, A. Krause, S. M. Kakade and M. W. Seeger, Gaussian process optimization in the bandit setting: No regret and experimental design, in Proceedings of the 27th International Conference on Machine Learning. Omnipress, (2010), 1015–1022.
[59]	P. Hennig, C. J. Schuler, Entropy search for information-efficient global optimization, J. Mach. Learn. Res. (JMLR), 13 (2012), 1809–1837.
[60]	J. M. Hernández-Lobato, M. W. Hoffman and Z. Ghahramani, Predictive entropy search for efficient global optimization of black-box functions, Adv. Neural Inform. Process. Syst., 27 (2014), 918–926.
[61]	Z. Wang, S. Jegelka, Max-value entropy search for efficient bayesian optimization, in Proceedings of the 34th International Conference on Machine Learning, (2017), 3627–3635.
[62]	M. Jaderberg, V. Dalibard, S. Osindero, W. M. Czarnecki, J. Donahue, A. Razavi, et al., Population based training of neural networks, preprint, arXiv: 1711.09846.
[63]	N. A. Vien, H. Zimmermann, M. Toussaint, Bayesian functional optimization, in Proceedings of the AAAI Conference on Artificial Intelligence, (2018). https://doi.org/10.1609/aaai.v32i1.11830
[64]	P. F. Jian Wu, Practical two-step lookahead bayesian optimization, Adv. Neural Inform. Process. Syst., 32 (2019), 9813–9823.
[65]	J. Kirschner, M. Mutný, N. Hiller, R. Ischebeck, A. Krause, Adaptive and safe bayesian optimization in high dimensions via one-dimensional subspaces, in Proceedings of the 36th International Conference on Machine Learning, (2019), 3429–3438.
[66]	D. Eriksson, M. Pearce, J. Gardner, R. D. Turner, M. Poloczek, Scalable global optimization via local Bayesian optimization, Adv. Neural Inf. Process. Syst., 32 (2019), 5497–5508.
[67]	V. Nguyen, M. A. Osborne, Knowing the what but not the where in bayesian optimization, Proceedings of the 37th International Conference on Machine Learning, (2020), 7317–7326.
[68]	E. A. Daxberger, A. Makarova, M. Turchetta, A. Krause, Mixed-variable bayesian optimization, in Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), (2020), 2633–2639. https://doi.org/10.24963/ijcai.2020/365
[69]	A. Souza, L. Nardi, L. B. Oliveira, K. Olukotun, M. Lindauer, F. Hutter, Bayesian optimization with a prior for the optimum, in Machine Learning and Knowledge Discovery in Databases. Research Track, Springer, (2021), 265–296. https://doi.org/10.1007/978-3-030-86523-8_17
[70]	C. Hvarfner, D. Stoll, A. Souza, L. Nardi, M. Lindauer, F. Hutter, $\pi$BO: Augmenting acquisition functions with user beliefs for bayesian optimization, in International Conference on Learning Representations, 2022.
[71]	N. Mallik, E. Bergman, C. Hvarfner, D. Stoll, M. Janowski, M. Lindauer, et al. Priorband: Practical hyperparameter optimization in the age of deep learning, Adv. Neural Inform. Process. Syst., (2024).
[72]	S. Katoch, S. S. Chauhan, V. Kumar, A review on genetic algorithm: past, present, and future, Multimed. Tools Appl., 80 (2021), 8091–8126. https://doi.org/10.1007/s11042-020-10139-6 doi: 10.1007/s11042-020-10139-6
[73]	C. A. C. Coello, G. B. Lamont, D. A. Van Veldhuizen, Evolutionary algorithms for solving multi-objective problems, Springer, New York, 2007. https://doi.org/10.1007/978-0-387-36797-2
[74]	J. H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, The MIT Press, Cambridge, 1992. https://doi.org/10.7551/mitpress/1090.001.0001
[75]	T. Blickle, L. Thiele, A comparison of selection schemes used in evolutionary algorithms, Evol. Comput., 4 (1996), 361–394. https://doi.org/10.1162/evco.1996.4.4.361 doi: 10.1162/evco.1996.4.4.361
[76]	T. Bäck, Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms, Oxford University Press, Oxford, 1996. https://doi.org/10.1093/oso/9780195099713.001.0001
[77]	I. Rechenberg, Optimierung technischer Systeme nach Prinzipien der biologischen Evolution, PhD thesis, Technische Universität, Fakultät für Maschinenwissenschaft, 1970.
[78]	H. P. Schwefel, G. Rudolph, Contemporary evolution strategies, in Advances in Artificial Life, Springer, (1995), 891–907. https://doi.org/10.1007/3-540-59496-5_351
[79]	R. Li, M. T. Emmerich, J. Eggermont, T. Bäck, M. Schütz, J. Dijkstra, et al., Mixed integer evolution strategies for parameter optimization, Evol. Comput., 21 (2013), 29–64. https://doi.org/10.1162/evco_a_00059 doi: 10.1162/evco_a_00059
[80]	N. Hansen, A. Ostermeier, A. Gawelczyk, On the adaptation of arbitrary normal mutation distributions in evolution strategies: The generating set adaptation, in Proceedings of the Sixth International Conference on Genetic Algorithms, (1995), 57–64.
[81]	R. Storn, K. V. Price, Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces, J. Glob. Optim., 11 (1997), 341–359. https://doi.org/10.1023/A:1008202821328 doi: 10.1023/A:1008202821328
[82]	S. Saremi, S. M. Mirjalili, A. Lewis, Grasshopper optimisation algorithm: Theory and application, Adv. Eng. Softw., 105 (2017), 30–47. https://doi.org/10.1016/j.advengsoft.2017.01.004 doi: 10.1016/j.advengsoft.2017.01.004
[83]	E. H. Houssein, A. G. Gad, K. Hussain, P. N. Suganthan, Major advances in particle swarm optimization: Theory, analysis, and application, Swarm Evol. Comput., 63 (2021), 100868. https://doi.org/10.1016/j.swevo.2021.100868 doi: 10.1016/j.swevo.2021.100868
[84]	J. Kennedy, R. Eberhart, Particle swarm optimization, in Proceedings of ICNN'95 - International Conference on Neural Networks, (1995), 1942–1948. https://doi.org/10.1109/ICNN.1995.488968
[85]	Y. Shi, R. Eberhart, A modified particle swarm optimizer, in 1998 IEEE International Conference on Evolutionary Computation Proceedings, (1998), 69–73. https://doi.org/10.1109/ICEC.1998.699146
[86]	R. Turner, D. Eriksson, M. McCourt, J. Kiili, E. Laaksonen, Z. Xu, et al., Bayesian optimization is superior to random search for machine learning hyperparameter tuning: analysis of the black-box optimization challenge 2020, in Proceedings of the NeurIPS 2020 Competition and Demonstration Track, (2021), 3–26.
[87]	H. G. Beyer, H. P. Schwefel, Evolution strategies–a comprehensive introduction, Nat. Comput., 1 (2002), 3–52. https://doi.org/10.1023/A:1015059928466 doi: 10.1023/A:1015059928466
[88]	K. Kandasamy, G. Dasarathy, J. B. Oliva, J. G. Schneider. B. Póczos, Gaussian process bandit optimisation with multi-fidelity evaluations, Adv. Neural Inform. Process. Syst., 29 (2016).
[89]	A. Klein, S. Falkner, S. Bartels, P. Hennig, F. Hutter, Fast bayesian optimization of machine learning hyperparameters on large datasets, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, (2017), 528–536.
[90]	M. Poloczek, J. Wang, P. Frazier, Multi-information source optimization, Multi-information source optimization, 30 (2017), 4288–4298.
[91]	J. Wu, S. Toscano-Palmerin, P. I. Frazier, A. G. Wilson, Practical multi-fidelity bayesian optimization for hyperparameter tuning, in Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, (2020), 788–798.
[92]	S. Takeno, H. Fukuoka, Y. Tsukada, T. Koyama, M. Shiga, I. Takeuchi, et al., Multi-fidelity bayesian optimization with max-value entropy search and its parallelization, in Proceedings of the 37th International Conference on Machine Learning, (2020), 9334–9345.
[93]	K. Swersky, J. Snoek, R. P. Adams, Multi-task bayesian optimization, Adv. Neural Inform. Process. Syst., 26 (2013).
[94]	M. Feurer, J. T. Springenberg, F. Hutter, Initializing bayesian hyperparameter optimization via meta-learning, AAAI Conf. Artif. Intell., 29 (2015), 1128–1135. https://doi.org/10.1609/aaai.v29i1.9354 doi: 10.1609/aaai.v29i1.9354
[95]	V. Perrone, R. Jenatton, M. W. Seeger, C. Archambeau, Scalable hyperparameter transfer learning, Adv. Neural Inform. Process. Syst., 31 (2018).
[96]	M. Nomura, S. Watanabe, Y. Akimoto, Y. Ozaki, M. Onishi, Warm starting CMA-ES for hyperparameter optimization, AAAI Conf. Artif. Intell., 35 (2021), 9188–9196. https://doi.org/10.1609/aaai.v35i10.17109 doi: 10.1609/aaai.v35i10.17109
[97]	K. G. Jamieson, A. S. Talwalkar, Non-stochastic best arm identification and hyperparameter optimization, in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, (2016), 240–248.
[98]	L. Li, K. G. Jamieson, G. DeSalvo, A. Rostamizadeh, A. S. Talwalkar, Hyperband: A novel bandit-based approach to hyperparameter optimization, J. Mach. Learn. Res. (JMLR), 18 (2018), 1–52.
[99]	L. Li, K. Jamieson, A. Rostamizadeh, E. Gonina, J. Ben-Tzur, M. Hardt, et al., A system for massively parallel hyperparameter tuning, Proc. Mach. Learn. Syst., (2020), 230–246.
[100]	G. Mittal, C. Liu, N. Karianakis, V. Fragoso, M. Chen, Y. R. Fu, Hyperstar: Task-aware hyperparameters for deep networks, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 8733–8742. https://doi.org/10.1109/cvpr42600.2020.00876
[101]	N. H. Awad, N. Mallik, F. Hutter, DEHB: Evolutionary hyberband for scalable, robust and efficient hyperparameter optimization, in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, (2021), 2147–2153. https://doi.org/10.24963/ijcai.2021/296
[102]	K. Swersky, J. Snoek, R. P. Adams, Freeze-thaw bayesian optimization, preprint, arXiv: 1406.3896.
[103]	T. Domhan, J. T. Springenberg, F. Hutter, Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves, in Twenty-fourth International Joint Conference on Artificial Intelligence, (2015), 3460–3468.
[104]	A. Klein, S. Falkner, J. T. Springenberg, F. Hutter, Learning curve prediction with bayesian neural networks, in International Conference on Learning Representations, 2017.
[105]	B. Baker, O. Gupta, R. Raskar, N. Naik, Accelerating neural architecture search using performance prediction, preprint, arXiv: 1705.10823.
[106]	Z. Dai, H. Yu, K. H. Low, P. Jaillet, Bayesian optimization meets bayesian optimal stopping, in Proceedings of the 36th International Conference on Machine Learning, (2019), 1496–1506.
[107]	V. Nguyen, S. Schulze, M. Osborne, Bayesian optimization for iterative learning, Adv. Neural Inform. Process. Syst., 33 (2020), 9361–9371.
[108]	A. Makarova, H. Shen, V. Perrone, A. Klein, J. B. Faddoul, A. Krause, et al., Automatic termination for hyperparameter optimization, in Proceedings of the First International Conference on Automated Machine Learning, 2022.
[109]	A. G. Baydin, R. Cornish, D. M. Rubio, M. Schmidt, F. Wood, Online learning rate adaptation with hypergradient descent, in International Conference on Learning Representations, 2018.
[110]	Y. Wu, M. Ren, R. Liao, R. Grosse, Understanding short-horizon bias in stochastic meta-optimization, in International Conference on Learning Representations, 2018.
[111]	J. Li, B. Gu, H. Huang, A fully single loop algorithm for bilevel optimization without hessian inverse, in Proceedings of the 36th AAAI Conference on Artificial Intelligence, (2022), 7426–7434. https://doi.org/10.1609/aaai.v36i7.20706
[112]	Z. Tao, Y. Li, B. Ding, C. Zhang, J. Zhou, Y. R. Fu, Learning to mutate with hypergradient guided population, Adv. Neural Inform. Process. Syst., 33 (2020), 17641–17651.
[113]	J. Parker-Holder, V. Nguyen, S. Desai, S. J. Roberts, Tuning mixed input hyperparameters on the fly for efficient population based autorl, Adv. Neural Inform. Process. Syst., 34 (2021).
[114]	X. Wan, C. Lu, J. Parker-Holder, P. J. Ball, V. Nguyen, B. Ru, et al., Bayesian generational population-based training, in Proceedings of the First International Conference on Automated Machine Learning, (2022), 1–27.
[115]	R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, 2nd edition, MIT press, Cambridge, 2018.
[116]	H. S. Jomaa, J. Grabocka, L. Schmidt-Thieme, Hyp-rl: Hyperparameter optimization by reinforcement learning, preprint, arXiv: 1906.11527.
[117]	S. Paul, V. Kurin, S. Whiteson, Fast efficient hyperparameter tuning for policy gradient methods, in Advances in Neural Information Processing Systems, 32 (2019), 4616–4626.
[118]	B. Doerr, C. Doerr, Theory of parameter control for discrete black-box optimization: provable performance gains through dynamic parameter choices, in Theory of Evolutionary Computation: Recent Developments in Discrete Optimization, Springer, Cham, (2020), 271–321. https://doi.org/10.1007/978-3-030-29414-4_6
[119]	W. B. Powell, Reinforcement Learning and Stochastic Optimization: A unified framework for sequential decisions, John Wiley & Sons, Hoboken, 2022. https://doi.org/10.1002/9781119815068
[120]	J. Parker-Holder, R. Rajan, X. Song, A. Biedenkapp, Y. Miao, T. Eimer, et al., Automated reinforcement learning (AutoRL): a survey and open problems, J. Artif. Intell. Res., 74 (2022), 517–568. http://doi.org/10.1613/jair.1.13596 doi: 10.1613/jair.1.13596
[121]	R. R. Afshar, Y. Zhang, J. Vanschoren and U. Kaymak, Automated reinforcement learning: An overview, preprint, arXiv: 2201.05000.
[122]	L. Engstrom, A. Ilyas, S. Santurkar, D. Tsipras, F. Janoos, L. Rudolph, et al., Implementation matters in deep RL: A case study on PPO and TRPO, in International Conference on Learning Representations, 2020.
[123]	M. Andrychowicz, A. Raichuk, P. Stańczyk, M. Orsini, S. Girgin, R. Marninier, et al., What matters for on-policy deep actor-critic methods? a large-scale study, in International Conference on Learning Representations, 2021.
[124]	B. Zhang, R. Rajan, L. Pineda, N. Lambert, A. Biedenkapp, K. Chua, et al., On the importance of hyperparameter optimization for model-based reinforcement learning, in Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, (2021), 4015–4023.
[125]	M. Igl, G. Farquhar, J. Luketina, W. Boehmer, S. Whiteson, Transient non-stationarity and generalisation in deep reinforcement learning, in International Conference on Learning Representations, 2021.
[126]	Y. Jin, T. Zhou, L. Zhao, Y. Zhu, C. Guo, M. Canini, et al., AutoLRS: Automatic learning-rate schedule by bayesian optimization on the fly, in International Conference on Learning Representations, 2020.
[127]	J. Sun, Y. Yang, G. Xun, A. Zhang, A stagewise hyperparameter scheduler to improve generalization, in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, (2021), 1530–1540. https://doi.org/10.1145/3447548.3467287
[128]	Y. Jin, Multi-objective machine learning, Springer, Berlin, 2006. https://doi.org/10.1007/11399346
[129]	K. Deb, Multi-objective optimisation using evolutionary algorithms: an introduction, in Multi-objective Evolutionary Optimisation for Product Design and Manufacturing, Springer, London, (2011), 3–34. https://doi.org/10.1007/978-0-85729-652-8_1
[130]	M. Parsa, A. Ankit, A. Ziabari, K. Roy, PABO: pseudo agent-based multi-objective bayesian hyperparameter optimization for efficient neural accelerator design, in 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), (2019), 1–8. https://doi.org/10.1109/iccad45719.2019.8942046
[131]	R. Schmucker, M. Donini, V. Perrone, C. Archambeau, Multi-objective multi-fidelity hyperparameter optimization with application to fairness, in NeurIPS 2020 Workshop on Meta-learning, 2020.
[132]	K. Miettinen, Nonlinear multiobjective optimization, Springer Science & Business Media, New York, 1999.
[133]	K. Miettinen and M. M. Mäkelä, On scalarizing functions in multiobjective optimization, OR Spectrum, 24 (2002), 193–213. https://doi.org/10.1007/s00291-001-0092-9 doi: 10.1007/s00291-001-0092-9
[134]	T. Chugh, Scalarizing functions in Bayesian multiobjective Optimization, in 2020 IEEE Congress on Evolutionary Computation (CEC), (2020), 1–8. https://doi.org/10.1109/CEC48606.2020.9185706
[135]	Y. Y. Haimes, L. S. Lasdon, D. A. Wismer, On a bicriterion cormulation of the problems of integrated system identification and system optimization, IEEE Trans. Syst. Man Cybern., (1971), 296–297. https://doi.org/10.1109/tsmc.1971.4308298 doi: 10.1109/tsmc.1971.4308298
[136]	K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-Ⅱ, IEEE T. Evolut. Comput., 6 (2002), 182–197. https://doi.org/10.1109/4235.996017 doi: 10.1109/4235.996017
[137]	N. Srinivas, K. Deb, Muiltiobjective optimization using nondominated sorting in genetic algorithms, Evol. Comput., 2 (1994), 221–248. https://doi.org/10.1162/evco.1994.2.3.221 doi: 10.1162/evco.1994.2.3.221
[138]	K. Deb, H. Jain, An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part Ⅰ: Solving problems with box constraints, IEEE T. Evolut. Comput., 18 (2014), 577–601. https://doi.org/10.1109/tevc.2013.2281535 doi: 10.1109/tevc.2013.2281535
[139]	K. Deb, H. Jain, An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part Ⅱ: handling constraints and extending to an adaptive approach, IEEE T. Evolut. Comput., 18 (2014), 602–622. https://doi.org/10.1109/tevc.2013.2281534 doi: 10.1109/tevc.2013.2281534
[140]	E. Zitzler, M. Laumanns, L. Thiele, SPEA2: Improving the strength Pareto evolutionary algorithm, TIK Report, 103 (2001), 1–21. https://doi.org/10.3929/ethz-a-004284029 doi: 10.3929/ethz-a-004284029
[141]	Q. Zhang, H. Li, MOEA/D: a multiobjective evolutionary algorithm based on decomposition, IEEE T. Evolut. Comput., 11 (2007), 712–731. https://doi.org/10.1109/tevc.2007.892759 doi: 10.1109/tevc.2007.892759
[142]	E. Zitzler, L. Thiele, Multiobjective optimization using evolutionary algorithms — A comparative case study, in Parallel Problem Solving from Nature—PPSN V, Springer, (1998), 292–301. https://doi.org/10.1007/BFb0056872
[143]	M. Emmerich, N. Beume, B. Naujoks, An EMO algorithm using the hypervolume measure as selection criterion, in Evolutionary Multi-Criterion Optimization, Springer, (2005), 62–76. https://doi.org/10.1007/978-3-540-31880-4_5
[144]	X. Li, A non-dominated sorting particle swarm optimizer for multiobjective optimization, in Genetic and Evolutionary Computation—GECCO 2003, Springer, (2003), 37–48. https://doi.org/10.1007/3-540-45105-6_4
[145]	C. A. C. Coello, G. T. Pulido, M. S. Lechuga, Handling multiple objectives with particle swarm optimization, IEEE T. Evolut. Comput., 8 (2004), 256–279. https://doi.org/10.1109/tevc.2004.826067 doi: 10.1109/tevc.2004.826067
[146]	J. Knowles, ParEGO: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems, IEEE T. Evolut. Comput., 10 (2006), 50–66. https://doi.org/10.1109/tevc.2005.851274 doi: 10.1109/tevc.2005.851274
[147]	W. Ponweiset, T. Wagner, D. Biermann, M. Vincze, Multiobjective optimization on a limited dudget of evaluations using model-assisted $\mathcal{S}$-metric selection, in Parallel Problem Solving from Nature – PPSN X, Springer, (2008), 784–794. https://doi.org/10.1007/978-3-540-87700-4_78
[148]	M. T. M. Emmerich, K. C. Giannakoglou, B. Naujoks, Single- and multiobjective evolutionary optimization assisted by Gaussian random field metamodels, IEEE T. Evolut. Comput., 10 (2006), 421–439. https://doi.org/10.1109/tevc.2005.859463 doi: 10.1109/tevc.2005.859463
[149]	Y. Jin, Surrogate-assisted evolutionary computation: Recent advances and future challenges, Swarm Evol. Comput., 1 (2011), 61–70. https://doi.org/10.1016/j.swevo.2011.05.001 doi: 10.1016/j.swevo.2011.05.001
[150]	S. Daulton, M. Balandat, E. Bakshy, Differentiable expected hypervolume improvement for parallel multi-objective Bayesian optimization, Adv. Neural Inform. Process. Syst., 33 (2020), 9851–9864
[151]	D. Hernández-Lobato, J. Hernández-Lobato, A. Shah, R. Adams, Predictive entropy search for multi-objective bayesian optimization, in Proceedings of The 33rd International Conference on Machine Learning, (2016), 1492–1501.
[152]	S. Belakaria, A. Deshwal, J. R. Doppa, Max-value entropy search for multi-objective bayesian optimization, Adv. Neural Inform. Process. Syst., 32 (2019).
[153]	S. Daulton, M. Balandat, E. Bakshy, Parallel bayesian optimization of multiple noisy objectives with expected hypervolume improvement, Adv. Neural Inform. Process. Syst., 34 (2021), 2187–2200.
[154]	Z. J. Lin, R. Astudillo, P. Frazier, E. Bakshy, Preference exploration for efficient bayesian optimization with multiple outcomes, in Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, (2022), 4235–4258.
[155]	G. Misitano, B. Afsar, G. Lárrage, K. Miettinen, Towards explainable interactive multiobjective optimization: R-XIMO, Auton. Agent. Multi-Agent Syst., 36 (2022), 43. http://doi.org/10.1007/s10458-022-09577-3 doi: 10.1007/s10458-022-09577-3
[156]	G. Malkomes, B. Cheng, E. H. Lee, M. Mccourt, Beyond the pareto efficient frontier: constraint active search for multiobjective experimental design, in Proceedings of the 38th International Conference on Machine Learning, (2021), 7423–7434.
[157]	Z. Chen, Y. Zhou, Z. Huang, X. Xia, Towards efficient multiobjective hyperparameter optimization: a multiobjective multi-fidelity bayesian optimization and hyperband algorithm, in Parallel Problem Solving from Nature–PPSN XVII, Springer, (2022), 160–174. http://doi.org/10.1007/978-3-031-14714-2_12
[158]	A. Dushatskiy, A. Chebykin, T. Alderliesten, P. A. N. Bosman, Multi-objective population based training, in Proceedings of the 40th International Conference on Machine Learning, (2023), 8969–8989.
[159]	R. Schmucker, M. Donini, M. B. Zafar, D. Salinas and C. Archambeau, Multi-objective asynchronous successive halving, preprint, arXiv: 2106.12639.
[160]	F. Hutter, L. Kotthoff, J. Vanschoren, Automated Machine Learning: Methods, Systems, Challenges, Springer, Cham, 2019. https://doi.org/10.1007/978-3-030-05318-5
[161]	T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Optuna: a next-generation hyperparameter optimization framework, in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2019), 2623–2631. https://doi.org/10.1145/3292500.3330701
[162]	R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, I. Stoica, Tune: a research platform for distributed model Selection and Training, preprint, arXiv: 1807.05118.
[163]	M. Balandat, B. Karrer, D. R. Jiang, S. Daulton, B. Letham, A. G. Wilson, et al., BoTorch: a framework for efficient monte-carlo bayesian optimization, Adv. Neural Inform. Process. Syst., 33 (2020).
[164]	J. Bergstra, B. Komer, C. Eliasmith, D. Yamins, D. D. Cox, Hyperopt: a python library for model selection and hyperparameter optimization, Comput. Sci. Discov., 8 (2015), 014008. https://doi.org/10.1088/1749-4699/8/1/014008 doi: 10.1088/1749-4699/8/1/014008
[165]	M. Lindauer, K. Eggensperger, M. Feurer, A. Biedenkapp, D. Deng, C. Benjamins, et al., SMAC3: A versatile bayesian optimization package for hyperparameter optimization, J. Mach. Learn. Res., 23 (2022), 1–9.
[166]	F. A. Fortin, F. M. De Rainville, M. A. G. Gardner, M. Parizeau, C. Gagné, DEAP: Evolutionary algorithms made easy, J. Mach. Learn. Res., 13 (2012), 2171–2175.
[167]	R. Martinez-Cantin, Bayesopt: a bayesian optimization library for nonlinear optimization, experimental design and bandits, J. Mach. Learn. Res., 15 (2014), 3735–3739.
[168]	L. Nardi, A. Souza, D. Koeplinger, K. Olukotun, HyperMapper: a practical design space exploration framework, in 2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), (2019), 425–426. https://doi.org/10.1109/MASCOTS.2019.00053
[169]	M. Lang, M. Binder, J. Richter, P. Schratz, F. Pfisterer, S. Coors, et al., mlr3: A modern object-oriented machine learning framework in R, J. Open Source Softw., 4 (2019), 1903. https://doi.org/10.21105/joss.01903 doi: 10.21105/joss.01903
[170]	B. Bischl, R. Sonabend, L. Kotthoff, M. Lang, Applied Machine Learning Using mlr3 in R, Chapman and Hall/CRC, New York, 2023. https://doi.org/10.1201/9781003402848
[171]	A. Benítez-Hidalgo, A. J. Nebro, J. García-Nieto, I. Oregi and J. Del Ser, jMetalPy: A Python framework for multi-objective optimization with metaheuristics, Swarm Evol. Comput., 51 (2019), 100598. https://doi.org/10.1016/j.swevo.2019.100598 doi: 10.1016/j.swevo.2019.100598
[172]	N. E. Toklu, T. Atkinson, V. Micka, P. Liskowski and R. K. Srivastava, EvoTorch: Scalable Evolutionary Computation in Python, preprint, arXiv: 2302.12600.
[173]	Y. Li, Y. Shen, W. Zhang, Y. Chen, H. Jiang, M. Liu, et al., OpenBox: a generalized black-box optimization service, in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, (2021), 3209–3219. https://doi.org/10.1145/3447548.3467061
[174]	K. Kandasamy, K. R. Vysyaraju, W. Neiswanger, B. Paria, C. R. Collins, J. Schneider, et al., Tuning hyperparameters without grad students: Scalable and robust bayesian optimisation with dragonfly, J. Mach. Learn. Res., 21 (2020), 1–27.
[175]	D. Salinas, M. Seeger, A. Klein, V. Perrone, M. Wistuba, C. Archambeau, Syne Tune: a library for large scale hyperparameter tuning and reproducible research, in Proceedings of the First International Conference on Automated Machine Learning, (2022), 1–23.
[176]	J. George, C. Gao, R. Liu, H. G. Liu, Y. Tang, R. Pydipaty, et al., A scalable and cloud-native hyperparameter tuning system, preprint, arXiv: 2006.02085.
[177]	O. Taubert, M. Weiel, D. Coquelin, A. Farshian, C. Debus, A. Schug, et al., Massively parallel genetic optimization through asynchronous propagation of populations, in High Performance Computing, Springer, (2023), 106–124. https://doi.org/10.1007/978-3-031-32041-5_6
[178]	J. Blank, K. Deb, Pymoo: multi-objective optimization in Python, IEEE Access, 8 (2020), 89497–89509. http://doi.org/10.1109/access.2020.2990567 doi: 10.1109/access.2020.2990567
[179]	S. S. Sandha, M. Aggarwal, I. Fedorov, M. Srivastava, Mango: A Python Library for Parallel Hyperparameter Tuning, in ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2020), 3987–3991. https://doi.org/10.1109/icassp40776.2020.9054609
[180]	L. Hertel, J. Collado, P. Sadowski, J. Ott, P. Baldi, Sherpa: Robust hyperparameter optimization for machine learning, SoftwareX, 12 (2020), 100591. https://doi.org/10.1016/j.softx.2020.100591 doi: 10.1016/j.softx.2020.100591
[181]	N. O. Nikitin, P. Vychuzhanin, M. Sarafanov, I. S. Polonskaia, I. Revin, I. V. Barabanova, et al., Automated evolutionary approach for the design of composite machine learning pipelines, Future Gener. Comput. Syst., 127 (2022), 109–125. https://doi.org/10.1016/j.future.2021.08.022 doi: 10.1016/j.future.2021.08.022
[182]	I. S. Polonskaia, N. O. Nikitin, I. Revin, P. Vychuzhanin, A. V. Kalyuzhnaya, Multi-objective evolutionary design of composite data-driven model, in 2021 IEEE Congress on Evolutionary Computation (CEC), (2021), 926–933. https://doi.org/10.1109/CEC45853.2021.9504773
[183]	R. S. Olson, J. H. Moore, Tpot: A tree-based pipeline optimization tool for automating machine learning, in Proceedings of the Workshop on Automatic Machine Learning, 2016, 66–74.
[184]	C. Guan, Z. Zhang, H. Li, H. Chang, Z. Zhang, Y. Qin, et al., AutoGL: A library for automated graph learning, in ICLR 2021 Workshop on Geometrical and Topological Representation Learning, 2021.
[185]	M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, F. Hutter, Efficient and robust automated machine learning, Adv. Neural Inform. Process. Syst., 28 (2015).
[186]	M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, F. Hutter, Auto-sklearn 2.0: Hands-free automl via meta-learning, J. Mach. Learn. Res., 23 (2022), 1–61.
[187]	L. Zimmer, M. Lindauer, F. Hutter, Auto-Pytorch: Multi-fidelity metaLearning for efficient and robust autoDL, IEEE T. Pattern Anal., 43 (2021), 3079–3090. https://doi.org/10.1109/tpami.2021.3067763 doi: 10.1109/tpami.2021.3067763
[188]	H. Jin, Q. Song, X. Hu, Auto-Keras: an efficient neural architecture search system, in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2019), 1946–1956. https://doi.org/10.1145/3292500.3330648
[189]	H. Jin, F. Chollet, Q. Song, X. Hu, AutoKeras: An autoML library for deep learning, J. Mach. Learn. Res., 24 (2023), 1–6.
[190]	N. Erickson, J. Mueller, A. Shirkov, H. Zhang, P. Larroy, M. Li, et al., AutoGluon-Tabular: robust and accurate autoML for structured data, preprint, arXiv: 2003.06505.
[191]	C. Wang, Q. Wu, M. Weimer, E. Zhu, FLAML: a fast and lightweight autoML library, in Proceedings of Machine Learning and Systems 3 (MLSys 2021), 2021.
[192]	N. Fusi, R. Sheth, M. Elibol, Probabilistic matrix factorization for automated machine learning, Adv. Neural Inform. Process. Syst., 31 (2018), 3166–3180.
[193]	A. Yakovlev, H. F. Moghadam, A. Moharrer, J. Cai, N. Chavoshi, V. Varadarajan, et al., Oracle AutoML: a fast and predictive AutoML pipeline, in Proc. VLDB Endow., 13 (2020), 3166–3180. https://doi.org/10.14778/3415478.3415542
[194]	D. Golovin, B. Solnik, S. Moitra, G. Kochanski, J. Karro, D. Sculley, Google vizier: A service for black-box optimization, in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2017), 1487–1495. https://doi.org/10.1145/3097983.3098043
[195]	E. Libery, Z. Karning, B. Xiang, L. Rouesnel, B. Coskun, R. Nallapati et al., Elastic machine learning algorithms in Amazon SageMaker, in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, (2020), 731–737. https://doi.org/10.1145/3318464.3386126
[196]	S. Blume, T. Benedens, D. Schramm, Hyperparameter optimization techniques for designing software sensors based on artificial neural networks, Sensors, 21 (2021), 8435. https://doi.org/10.3390/s21248435 doi: 10.3390/s21248435
[197]	C. Cooney, A. Korik, R. Folli, D. Coyle, Evaluation of hyperparameter optimization in machine and deep learning methods for decoding imagined speech EEG, Sensors, 20 (2020), 4629. https://doi.org/10.3390/s20164629 doi: 10.3390/s20164629
[198]	R. Khalida, N. Javaida, Survey on hyperparameters optimization algorithms of forecasting models in smart grid, Sustain. Cities Soc., 61 (2020), 102275. https://doi.org/10.1016/j.scs.2020.102275 doi: 10.1016/j.scs.2020.102275
[199]	R. Andonie, Hyperparameter optimization in learning systems, J. Membr. Comput., 1 (2019), 279–291. https://doi.org/10.1007/s41965-019-00023-0 doi: 10.1007/s41965-019-00023-0
[200]	G. Luo, A review of automatic selection methods for machine learning algorithms and hyper-parameter values, Netw. Model. Anal. Health Inform. Bioinform., 5 (2016), 1–16. https://doi.org/10.1007/s13721-016-0125-6 doi: 10.1007/s13721-016-0125-6
[201]	S. Stober, D. J. Cameron, J. A. Grahn, Using convolutional neural networks to recognize rhythm stimuli from electroencephalography recordings, Adv. Neural Inform. Process. Syst., 27 (2014), 1449–1457.
[202]	A. Drouin-Picaro, T. H. Falk, Using deep neural networks for natural saccade classification from electroencephalograms, in 2016 IEEE EMBS International Student Conference (ISC), 2016, 1–4. https://doi.org/10.1109/embsisc.2016.7508606
[203]	Z. Zhou, F. Xiong, B. Huang, C. Xu, R. Jiao, B. Liao, et al., Game-theoretical energy management for energy internet with big data-based renewable power forecasting, IEEE Access, 5 (2017), 5731–5746. https://doi.org/10.1109/access.2017.2658952 doi: 10.1109/access.2017.2658952
[204]	J. Waring, C. Lindvall, R. Umeton, Automated machine learning: Review of the state-of-the-art and opportunities for healthcare, Artif. Intell. Med., 104 (2020), 101822. https://doi.org/10.1016/j.artmed.2020.101822 doi: 10.1016/j.artmed.2020.101822
[205]	A. Alaa, M. Schaar, Autoprognosis: Automated clinical prognostic modeling via bayesian optimization with structured kernel learning, in Proceedings of the 35th International Conference on Machine Learning, (2018), 139–148.
[206]	I. Castiglioni, L. Rundo, M. Codari, G. Di Leo, C. Salvatore, M. Interlenghi, et al., AI applications to medical images: From machine learning to deep learning, Phys. Med., 83 (2021), 9–24. https://doi.org/10.1016/j.ejmp.2021.02.006 doi: 10.1016/j.ejmp.2021.02.006
[207]	M. Nishio, K. Fujimoto, K. Togashi, Lung segmentation on chest X-ray images in patients with severe abnormal findings using deep learning, Int. J. Imag. Syst. Tech., 31 (2021), 1002–1008. https://doi.org/10.1002/ima.22528 doi: 10.1002/ima.22528
[208]	A. Abdellatif, H. Abdellatef, J. Kanesan, C. O. Chow, J. H. Chuah, H. M. Gheni, An Effective Heart Disease Detection and Severity Level Classification Model Using Machine Learning and Hyperparameter Optimization Methods, IEEE Access, 10 (2022), 79974–79985. https://doi.org/10.1109/ACCESS.2022.3191669 doi: 10.1109/ACCESS.2022.3191669
[209]	D. M. Belete, M. D. Huchaiah, Grid search in hyperparameter optimization of ML models for prediction of HIV/AIDS test results, Int. J. Comput. Appl., 44 (2022), 875–886. https://doi.org/10.1080/1206212X.2021.1974663 doi: 10.1080/1206212X.2021.1974663
[210]	S. Nematzadeh, F. Kiana, M. Torkamanian-Afshar, N. Aydin, Tuning hyperparameters of machine learning algorithms and deep neural networks using metaheuristics: A bioinformatics study on biomedical and biological cases, Comput. Biol. Chem., 97 (2022), 107619. https://doi.org/10.1016/j.compbiolchem.2021.107619 doi: 10.1016/j.compbiolchem.2021.107619
[211]	G. I. Diaz, A. Fokoue-Nkoutche, G. Nannicini, H. Samulowitz, An effective algorithm for hyperparameter optimization of neural networks, IBM J. Res. Dev., 61 (2017), 9:1–9:11. https://doi.org/10.1147/JRD.2017.2709578 doi: 10.1147/JRD.2017.2709578
[212]	D. Stamoulis, E. Cai, D. C. Juan, D. Marculescu, HyperPower: Power- and memory-constrained hyper-parameter optimization for neural networks, in 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), (2018), 19–24. https://doi.org/10.23919/DATE.2018.8341973
[213]	Z. Lu, L. Chen, C. K. Chiang, F. Sha Hyper-parameter Tuning under a Budget Constraint, in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, (2019), 5744–5750. https://doi.org/10.24963/ijcai.2019/796
[214]	C. Wang, H. Wang, C. Zhou, H. Chen, ExperienceThinking: Constrained hyperparameter optimization based on knowledge and pruning, Knowl.-Based Syst., 223 (2018), 106602. https://doi.org/10.1016/j.knosys.2020.106602 doi: 10.1016/j.knosys.2020.106602
[215]	B. Letham, B. Karrer, G. Ottoni, E. Bakshy, Constrained Bayesian Optimization with Noisy Experiments, Bayesian Anal., 14 (2019), 495–519. https://doi.org/10.1214/18-BA1110 doi: 10.1214/18-BA1110
[216]	T. P. Papalexopoulos, C. Tjandraatmadja, R. Anderson, J. P. Vielma, D. Belanger, Constrained discrete black-box optimization using mixed-integer programming, in Proceedings of the 39th International Conference on Machine Learning, 2022, 17295–17322.
[217]	F. Berkenkamp, A. Krause, A. P. Schoellig, Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics, Mach. Learn., 112 (2023), 3713–3747. https://doi.org/10.1007/s10994-021-06019-1 doi: 10.1007/s10994-021-06019-1
[218]	F. Wenzel, J. Snoek, D. Tran, R. Jenatton, Hyperparameter ensembles for robustness and uncertainty quantification, Adv. Neural Inform. Process. Syst., 33 (2020), 6514–6527.
[219]	F. Seifi, M. J. Azizi, S. T. A. Niaki, A data-driven robust optimization algorithm for black-box cases: An application to hyper-parameter optimization of machine learning algorithms, Comput. Ind. Eng., 160 (2021), 107581. https://doi.org/10.1016/j.cie.2021.107581 doi: 10.1016/j.cie.2021.107581
[220]	G. Sunder, T. A. Albrecht, C. J. Nachtsheim, Robust sequential experimental strategy for black-box optimization with application to hyperparameter tuning, Qual. Reliab. Eng. Int., 38 (2022), 3992–4014. https://doi.org/110.1002/qre.3181
[221]	L. Franceschi, P. Frasconi, S. Salzo, R. Grazzi, M. Pontil, Bilevel Programming for Hyperparameter Optimization and Meta-Learning, in Proceedings of the 35th International Conference on Machine Learning, (2018), 1568–1577.
[222]	O. Bohdal, Y. Yang, T. Hospedales, EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization, Adv. Neural Inform. Process. Syst., 34 (2021).
[223]	X. He, K. Zhao, X. Chu, AutoML: A survey of the state-of-the-art, Knowl.-Based Syst., 212 (2021), 106622. https://doi.org/10.1016/j.knosys.2020.106622 doi: 10.1016/j.knosys.2020.106622
[224]	A. M. Vincent, P. Jidesh, An improved hyperparameter optimization framework for AutoML systems using evolutionary algorithms, Sci. Rep., 13 (2023), 4737. https://doi.org/10.1038/s41598-023-32027-3 doi: 10.1038/s41598-023-32027-3
[225]	B. Zoph, V. Vasudevan, J. Shlens, Q. V. Le, Learning transferable architectures for scalable image recognition, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 8697–8710. https://doi.org/10.1109/cvpr.2018.00907
[226]	E. Real, A. Aggarwal, Y. Huang, Q. V. Le, Regularized evolution for image classifier architecture search, in Proceedings of the AAAI Conference on Artificial Intelligence, (2019), 4780–4789. https://doi.org/10.1609/aaai.v33i01.33014780
[227]	C. White, W. Neiswanger, Y. Savani, BANANAS: Bayesian optimization with neural architectures for neural architecture search, in Proceedings of the AAAI Conference on Artificial Intelligence, (2021), 10293–10301. https://doi.org/10.1609/aaai.v35i12.17233
[228]	H. Liu, K. Simonyan, Y. Yang, DARTS: differentiable architecture search, in International Conference on Learning Representations, (2019).
[229]	X. Wang, C. Xue, J. Yan, X. Yang, Y. Hu, K. Sun, Mergenas: Merge operations into one for differentiable architecture search, in Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, (2020), 3065–3072. https://doi.org/10.24963/ijcai.2020/424
[230]	X. Wang, W. Guo, J. Su, X. Yang, J. Yan, Zarts: On zero-order optimization for neural architecture search, Adv. Neural Inform. Process. Syst., 35 (2022).
[231]	Y. Chen, T. Yang, X. Zhang, G. Meng, X. Xiao, J. Sun, Detnas: Backbone search for object detection, Adv. Neural Inform. Process. Syst., 32 (2019), 6642–6652.
[232]	X. Wang, J. Lin, J. Zhao, X. Yang, J. Yan, Eautodet: Efficient architecture search for object detection, in Computer Vision–ECCV 2022, Springer, (2022), 668–684. https://doi.org/10.1007/978-3-031-20044-1_38
[233]	L. Chen, M. D. Collins, Y. Zhu, G. Papandreou, B. Zoph, F. Schroff, et al., Searching for efficient multi-scale architectures for dense image prediction, Adv. Neural Inform. Process. Syst., 31 (2018).
[234]	C. Liu, L. Chen, F. Schroff, H. Adam, W. Hua, A. L. Yuille, et al., Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 82–92. https://doi.org/10.1109/cvpr.2019.00017
[235]	X. Wang, Z. Lian, J. Lin, C. Xue, J. Yan, DIY your easynas for vision: Convolution operation merging, map channel reducing, and search space to supernet conversion tooling, IEEE T. Pattern Anal., 45 (2023), 13974–13990. https://doi.org/10.1109/tpami.2023.3298296 doi: 10.1109/tpami.2023.3298296
[236]	Y. Bengio, A. Lodi, A. Prouvost, Machine learning for combinatorial optimization: a methodological tour d'horizon, Eur. J. Oper. Res., 290 (2021), 405–421. https://doi.org/10.1016/j.ejor.2020.07.063 doi: 10.1016/j.ejor.2020.07.063
[237]	J. Yan, S. Yang, E. Hancock, Learning for graph matching and related combinatorial optimization problems, in Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, (2020), 4988–4996. https://doi.org/10.24963/ijcai.2020/694
[238]	E. B. Khalil, H. Dai, Y. Zhang, B. Dilkina, L. Song, Learning combinatorial optimization algorithms over graphs, Adv. Neural Inform. Process. Syst., 30 (2017), 6351–6361.
[239]	M. Nazari, A. Oroojlooy, L. Snyder, M. Takác, Reinforcement learning for solving the vehicle routing problem, Adv. Neural Inform. Process. Syst., 31 (2018), 9839–9849.
[240]	C. Liu, Z. Dong, H. Ma, W. Luo, X. Li, B. Pang, et al., L2P-MIP: Learning to presolve for mixed integer programming, in The Twelfth International Conference on Learning Representations, (2024).
[241]	Y. Li, X. Chen, W. Guo, X. Li, W. Luo, J. Huang, et al., Hardsatgen: Understanding the difficulty of hard sat formula generation and a strong structure-hardness-aware baseline, in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, (2023), 4414–4425. https://doi.org/10.1145/3580305.3599837
[242]	R. Wang, L. Shen, Y. Chen, X. Yang, D. Tao, J. Yan, Towards one-shot neural combinatorial solvers: Theoretical and empirical notes on the cardinality-constrained case, in The Eleventh International Conference on Learning Representations, (2022).
[243]	Q. Ren, Q. Bao, R. Wang, J. Yan, Appearance and structure aware robust deep visual graph matching: Attack, defense and beyond, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 15242–15251. https://doi.org/10.1109/cvpr52688.2022.01483
[244]	J. Yan, M. Cho, H. Zha, X. Yang, S. M. Chu, Multi-graph matching via affinity optimization with graduated consistency regularization, IEEE T. Pattern Anal., 38 (2016), 1228–1242. https://doi.org/10.1109/tpami.2015.2477832 doi: 10.1109/tpami.2015.2477832
[245]	T. Wang, Z. Jiang, J. Yan, Multiple graph matching and clustering via decayed pairwise matching composition, in Proceedings of the AAAI Conference on Artificial Intelligence, (2020), 1660–1667. https://doi.org/10.1609/aaai.v34i02.5528
[246]	R. Wang, T. Zhang, T. Yu, J. Yan, X. Yang, Combinatorial learning of graph edit distance via dynamic embedding, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 5237–5246. https://doi.org/10.1109/CVPR46437.2021.00520

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)