Neural architecture search via standard machine learning methodologies

Giorgia Franchini; Valeria Ruggiero; Federica Porta; Luca Zanni; Giorgia Franchini; Valeria Ruggiero; Federica Porta; Luca Zanni

doi:10.3934/mine.2023012

Mathematics in Engineering

2023, Volume 5, Issue 1: 1-21. doi: 10.3934/mine.2023012

Previous Article Next Article

Research article Special Issues

Neural architecture search via standard machine learning methodologies

1.
Department of Mathematics and Computer Science, University of Ferrara, Via Machiavelli, 30, 44121 Ferrara (FE), Italy
2.
Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, Via Campi 213/B, 41121 Modena (MO), Italy

Received: 24 March 2021 Revised: 19 October 2021 Accepted: 13 January 2022 Published: 11 February 2022

In the context of deep learning, the more expensive computational phase is the full training of the learning methodology. Indeed, its effectiveness depends on the choice of proper values for the so-called hyperparameters, namely the parameters that are not trained during the learning process, and such a selection typically requires an extensive numerical investigation with the execution of a significant number of experimental trials. The aim of the paper is to investigate how to choose the hyperparameters related to both the architecture of a Convolutional Neural Network (CNN), such as the number of filters and the kernel size at each convolutional layer, and the optimisation algorithm employed to train the CNN itself, such as the steplength, the mini-batch size and the potential adoption of variance reduction techniques. The main contribution of the paper consists in introducing an automatic Machine Learning technique to set these hyperparameters in such a way that a measure of the CNN performance can be optimised. In particular, given a set of values for the hyperparameters, we propose a low-cost strategy to predict the performance of the corresponding CNN, based on its behavior after only few steps of the training process. To achieve this goal, we generate a dataset whose input samples are provided by a limited number of hyperparameter configurations together with the corresponding CNN measures of performance obtained with only few steps of the CNN training process, while the label of each input sample is the performance corresponding to a complete training of the CNN. Such dataset is used as training set for a Support Vector Machines for Regression and/or Random Forest techniques to predict the performance of the considered learning methodology, given its performance at the initial iterations of its learning process. Furthermore, by a probabilistic exploration of the hyperparameter space, we are able to find, at a quite low cost, the setting of a CNN hyperparameters which provides the optimal performance. The results of an extensive numerical experimentation, carried out on CNNs, together with the use of our performance predictor with NAS-Bench-101, highlight how the proposed methodology for the hyperparameter setting appears very promising.

Keywords:

Citation: Giorgia Franchini, Valeria Ruggiero, Federica Porta, Luca Zanni. Neural architecture search via standard machine learning methodologies[J]. Mathematics in Engineering, 2023, 5(1): 1-21. doi: 10.3934/mine.2023012

Related Papers:

[1]	Luca Franzoi, Riccardo Montalto . Time almost-periodic solutions of the incompressible Euler equations. Mathematics in Engineering, 2024, 6(3): 394-406. doi: 10.3934/mine.2024016
[2]	Lars Eric Hientzsch . On the low Mach number limit for 2D Navier–Stokes–Korteweg systems. Mathematics in Engineering, 2023, 5(2): 1-26. doi: 10.3934/mine.2023023
[3]	Piermarco Cannarsa, Rossana Capuani, Pierre Cardaliaguet . C^1;1-smoothness of constrained solutions in the calculus of variations with application to mean field games. Mathematics in Engineering, 2019, 1(1): 174-203. doi: 10.3934/Mine.2018.1.174
[4]	Mirco Piccinini . A limiting case in partial regularity for quasiconvex functionals. Mathematics in Engineering, 2024, 6(1): 1-27. doi: 10.3934/mine.2024001
[5]	Youchan Kim, Seungjin Ryu, Pilsoo Shin . Approximation of elliptic and parabolic equations with Dirichlet boundary conditions. Mathematics in Engineering, 2023, 5(4): 1-43. doi: 10.3934/mine.2023079
[6]	Alessandra De Luca, Veronica Felli . Unique continuation from the edge of a crack. Mathematics in Engineering, 2021, 3(3): 1-40. doi: 10.3934/mine.2021023
[7]	Yuzhe Zhu . Propagation of smallness for solutions of elliptic equations in the plane. Mathematics in Engineering, 2025, 7(1): 1-12. doi: 10.3934/mine.2025001
[8]	Massimiliano Giona, Luigi Pucci . Hyperbolic heat/mass transport and stochastic modelling - Three simple problems. Mathematics in Engineering, 2019, 1(2): 224-251. doi: 10.3934/mine.2019.2.224
[9]	Sudarshan Tiwari, Axel Klar, Giovanni Russo . Interaction of rigid body motion and rarefied gas dynamics based on the BGK model. Mathematics in Engineering, 2020, 2(2): 203-229. doi: 10.3934/mine.2020010
[10]	Edgard A. Pimentel, Miguel Walker . Potential estimates for fully nonlinear elliptic equations with bounded ingredients. Mathematics in Engineering, 2023, 5(3): 1-16. doi: 10.3934/mine.2023063

Abstract

1. Introduction

Within numerous control systems, various measures are implemented to mitigate costs or curtail energy consumption. For some special cases, specific control values are deliberately maintained at zero for designated durations using handoff control strategies, thereby effectively minimizing the associated control costs. In ^[1], a novel control methodology was introduced to solve the maximum hands-off control problem characterized by $L^{0}$ norm, which is to find the sparsest control among all permissible controls. This control methodology holds significant utility in minimizing electricity or fuel consumption, showcasing its practical relevance and applicability. In ^[1], within the car's start-stop system, the engine is automatically deactivated when the vehicle comes to a halt or operates at speeds below a predetermined threshold. This deliberate shutdown serves to diminish CO $_2$ emissions and decrease fuel consumption as part of the system's efficiency measures. Handoff control demonstrates effectiveness within networked control systems as well ^[2].

Sparse optimization has emerged as a prominent subject of interest among scholars in recent years. The technology of sparse optimization has garnered extensive attention and exploration across various domains, including compressed sensing ^[3], image and signal processing ^[4,5], machine learning ^[6,7], Alzheimer's disease biomarkers ^[8], and other fields. Maximum hands-off control shares a relationship with sparsity, a characteristic pertinent to numerous optimization problems. In the realm of sparse optimization, sparsity refers to a condition where a substantial portion of elements within a matrix or vector assumes zero values. Leveraging sparsity offers the potential to economize on both time and compression expenses. Moreover, it serves as a mechanism to sift through extensive datasets, extracting pertinent information and thereby streamlining the complexity of the problem ^[9]. Within various optimization problems, the $L^{0}$ norm of a vector commonly serves as a means to quantify and define sparsity ^[10,11]. Due to its non-convex and non-continuous nature, the $L^{0}$ norm presents inherent complexity in analysis. Consequently, numerous scholars resort to exploring the variational property of the $L^{0}$ norm and employing subdifferentiation techniques to address its intricacies. In ^[12], a technique known as convex relaxation was introduced as a substitute for the $L^{0}$ norm, employing the more tractable $L^{1}$ norm. This transformation enables the formulation of the problem in a linear programming paradigm, amenable to resolution using methods such as the interior point method or the simplex method. In ^[13,14], the approach revolves around the non-convex relaxation method, leveraging the $L^{p}$ relaxation technique, where $0\leq p\leq 1$ , to yield solutions with increased sparsity. In ^[15,16], a focus is directed toward the Difference of Convex functions (DC) algorithm. This method entails the representation of the $L^{0}$ norm as a difference between two convex functions. Subsequently, leveraging the DC algorithm facilitates the resolution of the resultant relaxation model. In ^[17], by addressing the non-convex nature of the logarithmic function, a proposal was made to substitute the logarithm function for the $L^{0}$ norm. In ^[18], the resolution involves addressing a sparse optimization problem that encapsulates two competing objectives: measurement error minimization and sparsity maximization. This problem is approached and solved through the utilization of a multi-objective evolutionary algorithm. In ^[19], an effective fault diagnosis framework was constructed by integrating $L^{0}$ norm sparse constraint optimization with principal component analysis, aimed at mitigating the extent of sparsity. However, in ^{[12,13,14,15,16,17,18,19]}, the sparse optimization problems do not take into account the constraints of the dynamical system. The focus of the study presented in ^[20] revolves around the $L^{1}$ objective optimal control problem specifically tailored for linear discrete systems. This formulation leads to diverse sparse solutions based on the selection of distinct problem parameters. The study detailed in ^[21] introduces a methodical approach for synthesizing sparse optimal control structures governed by linear discrete systems. This method guarantees sparsity within solutions, enabling the specification of a predetermined count of zero elements within the control structure. In ^[22], the utilization of the Smith iteration and Altering-Direction-Implicit (ADI) iteration methods was explored for obtaining numerical solutions in a large sparse optimal control problem governed by linear discrete-time Riccati equations, leveraging Newton's method. In ^[23], provably optimal sparse solutions to overdetermined linear systems with non-negativity constraints were researched in a least-squares sense by implicit enumeration. Nevertheless, in ^{[20,21,22,23]}, the sparse optimal control problem governed by the linear dynamical system failed to incorporate the constraints associated with the nonlinear dynamical system. The work presented in ^[24] delved into the examination of the Newton-like method of undetermined equations. This exploration led to the discovery of sparse solutions pertaining to sparse optimal control problems governed by an affine nonlinear system. The analysis of optimal control using the value function was conducted employing the dynamic programming method. In ^[25], a dynamic programming approach was introduced to effectively approximate the sparse optimal control problem governed by an affine nonlinear system in a numerical context. The maximum hands off control problem governed by the class of affine nonlinear systems was studied in ^[26]. Nonetheless, in ^[24,25,26], the nonlinear dynamical systems involved in the sparse optimal control problem are affine nonlinear systems, where the state variables and control inputs are separable. This paper delves into the investigation of the sparse optimal control problem governed by a general nonlinear dynamical system where the state variables and control inputs are inseparable.

The conventional approach to optimal control involves determining the most effective control strategy within specified constraints ^[27]. This strategy aims to either maximize or minimize the performance index. Typically, deriving an analytical solution for the optimal control problem governed by a nonlinear dynamic system proves challenging. Therefore, the resolution of such problems often necessitates numerical methods to obtain an effective solution ^[28]. The numerical solutions to optimal control problems can be divided into two solution strategies: first optimize then discretize (indirect methods), and first discretize then optimize (direct methods) ^[29]. For problems that do not contain inequality constraints, indirect methods derive the first-order necessary conditions for the optimal control problem, namely, the Euler-Lagrange equations that include the initial and boundary value conditions. The solution to this initial-boundary value problem mainly involves two types of methods: the control vector iteration method and methods such as multiple shooting and collocation ^[30]. The direct method encompasses techniques such as the direct collocation method and control parameterization ^[31,32]. Control parameterization involves discretizing solely the control function, approximating it using fixed basis functions within specific subintervals ^[33]. The coefficients within this linear combination of basis functions serve as the optimal decision variables. Typically, predetermined switching times govern the transitions between values for each control component, often evenly divided within specified intervals ^[34]. To enhance solution accuracy, the time range of the control function is frequently subdivided more finely, leading to a greater number of decision variables. However, this denser division amplifies computational costs. Simultaneously, to mitigate the need for extensive time range subdivisions, one must consider incorporating the switching times as additional decision variables ^[35]. The traditional time-scaling transformation operates by mapping variable switching times with fixed points within a redefined time horizon. This process yields an optimization problem wherein the revised switching times remain constant. The application of the time-scaling transformation finds extensive utility across diverse domains, including mixed integer programming ^[36] and singular optimal control ^[37]. Yet, in practical scenarios, simultaneous switching of time poses notable challenges ^[38]. The sequential adaptive switching time optimization technique (SASTOT) presented by ^[39], suggests that the interval for control switching times can vary, affording the flexibility to freely select the number of segments for each approximate control component. The method initiates by applying control parameterization and the time-scaling transformation to a single control component initially, leaving the remaining components untouched. Subsequently, within the newly introduced time range, the time-scaling transformation introduces a subsequent new time range. This iterative process continues until the final control component undergoes processing in a similar manner. This modification introduces a significant level of flexibility into the control strategy. Compared with the traditional time-scale transformation techniques, the SASTOT can accurately identify sparse positions for each control.

In this study, our focus lies on the maximum hands-off control problem governed by a nonlinear dynamical system (MHCPNDS) with a maximum hands-off control constraint characterized by the $L^{0}$ norm. The optimization variables encompass the count of segments for each control component and the respective switching times for each of these components. The examination of the MHCPNDS poses challenges in obtaining an analytical solution. Hence, the adoption of a numerical solution becomes imperative. In practical scenarios, simultaneous switching of all control components is not optimal. Addressing this, this paper introduces the SASTOT, which allows for a flexible selection of the number of segments and switching times for each individual control element. The integration of control parameterization with the SASTOT serves as a transformative approach to addressing the MHCPNDS. The non-smooth term that involves the maximum operator is approximated through a smoothing function, and we further illustrate the convergence of this approximation technique. The attainment of a sparse solution for the MHCPNDS involves the utilization of a gradient-based algorithm. Empirical assessments through numerical experiments substantiate the efficacy of the algorithm put forth in the study.

The contributions of the paper are three-fold:

1) Distinguished from the above-mentioned sparse optimization problem, which is amenable to analytical solutions, we consider the sparse optimal control problem within a nonlinear dynamic system. Given the intrinsic complexity of the nonlinear dynamic system, pinpointing an analytical solution poses a significant challenge. Taking these into consideration, we propose a numerical solution method, based on the gradient formulae of the cost function (Theorems 5–8) and the gradient-based algorithm, without requiring the linearization of the nonlinear dynamical systems.

2) A smoothing function has been introduced to mitigate the roughness of the non-smooth term involving the maximum operator. Several significant theorems (Theorems 1–4) have been established, illustrating that the smoothing function effectively addresses the shortcomings associated with constraint qualification noncompliance.

3) Setting itself apart from the time-scaling transformation technique, this paper introduces the SASTOT method that enables flexible determination of the number of segments and the specific switching times for each control element individually.

The rest of the paper is organized as follows. In Section 2, we present the MHCPNDS. In Section 3, the MHCPNDS is transformed by using control parameterization and the SASTOT. In Section 4, we deal with the nonsmoothness of the objective function by using the smoothing technique. In Section 5, the gradient-based algorithm is used to solve the resulting smooth problem. In Section 6, numerical results are presented. In Section 7, we draw some conclusions and suggest some future research directions.

2. Maximum hands-off control problem

Let $I_n$ be the set of $\{1, 2, ..., n\}.$ For a continuous time control $v:[0, T]\rightarrow R$ , the $L^{p}$ norm is defined by

$\begin{eqnarray} \|v\|_{p}\triangleq\left(\begin{array}{cccc} \int_{0}^{T}|v|_{p} {\rm{d}}t \end{array} \right)^{{\frac{1}{p}}}, p\in[1, +\infty). \end{eqnarray}$

(2.1)

The $L^{0}$ norm is defined by

$\begin{eqnarray} \|v\|_{0}\triangleq q_{M}( {\rm{supp}}(v)), \end{eqnarray}$

(2.2)

where $q_{M}$ is the Lebesgue measure and the set ${\rm{supp}}(v)$ is called a support set of $v$ defined by

$\begin{eqnarray} {\rm{supp}}(v) = \Big\{t\in [0, T]:v(t)\neq0\Big\}. \end{eqnarray}$

(2.3)

In some cases, it is possible to significantly reduce the control effort by keeping the control value at exactly zero over a time interval. Maximum hands-off control can be called sparse control, which means that the more zeros the control value is in unit time, the more sparse it is. It makes the time interval of the control completely zero.

In order to obtain the sparse solution of maximum hands-off control, we consider a nonlinear dynamical system defined by

$\begin{eqnarray} \begin{array}{cc} \left \{\begin{array}{l} {\frac{d\boldsymbol{x}(t)}{dt}} = \boldsymbol{f}(t, \boldsymbol{x}(t), \boldsymbol{u}(t)), t\in [0, T], \\ \boldsymbol{x}(0) = \xi, \\ \boldsymbol{x}(T) = 0, \end{array} \right. & \\ \end{array} \end{eqnarray}$

(2.4)

where $\boldsymbol{x}(t): = (x_1(t), ..., x_n(t))^\mathrm{T}\in \mathbb{R}^{n}$ denotes the state vector at time $t$ , $\boldsymbol{u}(t): = (u_1(t), ..., u_m(t))^\mathrm{T}\in \mathbb{R}^{m}$ denotes control input vector at time $t$ , $\xi$ denotes the initial state value, $T \ge 0$ denotes a given terminal time, and $\boldsymbol{f}(t, \boldsymbol{x}(t), \boldsymbol{u}(t))$ denotes a nonlinear function vector defined on $\mathbb{R}^{n}$ .

Let $\boldsymbol {x}(\cdot|\boldsymbol{u})$ denote the solution of system (2.4) satisfying

$\begin{eqnarray} \boldsymbol {x}(t|\boldsymbol{u})\geq 0, t\in [0, T]. \end{eqnarray}$

(2.5)

For all $t$ on $[0, T]$ , there is also the following constraint on each component of the control input vector $\boldsymbol{u}(t)$

$\begin{eqnarray} \underset{i\in I_m}{\max}|u_{i}(t)|\leq 1. \end{eqnarray}$

(2.6)

A control input vector $\boldsymbol{u}(t)\in \mathbb{R}^m$ that satisfies constraint (2.6) is called a candidate control input vector. Let $\mathcal L[T, \xi]$ be the set consisting of all candidate control input vectors.

Definition 1. The maximum hands-off control constraint characterized by the $L^{0}$ norm is defined by

$\begin{eqnarray} A_{0}(\boldsymbol{u}) \triangleq {\frac{1}{T}} \sum\limits_{i = 1}^{m}\lambda_{i}\|u_{i}\|_{0}\leq \mu, \end{eqnarray}$

(2.7)

where $\lambda_{i}\ge 0, i\in I_m,$ are given weights, $\boldsymbol{u}\in \mathcal L[T, \xi]$ denotes the admissible control, and $\mu$ is a small positive number.

Definition 1 is used to characterize the sparsity of maximum hands-off control. The cost functional is defined by

$\begin{eqnarray} J(u) = \pi_{0}\Big(\boldsymbol{x}(T|u)\Big)+ \int_{0}^{T}\vartheta \Big(\boldsymbol{x}(t|u)\Big) {\rm{d}}t, \end{eqnarray}$

(2.8)

where $\pi_{0}: \mathbb{R}^m\rightarrow \mathbb{R}$ and $\vartheta: \mathbb{R}^m\rightarrow \mathbb{R}$ are continuously differentiable functions.

Then, our maximum hands-off control problem can be formulated as follows.

Problem A: Given system (2.4), choose an admissible control $\boldsymbol{u} \in \mathcal L[T, \xi]$ to minimize the cost functional defined in (2.8) subject to boundary condition (2.6) and maximum hands-off control constraint (2.7).

3. Solution to maximum hands-off control problem

The $L^{0}$ norm, owing to its non-convex and discontinuous characteristics, inherently introduces complexity into analytical procedures. The $L^{0}$ norm can be well approximated by the $L^{1}$ norm ^[20]. The maximum hands-off control constraint characterized by the $L^{0}$ norm is approximated by Definition 2.

Definition 2. The maximum hands-off control constraint characterized by the $L^{1}$ norm is defined by

$\begin{eqnarray} A_{1}(\boldsymbol{u})\triangleq {\frac{1}{T}} \sum\limits_{i = 1}^{m}\lambda_{i}\|u_{i}\|_{1} = {\frac{1}{T}}\sum\limits_{i = 1}^{m}\lambda_{i} \int_{0}^{T}|u_{i}| {\rm{d}}t\leq \mu, \end{eqnarray}$

(3.1)

where $\lambda_{i}\ge 0, i\in I_m,$ are given weights, $\boldsymbol{u}\in \mathcal L[T, \xi]$ denotes the admissible control, and $\mu$ is a small positive number.

Definition 2 serves to delineate the computationally efficient sparsity of maximum hands-off control. By Definition 2, Problem A can be well approximated by the following Problem B.

Problem B: Given system (2.4), choose the control $\boldsymbol{u} \in \mathcal L[T, \xi]$ to minimize the objective functional defined in (2.8) subject to boundary condition (2.6) and maximum hands-off control constraint (3.1).

Control parameterization stands as an effective methodology for resolving the optimal control problem, commonly approached through approximation via piecewise constant functions ^{[29,40,41,42]}. Increasing the level of detail in the time level partition leads to greater accuracy as it allows for a more intricate and nuanced representation of the time-dependent processes or phenomena under consideration ^[43]. Employing control parameterization enables the derivation of a finite-dimensional approximation for Problem B. Ultimately, the gradient-based algorithm is employed to resolve the resultant approximation problem.

The conventional time-scaling transformation necessitates simultaneous switching of all control components, a condition challenging to attain in practical applications ^[44]. Hence, the introduction of SASTOT emerges as a solution, combining both time-scaling transformation and control parameterization methodologies ^[39]. Within this methodology, each control component possesses the capability to adaptively adjust and independently select its switching time, allowing for diverse switching points across components. Empirical validation showcases a notable reduction in computational complexity alongside an improvement in the accuracy of calculations. For a clearer exposition of this approach, this article examines two distinct control inputs. Let $\boldsymbol{\tilde{u}}(t) = [\tilde{u}_{1}(t), \tilde{u}_{2}(t)]^\mathrm{T}$ . Then, the dynamical system is

$\begin{eqnarray} \boldsymbol{\tilde{f}}(t, \boldsymbol{\tilde{x}}(t), \tilde{u}_{1}(t), \tilde{u}_{2}(t)) = \boldsymbol{\tilde{f}}(t, \boldsymbol{\tilde{x}}(t), \boldsymbol{\tilde{u}}(t)). \end{eqnarray}$

(3.2)

3.1. Control parameterization and time-scale transformation for $\tilde{u}_{1}(t)$ .

The interval $[0, T]$ is divided into $q_{1}$ subintervals $[\delta_{1}^{l_{1}-1}, \delta_{1}^{l_{1}}]$ , $l_{1}\in I_{q_{1}}$ , where $\boldsymbol{\sigma}_{1}: = [\delta_{1}^{1}, \delta_{1}^{2}, \dots, \delta_{1}^{q_{1}}]^\mathrm{T}$ are variable switching time vector, and

$\begin{eqnarray} 0 = \delta_{1}^{1}\le \delta_{1}^{2}\le \dots\ \le \delta_{1}^{q_{1}} = T. \end{eqnarray}$

(3.3)

The control component $\tilde{u}_1(t)$ can be approximated by

$\begin{eqnarray} \tilde{u}_1(t)\approx \tilde{u}_{1}^{q_{1}}(t|\delta_{1}, \sigma_{1}) = \sum\limits_{l_{1} = 1}^{q_{1}}\eta_{1}^{l_{1}}\chi _{[\delta_{1}^{l_{1}-1}, \delta_{1}^{l_{1}})}(t), \ t\in [0, T], l_{1}\in I_{q_{1}}, \end{eqnarray}$

(3.4)

where $\eta_{1}^{l_{1}}$ is the piecewise constant function value on the $l_{1}$ th subinterval of the control component $u_{1}(t)$ satisfying

$\begin{eqnarray} a_{1}\leq \eta_{1}^{l_{1}}\leq b_{1}, l_{1}\in I_{q_{1}}, \end{eqnarray}$

(3.5)

and $\chi _{[\delta_{1}^{l_{1}-1}, \delta_{1}^{l_{1}})}(t)$ is an indicator function defined by

$\begin{eqnarray} \chi _{[\delta_{1}^{l_{1}-1}, \delta_{1}^{l_{1}})}(t) = \begin{array}{cc} \left \{\begin{array}{l} 1, \; \; \; \; \; \; \; \; \; {\rm{if}}\; t\in[\delta_{1}^{l_{1}-1}, \delta_{1}^{l_{1}})), \\ 0 , \; \; \; \; \; \; \; \; \; {\rm{if}}\; t\not\in [\delta_{1}^{l_{1}-1}, \delta_{1}^{l_{1}})). \end{array} \right. & \\ \end{array} \end{eqnarray}$

(3.6)

Then, we obtain the following system:

$\begin{eqnarray} \begin{array}{cc} \left \{\begin{array}{l} {\frac{d\boldsymbol{\tilde{x}}(t)}{dt}} = \sum\limits_{l_{1} = 1}^{q_{1}}\boldsymbol{\tilde{f}}(t, \boldsymbol{\tilde{x}}(t), \eta_{1}^{l_{1}}, \tilde{u}_{2}(t)), \ t\in[0, T], \\ \boldsymbol{\tilde{x}}(0) = \xi, \\ \boldsymbol{\tilde{x}}(T) = 0. \end{array} \right. &\end{array} \end{eqnarray}$

(3.7)

Let $\boldsymbol{\tilde{x}}(\cdot|\boldsymbol {\eta_{1}}, u_{2})$ denote the solution of system (3.7).

After applying the time-scaling transformation ^[45] to $\tilde{u}_{1}(t)$ and mapping the variable switching times $\left\{\delta_{1}^{0}, \delta_{1}^{1}, \dots, \delta_{1}^{q_{1}}\right\}$ to a fixed number of switching times $\left\{0, 1, \dots, q_{1}\right\}$ in the new time horizon, we define the vector $\boldsymbol{\phi}_{1}: = [\phi_{1}^{1}, \dots, \phi_{1}^{q_{1}}]^\mathrm{T} \in \mathbb{R}^{q_{1}}$ , where $\phi_{1}^{l_{1}} = \delta_{1}^{l_{1}}-\delta_{1}^{l_{1}-1}\geq \rho$ , $\rho$ is an extremely small positive number, $l_{1}\in I_{q_{1}}$ , and $\phi_{1}^{1}+\phi_{1}^{2}+\dots+ \phi_{1}^{q_{1}} = T$ .

Then, we introduce a new time variable $p$ and define a time scaling function $\nu_{1}(p, \phi_{1})$

$\begin{eqnarray} t(p)\triangleq \nu_{1}(p, \boldsymbol{\phi}_{1}) = \sum\limits_{l_{1} = 1}^{\lfloor p \rfloor} \phi_{1}^{l_{1}}+\phi_{1}^{(\lfloor p \rfloor +1)}(p-\lfloor p \rfloor), \ p\in [0, q_{1}], \end{eqnarray}$

(3.8)

where $\theta (p) = \tilde{u}_{2}(\nu_{1}(p, \boldsymbol{\phi}_{1})) = \tilde{u}_{2}(t)$ and ${\lfloor p \rfloor}$ is a floor function of the time variable $p$ . With the new time variable $p$ , the dynamical system is redefined on the subinterval $[l_{1}-1, l_{1}), \ l_{1}\in I_{q_{1}},$

$\begin{eqnarray} \begin{array}{cc} \left \{\begin{array}{l} {\frac{d\hat{\boldsymbol{x}}(p)}{dp}} = \phi_{1}^{l_{1}}\boldsymbol{\hat{f}}(p, \hat{\boldsymbol{x}}(p), \eta_{1}^{l_{1}}, \theta (p)), \\ \hat{\boldsymbol{x}}(0) = \xi, \\ \hat{\boldsymbol{x}}(T) = 0. \end{array} \right. & \\ \end{array} \end{eqnarray}$

(3.9)

Let $\hat{\boldsymbol{x}}(\cdot|\boldsymbol {\eta_{1}, \theta })$ denote the solution of system (3.9).

3.2. Control parameterization and time-scaling transformation for $\tilde{u}_{2}(t)$ .

The interval $[0, q_{1}]$ is divided into $q_{2}$ subintervals $[p_{2}^{l_{2}-1}, p_{2}^{l_{2}}], \ l_{1}\in I_{q_{2}},$ where $\boldsymbol{\sigma}_{2} = [p_{2}^{0}, p_{2}^{1}, \dots, p_{2}^{q_{2}}]^\mathrm{T}$ is the variable switching time vector, and

$\begin{eqnarray} 0 = p_{2}^{0}\le p_{2}^{1}\le \dots\ \le p_{2}^{q_{2}} = T. \end{eqnarray}$

(3.10)

The control component $\tilde{u}_{2}(t)$ can be approximated by

$\begin{eqnarray} \tilde{u}_{2}(t)\approx\theta (p): = \sum\limits_{l_{2} = 1}^{q_{2}}\eta_{2}^{l_{2}}\chi _{[p_{2}^{l_{2}-1}, p_{2}^{l_{2}})}(p), p\in [0, q_{1}], \end{eqnarray}$

(3.11)

where $\eta_{2}^{l_{2}}$ is the piecewise constant function value on the $l_{2}$ th subinterval of the control component $\theta(p)$ satisfying

$\begin{eqnarray} a_{2}\leq \eta_{2}^{l_{2}}\leq b_{2}, l_{2}\in I_{q_{2}}. \end{eqnarray}$

(3.12)

With the new time variable $p$ , system (3.9) is redefined on the subinterval $[p_{2}^{l_{2}-1}, p_{2}^{l_{2}}), \ l_{2}\in I_{q_{2}},$

$\begin{eqnarray} \begin{array}{cc} \left \{\begin{array}{l} {\frac{d\check{\boldsymbol{x}}(p)}{dp}} = \phi_{1}^{\lfloor p \rfloor}\boldsymbol{\check{f}}(p, \boldsymbol{\check{x}}(p), \eta_{1}^{\lfloor p \rfloor}, \eta_{2}^{l_{2}}), \\ \boldsymbol{\check{x}}(0) = \xi, \\ \boldsymbol{\check{x}}(0) = 0. \end{array} \right. & \\ \end{array} \end{eqnarray}$

(3.13)

After applying a time-scaling transformation to $\check{u}_{2}(t)$ and mapping the variable switching times $\left\{\delta_{2}^{0}, \delta_{2}^{1}, \dots, \delta_{2}^{q_{2}}\right\}$ to a fixed number of switching times $\left\{0, 1, \dots, q_{2}\right\}$ in the new time horizon, we define the vector $\boldsymbol{\phi}_{2}: = [\phi_{2}^{1}, \dots, \phi_{2}^{q_{2}}]^\mathrm{T} \in \mathbb{R}^{q_{2}}$ , where $\phi_{2}^{l_{2}} = \delta_{2}^{l_{2}}-\delta_{2}^{l_{2}-1}\geq \rho$ , $\rho$ is an extremely small positive number, $l_{2}\in I_{q_{2}},$ and $\phi_{2}^{1}+\phi_{2}^{2}+\dots+ \phi_{2}^{q_{2}} = T$ .

Then, we introduce a new time variable $w$ and define a time scaling function $\nu_{2}(w, \phi_{2})$

$\begin{eqnarray} p(w)\triangleq \nu_{2}(w, \boldsymbol{\phi}_{2}) = \sum\limits_{l_{2} = 1}^{\lfloor w \rfloor} \phi_{2}^{l_{2}}+\phi_{2}^{(\lfloor w \rfloor +1)}(w-\lfloor w \rfloor), \ w\in [0, q_{2}]{, } \end{eqnarray}$

(3.14)

where ${\lfloor w \rfloor}$ is the floor function of the time variable $w$ . System (3.13) is redefined on the subinterval $[l_{2}-1, l_{2}), l_{2}\in I_{q_{2}},$

$\begin{eqnarray} \begin{array}{cc} \left \{\begin{array}{l} {\frac{d\boldsymbol{\check{x}}(w)}{dw}} = \phi_{2}^{l_{2}}\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}\boldsymbol{\check{f}}(w, \boldsymbol{\check{x}}(w), \eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, \eta_{2}^{l_{2}}), \\ \boldsymbol{\check{x}}(0) = \xi, \\ \boldsymbol{\check{x}}(0) = 0. \end{array} \right. & \\ \end{array} \end{eqnarray}$

(3.15)

Let $\check{\boldsymbol{x}}(\cdot|\boldsymbol {\eta_{1}, \eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})$ be the solution of system (3.15). Then the continuous state constraint (2.5) becomes

$\begin{eqnarray} \check{\boldsymbol{x}}(w|\boldsymbol {\eta_{1}, \eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})\geq 0, \forall w\in [l_{2}-1, l_{2}), \ l_{2}\in I_{q_{2}}. \end{eqnarray}$

(3.16)

The cost function (2.8) and the maximum hands-off control constraint (3.1) become

$\begin{eqnarray} &&\overline{J}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} }) = \pi_{0}\Big(\boldsymbol{\tilde{x}}(q_{2}|\boldsymbol {\eta_{1}, \eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})\Big)+\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}}\phi_{2}^{l_{2}}\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}\vartheta(\tilde{\boldsymbol{x}}(w|\boldsymbol {\eta_{1}, \eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})) {\rm{d}}w, \end{eqnarray}$

(3.17)

$\begin{eqnarray} &&A_{2}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} }) = {\frac{1}{q_{2}}}\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}}\phi_{2}^{l_{2}}\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}\Big[\lambda_{1}|\eta_{1}^{\lfloor \nu_{2}(w) \rfloor}|+\lambda_{2}|\eta_{2}^{l_{2}}|\Big] {\rm{d}}w \leq \mu. \end{eqnarray}$

(3.18)

With these in mind, Problem B can be approximated by the following Problem C.

Problem C: Given system (3.15), choose the quadruple $(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} })\in \mathbb{R}^{q_{1}}\times \mathbb{R}^{q_{1}}\times \mathbb{R}^{q_{2}}\times \mathbb{R}^{q_{2}}$ to minimize the objective function defined by (3.17) subject to continuous inequality constraints (3.16), boundary conditions (3.5) and (3.12), and maximum hands-off control constraint (3.18).

Because of $|\eta_{1}^{\lfloor \nu_{2}(w) \rfloor}| = 2{\max}\Big\{\eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, 0\Big\}-\eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, |\eta_{2}^{l_{2}}| = 2{\max}\Big\{\eta_{2}^{l_{2}}, 0\Big\}-\eta_{2}^{l_{2}},$ the maximum hands-off control constraint (3.18) can be equivalently transformed as $A_{3}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} })$ defined in (3.19).

$\begin{eqnarray} \begin{split} A_{3}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} })& = {\frac{1}{q_{2}}}\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}}\phi_{2}^{l_{2}}\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}\Big[\lambda_{1}(2{\max}\left\{\eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, 0\right\}-\eta_{1}^{\lfloor \nu_{2}(w) \rfloor})\\&+\lambda_{2}(2{\max}\left\{\eta_{2}^{l_{2}}, 0\right\}-\eta_{2}^{l_{2}})\Big] {\rm{d}}w \leq \mu \end{split} \end{eqnarray}$

(3.19)

Define

$\begin{eqnarray} \begin{split} J_{1}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} }, \rho) = \overline{J}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} })+\rho H(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} }), \end{split} \end{eqnarray}$

(3.20)

where $\rho$ is the penalty parameter and

$\begin{eqnarray} \begin{split} H(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} })& = \sum\limits_{l_{2} = 1}^{q_{2}}\int_{l_{2}-1}^{l_{2}}{\max}\left\{-\tilde{x}(w| \boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}), 0\right\} {\rm{d}}w+ \sum\limits_{l_{1} = 1}^{q_{1}}\bigg[{\max}\left\{a_{1}-\eta_{1}^{l_{1}}, 0\right\}+{\max}\left\{\eta_{1}^{l_{1}}-b_{1}, 0\right\}\bigg]\\ &+\sum\limits_{l_{2} = 1}^{q_{2}}\bigg[{\max}\left\{a_{2}-\eta_{1}^{l_{2}}, 0\right\}+{\max}\left\{\eta_{2}^{l_{2}}-b_{2}, 0\right\}\bigg]+A_{3}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}}). \end{split} \end{eqnarray}$

Thus, Problem C can be equivalently transformed into Problem D.

Problem D: Given system (3.15), choose the quadruple $(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} })\in \mathbb{R}^{q_{1}}\times \mathbb{R}^{q_{1}}\times \mathbb{R}^{q_{2}}\times \mathbb{R}^{q_{2}}$ to minimize the objective function defined in (3.20).

Remark 1. According to ^[29], it has been established that when the penalty parameter $\rho$ exceeds its threshold value $\rho^*$ , the solution derived for Problem D represents an exact solution for Problem C.

4. Smoothing process

4.1. Smoothing technique

The fundamental concept underlying the smooth technique involves the approximation of the non-smooth maximum operator through the utilization of a smoothing function ^[46], which is defined by

$\begin{eqnarray} \begin{array}{cc} P\left\{G, \rho, q, \varepsilon\right\} = \left \{\begin{array}{l} 0, \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; {\rm{if}}\; G < - {\frac{\varepsilon}{\rho q}}, \\ {\frac{\rho q}{2\varepsilon}}G^{2} +G+ {\frac{\varepsilon}{2\rho q}}, \; \; {\rm{if}}\; - {\frac{\varepsilon}{\rho q}}\leq G < 0, \\ G+ {\frac{\varepsilon}{2\rho q}}, \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; {\rm{if}}\; G\geq 0, \end{array} \right. & \\ \end{array} \end{eqnarray}$

(4.1)

where $\rho$ is a penalty factor, $q$ is the number of continuous inequality constraints, and $\varepsilon > 0$ is the smoothing parameter.

Theorem 1. (^[46]) For $\varepsilon > 0$ , we have

$\begin{eqnarray} 0\leq P\left\{G, \rho, q, \varepsilon\right\}-{\max}\left\{G, 0\right\}\leq {\frac{\varepsilon}{2\rho q}}. \end{eqnarray}$

(4.2)

By the smoothing process, the maximum hands-off control constraint $A_{3}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} })$ defined in (3.19) can be approximated by

$\begin{eqnarray} \begin{split} &\tilde{A}_{3}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} }, \rho, q_2, \varepsilon)\\& = {\frac{1}{q_{2}}}\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}}\phi_{2}^{l_{2}}\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}\Big[\lambda_{1}\Big(2P\{\eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, \rho, q_{2}, \varepsilon\}-\eta_{1}^{\lfloor \nu_{2}(w) \rfloor}\Big)+\lambda_{2}\Big(2P\{\eta_{2}^{l_{2}}, \rho, q_{2}, \varepsilon\}-\eta_{2}^{l_{2}}\Big)\Big] {\rm{d}}w. \end{split} \end{eqnarray}$

(4.3)

Based on the smoothing function (4.1), the cost function (3.20) can be approximated by

$\begin{eqnarray} \begin{split} J_{2}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}}, \rho, q_1, q_2, \varepsilon ) = \overline{J}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} })+\rho\tilde{H}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}}, \rho, q_1, q_2, \varepsilon), \end{split} \end{eqnarray}$

(4.4)

where

$\begin{eqnarray} \begin{split} &\tilde{H}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} }, \rho, q_1, q_2, \varepsilon) = \sum\limits_{l_{2} = 1}^{q_{2}}\int_{l_{2}-1}^{l_{2}}P\{-\tilde{x}(w| \boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} }), \rho, q_{2}, \varepsilon\} {\rm{d}}w+\tilde{A}_{3}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} }, \rho, q_2, \varepsilon)\\ &+\sum\limits_{l_{2} = 1}^{q_{2}}\bigg[P\{a_{2}-\eta_{2}^{l_{2}}, \rho, q_{2}, \varepsilon\}+P\{\eta_{2}^{l_{2}}-b_{2}, \rho, q_{2}, \varepsilon\}\bigg]+\sum\limits_{l_{1} = 1}^{q_{1}}\bigg[P\{a_{1}-\eta_{1}^{l_{1}}, \rho, q_{1}, \varepsilon\}+P\{\eta_{1}^{l_{1}}-b_{1}, \rho, q_{1}, \varepsilon\}\bigg]. \end{split} \end{eqnarray}$

(4.5)

Based on the smoothing function (3.20), Problem D can be approximated by Problem E.

Problem E: Given system (3.15), choose the quadruple $(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} })\in \mathbb{R}^{q_{1}}\times \mathbb{R}^{q_{1}}\times \mathbb{R}^{q_{2}}\times \mathbb{R}^{q_{2}}$ to minimize the cost function defined in (4.4).

4.2. Error analysis

The application of smoothing techniques may introduce discrepancies between Problems D and E. This section entails the derivation of errors between Problems D and E concerning the smoothing function (3.20). It has been demonstrated that for a sufficiently small smoothing parameter $\varepsilon$ , the solution to Problem D can be acquired by sequentially solving a series of Problem E while incrementing the values of the penalty factor $\rho$ .

Theorem 2. If $\rho > 0, \ q > 0$ , and $\varepsilon > 0$ , then

$\begin{eqnarray} 0\leq J_{2}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}}, \rho, q_1, q_2, \varepsilon )-J_{1}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} }, \rho)\leq {\frac{5\varepsilon}{2}+\frac{(\lambda_{1}+\lambda_{2})T^{2}\varepsilon}{q_{2}}}. \end{eqnarray}$

(4.6)

Proof. Based on Theorem 1, we have for $\rho > 0, \ q_{1}, \ q_{2} > 0$ , and $\varepsilon > 0$ ,

$0\leq P\left\{G, \rho, q, \varepsilon\right\}-{\max}\left\{G, 0\right\}\leq {\frac{\varepsilon}{2\rho q}}{.}$

Then,

$\begin{eqnarray} \begin{split} &0\leq J_{2}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} }, \rho, q_1, q_2, \varepsilon)-J_{1}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} }, \rho)\\ & = \rho\Bigg\{\sum\limits_{l_{2} = 1}^{q_{2}}\int_{l_{2}-1}^{l_{2}}\Big(P\big\{\tilde{x}(w| \boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}), \rho, q_{2}, \varepsilon\big\}-{\max}\big\{\tilde{x}(w| \boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}), 0\big\}\Big) {\rm{d}}w\\ &+\sum\limits_{l_{1} = 1}^{q_{1}}\bigg[\Big(P\{\eta_{1}^{l_{1}}-a_{1}, \rho, q_{1}, \varepsilon\}-{\max}\{\eta_{1}^{l_{1}}-a_{1}, 0\}\Big)+\Big(P\{b_{1}-\eta_{1}^{l_{1}}, \rho, q_{1}, \varepsilon\}-{\max}\{b_{1}-\eta_{1}^{l_{1}}, 0\}\Big)\bigg]\\ &+\sum\limits_{l_{2} = 1}^{q_{2}}\bigg[\Big(P\{\eta_{2}^{l_{2}}-a_{2}, \rho, q_{2}, \varepsilon\}-{\max}\{\eta_{1}^{l_{2}}-a_{2}, 0\}\Big)+\Big(P\{b_{2}-\eta_{2}^{l_{2}}, \rho, q_{2}, \varepsilon\}-{\max}\{b_{2}-\eta_{2}^{l_{2}}, 0\}\Big)\bigg]\\ &+ {\frac{1}{q_{2}}}\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}}\phi_{2}^{l_{2}}\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}\Big[2\lambda_{1}\Big(P\{\eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, \rho, q_{2}, \varepsilon\}-{\max}\{\eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, 0\}\Big)+2\lambda_{2}\Big(P\{\eta_{2}^{l_{2}}, \rho, q_{2}, \varepsilon\}-{\max}\{\eta_{2}^{l_{2}}, 0\}\Big)\Big] {\rm{d}}w\Bigg\}\\ &\leq \rho\bigg[ {\frac{5\varepsilon}{2\rho }}+ {\frac{2(\lambda_{1}+\lambda_{2})T^{2}\varepsilon}{2\rho q_{2}}}\bigg] = \frac{5\varepsilon}{2}+ {\frac{(\lambda_{1}+\lambda_{2})T^{2}\varepsilon}{q_{2}}}, \end{split} \end{eqnarray}$

which completes the proof.

Theorem 3. Let $(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}})$ be the solution of Problem D, and let $(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}})$ be the solution of Problem E. Then,

$\begin{eqnarray} 0\leq J_{2}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}}, \rho, q_1, q_2, \varepsilon)-J_{1}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}}, \rho)\leq {\frac{5\varepsilon}{2}}+ {\frac{(\lambda_{1}+\lambda_{2})T^{2}\varepsilon}{q_{2}}}. \end{eqnarray}$

Proof. Based on Theorem 2, we have

$\begin{equation} \begin{aligned} 0\leq J_{2}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}}, \rho, q_1, q_2, \varepsilon)-J_{1}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}}, \rho)\leq {\frac{5\varepsilon}{2}}+ {\frac{(\lambda_{1}+\lambda_{2})T^{2}\varepsilon}{q_{2}}}, \% {\frac{(\lambda_{1}+\lambda_{2})T^{2}\varepsilon}{c}}+ {\frac{5\varepsilon}{2c}}\Lambda^{-\alpha}\\ 0\leq J_{2}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}}, \rho, q_1, q_2, \varepsilon)-J_{1}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}}, \rho)\leq {\frac{5\varepsilon}{2}}+ {\frac{(\lambda_{1}+\lambda_{2})T^{2}\varepsilon}{q_{2}}}. \nonumber \end{aligned} \end{equation}$

Since $(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}})$ is the solution of Problem D, we have

$J_{1}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}}, \rho) > J_{1}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}}, \rho),$

which yields

$\begin{eqnarray} J_{2}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}}, \rho, q_1, q_2, \varepsilon)-J_{1}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}}, \rho) \\ \leq J_{2}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}}, \rho, q_1, q_2, \varepsilon)-J_{1}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}}, \rho). \end{eqnarray}$

(4.7)

Since $(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}})$ is the solution of Problem D, we obtain

$J_{2}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}}, \rho, q_1, q_2, \varepsilon) < J_{2}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}}, \rho, q_1, q_2, \varepsilon),$

which yields

$\begin{eqnarray} J_{2}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}}, \rho, q_1, q_2, \varepsilon)-J_{1}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}}, \rho)\\ \leq J_{2}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}}, \rho, q_1, q_2, \varepsilon)-J_{1}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}}, \rho). \end{eqnarray}$

(4.8)

Based on (4.7) and (4.8), we have

$\begin{eqnarray} \begin{split} 0&\leq J_{2}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}}, \rho, q, \varepsilon)-J_{1}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}}, \rho)\\ &\leq J_{2}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}}, \rho, q, \varepsilon)-J_{1}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}}, \rho)\\&\leq J_{2}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}}, \rho, q, \varepsilon)-J_{1}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}}, \rho)\leq {\frac{5\varepsilon}{2}}+ {\frac{(\lambda_{1}+\lambda_{2})T^{2}\varepsilon}{q_{2}}}, \nonumber \end{split} \end{eqnarray}$

which completes this proof.

Remark 2. Theorem 3 states that when the smoothing parameter $\varepsilon$ assumes a sufficiently small value, the solution of Problem E tends to approximate the solution of Problem D.

To obtain an error estimation between the solutions of Problems E and C outlined in Theorem 4, the definition of $\varepsilon$ -feasibility to Problem D is given in Definition 3.

Definition 3. A vector $(\boldsymbol {\eta_{1}^{\varepsilon}}, \boldsymbol {\eta_{2}^{\varepsilon}, \boldsymbol{\phi_{1}^{\varepsilon}}, \boldsymbol{\phi_{2}^{\varepsilon}})\in \mathbb{R}^{q_{1}}\times \mathbb{R}^{q_{1}}\times \mathbb{R}^{q_{2}}\times \mathbb{R}^{q_{2}}}$ is called $\varepsilon$ -feasible to Problem D if

$\begin{eqnarray} &&P\{(\eta_{2}^{l_{2}})^\varepsilon-a_{2}, \rho, q_{2}, \varepsilon\}\leq \varepsilon, \; \; P\{b_{2}-(\eta_{2}^{l_{2}})^\varepsilon, \rho, q_{2}, \varepsilon\}\leq \varepsilon, \; \; P\{(\eta_{1}^{l_{1}})^\varepsilon-a_{1}, \rho, q_{1}, \varepsilon\}\leq \varepsilon, \; \; \\ &&P\{b_{1}-(\eta_{1}^{l_{1}})^\varepsilon, \rho, q_{1}, \varepsilon\}\leq \varepsilon, \; \; P\{-\tilde{x}(w|\boldsymbol {\eta_{1}^{\varepsilon}}, \boldsymbol {\eta_{2}^{\varepsilon}}, \boldsymbol{\phi_{1}^{\varepsilon}}, \boldsymbol{\phi_{2}^{\varepsilon}}), \rho, q_{2}, \varepsilon\}\leq \varepsilon, \; \; \\ &&P\{(\eta_{1}^{\lfloor \nu_{2}(w) \rfloor})^\varepsilon, \rho, q_{2}, \varepsilon\}\leq \varepsilon, \; \; P\{(\eta_{2}^{l_{2}})^\varepsilon, \rho, q_{2}, \varepsilon\}\leq \varepsilon. \end{eqnarray}$

Theorem 4. Let $(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}})$ and $(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}})$ be the solution of Problems D and E, respectively. Furthermore, let $(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}})$ be feasible to Problem D and $(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}})$ be $\varepsilon$ -feasible to Problem D. Then

$\begin{eqnarray} - {\frac{5\varepsilon}{2}-\frac{(\lambda_{1}+\lambda_{2})T^{2}\varepsilon}{q_{2}}}\leq \overline{J}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}})-\overline{J}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}})\leq {\frac{5\varepsilon}{2}}+ {\frac{(\lambda_{1}+\lambda_{2})T^{2}\varepsilon}{q_{2}}}. \end{eqnarray}$

(4.9)

Proof. Since $(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}})$ is the solution of Problem D, we have

$\begin{eqnarray} \begin{split} H(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}})& = \sum\limits_{l_{2} = 1}^{q_{2}}\int_{l_{2}-1}^{l_{2}}{\max}\left\{-\tilde{x}(w| \boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}}, 0\right\} {\rm{d}}w\\&+ \sum\limits_{l_{1} = 1}^{q_{1}}\bigg[{\max}\left\{a_{1}-(\eta_{1}^{l_{1}})^*, 0\right\}+{\max}\left\{(\eta_{1}^{l_{1}})^*-b_{1}, 0\right\}\bigg]\\ &+\sum\limits_{l_{2} = 1}^{q_{2}}\bigg[{\max}\left\{a_{2}-(\eta_{1}^{l_{2}})^*, 0\right\}+{\max}\left\{(\eta_{2}^{l_{2}})^*-b_{2}, 0\right\}\bigg]+A_{3}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}}) = 0, \end{split} \end{eqnarray}$

where

$\begin{eqnarray} &&A_{3}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}})\\ && = {\frac{1}{q_{2}}}\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}} (\phi_{2}^{l_{2}})^*(\phi_{1}^{\lfloor \nu_{2}(w) \rfloor})^*\Big[\lambda_{1}(2{\max}\left\{(\eta_{1}^{\lfloor \nu_{2}(w) \rfloor})^*, 0\right\}-(\eta_{1}^{\lfloor \nu_{2}(w) \rfloor})^*)+\lambda_{2}(2{\max}\left\{(\eta_{2}^{l_{2}})^*, 0\right\}-(\eta_{2}^{l_{2}})^*)\Big] {\rm{d}}w \\&& = 0. \end{eqnarray}$

Since $(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}})$ is $\varepsilon$ -feasible to Problem D, we obtain

$0\leq\rho\tilde{H}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}}, \rho, q_1, q_2, \varepsilon)\leq {\frac{5\varepsilon}{2}}+ {\frac{(\lambda_{1}+\lambda_{2})T^{2}\varepsilon}{q_{2}}}.$

Based on Theorem 3, we get

$\begin{align} 0&\leq \overline{J}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}} )+\rho\tilde{H}(\boldsymbol {\eta_{1}^{**}}, \boldsymbol {\eta_{2}^{**}}, \boldsymbol{\phi_{1}^{**}}, \boldsymbol{\phi_{2}^{**}}, \rho, q_1, q_2, \varepsilon)-\overline{J}(\boldsymbol {\eta_{1}^{*}}, \boldsymbol {\eta_{2}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}})-\rho H(\boldsymbol{\eta_{1}^{*}}, \boldsymbol{\eta_{1}^{*}}, \boldsymbol{\phi_{1}^{*}}, \boldsymbol{\phi_{2}^{*}}) \\ &\leq {\frac{5\varepsilon}{2}}+ {\frac{(\lambda_{1}+\lambda_{2})T^{2}\varepsilon}{q_{2}}}, \end{align}$

which completes this proof.

Remark 3. In this scenario, as outlined in Theorem 4, an error estimation between the solutions of Problems E and C is provided, specifically under the condition of $\rho > \rho^*$ . Consequently, the solution for SOP (Sum of Optimization Problems) can be approximately obtained by iteratively solving a sequence of Problem E.

5. Gradient computation

In this section, we will derive the gradient formulae of the cost function $J_{2}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}}, \rho, q_1, q_2, \varepsilon)$ with respect to the quadruple $(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} })\in \mathbb{R}^{q_{1}}\times \mathbb{R}^{q_{1}}\times \mathbb{R}^{q_{2}}\times \mathbb{R}^{q_{2}}$ based on Theorems 5–8, whose proofs are similar to the proofs of Theorems 1 and 2 in ^[39].

Theorem 5. For each $(\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})$ , we have

$\begin{eqnarray} {\frac{\partial \tilde{\boldsymbol{x}}(w|\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})}{\partial \boldsymbol{\eta_{1}}}} = \boldsymbol{M}_{1}(w|\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}), \ w\in[0, q_{2}], \end{eqnarray}$

(5.1)

where $\boldsymbol{M}_{1}(\cdot|\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})$ is the solution to the following system on $w\in[l_{2}-1, l_{2}], l_{2}\in I_{q_{2}}:$

$\begin{equation} \begin{array}{cc} \left \{\begin{array}{l} \dot{\boldsymbol{M}_{1}}(w) = \phi_{2}^{l_{2}}\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}\bigg\{{ \frac{\partial\boldsymbol{f}(w, \hat{\boldsymbol{x}}(w), \eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, \eta_{2}^{l_{2}})}{\partial \tilde{\boldsymbol{x}}}}\boldsymbol{M}_{1}(w)+ \frac{\partial\boldsymbol{f}(w, \hat{\boldsymbol{x}}(w), \eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, \eta_{2}^{l_{2}})}{\partial {\boldsymbol{\eta_{1}}}} \bigg\}.\\ \boldsymbol{M}_{1}(0) = 0. \end{array} \right. & \\ \end{array} \end{equation}$

(5.2)

On the basis of Theorem 5, the gradient of $J_{2}$ with respect to $\boldsymbol{\eta_{1}}$ is given as follows:

$\begin{eqnarray} \begin{split} \frac{\partial J_{2}}{\partial \boldsymbol{\eta_{1}}} & = \frac{\pi_{0}(\tilde{\boldsymbol{x}}(q_{2}|\boldsymbol {\eta_{1}, \eta_{2}}))}{\partial \tilde {\boldsymbol{x}}}\boldsymbol{M}_{1}(q_{2}) +\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}}\phi_{2}^{l_{2}}\phi_{1}^{\lfloor \nu_{2}(w) \rfloor} \frac{\vartheta(\tilde{\boldsymbol{x}}(w|\boldsymbol {\eta_{1}, \eta_{2}}))}{\partial\tilde{\boldsymbol{x}}}\boldsymbol{M}_{1}(w) {\rm{d}}w\\ &+\rho\bigg[\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}} \frac{P\left\{-\tilde{x}(w| \boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}), \rho, q_{2}, \varepsilon\right\}}{\partial \tilde {\boldsymbol{x}}}\boldsymbol{M}_{1}(w) {\rm{d}}w\\ &+\sum\limits_{l_{1} = 1}^{q_{1}} \frac{(P\left\{a_{1}-\eta_{1}^{l_{1}}, \rho, q_{1}, \varepsilon\right\}+P\left\{\eta_{1}^{l_{1}}-b_{1}, \rho, q_{1}, \varepsilon\right\})}{\partial \boldsymbol{\eta_{1}}}\\ &+\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}} \frac{\phi_{2}^{l_{2}}\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}}{q_{2}}\bigg\{ \frac{\partial[\lambda_{1}(2P\left\{\eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, c, q_{2}, \varepsilon\right\}-\eta_{1}^{\lfloor \nu_{2}(w) \rfloor})]}{\partial \boldsymbol{\eta_{1}}}\bigg\} {\rm{d}}w\bigg]. \end{split} \end{eqnarray}$

(5.3)

Theorem 6. For each $(\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})$ ,

$\begin{eqnarray} {\frac{\partial \tilde{\boldsymbol{x}}(w|\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})}{\partial \boldsymbol{\eta_{2}}}} = \boldsymbol{M}_{2}(w|\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}), \ w\in[0, q_{2}], \end{eqnarray}$

where $\boldsymbol{M}_{2}(\cdot|\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})$ is the solution to the following system on $w\in[l_{2}-1, l_{2}], \ l_{2}\in I_{q_{2}}:$

$\begin{equation} \begin{array}{cc} \left \{\begin{array}{l} \dot{\boldsymbol{M}_{2}}(w) = \phi_{2}^{l_{2}}\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}\bigg\{{ \frac{\partial\boldsymbol{f}(w, \hat{\boldsymbol{x}}(w), \eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, \eta_{2}^{l_{2}})}{\partial \tilde{\boldsymbol{x}}}}\boldsymbol{M}_{2}(w)+ \frac{\partial\boldsymbol{f}(w, \hat{\boldsymbol{x}}(w), \eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, \eta_{2}^{l_{2}})}{\partial {\boldsymbol{\eta_{2}}}} \bigg\}.\\ \boldsymbol{M}_{2}(0) = 0. \end{array} \right. & \\ \end{array} \end{equation}$

(5.4)

On the basis of Theorem 6, the gradient of $J_{2}$ with respect to $\boldsymbol{\eta_{2}}$ is given as follows:

$\begin{eqnarray} \begin{split} \frac{\partial J_{2}}{\partial \boldsymbol{\eta_{2}}} & = \frac{\pi_{0}(\tilde{\boldsymbol{x}}(q_{2}|\boldsymbol {\eta_{1}, \eta_{2}}))}{\partial \tilde {\boldsymbol{x}}}\boldsymbol{M}_{2}(q_{2}) +\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}}\phi_{2}^{l_{2}}\phi_{1}^{\lfloor \nu_{2}(w) \rfloor} \frac{\vartheta(\tilde{\boldsymbol{x}}(w|\boldsymbol {\eta_{1}, \eta_{2}}))}{\partial\tilde{\boldsymbol{x}}}\boldsymbol{M}_{2}(w) {\rm{d}}w\\ &+\rho\bigg[\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}} \frac{P\left\{-\tilde{x}(w| \boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}), \rho, q_{2}, \varepsilon\right\}}{\partial \tilde {\boldsymbol{x}}}\boldsymbol{M}_{2}(w) {\rm{d}}w\\ &+\sum\limits_{l_{1} = 1}^{q_{1}} \frac{(P\left\{a_{2}-\eta_{2}^{l_{2}}, \rho, q_{2}, \varepsilon\right\}+P\left\{\eta_{2}^{l_{2}}-b_{2}, \rho, q_{2}, \varepsilon\right\})}{\partial \boldsymbol{\eta_{2}}}\\ &+\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}} \frac{\phi_{2}^{l_{2}}\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}}{q_{2}}\bigg\{ \frac{\partial[\lambda_{2}(2P\left\{\eta_{2}^{l_{2}}, \rho, q_{2}, \varepsilon\right\}-\eta_{2}^{l_{2}})]}{\partial \boldsymbol{\eta_{2}}}\bigg\} {\rm{d}}w\bigg]. \end{split} \end{eqnarray}$

Theorem 7. For each $(\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})$ , we have

$\begin{eqnarray} {\frac{\partial \tilde{\boldsymbol{x}}(w|\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})}{\partial \boldsymbol{\phi_{1}}}} = \boldsymbol{N}_{1}(w|\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}), \ w\in[0, q_{2}]{, } \end{eqnarray}$

where $\boldsymbol{N}_{1}(\cdot|\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})$ is the solution to the following system on $w\in[l_{2}-1, l_{2}], \ l_{2}\in I_{q_{2}}:$

$\begin{equation} \begin{array}{cc} \left \{\begin{array}{l} \dot{\boldsymbol{N}_{1}}(w) = \phi_{2}^{l_{2}}\bigg\{{ \frac{\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}\partial\boldsymbol{f}(w, \hat{\boldsymbol{x}}(w), \eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, \eta_{2}^{l_{2}})}{\partial \tilde{\boldsymbol{x}}}}\boldsymbol{N}_{1}(w)+ \frac{\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}\partial\boldsymbol{f}(w, \hat{\boldsymbol{x}}(w), \eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, \eta_{2}^{l_{2}})}{\partial {\boldsymbol{\phi_{1}}}} \bigg\}.\\ \boldsymbol{N}_{1}(0) = 0. \end{array} \right. & \\ \end{array} \end{equation}$

(5.5)

Based on Theorem 7, the gradient of $J_{2}$ with respect to $\boldsymbol{\phi_{1}}$ is given as follows:

$\begin{eqnarray} \begin{split} \frac{\partial J_{2}}{\partial \boldsymbol{\phi_{1}}} & = \frac{\pi_{0}(\tilde{\boldsymbol{x}}(q_{2}|\boldsymbol {\eta_{1}, \eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}))}{\partial \boldsymbol{\phi_{1}}}\boldsymbol{N}_{1}(q_{2})+ \sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}}\phi_{2}^{l_{2}} \frac{\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}\vartheta(\tilde{\boldsymbol{x}}(w|\boldsymbol {\eta_{1}, \eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}))}{\partial \boldsymbol{\phi_{1}}}\boldsymbol{N}_{1}(w) {\rm{d}}w\\ &+\rho\bigg[\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}} \frac{P\left\{-\tilde{x}(w| \boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}), \rho, q_{2}, \varepsilon\right\}}{\partial \boldsymbol{\phi_{1}}}\boldsymbol{N}_{1}(w) {\rm{d}}w\\ &+\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}} \frac{\phi_{2}^{l_{2}}}{q_{2}}\bigg\{ \frac{\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}\partial[\lambda_{1}(2P\left\{\eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, \rho, q_{2}, \varepsilon\right\}-\eta_{1}^{\lfloor \nu_{2}(w) \rfloor})+\lambda_{2}(2P\left\{\eta_{2}^{l_{2}}, \rho, q_{2}, \varepsilon\right\}-\eta_{2}^{l_{2}})}{\partial \boldsymbol{\phi_{1}}}\bigg\} {\rm{d}}w\bigg]. \end{split} \end{eqnarray}$

Remark 4. For this, we introduce the inverse of $\nu_{2}(w)$ . The gradient of $J_{2}$ with respect to $\boldsymbol{\phi}_{2}$ is complicated to derive because the switching time is in the time range of $w$ . $\nu_{2}^{-1}(p)$ is a function of $\boldsymbol{\phi}_{2}$ , and at every connection point, this inverse function is not differentiable. $\nu_{2}^{-1}(r), r\in I_{q_{1}}$ is represented by $w_{r}$ , and the gradient of $w_{r}$ with respect to $\phi_{2}^{l_{2}}, \ l_{2}\in I_{q_{2}}$ at $r\in I_{q_{1}-1}$ is given in Remark 1 as follows.

Remark 5. If point $w_{r}$ coincide with $w\in I_{q_{2}-1},$ then we have

$\begin{eqnarray} \frac{\partial w_{r}^{-}}{\partial \phi_{2}^{l_{2}}}: = \begin{array}{cc} \left \{\begin{array}{l} 0 \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; {\rm{if}}\; l_{2} > e, \\ (-p+\sum\limits_{j = 1}^{e-1}\phi_{2}^{j})/(\phi_{2}^{e})^2, \; \; \; \; \; {\rm{if}}\; l_{2} = e, \\ - \frac{1}{\phi_{2}^{e+1}}, \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; {\rm{if}}\; l_{2} < e, \end{array} \right. & \\ \end{array} \end{eqnarray}$

(5.6)

and

$\begin{eqnarray} \frac{\partial w_{r}^{+}}{\partial \phi_{2}^{l_{2}}}: = \begin{array}{cc} \left \{\begin{array}{l} 0 \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; {\rm{if}}\; l_{2} > e+1, \\ (-p+\sum\limits_{j = 1}^{e}\phi_{2}^{j})/(\phi_{2}^{e+1})^2, \; \; \; \; \; {\rm{if}}\; l_{2} = e+1, \\ - \frac{1}{\phi_{2}^{e+1}} \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; {\rm{if}}\; l_{2} < e+1. \end{array} \right. & \\\end{array} \end{eqnarray}$

(5.7)

Remark 6. If point $w_{h}$ does not coincide with $w\in I_{q_{2}-1}$ , then we have $\frac{\partial w_{r}^{+}}{\partial \phi_{2}} = \frac{\partial w_{r}^{-}}{\partial \phi_{2}}$ .

With the above discussion, the gradient of states with respect to $\boldsymbol{\phi_{2}}$ is given below.

Theorem 8. For each $(\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})$ , we have

$\begin{eqnarray} {\frac{\partial \tilde{\boldsymbol{x}}(w|\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})}{\partial \boldsymbol{\phi_{2}}}} = \boldsymbol{N}_{2}(w|\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}), \ w\in[0, q_{2}], \end{eqnarray}$

where $\boldsymbol{N}_{2}(\cdot|\boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}})$ is the solution to the following system on $w\in[l_{2}-1, l_{2}], \ l_{2}\in I_{q_{2}}:$

$\begin{equation} \begin{array}{cc} \left \{\begin{array}{l} \dot{\boldsymbol{N}_{2}}(w) = \phi_{1}^{\lfloor \nu_{2}(w) \rfloor}\bigg\{{ \frac{\phi_{2}^{l_{2}}\partial\boldsymbol{f}(w, \hat{\boldsymbol{x}}(w), \eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, \eta_{2}^{l_{2}})}{\partial \tilde{\boldsymbol{x}}}}\boldsymbol{N}_{2}(w)+ \frac{\phi_{2}^{l_{2}}\partial\boldsymbol{f}(w, \hat{\boldsymbol{x}}(w), \eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, \eta_{2}^{l_{2}})}{\partial {\boldsymbol{\phi_{2}}}} \bigg\}, \\ \boldsymbol{N}_{2}(w_{r}^{+}) = \boldsymbol{N}_{2}(w_{r}^{-})-H_{r, \boldsymbol{f}, \boldsymbol{h}}^{+}+H_{r, \boldsymbol{f}, \boldsymbol{h}}^{-}, \\ \boldsymbol{N}_{2}(0) = 0, \\ \end{array} \right. & \\ \end{array} \end{equation}$

(5.8)

with

$H_{r, \boldsymbol{f}, \boldsymbol{h}}^{-} = \phi_{1}^{\lfloor \nu_{2}(w_{h}^{-}) \rfloor}\phi_{2}^{\lfloor w_{h}^{-} \rfloor} \Big[\boldsymbol{f}(\tilde{\boldsymbol{x}}(w_{h}^{-}))+h_{1}(\tilde{\boldsymbol{x}}(w_{h}^{-}))\eta_{1}^{\lfloor \nu_{2}(w_{h}^{-}) \rfloor}+ h_{2}(\tilde{\boldsymbol{x}}(w_{h}^{-}))\eta_{2}^{\lfloor w_{h}^{-} \rfloor}\Big] \frac{\partial w_{r}^{-}}{\partial \phi_{2}}$ and

$H_{r, \boldsymbol{f}, \boldsymbol{h}}^{+} = \phi_{1}^{\lfloor \nu_{2}(w_{h}^{+}) \rfloor}\phi_{2}^{\lfloor w_{h}^{+} \rfloor} \Big[\boldsymbol{f}(\tilde{\boldsymbol{x}}(w_{h}^{+}))+h_{1}(\tilde{\boldsymbol{x}}(w_{h}^{+}))\eta_{1}^{\lfloor \nu_{2}(w_{h}^{+}) \rfloor}+ h_{2}(\tilde{\boldsymbol{x}}(w_{h}^{+}))\eta_{2}^{\lfloor w_{h}^{+} \rfloor}\Big] \frac{\partial w_{r}^{+}}{\partial \phi_{2}}$ .

Based on Theorem 8, the gradient of $J_{2}$ with respect to $\boldsymbol{\phi_{2}}$ is given as follows:

$\begin{eqnarray} \begin{split} \frac{\partial J_{2}}{\partial \boldsymbol{\phi_{2}}} & = \frac{\pi_{0}(\tilde{\boldsymbol{x}}(q_{2}|\boldsymbol {\eta_{1}, \eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}))}{\partial \boldsymbol{\phi_{2}}}\boldsymbol{N}_{2}(q_{2})+ \sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}}\phi_{2}^{l_{2}} \frac{\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}\vartheta(\tilde{\boldsymbol{x}}(w|\boldsymbol {\eta_{1}, \eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}))}{\partial \boldsymbol{\phi_{2}}}\boldsymbol{N}_{2}(w) {\rm{d}}w\\ &+\rho\bigg[\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}} \frac{P\left\{-\tilde{x}(w| \boldsymbol{\eta_{1}}, \boldsymbol{\eta_{2}}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}), \rho, q_{2}, \varepsilon\right\}}{\partial \boldsymbol{\phi_{2}}}\boldsymbol{N}_{2}(w) {\rm{d}}w\\ &+\sum\limits_{l_{2} = 1}^{q_{2}} \int_{l_{2}-1}^{l_{2}} \frac{\phi_{1}^{\lfloor \nu_{2}(w) \rfloor}}{q_{2}}\bigg\{ \frac{\phi_{2}^{l_{2}}\partial[\lambda_{1}(2P\left\{\eta_{1}^{\lfloor \nu_{2}(w) \rfloor}, \rho, q_{2}, \varepsilon\right\}-\eta_{1}^{\lfloor \nu_{2}(w) \rfloor})+\lambda_{2}(2P\left\{\eta_{2}^{l_{2}}, \rho, q_{2}, \varepsilon\right\}-\eta_{2}^{l_{2}})}{\partial \boldsymbol{\phi_{2}}}\bigg\} {\rm{d}}w\\ &-H_{r, \boldsymbol{f}, \boldsymbol{h}}^{+}+H_{r, \boldsymbol{f}, \boldsymbol{h}}^{-}\bigg]. \end{split} \end{eqnarray}$

where $H_{r, \lambda_{i}}^{+}$ and $H_{r, \lambda_{i}}^{-}$ are defined as $H_{r, \boldsymbol{f}, \boldsymbol{h}}^{+}$ and $H_{r, \boldsymbol{f}, \boldsymbol{h}}^{-}$ , respectively.

Remark 7. The gradient formulae of the cost function $J_{2}(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}}}, \rho, q_1, q_2, \varepsilon)$ with respect to the quadruple $(\boldsymbol {\eta_{1}, \eta_{2}, \boldsymbol{\phi_{1}}, \boldsymbol{\phi_{2}} })\in \mathbb{R}^{q_{1}}\times \mathbb{R}^{q_{1}}\times \mathbb{R}^{q_{2}}\times \mathbb{R}^{q_{2}}$ is calculated by the variational method (Theorems 5–8). Therefore, the gradient-based algorithm easily obtains the optimal solution for Problem E.

6. Numerical results

We consider a nonlinear maximum hands-off control problem with the cost functional defined by

$\begin{eqnarray} \underset{i}{\min}\, h_{0} = \int_{0}^{1}\left\{-6x_{1}^2(t)-12x_{2}(t)+3u_{1}(t)+u_{2}(t)\right\} {\rm{d}}t, \end{eqnarray}$

(6.1)

governed by the nonlinear dynamical system

$\begin{eqnarray} \begin{array}{cc} \left \{\begin{array}{l} \dot{x}_{1}(t) = u_{2}(t), \\ \dot{x}_{2}(t) = -{x}_{1}^2(t)+{u}_{1}(t), t\in [0, 1], \\ \boldsymbol{x} = (1, 0)^{\mathrm{T}}. \end{array} \right. & \\ \end{array} \end{eqnarray}$

(6.2)

with the maximum hands-off control constraint

$\begin{equation} A_1(u_1, u_2) = \sum\limits_{i = 1}^{2}\lambda_{i} \int_{0}^{1}|u_{i}(t)| {\rm{d}}t \leq \mu = 10, \end{equation}$

(6.3)

and the bound constraints of the control inputs

$\begin{equation} \label{exampleboundconstraint} -1\leq {u}_{j}(t)\leq 1, t\in [0, 1], j\in I_2. \nonumber \end{equation}$

Case 1: The optimal control problem governed by nonlinear dynamical system (6.2) without the maximum hands-off control constraint (6.3).

Case 2: The optimal control problem governed by nonlinear dynamical system (6.2) with the maximum hands-off control constraint (6.3).

For Cases 1 and 2, the number of segments for the two controls are, respectively, 10 and 20. For Cases 1 and 2, by using the proposed method we respectively obtain the optimal control strategies ${\mathbf u}^{1*}$ and ${\mathbf u}^{2*}$ plotted in Figures 1 and 2. Their corresponding optimal cost and sparsity levels are given in Table 1. From Figure 2, it is observed that for the first control, sparse control happened in the first and from the fourth to the tenth segments. For the second control, sparse control occurred in the second, the sixteenth, from the ninth to the thirteenth, and from the eighteenth to the twentieth segments.

Figure 1. The optimal control strategies

${\mathbf u}^{1*}$ without the maximum hands-off control constraint (6.3).

DownLoad: Full-Size Img PowerPoint

Figure 2. The sparse optimal control strategies

${\mathbf u}^{2*}$ with the maximum hands-off control constraint (6.3).

DownLoad: Full-Size Img PowerPoint

Table 1. Type, control strategies, optimal cost, and

$A_1(u_1, u_2)$ .

Type	Control strategies	Optimal cost	$A_1(u_1, u_2)$
Case 1	${\mathbf u}^{1*}$	$h_0({\mathbf u}^{1*})=0.3392$	$A_1({\mathbf u}^{1*})$ = 21.2052
Case 2	${\mathbf u}^{2*}$	$h_0({\mathbf u}^{2*})=0.3737$	$A_1({\mathbf u}^{2*})$ = 7.303

| Show Table

DownLoad: CSV

The graphical representation in Figure 1 illustrates that the optimal control strategies demonstrate a considerable density. In contrast, Figure 2 portrays these strategies with a noticeable level of sparsity. Nevertheless, it is essential to highlight that the inclusion of the maximum hands-off control constraint (6.3) results in a marginal increase in the cost function value compared to the scenario without this constraint (6.3). Specifically, the cost function slightly exceeds its counterpart when the maximum hands-off control constraint is imposed.

The observed trend in Figure 2 reveals a noteworthy pattern in the sparsity of the sparse optimal control strategies. It becomes evident that there is a rapid initial increase in sparsity, accompanied by a relatively minor escalation in cost. Therefore, based on the findings presented in this paper, it can be inferred that the proposed method exhibits the capability to generate solutions of superior quality.

Furthermore, from , it is pertinent to note that the value of the cost function, specifically $h_0({\mathbf u}^{2*})$ , slightly exceeds that of $h_0({\mathbf u}^{1*}).$ The smaller the value of $A_1({\mathbf u}^{*})$ , the sparser ${\mathbf u}^{*}$ becomes. From , it follows that ${\mathbf u}^{2*}$ is sparser than ${\mathbf u}^{1*}$ . So, we can conclude that this proposed method achieves a balance between system performance and sparsity, thereby suggesting its efficacy in providing solutions that optimize both aspects effectively.

7. Conclusions

Solving the sparse optimal control problem within the framework of linear dynamical systems often allows for an analytical solution. However, when addressing the maximum hands-off control problem within the domain of nonlinear dynamical systems, obtaining an analytical solution proves to be considerably challenging. This paper necessitates the numerical solution of the sparse optimal control problem which is governed by nonlinear dynamical systems. We employ control parameterization in conjunction with the sequential adaptive switching time optimization technique to approximate the maximum hands-off control problem by a sequence of finite-dimensional optimization problem. This approach allows for varying control switching times without imposing uniformity and offers flexibility in selecting control components. The resolution of the sparse optimal control problem relies on a gradient-based algorithm. To showcase the efficacy of the proposed methodology, an illustrative example is provided.

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgements

This work was supported in part by the National Key Research and Development Program of China under Grant 2022YFB3304600; in part by the National Natural Science Foundation of China under Grants 11901075, 12271307 and 12161076; in part by the China Postdoctoral Science Foundation under Grant 2019M661073; in part by the Fundamental Research Funds for the Central Universities under Grants 3132022201, 3132023206, 3132023535 and DUT22LAB305; in part by the Guandong Province Natural Science Foundation of China under Grant 2022A1515011761; in part by Chongqing Natural Science Foundation Innovation and Development Joint Fund (CSTB2022NSCQ-LZX0040); in part by Ministry of Higher Education (MoHE) Malaysia through the Foundation Research Grant Scheme (FRGS/1/2021/STG06/SYUC/03/1); in part by the Shandong Province Natural Science Foundation of China under Grant ZR2023MA054; in part by the Fundamental Scientific Research Projects of Higher Education Institutions of Liaoning Provincial Department of Education under JYTMS20230165 (General Project); and in part by the Xinghai Project of Dalian Maritime University.

Conflict of interest

The authors declare there is no conflict of interest.

References

[1]	T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Optuna: A next-generation hyperparameter optimization framework, In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019, 2623–2631. http://dx.doi.org/10.1145/3292500.3330701
[2]	B. Baker, O. Gupta, R. Raskar, N. Naik, Accelerating neural architecture search using performance prediction, 2017, arXiv: 1705.10823.
[3]	B. Baker, O. Gupta, N. Naik, R. Raskar, Designing neural network architectures using reinforcement learning, 2017, arXiv: 1611.02167.
[4]	J. F. Barrett, N. Keat, Artifacts in CT: recognition and avoidance, RadioGraphics, 24 (2004), 1679–1691. http://dx.doi.org/10.1148/rg.246045065 doi: 10.1148/rg.246045065
[5]	J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper-parameter optimization, In: Advances in Neural Information Processing Systems, 2011, 2546–2554.
[6]	J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, J. Mach. Learn. Res., 13 (2012), 281–305.
[7]	L. Bottou, F. E. Curtis, J. Nocedal, Optimization methods for large-scale machine learning, SIAM Rev., 60 (2018), 223–311. http://dx.doi.org/10.1137/16M1080173 doi: 10.1137/16M1080173
[8]	L. Breiman, Random forests, Machine Learning, 45 (2001), 5–32. http://dx.doi.org/10.1023/A:1010933404324 doi: 10.1023/A:1010933404324
[9]	C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2 (1998), 121–167. http://dx.doi.org/10.1023/A:1009715923555 doi: 10.1023/A:1009715923555
[10]	H. Cai, T. Chen, W. Zhang, Y. Yu, J. Wang, Efficient architecture search by network transformation, 2017, arXiv/1707.04873.
[11]	T. Domhan, J. T. Springenberg, F. Hutter, Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves, In: IJCAI International Joint Conference on Artificial Intelligence, 2015, 3460–3468.
[12]	T. Elsken, J. H. Metzen, F. Hutter, Neural architecture search: a survey, J. Mach. Learn. Res., 20 (2019), 1997–2017.
[13]	T. Elsken, J.-H. Metzen, F. Hutter, Simple and efficient architecture search for convolutional neural networks, 2017, arXiv: 1711.04528.
[14]	G. Franchini, M. Galinier, M. Verucchi, Mise en abyme with artificial intelligence: how to predict the accuracy of NN, applied to hyper-parameter tuning, In: INNSBDDL 2019: Recent advances in big data and deep learning, Cham: Springer, 2020,286–295. http://dx.doi.org/10.1007/978-3-030-16841-4_30
[15]	D. E. Goldberg, Genetic algorithms in search, optimization, and machine learning, Addison Wesley Publishing Co. Inc., 1989.
[16]	T. Hospedales, A. Antoniou, P. Micaelli, A. Storkey, Meta-learning in neural networks: a survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, in press. http://dx.doi.org/10.1109/TPAMI.2021.3079209
[17]	F. Hutter, L. Kotthoff, J. Vanschoren, Automatic machine learning: methods, systems, challenges, Cham: Springer, 2019. http://dx.doi.org/10.1007/978-3-030-05318-5
[18]	F. Hutter, H. Hoos, K. Leyton-Brown, Sequential model-based optimization for general algorithm configuration, In: LION 2011: Learning and Intelligent Optimization, Berlin, Heidelberg: Springer, 2011,507–523. http://dx.doi.org/10.1007/978-3-642-25566-3_40
[19]	D. P. Kingma, J. Ba, Adam: a method for stochastic optimization, 2017, arXiv: 1412.6980.
[20]	N. Loizou, P. Richtarik, Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods, Comput. Optim. Appl., 77 (2020), 653–710. http://dx.doi.org/10.1007/s10589-020-00220-z doi: 10.1007/s10589-020-00220-z
[21]	J. Mockus, V. Tiesis, A. Zilinskas, The application of Bayesian methods for seeking the extremum, In: Towards global optimisation, North-Holand, 2012,117–129.
[22]	B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, N. De Freitas, Taking the human out of the loop: A review of bayesian optimization, Proc. IEEE, 104 (2016), 148–175. http://dx.doi.org/10.1109/JPROC.2015.2494218 doi: 10.1109/JPROC.2015.2494218
[23]	S. Thrun, L. Pratt, Learning to learn: introduction and overview, In: Learning to learn, Boston, MA: Springer, 1998, 3–17. http://dx.doi.org/10.1007/978-1-4615-5529-2_1
[24]	C. Ying, A. Klein, E. Real, E. Christiansen, K. Murphy, F. Hutter, NAS-Bench-101: Towards reproducible neural architecture search, In: Proceedings of the 36–th International Conference on Machine Learning, 2019, 7105–7114.
[25]	Z. Zhong, J. Yan, W. Wei, J. Shao, C.-L. Liu, Practical block-wise neural network architecture generation, In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 2423–2432. http://dx.doi.org/10.1109/CVPR.2018.00257
[26]	B. Zoph, Q. V. Le, Neural architecture search with reinforcemente learning, 2017, arXiv: 1611.01578.

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)