Processing math: 100%
Research article Special Issues

Neural architecture search via standard machine learning methodologies

  • Received: 24 March 2021 Revised: 19 October 2021 Accepted: 13 January 2022 Published: 11 February 2022
  • In the context of deep learning, the more expensive computational phase is the full training of the learning methodology. Indeed, its effectiveness depends on the choice of proper values for the so-called hyperparameters, namely the parameters that are not trained during the learning process, and such a selection typically requires an extensive numerical investigation with the execution of a significant number of experimental trials. The aim of the paper is to investigate how to choose the hyperparameters related to both the architecture of a Convolutional Neural Network (CNN), such as the number of filters and the kernel size at each convolutional layer, and the optimisation algorithm employed to train the CNN itself, such as the steplength, the mini-batch size and the potential adoption of variance reduction techniques. The main contribution of the paper consists in introducing an automatic Machine Learning technique to set these hyperparameters in such a way that a measure of the CNN performance can be optimised. In particular, given a set of values for the hyperparameters, we propose a low-cost strategy to predict the performance of the corresponding CNN, based on its behavior after only few steps of the training process. To achieve this goal, we generate a dataset whose input samples are provided by a limited number of hyperparameter configurations together with the corresponding CNN measures of performance obtained with only few steps of the CNN training process, while the label of each input sample is the performance corresponding to a complete training of the CNN. Such dataset is used as training set for a Support Vector Machines for Regression and/or Random Forest techniques to predict the performance of the considered learning methodology, given its performance at the initial iterations of its learning process. Furthermore, by a probabilistic exploration of the hyperparameter space, we are able to find, at a quite low cost, the setting of a CNN hyperparameters which provides the optimal performance. The results of an extensive numerical experimentation, carried out on CNNs, together with the use of our performance predictor with NAS-Bench-101, highlight how the proposed methodology for the hyperparameter setting appears very promising.

    Citation: Giorgia Franchini, Valeria Ruggiero, Federica Porta, Luca Zanni. Neural architecture search via standard machine learning methodologies[J]. Mathematics in Engineering, 2023, 5(1): 1-21. doi: 10.3934/mine.2023012

    Related Papers:

    [1] Luca Franzoi, Riccardo Montalto . Time almost-periodic solutions of the incompressible Euler equations. Mathematics in Engineering, 2024, 6(3): 394-406. doi: 10.3934/mine.2024016
    [2] Lars Eric Hientzsch . On the low Mach number limit for 2D Navier–Stokes–Korteweg systems. Mathematics in Engineering, 2023, 5(2): 1-26. doi: 10.3934/mine.2023023
    [3] Piermarco Cannarsa, Rossana Capuani, Pierre Cardaliaguet . C1;1-smoothness of constrained solutions in the calculus of variations with application to mean field games. Mathematics in Engineering, 2019, 1(1): 174-203. doi: 10.3934/Mine.2018.1.174
    [4] Mirco Piccinini . A limiting case in partial regularity for quasiconvex functionals. Mathematics in Engineering, 2024, 6(1): 1-27. doi: 10.3934/mine.2024001
    [5] Youchan Kim, Seungjin Ryu, Pilsoo Shin . Approximation of elliptic and parabolic equations with Dirichlet boundary conditions. Mathematics in Engineering, 2023, 5(4): 1-43. doi: 10.3934/mine.2023079
    [6] Alessandra De Luca, Veronica Felli . Unique continuation from the edge of a crack. Mathematics in Engineering, 2021, 3(3): 1-40. doi: 10.3934/mine.2021023
    [7] Yuzhe Zhu . Propagation of smallness for solutions of elliptic equations in the plane. Mathematics in Engineering, 2025, 7(1): 1-12. doi: 10.3934/mine.2025001
    [8] Massimiliano Giona, Luigi Pucci . Hyperbolic heat/mass transport and stochastic modelling - Three simple problems. Mathematics in Engineering, 2019, 1(2): 224-251. doi: 10.3934/mine.2019.2.224
    [9] Sudarshan Tiwari, Axel Klar, Giovanni Russo . Interaction of rigid body motion and rarefied gas dynamics based on the BGK model. Mathematics in Engineering, 2020, 2(2): 203-229. doi: 10.3934/mine.2020010
    [10] Edgard A. Pimentel, Miguel Walker . Potential estimates for fully nonlinear elliptic equations with bounded ingredients. Mathematics in Engineering, 2023, 5(3): 1-16. doi: 10.3934/mine.2023063
  • In the context of deep learning, the more expensive computational phase is the full training of the learning methodology. Indeed, its effectiveness depends on the choice of proper values for the so-called hyperparameters, namely the parameters that are not trained during the learning process, and such a selection typically requires an extensive numerical investigation with the execution of a significant number of experimental trials. The aim of the paper is to investigate how to choose the hyperparameters related to both the architecture of a Convolutional Neural Network (CNN), such as the number of filters and the kernel size at each convolutional layer, and the optimisation algorithm employed to train the CNN itself, such as the steplength, the mini-batch size and the potential adoption of variance reduction techniques. The main contribution of the paper consists in introducing an automatic Machine Learning technique to set these hyperparameters in such a way that a measure of the CNN performance can be optimised. In particular, given a set of values for the hyperparameters, we propose a low-cost strategy to predict the performance of the corresponding CNN, based on its behavior after only few steps of the training process. To achieve this goal, we generate a dataset whose input samples are provided by a limited number of hyperparameter configurations together with the corresponding CNN measures of performance obtained with only few steps of the CNN training process, while the label of each input sample is the performance corresponding to a complete training of the CNN. Such dataset is used as training set for a Support Vector Machines for Regression and/or Random Forest techniques to predict the performance of the considered learning methodology, given its performance at the initial iterations of its learning process. Furthermore, by a probabilistic exploration of the hyperparameter space, we are able to find, at a quite low cost, the setting of a CNN hyperparameters which provides the optimal performance. The results of an extensive numerical experimentation, carried out on CNNs, together with the use of our performance predictor with NAS-Bench-101, highlight how the proposed methodology for the hyperparameter setting appears very promising.



    Within numerous control systems, various measures are implemented to mitigate costs or curtail energy consumption. For some special cases, specific control values are deliberately maintained at zero for designated durations using handoff control strategies, thereby effectively minimizing the associated control costs. In [1], a novel control methodology was introduced to solve the maximum hands-off control problem characterized by L0 norm, which is to find the sparsest control among all permissible controls. This control methodology holds significant utility in minimizing electricity or fuel consumption, showcasing its practical relevance and applicability. In [1], within the car's start-stop system, the engine is automatically deactivated when the vehicle comes to a halt or operates at speeds below a predetermined threshold. This deliberate shutdown serves to diminish CO2 emissions and decrease fuel consumption as part of the system's efficiency measures. Handoff control demonstrates effectiveness within networked control systems as well [2].

    Sparse optimization has emerged as a prominent subject of interest among scholars in recent years. The technology of sparse optimization has garnered extensive attention and exploration across various domains, including compressed sensing [3], image and signal processing [4,5], machine learning [6,7], Alzheimer's disease biomarkers [8], and other fields. Maximum hands-off control shares a relationship with sparsity, a characteristic pertinent to numerous optimization problems. In the realm of sparse optimization, sparsity refers to a condition where a substantial portion of elements within a matrix or vector assumes zero values. Leveraging sparsity offers the potential to economize on both time and compression expenses. Moreover, it serves as a mechanism to sift through extensive datasets, extracting pertinent information and thereby streamlining the complexity of the problem [9]. Within various optimization problems, the L0 norm of a vector commonly serves as a means to quantify and define sparsity [10,11]. Due to its non-convex and non-continuous nature, the L0 norm presents inherent complexity in analysis. Consequently, numerous scholars resort to exploring the variational property of the L0 norm and employing subdifferentiation techniques to address its intricacies. In [12], a technique known as convex relaxation was introduced as a substitute for the L0 norm, employing the more tractable L1 norm. This transformation enables the formulation of the problem in a linear programming paradigm, amenable to resolution using methods such as the interior point method or the simplex method. In [13,14], the approach revolves around the non-convex relaxation method, leveraging the Lp relaxation technique, where 0p1, to yield solutions with increased sparsity. In [15,16], a focus is directed toward the Difference of Convex functions (DC) algorithm. This method entails the representation of the L0 norm as a difference between two convex functions. Subsequently, leveraging the DC algorithm facilitates the resolution of the resultant relaxation model. In [17], by addressing the non-convex nature of the logarithmic function, a proposal was made to substitute the logarithm function for the L0 norm. In [18], the resolution involves addressing a sparse optimization problem that encapsulates two competing objectives: measurement error minimization and sparsity maximization. This problem is approached and solved through the utilization of a multi-objective evolutionary algorithm. In [19], an effective fault diagnosis framework was constructed by integrating L0 norm sparse constraint optimization with principal component analysis, aimed at mitigating the extent of sparsity. However, in [12,13,14,15,16,17,18,19], the sparse optimization problems do not take into account the constraints of the dynamical system. The focus of the study presented in [20] revolves around the L1 objective optimal control problem specifically tailored for linear discrete systems. This formulation leads to diverse sparse solutions based on the selection of distinct problem parameters. The study detailed in [21] introduces a methodical approach for synthesizing sparse optimal control structures governed by linear discrete systems. This method guarantees sparsity within solutions, enabling the specification of a predetermined count of zero elements within the control structure. In [22], the utilization of the Smith iteration and Altering-Direction-Implicit (ADI) iteration methods was explored for obtaining numerical solutions in a large sparse optimal control problem governed by linear discrete-time Riccati equations, leveraging Newton's method. In [23], provably optimal sparse solutions to overdetermined linear systems with non-negativity constraints were researched in a least-squares sense by implicit enumeration. Nevertheless, in [20,21,22,23], the sparse optimal control problem governed by the linear dynamical system failed to incorporate the constraints associated with the nonlinear dynamical system. The work presented in [24] delved into the examination of the Newton-like method of undetermined equations. This exploration led to the discovery of sparse solutions pertaining to sparse optimal control problems governed by an affine nonlinear system. The analysis of optimal control using the value function was conducted employing the dynamic programming method. In [25], a dynamic programming approach was introduced to effectively approximate the sparse optimal control problem governed by an affine nonlinear system in a numerical context. The maximum hands off control problem governed by the class of affine nonlinear systems was studied in [26]. Nonetheless, in [24,25,26], the nonlinear dynamical systems involved in the sparse optimal control problem are affine nonlinear systems, where the state variables and control inputs are separable. This paper delves into the investigation of the sparse optimal control problem governed by a general nonlinear dynamical system where the state variables and control inputs are inseparable.

    The conventional approach to optimal control involves determining the most effective control strategy within specified constraints [27]. This strategy aims to either maximize or minimize the performance index. Typically, deriving an analytical solution for the optimal control problem governed by a nonlinear dynamic system proves challenging. Therefore, the resolution of such problems often necessitates numerical methods to obtain an effective solution [28]. The numerical solutions to optimal control problems can be divided into two solution strategies: first optimize then discretize (indirect methods), and first discretize then optimize (direct methods) [29]. For problems that do not contain inequality constraints, indirect methods derive the first-order necessary conditions for the optimal control problem, namely, the Euler-Lagrange equations that include the initial and boundary value conditions. The solution to this initial-boundary value problem mainly involves two types of methods: the control vector iteration method and methods such as multiple shooting and collocation [30]. The direct method encompasses techniques such as the direct collocation method and control parameterization [31,32]. Control parameterization involves discretizing solely the control function, approximating it using fixed basis functions within specific subintervals [33]. The coefficients within this linear combination of basis functions serve as the optimal decision variables. Typically, predetermined switching times govern the transitions between values for each control component, often evenly divided within specified intervals [34]. To enhance solution accuracy, the time range of the control function is frequently subdivided more finely, leading to a greater number of decision variables. However, this denser division amplifies computational costs. Simultaneously, to mitigate the need for extensive time range subdivisions, one must consider incorporating the switching times as additional decision variables [35]. The traditional time-scaling transformation operates by mapping variable switching times with fixed points within a redefined time horizon. This process yields an optimization problem wherein the revised switching times remain constant. The application of the time-scaling transformation finds extensive utility across diverse domains, including mixed integer programming [36] and singular optimal control [37]. Yet, in practical scenarios, simultaneous switching of time poses notable challenges [38]. The sequential adaptive switching time optimization technique (SASTOT) presented by [39], suggests that the interval for control switching times can vary, affording the flexibility to freely select the number of segments for each approximate control component. The method initiates by applying control parameterization and the time-scaling transformation to a single control component initially, leaving the remaining components untouched. Subsequently, within the newly introduced time range, the time-scaling transformation introduces a subsequent new time range. This iterative process continues until the final control component undergoes processing in a similar manner. This modification introduces a significant level of flexibility into the control strategy. Compared with the traditional time-scale transformation techniques, the SASTOT can accurately identify sparse positions for each control.

    In this study, our focus lies on the maximum hands-off control problem governed by a nonlinear dynamical system (MHCPNDS) with a maximum hands-off control constraint characterized by the L0 norm. The optimization variables encompass the count of segments for each control component and the respective switching times for each of these components. The examination of the MHCPNDS poses challenges in obtaining an analytical solution. Hence, the adoption of a numerical solution becomes imperative. In practical scenarios, simultaneous switching of all control components is not optimal. Addressing this, this paper introduces the SASTOT, which allows for a flexible selection of the number of segments and switching times for each individual control element. The integration of control parameterization with the SASTOT serves as a transformative approach to addressing the MHCPNDS. The non-smooth term that involves the maximum operator is approximated through a smoothing function, and we further illustrate the convergence of this approximation technique. The attainment of a sparse solution for the MHCPNDS involves the utilization of a gradient-based algorithm. Empirical assessments through numerical experiments substantiate the efficacy of the algorithm put forth in the study.

    The contributions of the paper are three-fold:

    1) Distinguished from the above-mentioned sparse optimization problem, which is amenable to analytical solutions, we consider the sparse optimal control problem within a nonlinear dynamic system. Given the intrinsic complexity of the nonlinear dynamic system, pinpointing an analytical solution poses a significant challenge. Taking these into consideration, we propose a numerical solution method, based on the gradient formulae of the cost function (Theorems 5–8) and the gradient-based algorithm, without requiring the linearization of the nonlinear dynamical systems.

    2) A smoothing function has been introduced to mitigate the roughness of the non-smooth term involving the maximum operator. Several significant theorems (Theorems 1–4) have been established, illustrating that the smoothing function effectively addresses the shortcomings associated with constraint qualification noncompliance.

    3) Setting itself apart from the time-scaling transformation technique, this paper introduces the SASTOT method that enables flexible determination of the number of segments and the specific switching times for each control element individually.

    The rest of the paper is organized as follows. In Section 2, we present the MHCPNDS. In Section 3, the MHCPNDS is transformed by using control parameterization and the SASTOT. In Section 4, we deal with the nonsmoothness of the objective function by using the smoothing technique. In Section 5, the gradient-based algorithm is used to solve the resulting smooth problem. In Section 6, numerical results are presented. In Section 7, we draw some conclusions and suggest some future research directions.

    Let In be the set of {1,2,...,n}. For a continuous time control v:[0,T]R, the Lp norm is defined by

    vp(T0|v|pdt)1p,p[1,+). (2.1)

    The L0 norm is defined by

    v0qM(supp(v)), (2.2)

    where qM is the Lebesgue measure and the set supp(v) is called a support set of v defined by

    supp(v)={t[0,T]:v(t)0}. (2.3)

    In some cases, it is possible to significantly reduce the control effort by keeping the control value at exactly zero over a time interval. Maximum hands-off control can be called sparse control, which means that the more zeros the control value is in unit time, the more sparse it is. It makes the time interval of the control completely zero.

    In order to obtain the sparse solution of maximum hands-off control, we consider a nonlinear dynamical system defined by

    {dx(t)dt=f(t,x(t),u(t)),t[0,T],x(0)=ξ,x(T)=0, (2.4)

    where x(t):=(x1(t),...,xn(t))TRn denotes the state vector at time t, u(t):=(u1(t),...,um(t))TRm denotes control input vector at time t, ξ denotes the initial state value, T0 denotes a given terminal time, and f(t,x(t),u(t)) denotes a nonlinear function vector defined on Rn.

    Let x(|u) denote the solution of system (2.4) satisfying

    x(t|u)0,t[0,T]. (2.5)

    For all t on [0,T], there is also the following constraint on each component of the control input vector u(t)

    maxiIm|ui(t)|1. (2.6)

    A control input vector u(t)Rm that satisfies constraint (2.6) is called a candidate control input vector. Let L[T,ξ] be the set consisting of all candidate control input vectors.

    Definition 1. The maximum hands-off control constraint characterized by the L0 norm is defined by

    A0(u)1Tmi=1λiui0μ, (2.7)

    where λi0,iIm, are given weights, uL[T,ξ] denotes the admissible control, and μ is a small positive number.

    Definition 1 is used to characterize the sparsity of maximum hands-off control. The cost functional is defined by

    J(u)=π0(x(T|u))+T0ϑ(x(t|u))dt, (2.8)

    where π0:RmR and ϑ:RmR are continuously differentiable functions.

    Then, our maximum hands-off control problem can be formulated as follows.

    Problem A: Given system (2.4), choose an admissible control uL[T,ξ] to minimize the cost functional defined in (2.8) subject to boundary condition (2.6) and maximum hands-off control constraint (2.7).

    The L0 norm, owing to its non-convex and discontinuous characteristics, inherently introduces complexity into analytical procedures. The L0 norm can be well approximated by the L1 norm [20]. The maximum hands-off control constraint characterized by the L0 norm is approximated by Definition 2.

    Definition 2. The maximum hands-off control constraint characterized by the L1 norm is defined by

    A1(u)1Tmi=1λiui1=1Tmi=1λiT0|ui|dtμ, (3.1)

    where λi0,iIm, are given weights, uL[T,ξ] denotes the admissible control, and μ is a small positive number.

    Definition 2 serves to delineate the computationally efficient sparsity of maximum hands-off control. By Definition 2, Problem A can be well approximated by the following Problem B.

    Problem B: Given system (2.4), choose the control uL[T,ξ] to minimize the objective functional defined in (2.8) subject to boundary condition (2.6) and maximum hands-off control constraint (3.1).

    Control parameterization stands as an effective methodology for resolving the optimal control problem, commonly approached through approximation via piecewise constant functions [29,40,41,42]. Increasing the level of detail in the time level partition leads to greater accuracy as it allows for a more intricate and nuanced representation of the time-dependent processes or phenomena under consideration [43]. Employing control parameterization enables the derivation of a finite-dimensional approximation for Problem B. Ultimately, the gradient-based algorithm is employed to resolve the resultant approximation problem.

    The conventional time-scaling transformation necessitates simultaneous switching of all control components, a condition challenging to attain in practical applications [44]. Hence, the introduction of SASTOT emerges as a solution, combining both time-scaling transformation and control parameterization methodologies [39]. Within this methodology, each control component possesses the capability to adaptively adjust and independently select its switching time, allowing for diverse switching points across components. Empirical validation showcases a notable reduction in computational complexity alongside an improvement in the accuracy of calculations. For a clearer exposition of this approach, this article examines two distinct control inputs. Let ˜u(t)=[˜u1(t),˜u2(t)]T. Then, the dynamical system is

    ˜f(t,˜x(t),˜u1(t),˜u2(t))=˜f(t,˜x(t),˜u(t)). (3.2)

    The interval [0,T] is divided into q1 subintervals [δl111,δl11], l1Iq1, where σ1:=[δ11,δ21,,δq11]T are variable switching time vector, and

    0=δ11δ21 δq11=T. (3.3)

    The control component ˜u1(t) can be approximated by

    ˜u1(t)˜uq11(t|δ1,σ1)=q1l1=1ηl11χ[δl111,δl11)(t), t[0,T],l1Iq1, (3.4)

    where ηl11 is the piecewise constant function value on the l1th subinterval of the control component u1(t) satisfying

    a1ηl11b1,l1Iq1, (3.5)

    and χ[δl111,δl11)(t) is an indicator function defined by

    χ[δl111,δl11)(t)={1,ift[δl111,δl11)),0,ift[δl111,δl11)). (3.6)

    Then, we obtain the following system:

    {d˜x(t)dt=q1l1=1˜f(t,˜x(t),ηl11,˜u2(t)), t[0,T],˜x(0)=ξ,˜x(T)=0. (3.7)

    Let ˜x(|η1,u2) denote the solution of system (3.7).

    After applying the time-scaling transformation [45] to ˜u1(t) and mapping the variable switching times {δ01,δ11,,δq11} to a fixed number of switching times {0,1,,q1} in the new time horizon, we define the vector ϕ1:=[ϕ11,,ϕq11]TRq1, where ϕl11=δl11δl111ρ, ρ is an extremely small positive number, l1Iq1, and ϕ11+ϕ21++ϕq11=T.

    Then, we introduce a new time variable p and define a time scaling function ν1(p,ϕ1)

    t(p)ν1(p,ϕ1)=pl1=1ϕl11+ϕ(p+1)1(pp), p[0,q1], (3.8)

    where θ(p)=˜u2(ν1(p,ϕ1))=˜u2(t) and p is a floor function of the time variable p. With the new time variable p, the dynamical system is redefined on the subinterval [l11,l1), l1Iq1,

    {dˆx(p)dp=ϕl11ˆf(p,ˆx(p),ηl11,θ(p)),ˆx(0)=ξ,ˆx(T)=0. (3.9)

    Let ˆx(|η1,θ) denote the solution of system (3.9).

    The interval [0,q1] is divided into q2 subintervals [pl212,pl22], l1Iq2, where σ2=[p02,p12,,pq22]T is the variable switching time vector, and

    0=p02p12 pq22=T. (3.10)

    The control component ˜u2(t) can be approximated by

    ˜u2(t)θ(p):=q2l2=1ηl22χ[pl212,pl22)(p),p[0,q1], (3.11)

    where ηl22 is the piecewise constant function value on the l2th subinterval of the control component θ(p) satisfying

    a2ηl22b2,l2Iq2. (3.12)

    With the new time variable p, system (3.9) is redefined on the subinterval [pl212,pl22), l2Iq2,

    {dˇx(p)dp=ϕp1ˇf(p,ˇx(p),ηp1,ηl22),ˇx(0)=ξ,ˇx(0)=0. (3.13)

    After applying a time-scaling transformation to ˇu2(t) and mapping the variable switching times {δ02,δ12,,δq22} to a fixed number of switching times {0,1,,q2} in the new time horizon, we define the vector ϕ2:=[ϕ12,,ϕq22]TRq2, where ϕl22=δl22δl212ρ, ρ is an extremely small positive number, l2Iq2, and ϕ12+ϕ22++ϕq22=T.

    Then, we introduce a new time variable w and define a time scaling function ν2(w,ϕ2)

    p(w)ν2(w,ϕ2)=wl2=1ϕl22+ϕ(w+1)2(ww), w[0,q2], (3.14)

    where w is the floor function of the time variable w. System (3.13) is redefined on the subinterval [l21,l2),l2Iq2,

    {dˇx(w)dw=ϕl22ϕν2(w)1ˇf(w,ˇx(w),ην2(w)1,ηl22),ˇx(0)=ξ,ˇx(0)=0. (3.15)

    Let ˇx(|η1,η2,ϕ1,ϕ2) be the solution of system (3.15). Then the continuous state constraint (2.5) becomes

    ˇx(w|η1,η2,ϕ1,ϕ2)0,w[l21,l2), l2Iq2. (3.16)

    The cost function (2.8) and the maximum hands-off control constraint (3.1) become

    ¯J(η1,η2,ϕ1,ϕ2)=π0(˜x(q2|η1,η2,ϕ1,ϕ2))+q2l2=1l2l21ϕl22ϕν2(w)1ϑ(˜x(w|η1,η2,ϕ1,ϕ2))dw, (3.17)
    A2(η1,η2,ϕ1,ϕ2)=1q2q2l2=1l2l21ϕl22ϕν2(w)1[λ1|ην2(w)1|+λ2|ηl22|]dwμ. (3.18)

    With these in mind, Problem B can be approximated by the following Problem C.

    Problem C: Given system (3.15), choose the quadruple (η1,η2,ϕ1,ϕ2)Rq1×Rq1×Rq2×Rq2 to minimize the objective function defined by (3.17) subject to continuous inequality constraints (3.16), boundary conditions (3.5) and (3.12), and maximum hands-off control constraint (3.18).

    Because of |ην2(w)1|=2max{ην2(w)1,0}ην2(w)1,|ηl22|=2max{ηl22,0}ηl22, the maximum hands-off control constraint (3.18) can be equivalently transformed as A3(η1,η2,ϕ1,ϕ2) defined in (3.19).

    A3(η1,η2,ϕ1,ϕ2)=1q2q2l2=1l2l21ϕl22ϕν2(w)1[λ1(2max{ην2(w)1,0}ην2(w)1)+λ2(2max{ηl22,0}ηl22)]dwμ (3.19)

    Define

    J1(η1,η2,ϕ1,ϕ2,ρ)=¯J(η1,η2,ϕ1,ϕ2)+ρH(η1,η2,ϕ1,ϕ2), (3.20)

    where ρ is the penalty parameter and

    H(η1,η2,ϕ1,ϕ2)=q2l2=1l2l21max{˜x(w|η1,η2,ϕ1,ϕ2),0}dw+q1l1=1[max{a1ηl11,0}+max{ηl11b1,0}]+q2l2=1[max{a2ηl21,0}+max{ηl22b2,0}]+A3(η1,η2,ϕ1,ϕ2).

    Thus, Problem C can be equivalently transformed into Problem D.

    Problem D: Given system (3.15), choose the quadruple (η1,η2,ϕ1,ϕ2)Rq1×Rq1×Rq2×Rq2 to minimize the objective function defined in (3.20).

    Remark 1. According to [29], it has been established that when the penalty parameter ρ exceeds its threshold value ρ, the solution derived for Problem D represents an exact solution for Problem C.

    The fundamental concept underlying the smooth technique involves the approximation of the non-smooth maximum operator through the utilization of a smoothing function [46], which is defined by

    P{G,ρ,q,ε}={0,ifG<ερq,ρq2εG2+G+ε2ρq,ifερqG<0,G+ε2ρq,ifG0, (4.1)

    where ρ is a penalty factor, q is the number of continuous inequality constraints, and ε>0 is the smoothing parameter.

    Theorem 1. ([46]) For ε>0, we have

    0P{G,ρ,q,ε}max{G,0}ε2ρq. (4.2)

    By the smoothing process, the maximum hands-off control constraint A3(η1,η2,ϕ1,ϕ2) defined in (3.19) can be approximated by

    ˜A3(η1,η2,ϕ1,ϕ2,ρ,q2,ε)=1q2q2l2=1l2l21ϕl22ϕν2(w)1[λ1(2P{ην2(w)1,ρ,q2,ε}ην2(w)1)+λ2(2P{ηl22,ρ,q2,ε}ηl22)]dw. (4.3)

    Based on the smoothing function (4.1), the cost function (3.20) can be approximated by

    J2(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε)=¯J(η1,η2,ϕ1,ϕ2)+ρ˜H(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε), (4.4)

    where

    ˜H(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε)=q2l2=1l2l21P{˜x(w|η1,η2,ϕ1,ϕ2),ρ,q2,ε}dw+˜A3(η1,η2,ϕ1,ϕ2,ρ,q2,ε)+q2l2=1[P{a2ηl22,ρ,q2,ε}+P{ηl22b2,ρ,q2,ε}]+q1l1=1[P{a1ηl11,ρ,q1,ε}+P{ηl11b1,ρ,q1,ε}]. (4.5)

    Based on the smoothing function (3.20), Problem D can be approximated by Problem E.

    Problem E: Given system (3.15), choose the quadruple (η1,η2,ϕ1,ϕ2)Rq1×Rq1×Rq2×Rq2 to minimize the cost function defined in (4.4).

    The application of smoothing techniques may introduce discrepancies between Problems D and E. This section entails the derivation of errors between Problems D and E concerning the smoothing function (3.20). It has been demonstrated that for a sufficiently small smoothing parameter ε, the solution to Problem D can be acquired by sequentially solving a series of Problem E while incrementing the values of the penalty factor ρ.

    Theorem 2. If ρ>0, q>0, and ε>0, then

    0J2(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε)J1(η1,η2,ϕ1,ϕ2,ρ)5ε2+(λ1+λ2)T2εq2. (4.6)

    Proof. Based on Theorem 1, we have for ρ>0, q1, q2>0, and ε>0,

    0P{G,ρ,q,ε}max{G,0}ε2ρq.

    Then,

    0J2(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε)J1(η1,η2,ϕ1,ϕ2,ρ)=ρ{q2l2=1l2l21(P{˜x(w|η1,η2,ϕ1,ϕ2),ρ,q2,ε}max{˜x(w|η1,η2,ϕ1,ϕ2),0})dw+q1l1=1[(P{ηl11a1,ρ,q1,ε}max{ηl11a1,0})+(P{b1ηl11,ρ,q1,ε}max{b1ηl11,0})]+q2l2=1[(P{ηl22a2,ρ,q2,ε}max{ηl21a2,0})+(P{b2ηl22,ρ,q2,ε}max{b2ηl22,0})]+1q2q2l2=1l2l21ϕl22ϕν2(w)1[2λ1(P{ην2(w)1,ρ,q2,ε}max{ην2(w)1,0})+2λ2(P{ηl22,ρ,q2,ε}max{ηl22,0})]dw}ρ[5ε2ρ+2(λ1+λ2)T2ε2ρq2]=5ε2+(λ1+λ2)T2εq2,

    which completes the proof.

    Theorem 3. Let (η1,η2,ϕ1,ϕ2) be the solution of Problem D, and let (η1,η2,ϕ1,ϕ2) be the solution of Problem E. Then,

    0J2(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε)J1(η1,η2,ϕ1,ϕ2,ρ)5ε2+(λ1+λ2)T2εq2.

    Proof. Based on Theorem 2, we have

    0J2(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε)J1(η1,η2,ϕ1,ϕ2,ρ)5ε2+(λ1+λ2)T2εq2,%(λ1+λ2)T2εc+5ε2cΛα0J2(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε)J1(η1,η2,ϕ1,ϕ2,ρ)5ε2+(λ1+λ2)T2εq2.

    Since (η1,η2,ϕ1,ϕ2) is the solution of Problem D, we have

    J1(η1,η2,ϕ1,ϕ2,ρ)>J1(η1,η2,ϕ1,ϕ2,ρ),

    which yields

    J2(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε)J1(η1,η2,ϕ1,ϕ2,ρ)J2(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε)J1(η1,η2,ϕ1,ϕ2,ρ). (4.7)

    Since (η1,η2,ϕ1,ϕ2) is the solution of Problem D, we obtain

    J2(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε)<J2(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε),

    which yields

    J2(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε)J1(η1,η2,ϕ1,ϕ2,ρ)J2(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε)J1(η1,η2,ϕ1,ϕ2,ρ). (4.8)

    Based on (4.7) and (4.8), we have

    0J2(η1,η2,ϕ1,ϕ2,ρ,q,ε)J1(η1,η2,ϕ1,ϕ2,ρ)J2(η1,η2,ϕ1,ϕ2,ρ,q,ε)J1(η1,η2,ϕ1,ϕ2,ρ)J2(η1,η2,ϕ1,ϕ2,ρ,q,ε)J1(η1,η2,ϕ1,ϕ2,ρ)5ε2+(λ1+λ2)T2εq2,

    which completes this proof.

    Remark 2. Theorem 3 states that when the smoothing parameter ε assumes a sufficiently small value, the solution of Problem E tends to approximate the solution of Problem D.

    To obtain an error estimation between the solutions of Problems E and C outlined in Theorem 4, the definition of ε-feasibility to Problem D is given in Definition 3.

    Definition 3. A vector (ηε1,ηε2,ϕε1,ϕε2)Rq1×Rq1×Rq2×Rq2 is called ε-feasible to Problem D if

    P{(ηl22)εa2,ρ,q2,ε}ε,P{b2(ηl22)ε,ρ,q2,ε}ε,P{(ηl11)εa1,ρ,q1,ε}ε,P{b1(ηl11)ε,ρ,q1,ε}ε,P{˜x(w|ηε1,ηε2,ϕε1,ϕε2),ρ,q2,ε}ε,P{(ην2(w)1)ε,ρ,q2,ε}ε,P{(ηl22)ε,ρ,q2,ε}ε.

    Theorem 4. Let (η1,η2,ϕ1,ϕ2) and (η1,η2,ϕ1,ϕ2) be the solution of Problems D and E, respectively. Furthermore, let (η1,η2,ϕ1,ϕ2) be feasible to Problem D and (η1,η2,ϕ1,ϕ2) be ε-feasible to Problem D. Then

    5ε2(λ1+λ2)T2εq2¯J(η1,η2,ϕ1,ϕ2)¯J(η1,η2,ϕ1,ϕ2)5ε2+(λ1+λ2)T2εq2. (4.9)

    Proof. Since (η1,η2,ϕ1,ϕ2) is the solution of Problem D, we have

    H(η1,η2,ϕ1,ϕ2)=q2l2=1l2l21max{˜x(w|η1,η2,ϕ1,ϕ2,0}dw+q1l1=1[max{a1(ηl11),0}+max{(ηl11)b1,0}]+q2l2=1[max{a2(ηl21),0}+max{(ηl22)b2,0}]+A3(η1,η2,ϕ1,ϕ2)=0,

    where

    A3(η1,η2,ϕ1,ϕ2)=1q2q2l2=1l2l21(ϕl22)(ϕν2(w)1)[λ1(2max{(ην2(w)1),0}(ην2(w)1))+λ2(2max{(ηl22),0}(ηl22))]dw=0.

    Since (η1,η2,ϕ1,ϕ2) is ε-feasible to Problem D, we obtain

    0ρ˜H(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε)5ε2+(λ1+λ2)T2εq2.

    Based on Theorem 3, we get

    0¯J(η1,η2,ϕ1,ϕ2)+ρ˜H(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε)¯J(η1,η2,ϕ1,ϕ2)ρH(η1,η1,ϕ1,ϕ2)5ε2+(λ1+λ2)T2εq2,

    which completes this proof.

    Remark 3. In this scenario, as outlined in Theorem 4, an error estimation between the solutions of Problems E and C is provided, specifically under the condition of ρ>ρ. Consequently, the solution for SOP (Sum of Optimization Problems) can be approximately obtained by iteratively solving a sequence of Problem E.

    In this section, we will derive the gradient formulae of the cost function J2(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε) with respect to the quadruple (η1,η2,ϕ1,ϕ2)Rq1×Rq1×Rq2×Rq2 based on Theorems 5–8, whose proofs are similar to the proofs of Theorems 1 and 2 in [39].

    Theorem 5. For each (η1,η2,ϕ1,ϕ2), we have

    ˜x(w|η1,η2,ϕ1,ϕ2)η1=M1(w|η1,η2,ϕ1,ϕ2), w[0,q2], (5.1)

    where M1(|η1,η2,ϕ1,ϕ2) is the solution to the following system on w[l21,l2],l2Iq2:

    {˙M1(w)=ϕl22ϕν2(w)1{f(w,ˆx(w),ην2(w)1,ηl22)˜xM1(w)+f(w,ˆx(w),ην2(w)1,ηl22)η1}.M1(0)=0. (5.2)

    On the basis of Theorem 5, the gradient of J2 with respect to η1 is given as follows:

    J2η1=π0(˜x(q2|η1,η2))˜xM1(q2)+q2l2=1l2l21ϕl22ϕν2(w)1ϑ(˜x(w|η1,η2))˜xM1(w)dw+ρ[q2l2=1l2l21P{˜x(w|η1,η2,ϕ1,ϕ2),ρ,q2,ε}˜xM1(w)dw+q1l1=1(P{a1ηl11,ρ,q1,ε}+P{ηl11b1,ρ,q1,ε})η1+q2l2=1l2l21ϕl22ϕν2(w)1q2{[λ1(2P{ην2(w)1,c,q2,ε}ην2(w)1)]η1}dw]. (5.3)

    Theorem 6. For each (η1,η2,ϕ1,ϕ2),

    ˜x(w|η1,η2,ϕ1,ϕ2)η2=M2(w|η1,η2,ϕ1,ϕ2), w[0,q2],

    where M2(|η1,η2,ϕ1,ϕ2) is the solution to the following system on w[l21,l2], l2Iq2:

    {˙M2(w)=ϕl22ϕν2(w)1{f(w,ˆx(w),ην2(w)1,ηl22)˜xM2(w)+f(w,ˆx(w),ην2(w)1,ηl22)η2}.M2(0)=0. (5.4)

    On the basis of Theorem 6, the gradient of J2 with respect to η2 is given as follows:

    J2η2=π0(˜x(q2|η1,η2))˜xM2(q2)+q2l2=1l2l21ϕl22ϕν2(w)1ϑ(˜x(w|η1,η2))˜xM2(w)dw+ρ[q2l2=1l2l21P{˜x(w|η1,η2,ϕ1,ϕ2),ρ,q2,ε}˜xM2(w)dw+q1l1=1(P{a2ηl22,ρ,q2,ε}+P{ηl22b2,ρ,q2,ε})η2+q2l2=1l2l21ϕl22ϕν2(w)1q2{[λ2(2P{ηl22,ρ,q2,ε}ηl22)]η2}dw].

    Theorem 7. For each (η1,η2,ϕ1,ϕ2), we have

    ˜x(w|η1,η2,ϕ1,ϕ2)ϕ1=N1(w|η1,η2,ϕ1,ϕ2), w[0,q2],

    where N1(|η1,η2,ϕ1,ϕ2) is the solution to the following system on w[l21,l2], l2Iq2:

    {˙N1(w)=ϕl22{ϕν2(w)1f(w,ˆx(w),ην2(w)1,ηl22)˜xN1(w)+ϕν2(w)1f(w,ˆx(w),ην2(w)1,ηl22)ϕ1}.N1(0)=0. (5.5)

    Based on Theorem 7, the gradient of J2 with respect to ϕ1 is given as follows:

    J2ϕ1=π0(˜x(q2|η1,η2,ϕ1,ϕ2))ϕ1N1(q2)+q2l2=1l2l21ϕl22ϕν2(w)1ϑ(˜x(w|η1,η2,ϕ1,ϕ2))ϕ1N1(w)dw+ρ[q2l2=1l2l21P{˜x(w|η1,η2,ϕ1,ϕ2),ρ,q2,ε}ϕ1N1(w)dw+q2l2=1l2l21ϕl22q2{ϕν2(w)1[λ1(2P{ην2(w)1,ρ,q2,ε}ην2(w)1)+λ2(2P{ηl22,ρ,q2,ε}ηl22)ϕ1}dw].

    Remark 4. For this, we introduce the inverse of ν2(w). The gradient of J2 with respect to ϕ2 is complicated to derive because the switching time is in the time range of w. ν12(p) is a function of ϕ2, and at every connection point, this inverse function is not differentiable. ν12(r),rIq1 is represented by wr, and the gradient of wr with respect to ϕl22, l2Iq2 at rIq11 is given in Remark 1 as follows.

    Remark 5. If point wr coincide with wIq21, then we have

    wrϕl22:={0ifl2>e,(p+e1j=1ϕj2)/(ϕe2)2,ifl2=e,1ϕe+12,ifl2<e, (5.6)

    and

    w+rϕl22:={0ifl2>e+1,(p+ej=1ϕj2)/(ϕe+12)2,ifl2=e+1,1ϕe+12ifl2<e+1. (5.7)

    Remark 6. If point wh does not coincide with wIq21, then we have w+rϕ2=wrϕ2.

    With the above discussion, the gradient of states with respect to ϕ2 is given below.

    Theorem 8. For each (η1,η2,ϕ1,ϕ2), we have

    ˜x(w|η1,η2,ϕ1,ϕ2)ϕ2=N2(w|η1,η2,ϕ1,ϕ2), w[0,q2],

    where N2(|η1,η2,ϕ1,ϕ2) is the solution to the following system on w[l21,l2], l2Iq2:

    {˙N2(w)=ϕν2(w)1{ϕl22f(w,ˆx(w),ην2(w)1,ηl22)˜xN2(w)+ϕl22f(w,ˆx(w),ην2(w)1,ηl22)ϕ2},N2(w+r)=N2(wr)H+r,f,h+Hr,f,h,N2(0)=0, (5.8)

    with

    Hr,f,h=ϕν2(wh)1ϕwh2[f(˜x(wh))+h1(˜x(wh))ην2(wh)1+h2(˜x(wh))ηwh2]wrϕ2 and

    H+r,f,h=ϕν2(w+h)1ϕw+h2[f(˜x(w+h))+h1(˜x(w+h))ην2(w+h)1+h2(˜x(w+h))ηw+h2]w+rϕ2.

    Based on Theorem 8, the gradient of J2 with respect to ϕ2 is given as follows:

    J2ϕ2=π0(˜x(q2|η1,η2,ϕ1,ϕ2))ϕ2N2(q2)+q2l2=1l2l21ϕl22ϕν2(w)1ϑ(˜x(w|η1,η2,ϕ1,ϕ2))ϕ2N2(w)dw+ρ[q2l2=1l2l21P{˜x(w|η1,η2,ϕ1,ϕ2),ρ,q2,ε}ϕ2N2(w)dw+q2l2=1l2l21ϕν2(w)1q2{ϕl22[λ1(2P{ην2(w)1,ρ,q2,ε}ην2(w)1)+λ2(2P{ηl22,ρ,q2,ε}ηl22)ϕ2}dwH+r,f,h+Hr,f,h].

    where H+r,λi and Hr,λi are defined as H+r,f,h and Hr,f,h, respectively.

    Remark 7. The gradient formulae of the cost function J2(η1,η2,ϕ1,ϕ2,ρ,q1,q2,ε) with respect to the quadruple (η1,η2,ϕ1,ϕ2)Rq1×Rq1×Rq2×Rq2 is calculated by the variational method (Theorems 5–8). Therefore, the gradient-based algorithm easily obtains the optimal solution for Problem E.

    We consider a nonlinear maximum hands-off control problem with the cost functional defined by

    minih0=10{6x21(t)12x2(t)+3u1(t)+u2(t)}dt, (6.1)

    governed by the nonlinear dynamical system

    {˙x1(t)=u2(t),˙x2(t)=x21(t)+u1(t),t[0,1],x=(1,0)T. (6.2)

    with the maximum hands-off control constraint

    A1(u1,u2)=2i=1λi10|ui(t)|dtμ=10, (6.3)

    and the bound constraints of the control inputs

    1uj(t)1,t[0,1],jI2.

    Case 1: The optimal control problem governed by nonlinear dynamical system (6.2) without the maximum hands-off control constraint (6.3).

    Case 2: The optimal control problem governed by nonlinear dynamical system (6.2) with the maximum hands-off control constraint (6.3).

    For Cases 1 and 2, the number of segments for the two controls are, respectively, 10 and 20. For Cases 1 and 2, by using the proposed method we respectively obtain the optimal control strategies u1 and u2 plotted in Figures 1 and 2. Their corresponding optimal cost and sparsity levels are given in Table 1. From Figure 2, it is observed that for the first control, sparse control happened in the first and from the fourth to the tenth segments. For the second control, sparse control occurred in the second, the sixteenth, from the ninth to the thirteenth, and from the eighteenth to the twentieth segments.

    Figure 1.  The optimal control strategies u1 without the maximum hands-off control constraint (6.3).
    Figure 2.  The sparse optimal control strategies u2 with the maximum hands-off control constraint (6.3).
    Table 1.  Type, control strategies, optimal cost, and A1(u1,u2).
    Type Control strategies Optimal cost A1(u1,u2)
    Case 1 u1 h0(u1)=0.3392 A1(u1)= 21.2052
    Case 2 u2 h0(u2)=0.3737 A1(u2) = 7.303

     | Show Table
    DownLoad: CSV

    The graphical representation in Figure 1 illustrates that the optimal control strategies demonstrate a considerable density. In contrast, Figure 2 portrays these strategies with a noticeable level of sparsity. Nevertheless, it is essential to highlight that the inclusion of the maximum hands-off control constraint (6.3) results in a marginal increase in the cost function value compared to the scenario without this constraint (6.3). Specifically, the cost function slightly exceeds its counterpart when the maximum hands-off control constraint is imposed.

    The observed trend in Figure 2 reveals a noteworthy pattern in the sparsity of the sparse optimal control strategies. It becomes evident that there is a rapid initial increase in sparsity, accompanied by a relatively minor escalation in cost. Therefore, based on the findings presented in this paper, it can be inferred that the proposed method exhibits the capability to generate solutions of superior quality.

    Furthermore, from Table 1, it is pertinent to note that the value of the cost function, specifically h0(u2), slightly exceeds that of h0(u1). The smaller the value of A1(u), the sparser u becomes. From Table 1, it follows that u2 is sparser than u1. So, we can conclude that this proposed method achieves a balance between system performance and sparsity, thereby suggesting its efficacy in providing solutions that optimize both aspects effectively.

    Solving the sparse optimal control problem within the framework of linear dynamical systems often allows for an analytical solution. However, when addressing the maximum hands-off control problem within the domain of nonlinear dynamical systems, obtaining an analytical solution proves to be considerably challenging. This paper necessitates the numerical solution of the sparse optimal control problem which is governed by nonlinear dynamical systems. We employ control parameterization in conjunction with the sequential adaptive switching time optimization technique to approximate the maximum hands-off control problem by a sequence of finite-dimensional optimization problem. This approach allows for varying control switching times without imposing uniformity and offers flexibility in selecting control components. The resolution of the sparse optimal control problem relies on a gradient-based algorithm. To showcase the efficacy of the proposed methodology, an illustrative example is provided.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This work was supported in part by the National Key Research and Development Program of China under Grant 2022YFB3304600; in part by the National Natural Science Foundation of China under Grants 11901075, 12271307 and 12161076; in part by the China Postdoctoral Science Foundation under Grant 2019M661073; in part by the Fundamental Research Funds for the Central Universities under Grants 3132022201, 3132023206, 3132023535 and DUT22LAB305; in part by the Guandong Province Natural Science Foundation of China under Grant 2022A1515011761; in part by Chongqing Natural Science Foundation Innovation and Development Joint Fund (CSTB2022NSCQ-LZX0040); in part by Ministry of Higher Education (MoHE) Malaysia through the Foundation Research Grant Scheme (FRGS/1/2021/STG06/SYUC/03/1); in part by the Shandong Province Natural Science Foundation of China under Grant ZR2023MA054; in part by the Fundamental Scientific Research Projects of Higher Education Institutions of Liaoning Provincial Department of Education under JYTMS20230165 (General Project); and in part by the Xinghai Project of Dalian Maritime University.

    The authors declare there is no conflict of interest.



    [1] T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Optuna: A next-generation hyperparameter optimization framework, In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019, 2623–2631. http://dx.doi.org/10.1145/3292500.3330701
    [2] B. Baker, O. Gupta, R. Raskar, N. Naik, Accelerating neural architecture search using performance prediction, 2017, arXiv: 1705.10823.
    [3] B. Baker, O. Gupta, N. Naik, R. Raskar, Designing neural network architectures using reinforcement learning, 2017, arXiv: 1611.02167.
    [4] J. F. Barrett, N. Keat, Artifacts in CT: recognition and avoidance, RadioGraphics, 24 (2004), 1679–1691. http://dx.doi.org/10.1148/rg.246045065 doi: 10.1148/rg.246045065
    [5] J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper-parameter optimization, In: Advances in Neural Information Processing Systems, 2011, 2546–2554.
    [6] J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, J. Mach. Learn. Res., 13 (2012), 281–305.
    [7] L. Bottou, F. E. Curtis, J. Nocedal, Optimization methods for large-scale machine learning, SIAM Rev., 60 (2018), 223–311. http://dx.doi.org/10.1137/16M1080173 doi: 10.1137/16M1080173
    [8] L. Breiman, Random forests, Machine Learning, 45 (2001), 5–32. http://dx.doi.org/10.1023/A:1010933404324 doi: 10.1023/A:1010933404324
    [9] C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2 (1998), 121–167. http://dx.doi.org/10.1023/A:1009715923555 doi: 10.1023/A:1009715923555
    [10] H. Cai, T. Chen, W. Zhang, Y. Yu, J. Wang, Efficient architecture search by network transformation, 2017, arXiv/1707.04873.
    [11] T. Domhan, J. T. Springenberg, F. Hutter, Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves, In: IJCAI International Joint Conference on Artificial Intelligence, 2015, 3460–3468.
    [12] T. Elsken, J. H. Metzen, F. Hutter, Neural architecture search: a survey, J. Mach. Learn. Res., 20 (2019), 1997–2017.
    [13] T. Elsken, J.-H. Metzen, F. Hutter, Simple and efficient architecture search for convolutional neural networks, 2017, arXiv: 1711.04528.
    [14] G. Franchini, M. Galinier, M. Verucchi, Mise en abyme with artificial intelligence: how to predict the accuracy of NN, applied to hyper-parameter tuning, In: INNSBDDL 2019: Recent advances in big data and deep learning, Cham: Springer, 2020,286–295. http://dx.doi.org/10.1007/978-3-030-16841-4_30
    [15] D. E. Goldberg, Genetic algorithms in search, optimization, and machine learning, Addison Wesley Publishing Co. Inc., 1989.
    [16] T. Hospedales, A. Antoniou, P. Micaelli, A. Storkey, Meta-learning in neural networks: a survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, in press. http://dx.doi.org/10.1109/TPAMI.2021.3079209
    [17] F. Hutter, L. Kotthoff, J. Vanschoren, Automatic machine learning: methods, systems, challenges, Cham: Springer, 2019. http://dx.doi.org/10.1007/978-3-030-05318-5
    [18] F. Hutter, H. Hoos, K. Leyton-Brown, Sequential model-based optimization for general algorithm configuration, In: LION 2011: Learning and Intelligent Optimization, Berlin, Heidelberg: Springer, 2011,507–523. http://dx.doi.org/10.1007/978-3-642-25566-3_40
    [19] D. P. Kingma, J. Ba, Adam: a method for stochastic optimization, 2017, arXiv: 1412.6980.
    [20] N. Loizou, P. Richtarik, Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods, Comput. Optim. Appl., 77 (2020), 653–710. http://dx.doi.org/10.1007/s10589-020-00220-z doi: 10.1007/s10589-020-00220-z
    [21] J. Mockus, V. Tiesis, A. Zilinskas, The application of Bayesian methods for seeking the extremum, In: Towards global optimisation, North-Holand, 2012,117–129.
    [22] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, N. De Freitas, Taking the human out of the loop: A review of bayesian optimization, Proc. IEEE, 104 (2016), 148–175. http://dx.doi.org/10.1109/JPROC.2015.2494218 doi: 10.1109/JPROC.2015.2494218
    [23] S. Thrun, L. Pratt, Learning to learn: introduction and overview, In: Learning to learn, Boston, MA: Springer, 1998, 3–17. http://dx.doi.org/10.1007/978-1-4615-5529-2_1
    [24] C. Ying, A. Klein, E. Real, E. Christiansen, K. Murphy, F. Hutter, NAS-Bench-101: Towards reproducible neural architecture search, In: Proceedings of the 36–th International Conference on Machine Learning, 2019, 7105–7114.
    [25] Z. Zhong, J. Yan, W. Wei, J. Shao, C.-L. Liu, Practical block-wise neural network architecture generation, In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 2423–2432. http://dx.doi.org/10.1109/CVPR.2018.00257
    [26] B. Zoph, Q. V. Le, Neural architecture search with reinforcemente learning, 2017, arXiv: 1611.01578.
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(4494) PDF downloads(413) Cited by(13)

Figures and Tables

Figures(10)  /  Tables(6)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog