Research article Special Issues

Fault-tolerant control for nonlinear systems with a dead zone: Reinforcement learning approach


  • Received: 02 December 2022 Revised: 31 December 2022 Accepted: 17 January 2023 Published: 01 February 2023
  • This paper focuses on the adaptive reinforcement learning-based optimal control problem for standard nonstrict-feedback nonlinear systems with the actuator fault and an unknown dead zone. To simultaneously reduce the computational complexity and eliminate the local optimal problem, a novel neural network weight updated algorithm is presented to replace the classic gradient descent method. By utilizing the backstepping technique, the actor critic-based reinforcement learning control strategy is developed for high-order nonlinear nonstrict-feedback systems. In addition, two auxiliary parameters are presented to deal with the input dead zone and actuator fault respectively. All signals in the system are proven to be semi-globally uniformly ultimately bounded by Lyapunov theory analysis. At the end of the paper, some simulation results are shown to illustrate the remarkable effect of the proposed approach.

    Citation: Zichen Wang, Xin Wang. Fault-tolerant control for nonlinear systems with a dead zone: Reinforcement learning approach[J]. Mathematical Biosciences and Engineering, 2023, 20(4): 6334-6357. doi: 10.3934/mbe.2023274

    Related Papers:

    [1] Tianqi Yu, Lei Liu, Yan-Jun Liu . Observer-based adaptive fuzzy output feedback control for functional constraint systems with dead-zone input. Mathematical Biosciences and Engineering, 2023, 20(2): 2628-2650. doi: 10.3934/mbe.2023123
    [2] Bin Hang, Beibei Su, Weiwei Deng . Adaptive sliding mode fault-tolerant attitude control for flexible satellites based on T-S fuzzy disturbance modeling. Mathematical Biosciences and Engineering, 2023, 20(7): 12700-12717. doi: 10.3934/mbe.2023566
    [3] Rong Sun, Yuntao Han, Yingying Wang . Design of generalized fault diagnosis observer and active adaptive fault tolerant controller for aircraft control system. Mathematical Biosciences and Engineering, 2022, 19(6): 5591-5609. doi: 10.3934/mbe.2022262
    [4] Xueyan Wang . A fuzzy neural network-based automatic fault diagnosis method for permanent magnet synchronous generators. Mathematical Biosciences and Engineering, 2023, 20(5): 8933-8953. doi: 10.3934/mbe.2023392
    [5] Shixuan Yao, Xiaochen Liu, Yinghui Zhang, Ze Cui . An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning. Mathematical Biosciences and Engineering, 2022, 19(9): 9258-9290. doi: 10.3934/mbe.2022430
    [6] Ihab Haidar, Alain Rapaport, Frédéric Gérard . Effects of spatial structure and diffusion on the performances of the chemostat. Mathematical Biosciences and Engineering, 2011, 8(4): 953-971. doi: 10.3934/mbe.2011.8.953
    [7] Vladimir Djordjevic, Hongfeng Tao, Xiaona Song, Shuping He, Weinan Gao, Vladimir Stojanovic . Data-driven control of hydraulic servo actuator: An event-triggered adaptive dynamic programming approach. Mathematical Biosciences and Engineering, 2023, 20(5): 8561-8582. doi: 10.3934/mbe.2023376
    [8] Hany Bauomy . Safety action over oscillations of a beam excited by moving load via a new active vibration controller. Mathematical Biosciences and Engineering, 2023, 20(3): 5135-5158. doi: 10.3934/mbe.2023238
    [9] Van Dong Nguyen, Dinh Quoc Vo, Van Tu Duong, Huy Hung Nguyen, Tan Tien Nguyen . Reinforcement learning-based optimization of locomotion controller using multiple coupled CPG oscillators for elongated undulating fin propulsion. Mathematical Biosciences and Engineering, 2022, 19(1): 738-758. doi: 10.3934/mbe.2022033
    [10] Xiangfei Meng, Guichen Zhang, Qiang Zhang . Robust adaptive neural network integrated fault-tolerant control for underactuated surface vessels with finite-time convergence and event-triggered inputs. Mathematical Biosciences and Engineering, 2023, 20(2): 2131-2156. doi: 10.3934/mbe.2023099
  • This paper focuses on the adaptive reinforcement learning-based optimal control problem for standard nonstrict-feedback nonlinear systems with the actuator fault and an unknown dead zone. To simultaneously reduce the computational complexity and eliminate the local optimal problem, a novel neural network weight updated algorithm is presented to replace the classic gradient descent method. By utilizing the backstepping technique, the actor critic-based reinforcement learning control strategy is developed for high-order nonlinear nonstrict-feedback systems. In addition, two auxiliary parameters are presented to deal with the input dead zone and actuator fault respectively. All signals in the system are proven to be semi-globally uniformly ultimately bounded by Lyapunov theory analysis. At the end of the paper, some simulation results are shown to illustrate the remarkable effect of the proposed approach.



    Optimal control theory originated in the 1960s and has become an important part of automatic control theory primarily owing to its spirit of seeking the optimal solution from all possible control schemes [1,2]. Optimal control problems for linear systems can generally be settled by solving Ricatti equations. However, when it comes to nonlinear systems, there are quite a few effective methods since the Hamilton-Jacobi-Bellman (HJB) equation should be addressed. The HJB equation is generally difficult to solve analytically. To overcome this bottleneck, a lot of remarkable approaches have been developed, such as adaptive dynamic [3,4], the use of actor-critic neural networks (ACNNs) [5,6], policy iteration, and so on. Reinforcement learning (RL), as a method that can solve optimal control problems in nonlinear systems to avoid solving the HJB equation, has received widespread attention in the past decades. In 1974, Werbos first applied the idea of RL to optimal control theory [7]. Since then, many outstanding outcomes have been subsequently discovered [8,9,10]. At present, RL theory is usually implemented by using ACNNs, where the critic neural network (CNN) provides policy evaluation and the actor neural network (ANN) updates the present policy. The RL algorithm can reduce energy consumption beyond that of other algorithms under the premise of achieving system stability; it became a significant method of modern control theory. In recent years, more and more approaches to RL have been presented in various fields, including online RL [11,12], integral RL [13,14], off-policy RL [15,16], etc.

    General RL strategies utilize the gradient descent method to obtain the ideal weights of neural networks, which often fall into the local optimal problem [17], resulting in neural network estimation errors that cannot easily meet the requirements. To overcome this bottleneck, Bai et al. proposed the multigradient recursive (MGR) algorithm in [17] to obtain the global optimum solution. By updating pseudo-gradients, this state-of-the-art technique can settle down the local optimal problem and accelerate the convergence rate of the neural network weight. However, the MGR algorithm suffers because of a heavy computational burden. Thus, it is necessary to mention the minimal learning parameter (MLP) scheme. It can reduce the number of updated laws without prominently reducing the estimation accuracy. Many studies have been proposed to validate the effectiveness of the MLP. Nevertheless, the aforementioned papers adopt only one of the two algorithms and fail to combine both the MLP and MGR algorithms to combine their advantages.

    The optimal control problem of strict-feedback nonlinear systems has been widely studied. However, none of these strategies can be extended into the field of nonstrict-feedback systems [18,19]. For this sake, some scholars have proposed their studies to overcome the bottleneck. For example, Tong et al. [20], presented a novel fuzzy tracking control design for a nonstrict-feedback SISO system. Bai et al. [21] utilized MLP-based RL theory to overcome optimal control problems for a class of nonstrict-feedback systems. However, all of the above results omitted the influence of actuator fault and dead zone input. These are common factors that affect the stability of the system. This negligence can lead to severe damage and must be taken seriously. Thus, many works are presented to offset their influence. However, no one has considered the situation that the dead zones and the actuator fault occur simultaneously. Thus, how to obtain an optimal controller for a nonstrict-feedback system with actuator faults and input dead zone with minimal computation and enough accuracy will be an important task, and it is the motivation for the current investigation.

    Combining with the back-stepping technique, the authors made a thorough investigation of the tracking control problem of the strict-feedback nonlinear systems [14,15,16]. At present, the back-stepping approach is introduced into the analysis for the high-order nonlinear systems. Li et al. [22], investigated the optimal control problem of a class of SISO strict-feedback systems via the fuzzy control method. Modares et al. [23] have developed an integral RL approach for the strict-feedback system with input constraints. Wang et al. [24] proposed an optimal fault-tolerant control strategy for a nonlinear strict-feedback system via adaptive critic design.

    Besides, many papers have been proposed to study novel neural network weight updated algorithms. Li et al. [25] utilized the MLP technique to overcome the fault-tolerant problem of a class of multiagent systems. Liu et al. [26] designed an RL controller by applying a MLP scheme for classic MIMO systems with external disturbance. Bai et al. [27] developed an event trigger control scheme for the multiagent system based on the MLP technique.

    Furthermore, many scholars are committed to investigating a tolerance strategy for the input dead zone and actuator fault. For instance, Wang and Yang [28] studied the fault detection problem for linear systems with disturbance. Tan et al. [29] developed a compensation control scheme for a class of discrete-time systems that has actuator failure. In addition, Na et al. [30] provided an adaptive dynamic control approach for a system with an unknown dead zone.

    Based on the above discussion, an RL optimal controller is built in this paper to deal with the fault tolerant control problem for a class of nonstrict-feedback nonlinear systems in discrete time with an unknown dead zone input and actuator fault. To deal with the dead zone and actuator fault issues, we propose two auxiliary systems to offset the influence. The ANN and CNN are utilized to approximate the unknown terms and long-time utility functions, respectively. We propose a novel approach to update the neural networks' weight. The stability of all signals in the closed-loop is rigorously proved and tracking errors are converged to a small compact set. The novelties of this paper can be concluded to be as follows

    1) We propose a novel neural networks weigh-updated algorithm to eliminate the local optimal problem and reduce computational burden. Besides, compared with the ordinary gradient descent algorithm [11,26], the proposed approach can achieve a faster weight convergence rate.

    2) We formulate a modified backstepping method with additional parameters to offset the influence of input dead zone, the actuator fault, and the algebraic loop problem. Besides that, the unified fault-tolerant control algorithms are developed based on the RL strategy.

    The organization of this paper is given below. In Section 2, descriptions of the system and radial basis function neural network (RBF NN) theory are given. In Subsection 3.1, the CNN and our novel update law are presented. In Subsection 3.2, the design procedure of the adaptive RL controller is provided. In Section 4, we propose some simulation results to show the contributions of the scheme presented in this paper. The conclusion is provided in Section 5.

    The dynamics of a standard n-order strict-feedback nonlinear system [31,32,33,34] can be described as follows:

    {xi(k+1)=φi(ˉxi(k))+ϕi(ˉxi(k))xi+1(k)xn(k+1)=φn(ˉxn(k))+ϕn(ˉxn(k))U(k)+d(k)y(k)=x1(k) (2.1)

    where xi(k)R for i=1,...,n represents the state variable of the system. The notation ˉxn(k)=[x1(k),x2(k),...,xn(k)]Rn denotes the vectors of the states. The notations U(k)R and y(k)R are the input and output signals, respectively. Notation d(k) stands for the external disturbance. Notations φi() and ϕi() represent unknown smooth nonlinear functions.

    Motivated by the transformation proposed in [18,19], the nonstrict-feedback system (2.1) can be further expressed as

    {xi(k+ni+1)=φi(ˉxn(k+ni))+ϕi(ˉxn(k+ni))xi+1(k+ni)xn(k+1)=φn(ˉxn(k))+ϕn(ˉxn(k))U(k)+d(k)y(k)=x1(k). (2.2)

    To proceed smoothly, an assumption is introduced in the following sequel.

    Assumption 1: According to the contributions in [34,35], the functions φi(ˉx(k)) and ϕi(ˉx(k)) satisfy 0<φ_<φi(ˉx(k))<ˉφ and 0<ϕ_<ϕi(ˉx(k))<ˉϕ, where ˉφ and φ_ are the unknown upper bound and the unknown lower bound of φi(ˉx(k)) and ˉϕ and ϕ_ are the unknown upper bound and unknown lower bound of ϕi(ˉx(k)), respectively. The external disturbance is bounded and satisfies |d(k)|ˉd with ˉd being an unknown positive constant.

    The control signal with the actuator fault and input dead zone can be described as U(k)=ψ(k)u(k)+δ(k), where ψ(k) and δ(k) denote the efficiency factor and the unknown drift fault of the actuator, respectively. We assume that ψ(k) is a positive constant with ψ(k)<ˉψ<1, where ˉψ is an unknown constant. Further, δ(k) satisfies δ(k)<ˉδ with ˉδ being the upper bound. The dead zone can be defined as u(k)=D(v(k)), where v(k) represents the input of the dead zone and D() is a function of v(k) which represents the output of the dead zone. According to propositions in [31], the dead zone is expressed as

    D(v(k))={br(v(k)fr),v(k)fr0,fl<v(k)<frbl(v(k)+fl),v(k)fl (2.3)

    where br and bl denote the right slope of the dead zone and the left slope of the dead zone, respectively. The notations fr and fl are breakpoints of the input. For the purpose of simplifying the following calculation, the expression D(v(k)) can be converted into a new form, as follows

    D(v(k))=b(k)v(k)+f(k) (2.4)

    where b(k) and f(k) can be described as

    b(k)={br,v(k)>0bl,v(k)0f(k)={brfr,v(k)brb(k)v(k),br<v(k)<blblfl,v(k)bl. (2.5)

    We suppose that b(k) and f(k) satisfy 0<b_<|b(k)|<ˉb and 0<f_<|f(k)|<ˉf, respectively. The control signal U(k) can be reorganized as

    U(k)=ψ(k)(b(k)v(k)+f(k))+δ(k). (2.6)

    In this paper, the RL controller is developed for the nonstrict-feedback nonlinear system (2.2), ensuring the semi-globally uniformly ultimately bounded (SGUUB) capability of all signals in the closed-loop system. Based on the ACNNs, the tracking error ξ1(k) is required to converge to the neighborhood of zero it will be specified subsequently.

    Note that, the RBF NN can approximate any smooth nonlinear functions over a compact set. That is to say, considering an unknown nonlinear function F(N), there exists an RBF NN WTS(N) such that F(N)=WTS(N)+σ(N), where W=[w1,...,wl]TRl denotes the ideal weight vector, l represents the node number in the hidden layer and σ(N) is the estimation error. Both W and σ(N) satisfy ||W||<ˉW and ||σ(N)||<ˉσ with ˉW and ˉσ as unknown upper bounds. The notation S(N)=[s1(N),...,sl(N)]T is the vector of the basis function and si(N) is applied as Gaussian form si(N)=exp[(Nci)T(Nci)η2i], where ci represents the kernel of the receptive field, ηi(k) denotes the width of the function. Because 0<si(N)<1, we can further derive that 0<li=1si(N)si(N)=S(N)TS(N)<l.

    The utility function [35] can be chosen as

    ρ(k)={0,|ξ1(k)|ϖ1,|ξ1(k)|<ϖ (3.1)

    where ϖ is a positive constant that denotes the threshold value of tracking performance. The tracking error is written as ξ1(k)=y(k)xd(k) xd(k) indicates the reference signal. The long term strategic utility function [21,27] is given by

    M(k)=ρ(k+1)+aρ(k+2)+a2ρ(k+3)+,......, (3.2)

    where a is a predefined positive parameter satisfying a<1. According to RBF NN theory, the long term utility function M(k) is defined below

    M(K)=WTMSM(k)+δM(k) (3.3)

    where WM and δM(k) indicate the ideal weight vector and the error of the approximation, respectively. Let SM(k) be the RBF NN basis function. We define ˆM(k)=ˆWTM(k)SM(k); it denotes the estimation of function M(k) with ˆWM(k) being the estimation of the ideal weight WM.

    On the basis of the MLP scheme, ˆM(k) can be written in the form

    ˆM(k)=ˆΨM(k)SM(k) (3.4)

    where indicates the Euclidean norm, ˆΨM(k)=ˆWM(k) is true for ˆΨˉΨ and ˉΨ is a positive unknown constant.

    According to the scheme in [36], the equation of the Bellman error is designed as

    EM(k)=aˆM(k)[ˆM(k1)ρ(k)]. (3.5)

    Adopting the cost function in its quadratic form βM(k)=(1/2)(EM(k))2, the gradient of ˆWM is obtained

    ΔˆWM(k)=aSM(k)[aˆM(k)ˆM(k1)+ρ(k)]. (3.6)

    Defining ωM(kj+1)=ˆΨM(k)SM(kj+1), we can further get

    g(ι,βM(k))=ιj=1a||SM(kj+1)||[aωM(kj+1)ωM(kj)+ρ(kj+1)] (3.7)

    where ι1 is a positive predefined constant that indicates the step length of the gradient.

    Together with (3.7), the updated law of ˆΨM can be obtained

    ˆΨM(k+1)=ˆΨM(k)μMιj=1aSM(kj+1)[aωM(kj+1)ωM(kj)+ρ(kj+1)] (3.8)

    where μM is the selected learning rate. The structure of the CNN is shown in Figure 1.

    Figure 1.  Structure of the CNN.

    Remark 1: The neural networks in this paper are updated by our weight-updated algorithm. As compare to the classic gradient descent method, our updated algorithm has the following advantages: 1) Reduces the computational complexity; 2) Eliminates the local optimal problem; 3) Accelerates neural networks weight convergence speed;

    In this section, an ANN will be utilized to implement the n-step backstepping RL control strategy. Specifically, two auxiliary signals are introduced in the n step to eliminate the impact of the dead zone and actuator fault.

    Step 1: The tracking error can be defined as ξ1(k+n)=x1(k+n)xd(k+n) and ξ2(k+n1)=x2(k+n1)α1(k). According to System (2.2), the tracking error ξ1(k+n) can be further deduced as

    ξ1(k+n)=ϕ1(ˆxn(k+n1))[(φ1(ˉxn(k+n1))ϕ1(ˉxn(k+n1))xd(k+n)ϕ1(ˉxn(k+n1))+xd(k+n))+α1(k)xd(k+n)+ξ2(k+n1)]

    where α1(k) denotes the virtual controller. Let

    γ1(k)=(φ1(ˉxn(k+n1))ϕ1(ˉxn(k+n1))+xd(k+n)xd(k+n)ϕ1(ˉxn(k+n1))). (3.9)

    With the universal approximation capability of the RBF NN, γ1(k)=WT1S1(N1(k))+σ1(k) can be approximated, where W1 represents the ideal weight vector. We define N1(k)=[ˉxn(k+n1),xd(k+n)] and σ1(k) indicates the approximation error. Suppose that W1 and σ1(k) satisfy W1<ˉW1 and σ1(k)<ˉσ1, respectively. Both ˉW1 and ˉσ1 are corresponding upper bounds.

    Combining (3.9) and γ1(k), ξ1(k+n) can be further expressed as

    ξ1(k+n)=ϕ1(ˆxn(k+n1))[WT1S1(N1(k))σ1(k)+α1(k)+ξ2(k+n1)xd(k+n)]. (3.10)

    For the purpose of solving the algebraic loop problem which will be mentioned later, the term o1 is proposed

    o1(k)=Ψ1S1(ϵ1(k)) (3.11)

    where ϵ1(k)=[ˉx1(k+n1),xd(k+n)] and Ψ1=W1(k).

    Adding and subtracting (3.11) into (3.10), one can easily derive

    ξ1(k+n)=ϕ1(ˉxn(k+n1))[WT1S1(N1(k))σ1(k)+α1(k)+ξ2(k+n1)xd(k+n)+Ψ1S1(ϵ1(k))Ψ1S1(ϵ1(k))]. (3.12)

    In order to further simplify (3.12), the virtual controller is designed as

    α1(k)=ˆΨ1(k)||S1(ϵ1(k))||+xd(k+n) (3.13)

    where ˆΨ1(k)=||ˆW1(k)|| and ˆW1 is the estimation of W1.

    Substituting (3.13) into (3.12), we gets

    ξ1(k+n)=ϕ1(ˉxn(k+n1))[WT1S1(N1(k))σ1(k)+ξ2(k+n1)˜Ψ1(k)S1(ϵ1(k))Ψ1S1(ϵ1(k))] (3.14)

    where ˜Ψ1(k)=ˆΨ1(k)Ψ1.

    Transforming (3.14) with the k+1 time instant, one has

    ξ1(k+1)=ϕ1(ˉxn(k))[WT1S1(N1(k1))σ1(k1)+ξ2(k)˜Ψ1(k1)S1(ϵ1(k1))Ψ1S1(ϵ1(k1))] (3.15)

    where k1=k+n1 represents the time instant.

    On the basis of the RL control scheme, the strategic utility function can be defined as

    E1(k)=ˆΨ1(k1)||S1(ϵ1(k1))||+(ˆM(k)Md(k)) (3.16)

    where Md(k) represents the ideal strategic utility function and it is usually defined as "0" [37].

    The cost function is derived as β1(k)=(1/2)(E1(k))2 and the gradient of ˆΨ1(k) is deduced as

    ΔˆΨ1(k)=β1(k)ˆΨ1(k1)=||S1(ϵ1(k1))||[ˆΨ1(k)||S1(ϵ1(k1))||+ˆM(k)]. (3.17)

    The multigradient can be further obtained as

    g(ι,β1(k))=ιj=1||S1(ϵ1(k1j+1))||[ω1(k1j+1)+ωM(kj+1)]. (3.18)

    Define ω1(k1j+1)=ˆΨ1(k1)||S1(ϵ1(k1j+1))||. Similar to step (3.8), the MGR updated law of ˆΨ1(k) is derived

    ˆΨ1(k+1)=ˆΨ1(k1)μ1g(ι,β1(k))=ˆΨ1(k1)μ1ιj=1||S1(ϵ1(k1j+1))||[ω1(k1j+1)+ωM(kj+1)] (3.19)

    where μ1 stands for the chosen learning rate. The structure of the ANN is proposed in Figure 2.

    Figure 2.  Structure of the ANN.

    Remark 2: It is necessary to emphasize that previous works usually designed the ANN basis function to have the form S1(N1(k)), which is not the function of x1(k). According to this design, α1(k) and ˆΨ1(k+1) are all built up as functions of N1(k)=[ˉxn(k+n1),xd(k+n)]T. This results in the algebraic loop problem proposed in [38]. To settle this conundrum, the term o1(k) is presented in this paper and we adapt α1(k) and ˆΨ1(k+1) as functions of ϵ1(k)=[ˉx1(k+n1),xd(k+n))]T.

    Step i: Define the tracking error ξi(k+ni+1)=xi(k+ni+1)αi1(k) and ξi+1(k+ni)=xi+1(k+ni)αi(k). Notation αi1(k) and αi(k) indicate the virtual controller at Step i1 and Step i, respectively. Similar to the process in (3.9), one has

    ξi(k+ni+1)=ϕi(ˉxn(k+ni))[(φi(ˉxn(k+ni))ϕi(ˉxn(k+ni))αi1(k)ϕi(ˉxn(k+ni))+αi1(k))+αi(k)+ξi+1(k+ni)αi1(k)]. (3.20)

    According to the definition of γ1(k), one has γi(k)=(φi(ˉxn(k+ni)ϕi(ˉxn(k+ni))+αi1(k)αi1(k)ϕi(ˉxn(k+ni))). The unknown function can be approximated by the RBF NN which is given as γi(k)=WTiSi(Ni(k))+σi(k), where Wi and σi(k) are defined as the ideal weight vector and the approximation error, respectively. Futhermore, we let Ni(k)=[x1(k+ni),...,xn(k+ni),xd(k+n)]T.

    Substituting γi(k) into (3.20), one has

    ξi(k+ni+1)=ϕi(ˉxn(k+ni))[WTiSi(Ni(k))σi(k)+αi(k)+ξi+1(k+ni)αi1(k)]. (3.21)

    The term oi(k) is given in the form below

    oi(k)=Ψi||Si(ϵi(k))|| (3.22)

    where ϵi(k)=[ˉxi(x+ni),xd(k+n)]T and Ψi denotes the Euclidean norm of the weight vector Wi.

    Substituting (3.22) into (3.21) yields

    ξi(k+ni+1)=ϕi(ˉxn(k+ni))[WTiSi(Ni(k))σi(k)+αi(k)+ξi+1(k+n1)ai1(k)+Ψi||Si(ϵi(k))||Ψi||Si(ϵi(k))||]. (3.23)

    The same as the previous process, the virtual controller is designed as

    αi(k)=ˆΨi(k)||Si(ϵi(k))||+αi1(k) (3.24)

    where ˆWi(k) is the estimation of Wi and ˆΨ(k)=||ˆWi(k)||.

    Substituting (3.24) into (3.23), ξi(k+ni+1) expresses

    ξi(k+ni+1)=ϕi(ˉxn(k+ni))[WTiSi(Ni(k))σi(k)+ξi+1(k+n1)˜Ψi(k)||Si(ϵi(k))||Ψi||Si(ϵi(k))||]. (3.25)

    Resembling Step (3.15), (3.25) can be further described as

    ξi(k+1)=ϕi(ˉxn(k))[WTiSi(Ni(ki))σi(ki)+ξi+1(k)˜Ψi(ki)||Si(ϵi(ki))||Ψi||Si(ϵi(ki))||] (3.26)

    where ki=kn+i.

    Let the prediction error Ei(k)=ˆΨi(k)||Si(ϵi(ki))||+ˆM(k). According to Ei(k), the cost function is described in its quadratic form βi(k)=(1/2)(Ei(k))2. The gradient of ˆΨi is obtained according to the derivation:

    ΔˆΨi(k)=βi(k)ˆΨi(ki)=||Si(ϵi(ki))||[ˆΨi(k)||Si(ϵi(ki)||+ˆM(k)]. (3.27)

    On the basis of the MGR algorithm definition, the multigradient is expressed as

    g(ι,βi(k))=ιj=1||Si(ϵi(kij+1))||[ωi(kij+1)+ωM(kj+1)]. (3.28)

    The updated law of ˆΨi(k) is deduced according to (3.28):

    ˆΨi(k+1)=ˆΨi(ki)μig(ι,βi(k))=ˆΨi(ki)μiιj=1||Si(ϵi(kij+1))||[ωi(k1j+1)+ωM(kj+1)] (3.29)

    where μi is the i step learning rate.

    Step n: The tracking error in the n-th subsystem can be described as ξn(k+1)=xn(k+1)αn1(k). Substitute (2.2) and (2.6) into the n-th tracking error equation:

    ξn(k+1)=φn(ˉxn(k))+ϕn(ˉxn(k))(ψ(k)(b(k)v(k)+f(k))+δ(k))+d(k)αn1(k). (3.30)

    For the purpose of simplifying 3.30, π(k) is defined

    π(k)=1ϕn(ˉxn(k))ψ(k)b(k)(φn(ˉxn(k))αn1(k)). (3.31)

    Using the theory of the RBF NN to approximate (3.31), one gets

    π(k)=WTnSn(Nn(k))+σn(k) (3.32)

    where the definitions of Wn and σn(k) are the same as those for the steps from 1 to n1 and Nn(k)=[ˉxn(k),xd(k+n)]T.

    Combining (3.32) and (3.29) we derive

    ξn(k+1)=ϕn(ˉxn(k))ψ(k)b(k)×(v(k)+f(k)b(k)+δ(k)ψ(k)b(k)π(k))+d(k). (3.33)

    From (3.33) we can acquire the dynamics of the actuator fault and dead-zone shown in (2.3) and (2.6), separately. It is easy to deduce that they have the following properties

    f(k)b(k)ˉfb_=τδ(k)ψ(k)b(k)ˉδψ_b_=ϑ (3.34)

    where ϑ and τ are both unknown parameters. Define the estimation of both parameters as ˆϑ and ˆτ; so the estimation error ˜ϑ and ˜τ are given by

    ˜ϑ(k)=ˆϑ(k)ϑ˜τ(k)=ˆτ(k)τ. (3.35)

    Based on estimation errors of (3.35), the actual controller is designed as

    v(k)=ˆΨn(kn)||Sn(Nn(kn))||+ˆτ(k)+ˆϑ(k) (3.36)

    where ˆΨn(k)=||ˆWn(k)|| and ˆWn(k) stands for the estimation of Wn. The time index kn=k.

    With (3.35) and (3.36), the estimation fault (3.33) can be further written as

    ξn(k+1)=ϕn(ˉxn(k))ψ(k)b(kn)˜Ψn(k)||Sn(Nn(kn))||ϕn(ˉxn(k))ψ(k)b(k)×(˜ϑ(k)+ϑ+˜τ(k)+τ+f(k)b(k)+δ(k)ψ(k)b(k))+ϕn(ˉxn(k))ψ(k)b(k)q(kn)+d(k) (3.37)

    where ˜Ψn(k)=ˆΨn(k)Ψn and q(k)=Ψn||Sn(Nn(kn))||WTnSn(Nn(k))σn(k). Similar to Step i, the strategic utility function is defined

    En(k)=ˆΨn(kn)||Sn(Nn(kn))||+ˆM(k). (3.38)

    Further define the cost function as βn(k)=(1/2)(En(k))2; the gradient of ˆΨn(k) yields

    ΔˆΨn(k)=βn(k)ˆΨn(kn)=||Sn(Nn(kn))||[ˆΨn(k)||Sn(Nn(kn))||+ˆM(k)]. (3.39)

    The multigradient yields

    g(ι,βn(k))=ιj=1||Sn(Nn(knj+1))||[ωn(knj+1)+ωM(kj+1)] (3.40)

    where ωn(knj+1)=ˆΨn(kn)||Sn(Nn(knj+1))||.

    The weight updated law for ˆΨn(k) can be further obtained as

    ˆΨn(k+1)=ˆΨn(kn)g(ι,βn(k))=ˆΨn(kn)μnιj=1||Sn(Nn(knj+1))||[ωn(knj+1)+ωM(kj+1)] (3.41)

    where μn is a chosen positive learning rate. Let

    Eϑ(k)=ˆϑ(k)+ˆM(k)Eτ(k)=ˆτ(k)+ˆM(k). (3.42)

    The cost functions of two auxiliary signals are chosen to have a form that is the same as βi(k) form

    βϑ(k)=(1/2)(Eϑ(k))2βτ(k)=(1/2)(Eτ(k))2. (3.43)

    The gradients are deduced

    Δˆϑ(k)=βϑ(k)ˆϑ(k)=ˆϑ(k)+ˆM(k)Δˆτ(k)=βτ(k)ˆτ(k)=ˆτ(k)+ˆM(k). (3.44)

    Two MGR updated laws are obtained

    ˆϑ(k+1)=ˆϑ(k)μϑιj=1(ˆϑ(kj+1)+ˆM(kj+1))ˆτ(k+1)=ˆτ(k)μτιj=1(ˆτ(kj+1)+ˆM(kj+1)) (3.45)

    where μϑ and μτ are positive learning factors.

    Figure 3 shows the structure of the proposed control strategy. The analysis of the stability and tracking performance are mentioned in the following theorem.

    Figure 3.  Control system structure for the proposed strategy.

    Theorem 1: Consider the nonstrict feedback nonlinear system (2.2). The adaptive RL strategy includes the updated laws (3.8), (3.19), (3.29) and (3.41), the virtual controllers (3.13) and (3.24) and the actual controller (3.36). If the selected parameters that

    0<μM<1/(lιa2), 0<μi<1/lι, 0<μϑ<1/ι, 0<μτ<1/ι (3.46)

    and Assumption 1 holds, our control strategy can ensure that all signals are SGUUB and the tracking error is tolerated. The proof of Theorem 1 is shown in the appendix.

    In this section, some simulation results are presented to illustrate the effectiveness of the proposed approach.

    The nonstrict-feedback nonlinear discrete time system is chosen as

    {x1(k+1)=φ1(ˉx2(k))+ϕ1(ˉx2(k))x2(k)x2(k+1)=φ2(ˉx2(k))+ϕ2(ˉx2(k))U(k)+d(k)y(k)=x1(k) (4.1)

    where x1(k) and x2(k) are the states, U(k) is the input and y(k) is the output. The functions φ1(ˉxn(k)) and φ2(ˉxn(k)) are chosen as x1(k) and x2(k), respectively. We choose ϕ1(ˉxn(k)) and ϕ1(ˉxn(k)) as [x1(k)+0.019(1.5x1(k))exp(4x2(k)/(3.4+x2(k)))]/20 and [x2(k)+3.1(0.4x1(k))exp(1.5x2(k)/(3.4+x2(k)))4(x2(k)U(k))], respectively. The external disturbance d(k) is chosen as 0.1cos(0.05k)cos(x1(k)). The desired signal xd(k) can be described as xd(k)=0.013sin(π/8+0.6kπ/38).

    The parameters can be selected as follows: Ψ2(0)=0.001, μM=0.005, μ1=0.008, μ2=0.006, μτ=0.05, μϑ=0.05, a=0.00001, ˆτ(0)=0.2 and ˆϑ(0)=0.3. The hidden layer node numbers of the ACNNs have been set as lW1=10, lW2=10 and lWM=10. The pace of the multigradient is chosen as ι=10.

    Figures 4 and 5 illustrate the trajectories of signals and the tracking error when br=0.5, bl=0.5, fr=0.3, and fl=0.3, respectively. The control output achieved precise tracking of the reference signal and the tracking error was close to zero (almost 0.001). Figures 6 and 7 illustrate the trajectories of signals and the tracking error when br=0.4, bl=0.4, fr=0.25 and fl=0.25, respectively. The tracking performance in this scenario was also great (the tracking error was almost 0.002). According to our results, it is true that our fault-tolerant approach can offset the influence of dead zones with different parameters.

    Figure 4.  Tracking trajectory for the proposed scheme when br=0.5, bl=0.5, fr=0.3, and fl=0.3.
    Figure 5.  Tracking error for the proposed scheme when br=0.5, bl=0.5, fr=0.3, and fl=0.3.
    Figure 6.  Tracking trajectory for the proposed scheme when br=0.4, bl=0.4, fr=0.25, and fl=0.25.
    Figure 7.  Tracking error utilizing proposed scheme when br=0.4, bl=0.4, fr=0.25, and fl=0.25.

    Figure 8 describes the trajectories of the ANN weight, the CNN weight, and the control input. We make a comparison between the proposed scheme and the MLP-based strategy. According to the results, it is clear that the proposed scheme achieved a faster convergence rate with the weight parameters than the ordinary MLP scheme. Figures 9 and 10 show the tracking trajectory and the tracking error without utilizing two auxiliary systems. Affected by the input dead zone and the actuator fault, tracking became extremely inaccurate and the tracking error was up to 0.01. Comparing Figures 4 and 9, it is obvious that our the approach can successfully offset the influence of the input dead zone and the actuator fault.

    Figure 8.  Comparison of the MLP-based control strategy and the proposed scheme.
    Figure 9.  Tracking performance for the scheme proposed in [21].
    Figure 10.  Tracking error for the scheme proposed in [21].

    To verify our novel updated algorithm can reduce the computational burden, an experiment was conducted as follows. We consider that the longer the computation the greater the computational burden. The total sample number was 1000. All of results were derived in the same environment by a computer with a 3.6 GHz CPU and 16 GB RAM.

    From Table 1, we can derive that the computational time of the ordinary gradient descent algorithm is 0.5445 s but the proposed updated algorithm only needs 0.2862 s. The computational time is reduced by 47.44%. Combined with the previous analysis, our approach not only alleviates the computational burden of the MGR algorithm but it also achieves a faster convergence rate for the weight parameters.

    Table 1.  Simulation results.
    approach computational time/s
    Gradient descent algorithm in [17] 0.5445
    The proposed approach 0.2862

     | Show Table
    DownLoad: CSV

    The aim of this paper was to build up a fault-tolerant controller for a class of nonstrict feedback systems with input dead zone. We proposed a novel neural network-updated algorithm to achieve a faster computational speed and eliminate the local optimal problem. Two auxiliary parameters were presented to offset the influence of the dead zone and actuator fault. To eliminate the occurrence of the algebraic loop problem, an auxiliary term was introduced to overcome this difficulty. According to the Lyapunov theory, all signals in the closed-loop system were proven to be SGUUB and the tracking error converged to the neighborhood of zero. Finally, some simulation results were presented to illustrate the effectiveness of our approach.

    There are still some other difficulties in the control area; for instance, the tracking control for stochastic systems via RL will be our future topic based on our current investigation.

    This work was supported by the Natural Science Foundation Project of Chongqing under Grant cstc2019jcyj-msxmX036, and also in part by the open funding project of the Guangdong Provincial Key Laboratory of Intelligent Decision and Coordination Control under Grant F2021098.

    The authors declare that they have no conflict of interest.

    Step 1: Defining θξ1, θ˜Ψ1 and θM as positive constants. The Lyapunov function is chosen as

    V1(k)=V11(k)+V12(k)+V13(k)+V14(k) (A.1)

    where V11(k)=(θξ1/4)ξ21(k), V12(k)=(θ˜Ψ1/μ1)n1s=0˜Ψ1(k1+i)2, V13(k)=(θM/μM)˜ΨM(k)2 and V14(k)=2θMιj=1[˜ΨM(k)||SM(kj)||]2.

    The Cauchy-Schwarz inequality is expressed as

    (a1+a2+,...,+an)2n(a21+a22+,...,+a2n). (A.2)

    Young's inequality is also given below

    ˜aT˜b(1/2)˜aT˜a+(1/2)˜bT˜b (A.3)

    where ˜a and ˜b are arbitrary vectors.

    By utilizing the inequality (A.3) and property that 0<S(N)TS(N)<l, the two terms shown in (3.12) have the following quality

    WTiSi(Ni(ki))Ψi||Si(ϵi(ki))|||WTiSi(Ni(ki))|+Ψi||Si(ϵi(ki))||12WTiWi+12Si(Ni(ki))TSi(Ni(ki))+12Ψ2i+12||Si(ϵi(ki))||2ˉΨ2i+l (A.4)

    where i=1,...,n1.

    According to (A.2) and (3.15), the first order difference of V11(k) is described as

    ΔV11(k)θξ1ˉϕ21(˜Ψ1(k1)||S1(ϵ1(k1))||)2+θξ1ˉϕ21(ˉΨ21+l)2+θξ1ˉϕ21ξ2(k)2+θξ1ˉϕ21ˉσ2114θξ1ξ21(k). (A.5)

    Based on (A.2) and (3.19), the first order difference of ΔV12(k) can be derived

    ΔV12(k)θΨ1(1ιlμ1)ιj=1[ω1(k1j+1)+ωM(kj+1)]2+2θΨ1ι(ˉΨ1+ˉΨM)2+2θΨ1ιj=1[˜ΨM(k1)||SM(kj+1)||]2θΨ1ιj=1[˜Ψ1(k1)||S1(ϵ1(k1j+1)||]2. (A.6)

    Similar to the processes in (A.6), the first order difference of ΔV13(k) is calculated as

    ΔV13(k)θM(1ιlμMa2)ιj=1[aωM(kj+1)+ρ(kj+1)ωM(k1)]2θMa2ιj=1[˜ΨM(k1)||SM(kj+1)||]2+2θMιj=1[˜ΨM(k1)||SM(kj)||]2+2θMι[ˉΨM(1+a)+1]2. (A.7)

    Then, consider ΔV14(k); we can obtain

    ΔV14(k)=2θMιj=1[˜ΨM(k)||SM(kj+1)||]22θMιj=1[˜ΨM(k)||SM(kj)||]2. (A.8)

    Combining (A.5)–(A.8), the first-order difference of ΔV1(k) is derived

    ΔV1(k)θΨ1(1ιlμ1)ιj=1[ω1(k1j+1)+ωM(kj+1)]2θM(1ιlμMa2)ιj=1[aωM(kj+1)+ρ(kj+1)ωM(k1)]2(θΨ1θξ1ˉϕ21)[˜Ψ1(k1)||S1(ϵ1(k1))||]2+θΨ1ˉϕ21ξ22(k)(θMa22θΨ12θM)ιj=1[˜ΨM(k)||SM(kj+1)||]2θΨ1ιj=2[˜Ψ1(k1)||S1(ϵ1(k1j+1))||]2θξ14ξ21(k)+B1 (A.9)

    where B1=2θΨ1ι(ˉW1+ˉWM)2+θξ1ˉΨ21ˉσ21+θξ1ˉϕ21(ˉΨ21+l)2+2θMι(ˉΨM(1+a)+1)2.

    Step i: The Lyapunov function in Steps 2 to n1 is designed as

    Vi(k)=Vi1(k)+Vi2(k) (A.10)

    where Vi1(k)=(θξi/4)ξ2i(k), Vi2(k)=(θΨi/μi)nis=0˜Ψi(ki+s)2 and θξi and θΨi are both positive constants.

    According to (3.26), the first order difference of Vi1 can be deduced

    ΔVi1(k)θξiˉϕ2i[˜Ψi(ki)||Si(ϵi(ki))||]2+θξiˉϕ2iξ2i+1+θξiˉϕ2iˉσ2i+θξiˉϕ2i(ˉΨ2i+l)214θξiξi(k)2. (A.11)

    Similar to (A.6), one deduces ΔVi2(k) as

    ΔVi2(k)θΨi(1ιlμi)ιj=1[ωi(kij+1)+ωM(kj+1)]2+2θΨiι(ˉΨi+ˉΨM)2+2θΨiιj=1[˜ΨM(k)||SM(kj+1)||]2θΨiιj=1[˜Ψi(ki)||Si(ϵi(kij+1)||]2. (A.12)

    Combining (A.11) and (A.12), ΔVi(k) is derived, which shows that

    ΔVi(k)θΨi(1ιlμi)ιj=1[ωi(kij+1)+ωM(kj+1)]214θξi(k)ξi(k)2+θξiˉϕ2iξ2i+1+2θΨiιj=1[˜ΨM(k)||SM(kj+1)||]2(θΨiθξiˉϕ2i)[˜Ψi(ki)||Si(ϵi(ki)||]2θΨiιj=2[˜Ψi(ki)||Si(ϵi(kij+1)||]2+Bi (A.13)

    where Bi=θξiˉϕ2i(ˉΨ2i+l)2+2θΨiι(ˉΨi+ˉΨM)2+θξiˉϕ2iˉσ2i.

    Step n: The Lyapunov function in the n-th step is

    Vn(k)=Vn1(k)+Vn2(k)+Vn3(k)+Vn4(k) (A.14)

    where Vn1(k)=(θξn/2)ξ2n(k), Vn2(k)=(θΨn/μn)˜Ψ2n(k), Vn3(k)=(θϑ/μϑ)˜ϑ2(k) and Vn4(k)=(θτ/μτ)˜τ2(k).

    Using the above inequality, (3.37) follows

    ΔVn1(k)163θξnˉϕ2nˉψ2ˉb2[2τ2+˜τ(k)2+2ϑ2+˜ϑ(k)2+(˜Ψn(k)||Sn(Nn(kn))||)2+ˉq]+23θξnˉd2(θξn/3)ξn(k)2 (A.15)

    where q(kn)2<(ˉσn+2l1/2ˉΨn)=ˉq.

    Based on (3.41) and (A.2), ΔVn2(k) is given as

    ΔVn2(k)θΨn(1ιlμn)ιj=1[ωn(knj+1)+ωM(kj+1)]2+2θΨnι(ˉΨn+ˉΨM)2+2θΨnιj=1[˜ΨM(k)||SM(kj+1)||]2θΨnιj=1[˜Ψn(kn)||Sn(ϵn(knj+1)||]2. (A.16)

    Obviously, similar to (A.6), the first order difference of Vn3(k) and Vn4(k) is:

    ΔVn3(k)θϑ(1μϑι)ιj=1[ˆϑ(kj+1)+ωM(kj+1)]2θϑιj=1˜ϑ(kj+1)2+2θϑιj=1[˜ΨM(k)||SM(kj+1)||]2+2θϑι(ϑ+ˉΨM)2ΔVn4(k)θτ(1μτι)ιj=1[ˆτ(kj+1)+ωM(kj+1)]2θτιj=1˜τ(kj+1)2+2θτιj=1[˜ΨM(k)||SM(kj+1)||]2+2θτι(τ+ˉΨM)2. (A.17)

    Combining (A.15)–(A.17), one has

    ΔVn(k)=Vn1(k)+Vn2(k)+Vn3(k)+Vn4(k)θΨn(1ιlμn)ιj=1[ωn(knj+1)+ωM(kj+1)]2θϑ(1μϑι)ιj=1[ˆϑ(kj+1)+ωM(kj+1)]2θτ(1μτι)ιj=1[ˆτ(kj+1)+ωM(kj+1)]2(θΨn163θξnˉϕ2nˉψ2ˉb2)[˜Ψn(k)||Sn(ϵn(kn))||]2θξnιj=2[˜Ψn(k)||Sn(ϵn(knj+1))||]2(θξn/3)ξn(k)2+(2θΨn+2θϑ+2θτ)[˜ΨM(k)||SM(kj+1)||]2(θϑ163θξnˉϕ2nˉψ2ˉb2)˜ϑ(k)2(θτ163θξnˉϕ2nˉψ2ˉb2)˜τ(k)2θϑιj=2˜ϑ(kj+1)2θτιj=2˜τ(kj+1)2+Bn (A.18)

    where Bn=163θξnˉϕ2nˉψ2ˉb2(2τ2+2ϑ2+ˉq)+23θξnˉd2+2θτι(τ+ˉΨM)2+2θΨnι(ˉΨn+ˉΨM)2+2θϑι(ϑ+ˉΨM)2.

    Combining the Lyapunov function from Step 1 to Step n, we can obtain

    V(k)=ni=1Vi(k)=ni=1θΨiμinis=0˜Ψi(ki+s)2+n1i=114θξiξ2i(k)+13θξnξ2n(k)+θMμM˜ΨM(k)2+2θMιj=1ΨM(kj)2+(θϑ/μϑ)˜ϑ(k)2+(θτ/μτ)˜τ(k)2. (A.19)

    Combining (A.9), (A.13) and (A.18), we finially get

    ΔV(k)(θξi3θξ(n1)ˉϕ2n1)ξn(k)2ni=1θΨi(1ιlμi)ιj=1[ωi(kij+1)+ωM(kj+1)]2θϑ(1μϑι)ιj=1(ˆϑ(kj+1)ωM(kj+1)]θτ(1μτι)ιj=1[ˆτ(kj+1)+ωM(kj+1)]θM(1ιlμMa2)ιj=1[aωM(kj+1)+ρ(kj+1)ωM(k1)]2(θϑ163θξnˉϕ2nˉψ2ˉb2)˜ϑ(k)2(θτ163θξnˉϕ2nˉψ2ˉb2)˜τ(k)2(θMa22θM2ni=1θΨi2θϑ2θτ)ιj=1[˜ΨM||SM(kj+1)||]2n1i=2(θξi4θξ(i1)ˉϕ2i1)ξi(k)214θξ1ξ1(k)2ni=1θΨiιj=2[˜Ψi(ki))||Si(ϵi(kij+1))||]2n1i=1(θΨiθΨiˉϕ2i)[˜Ψi(ki)||Si(ϵi(ki))||]2(θΨn163θξnˉϕ2nˉψ2ˉb2)[˜Ψn(kn)||Sn(ϵn(kn))||]2+B (A.20)

    where B=ni=1Bi.

    Selecting the parameters as 0<μM<1/(lιa2),0<μi<1/lι,0<μϑ<1/ι and 0<μτ<1/ι. Thus, the first order difference of V(k) can be simplified as

    ΔV(k)(θϑ163θξnˉϕ2nˉψ2ˉb2)˜ϑ(k)2(θτ163θξnˉϕ2nˉψ2ˉb2)˜τ(k)2(θMa22θM2ni=1θΨi2θϑ2θτ)ιj=1[˜ΨM||SM(kj+1)||]214θξ1ξ1(k)2+Bn1i=2(θξi4θξ(i1)ˉϕ2i1)ξi(k)2(θξi3θξ(n1)ˉϕ2n1)ξn(k)2ni=1θΨiιj=2[˜Ψi(ki))||Si(ϵi(kij+1))||]2n1i=1(θΨiθΨiˉϕ2i)[˜Ψi(ki)||Si(ϵi(ki))||]2(θΨn163θξnˉϕ2nˉψ2ˉb2)[˜Ψn(kn)||Sn(ϵn(kn))||]2. (A.21)

    In this paper, the parameters θΨi, θΦn, θξi, θξn, θM, θϑ and θτ are respectively designed as θϑ>163θξnˉϕ2nˉψ2ˉb2, θτ>163θξnˉϕ2nˉψ2ˉb2, θM>1a2(2θM+2ni=1θΨi+2θϑ+2θτ), θΨi>θΨiˉϕ2i, θΨn>163θξnˉϕ2nˉψ2ˉb2, θξi>4θξ(i1)ˉϕ2i1 and θξn>3θξ(n1)ˉϕ2n1. On the basis of the method, we have that ΔV(k)<0 if the following inequalities hold

    |˜ϑ(k)|>Bθϑ163θξnˉϕ2nˉψ2ˉb2,|˜τ(k)|>Bθτ163θξnˉϕ2nˉψ2ˉb2,|ιj=1(˜ΨM||SM(kj+1)||)|>BθMa22θM2ni=1θΨi2θϑ2θτ,|ξ1(k)|>B14θξ1,|ξi(k)|>Bθξi4θξ(i1)ˉϕ2i1,|ξn(k)|>Bθξi3θξ(n1)ˉϕ2n1,|ιj=2(˜Ψi(ki))||Si(ϵi(kij+1))|||>BθΨi,|˜Ψi(ki)||Si(ϵi(ki))|||>BθΨiθΨiˉϕ2i,|˜Ψn(kn)||Sn(ϵn(kn))|||>BθΨn163θξnˉϕ2nˉψ2ˉb2. (A.22)

    In this way, all signals in the closed-loop system are proven to be SGUUB.



    [1] J. B. Du, W. J. Cheng, G. Y. Lu, H. T. Gao, X. L. Chu, Z. C. Zhang, et al., Resource pricing and allocation in MEC enabled blockchain systems: An A3C deep reinforcement learning approach, IEEE Trans. Network Sci. Eng., 9 (2022), 33–44. https://10.1109/TNSE.2021.3068340 doi: 10.1109/TNSE.2021.3068340
    [2] H. X. Peng, X. M. Shen, Deep reinforcement learning based resource management for multi-access edge computing in vehicular networks, IEEE Trans. Network Sci. Eng., 7 (2021), 2416–2428. https://10.1109/TNSE.2020.2978856 doi: 10.1109/TNSE.2020.2978856
    [3] D. C. Chen, X. L. Liu, W. W. Yu, Finite-time fuzzy adaptive consensus for heterogeneous nonlinear multi-agent systems, IEEE Trans. Network Sci. Eng., 7 (2021), 3057–3066. https://10.1109/TNSE.2020.3013528 doi: 10.1109/TNSE.2020.3013528
    [4] J. Wang, Q. Wang, H. Wu, T. Huang, Finite-time consensus and finite-time H consensus of multi-agent systems under directed topology, IEEE Trans. Network Sci. Eng., 7 (2020), 1619–1632. https://10.1109/TNSE.2019.2943023 doi: 10.1109/TNSE.2019.2943023
    [5] T. Gao, T. Li, Y. J. Liu, S. Tong, IBLF-based adaptive neural control of state-constrained uncertain stochastic nonlinear systems, IEEE Trans. Neural Networks Learn. Syst., 33 (2022), 7345–7356. https://10.1109/TNNLS.2021.3084820 doi: 10.1109/TNNLS.2021.3084820
    [6] T. T. Gao, Y. J. Liu, D. P. Li, S. C. Tong, T. S. Li, Adaptive neural control using tangent time-varying BLFs for a class of uncertain stochastic nonlinear systems with full state constraints, IEEE Trans. Cybern., 51 (2021), 1943–1953. https://10.1109/TCYB.2019.2906118 doi: 10.1109/TCYB.2019.2906118
    [7] P. J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences, Ph. D. dissertation, Harvard University, 1974.
    [8] Y. Tang, D. D. Zhang, P. Shi, W. B. Zhang, F. Qian, Event-based formation control for nonlinear multiagent systems under DOS attacks, IEEE Trans. Autom. Control., 66 (2021) 452–459. https://10.1109/TAC.2020.2979936 doi: 10.1109/TAC.2020.2979936
    [9] Y. Tang, X. T. Wu, P. Shi, F. Qian, Input-to-state stability for nonlinear systems with stochastic impulses, Automatica, 113 (2020), 0005–1098. https://doi.org/10.1016/j.automatica.2019.108766 doi: 10.1016/j.automatica.2019.108766
    [10] X. T. Wu, Y. Tang, J. D. Cao, X. R. Mao, Stability analysis for continuous-time switched systems with stochastic switching signals, IEEE Trans. Autom. Control., 63 (2018), 3083–3090. https://10.1109/TAC.2017.2779882 doi: 10.1109/TAC.2017.2779882
    [11] B. Kiumarsi, K. G. Vamvoudakis, H. Modares, F. L. Lewis, Optimal and autonomous control using reinforcement learning: A survey, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 2042–2062. https://10.1109/TNNLS.2017.2773458 doi: 10.1109/TNNLS.2017.2773458
    [12] V. Narayanan, S. Jagannathan, Event-triggered distributed control of nonlinear interconnected systems using online reinforcement learning with exploration, IEEE Trans. Cybern., 48 (2018), 2510–2519. https://10.1109/TCYB.2017.2741342 doi: 10.1109/TCYB.2017.2741342
    [13] B. Luo, H. N. Wu, T. Huang, Off-policy reinforcement learning for H control design, IEEE Trans. Cybern., 45 (2015), 65–76. https://10.1109/TCYB.2014.2319577 doi: 10.1109/TCYB.2014.2319577
    [14] R. Song, F. L. Lewis, Q. Wei, Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero sum games, IEEE Trans. Neural Networks Learn. Syst., 28 (2017), 704–713. https://10.1109/TNNLS.2016.2582849 doi: 10.1109/TNNLS.2016.2582849
    [15] X. Yang, D. Liu, B. Luo, C. Li, Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning, Inf. Sci., 369 (2016), 731–747. https://doi.org/10.1016/j.ins.2016.07.051 doi: 10.1016/j.ins.2016.07.051
    [16] H. Zhang, K. Zhang, Y. Cai, J. Han, Adaptive fuzzy fault-tolerant tracking control for partially unknown systems with actuator faults via integral reinforcement learning method, IEEE Trans. Fuzzy Syst., 27 (2019), 1986–1998. https://10.1109/TFUZZ.2019.2893211 doi: 10.1109/TFUZZ.2019.2893211
    [17] W. Bai, Q. Zhou, T. Li, H. Li, Adaptive reinforcement learning neural network control for uncertain nonlinear system with input saturation, IEEE Trans. Cybern., 50 (2020), 3433–3443. https://10.1109/TCYB.2019.2921057 doi: 10.1109/TCYB.2019.2921057
    [18] Y. Li, S. Tong, Adaptive neural networks decentralized FTC design for nonstrict-feedback nonlinear interconnected large-scale systems against actuator faults, IEEE Trans. Neural Networks Learn. Syst., 28 (2017), 2541–2554. https://10.1109/TNNLS.2016.2598580 doi: 10.1109/TNNLS.2016.2598580
    [19] Q. Chen, H. Shi, M. Sun, Echo state network-based backstepping adaptive iterative learning control for strict-feedback systems: An error-tracking approach, IEEE Trans. Cybern., 50 (2020), 3009–3022. https://10.1109/TCYB.2019.2931877 doi: 10.1109/TCYB.2019.2931877
    [20] S. Tong, Y. Li, S. Sui, Adaptive fuzzy tracking control design for SISO uncertain nonstrict feedback nonlinear systems, IEEE Trans. Fuzzy Syst., 24 (2016), 1441–1454. https://10.1109/TFUZZ.2016.2540058 doi: 10.1109/TFUZZ.2016.2540058
    [21] W. Bai, T. Li, S. Tong, NN reinforcement learning adaptive control for a class of nonstrict-feedback discrete-time systems, IEEE Trans. Cybern., 50 (2020), 4573–4584. https://10.1109/TCYB.2020.2963849 doi: 10.1109/TCYB.2020.2963849
    [22] Y. Li, K. Sun, S. Tong, Observer-based adaptive fuzzy fault-tolerant optimal control for SISO nonlinear systems, IEEE Trans. Cybern., 49 (2019), 649–661. https://10.1109/TCYB.2017.2785801 doi: 10.1109/TCYB.2017.2785801
    [23] H. Modares, F. L. Lewis, M. B. Naghibi-Sistani, Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems, Automatica, 50 (2014), 193–202. https://doi.org/10.1016/j.automatica.2013.09.043 doi: 10.1016/j.automatica.2013.09.043
    [24] Z. Wang, L. Liu, Y. Wu, H. Zhang, Optimal fault-tolerant control for discrete-time nonlinear strict-feedback systems based on adaptive critic design, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 2179–2191. https://10.1109/TNNLS.2018.2810138 doi: 10.1109/TNNLS.2018.2810138
    [25] H. Li, Y. Wu, M. Chen, Adaptive fault-tolerant tracking control for discrete-time multiagent systems via reinforcement learning algorithm, IEEE Trans. Cybern., 51 (2021), 1163–1174. https://10.1109/TCYB.2020.2982168 doi: 10.1109/TCYB.2020.2982168
    [26] Y. J. Liu, L. Tang, S. Tong, C. L. P. Chen, D. J. Li, Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems, IEEE Trans. Neural Networks Learn. Syst., 26 (2015), 165–176. https://10.1109/TNNLS.2014.2360724 doi: 10.1109/TNNLS.2014.2360724
    [27] W. Bai, T. Li, Y. Long, C. L. P. Chen, Event-triggered multigradient recursive reinforcement learning tracking control for multiagent systems, IEEE Trans. Neural Networks Learn. Syst., 34 (2023), 355–379. https://10.1109/TNNLS.2021.3094901 doi: 10.1109/TNNLS.2021.3094901
    [28] H. Wang, G. H. Yang, A finite frequency domain approach to fault detection for linear discrete-time systems, Int. J. Control., 81 (2008), 1162–1171. https://doi.org/10.1080/00207170701691513 doi: 10.1080/00207170701691513
    [29] C. Tan, G. Tao, R. Qi, A discrete-time parameter estimation based adaptive actuator failure compensation control scheme, Int. J. Control., 86 (2013), 276–289. https://doi.org/10.1080/00207179.2012.723828 doi: 10.1080/00207179.2012.723828
    [30] J. Na, X. Ren, G. Herrmann, Z. Qiao, Adaptive neural dynamic surface control for servo systems with unknown dead-zone, Control Eng. Pract., 19 (2011), 1328–1343. https://doi.org/10.1016/j.conengprac.2011.07.005 doi: 10.1016/j.conengprac.2011.07.005
    [31] Y. J. Liu, S. Li, S. Tong, C. L. P. Chen, Adaptive reinforcement learning control based on neural approximation for nonlinear discrete-time systems with unknown nonaffine dead-zone input, IEEE Trans. Neural Networks Learn. Syst., 30 (2019), 295–305. https://10.1109/TNNLS.2018.2844165 doi: 10.1109/TNNLS.2018.2844165
    [32] S. S. Ge, J. Zhang, T. H. Lee, Adaptive neural network control for a class of MIMO nonlinear systems with disturbances in discrete time, IEEE Trans. Syst., Man Cybern. B, Cybern., 34 (2004), 1630–1645. https://10.1109/TSMCB.2004.826827 doi: 10.1109/TSMCB.2004.826827
    [33] Y. J. Liu, Y. Gao, S. Tong, Y. Li, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete time systems with dead-zone, IEEE Trans. Fuzzy Syst., 24 (2016), 16–28. https://10.1109/TFUZZ.2015.2418000 doi: 10.1109/TFUZZ.2015.2418000
    [34] S. S. Ge, G. Y. Li, T. H. Lee, Adaptive NN control for a class of strict-feedback discrete-time nonlinear systems, Automatica, 39 (2003), 807–819. https://doi.org/10.1016/S0005-1098(03)00032-3 doi: 10.1016/S0005-1098(03)00032-3
    [35] Q. Yang, S. Jagannathan, Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators, IEEE Trans. Syst., Man, Cybern. B, Cybern., 42 (2012), 377–390. https://10.1109/TSMCB.2011.2166384 doi: 10.1109/TSMCB.2011.2166384
    [36] Y. J. Liu, Y. Gao, S. Tong, Y. Li, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete time systems with dead-zone, IEEE Trans. Fuzzy Syst., 24 (2016), 16–28. https://10.1109/TFUZZ.2015.2418000 doi: 10.1109/TFUZZ.2015.2418000
    [37] S. Ferrari, J. E. Steck, R. Chandramohan, Adaptive feedback control by constrained approximate dynamic programming, IEEE Trans. Syst., Man, Cybern. B, Cybern., 38 (2008), 982–987. https://10.1109/TSMCB.2008.924140 doi: 10.1109/TSMCB.2008.924140
    [38] S. Tong, Y. Li, S. Sui, Adaptive fuzzy tracking control design for SISO uncertain nonstrict feedback nonlinear systems, IEEE Trans. Fuzzy Syst., 24 (2016), 1441–1454. https://10.1109/TFUZZ.2016.2540058 doi: 10.1109/TFUZZ.2016.2540058
  • This article has been cited by:

    1. Xiaohang Su, Peng Liu, Haoran Jiang, Xinyu Yu, Neighbor event-triggered adaptive distributed control for multiagent systems with dead-zone inputs, 2024, 9, 2473-6988, 10031, 10.3934/math.2024491
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1865) PDF downloads(144) Cited by(1)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog