
This paper focuses on the adaptive reinforcement learning-based optimal control problem for standard nonstrict-feedback nonlinear systems with the actuator fault and an unknown dead zone. To simultaneously reduce the computational complexity and eliminate the local optimal problem, a novel neural network weight updated algorithm is presented to replace the classic gradient descent method. By utilizing the backstepping technique, the actor critic-based reinforcement learning control strategy is developed for high-order nonlinear nonstrict-feedback systems. In addition, two auxiliary parameters are presented to deal with the input dead zone and actuator fault respectively. All signals in the system are proven to be semi-globally uniformly ultimately bounded by Lyapunov theory analysis. At the end of the paper, some simulation results are shown to illustrate the remarkable effect of the proposed approach.
Citation: Zichen Wang, Xin Wang. Fault-tolerant control for nonlinear systems with a dead zone: Reinforcement learning approach[J]. Mathematical Biosciences and Engineering, 2023, 20(4): 6334-6357. doi: 10.3934/mbe.2023274
[1] | Tianqi Yu, Lei Liu, Yan-Jun Liu . Observer-based adaptive fuzzy output feedback control for functional constraint systems with dead-zone input. Mathematical Biosciences and Engineering, 2023, 20(2): 2628-2650. doi: 10.3934/mbe.2023123 |
[2] | Bin Hang, Beibei Su, Weiwei Deng . Adaptive sliding mode fault-tolerant attitude control for flexible satellites based on T-S fuzzy disturbance modeling. Mathematical Biosciences and Engineering, 2023, 20(7): 12700-12717. doi: 10.3934/mbe.2023566 |
[3] | Rong Sun, Yuntao Han, Yingying Wang . Design of generalized fault diagnosis observer and active adaptive fault tolerant controller for aircraft control system. Mathematical Biosciences and Engineering, 2022, 19(6): 5591-5609. doi: 10.3934/mbe.2022262 |
[4] | Xueyan Wang . A fuzzy neural network-based automatic fault diagnosis method for permanent magnet synchronous generators. Mathematical Biosciences and Engineering, 2023, 20(5): 8933-8953. doi: 10.3934/mbe.2023392 |
[5] | Shixuan Yao, Xiaochen Liu, Yinghui Zhang, Ze Cui . An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning. Mathematical Biosciences and Engineering, 2022, 19(9): 9258-9290. doi: 10.3934/mbe.2022430 |
[6] | Ihab Haidar, Alain Rapaport, Frédéric Gérard . Effects of spatial structure and diffusion on the performances of the chemostat. Mathematical Biosciences and Engineering, 2011, 8(4): 953-971. doi: 10.3934/mbe.2011.8.953 |
[7] | Vladimir Djordjevic, Hongfeng Tao, Xiaona Song, Shuping He, Weinan Gao, Vladimir Stojanovic . Data-driven control of hydraulic servo actuator: An event-triggered adaptive dynamic programming approach. Mathematical Biosciences and Engineering, 2023, 20(5): 8561-8582. doi: 10.3934/mbe.2023376 |
[8] | Hany Bauomy . Safety action over oscillations of a beam excited by moving load via a new active vibration controller. Mathematical Biosciences and Engineering, 2023, 20(3): 5135-5158. doi: 10.3934/mbe.2023238 |
[9] | Van Dong Nguyen, Dinh Quoc Vo, Van Tu Duong, Huy Hung Nguyen, Tan Tien Nguyen . Reinforcement learning-based optimization of locomotion controller using multiple coupled CPG oscillators for elongated undulating fin propulsion. Mathematical Biosciences and Engineering, 2022, 19(1): 738-758. doi: 10.3934/mbe.2022033 |
[10] | Xiangfei Meng, Guichen Zhang, Qiang Zhang . Robust adaptive neural network integrated fault-tolerant control for underactuated surface vessels with finite-time convergence and event-triggered inputs. Mathematical Biosciences and Engineering, 2023, 20(2): 2131-2156. doi: 10.3934/mbe.2023099 |
This paper focuses on the adaptive reinforcement learning-based optimal control problem for standard nonstrict-feedback nonlinear systems with the actuator fault and an unknown dead zone. To simultaneously reduce the computational complexity and eliminate the local optimal problem, a novel neural network weight updated algorithm is presented to replace the classic gradient descent method. By utilizing the backstepping technique, the actor critic-based reinforcement learning control strategy is developed for high-order nonlinear nonstrict-feedback systems. In addition, two auxiliary parameters are presented to deal with the input dead zone and actuator fault respectively. All signals in the system are proven to be semi-globally uniformly ultimately bounded by Lyapunov theory analysis. At the end of the paper, some simulation results are shown to illustrate the remarkable effect of the proposed approach.
Optimal control theory originated in the 1960s and has become an important part of automatic control theory primarily owing to its spirit of seeking the optimal solution from all possible control schemes [1,2]. Optimal control problems for linear systems can generally be settled by solving Ricatti equations. However, when it comes to nonlinear systems, there are quite a few effective methods since the Hamilton-Jacobi-Bellman (HJB) equation should be addressed. The HJB equation is generally difficult to solve analytically. To overcome this bottleneck, a lot of remarkable approaches have been developed, such as adaptive dynamic [3,4], the use of actor-critic neural networks (ACNNs) [5,6], policy iteration, and so on. Reinforcement learning (RL), as a method that can solve optimal control problems in nonlinear systems to avoid solving the HJB equation, has received widespread attention in the past decades. In 1974, Werbos first applied the idea of RL to optimal control theory [7]. Since then, many outstanding outcomes have been subsequently discovered [8,9,10]. At present, RL theory is usually implemented by using ACNNs, where the critic neural network (CNN) provides policy evaluation and the actor neural network (ANN) updates the present policy. The RL algorithm can reduce energy consumption beyond that of other algorithms under the premise of achieving system stability; it became a significant method of modern control theory. In recent years, more and more approaches to RL have been presented in various fields, including online RL [11,12], integral RL [13,14], off-policy RL [15,16], etc.
General RL strategies utilize the gradient descent method to obtain the ideal weights of neural networks, which often fall into the local optimal problem [17], resulting in neural network estimation errors that cannot easily meet the requirements. To overcome this bottleneck, Bai et al. proposed the multigradient recursive (MGR) algorithm in [17] to obtain the global optimum solution. By updating pseudo-gradients, this state-of-the-art technique can settle down the local optimal problem and accelerate the convergence rate of the neural network weight. However, the MGR algorithm suffers because of a heavy computational burden. Thus, it is necessary to mention the minimal learning parameter (MLP) scheme. It can reduce the number of updated laws without prominently reducing the estimation accuracy. Many studies have been proposed to validate the effectiveness of the MLP. Nevertheless, the aforementioned papers adopt only one of the two algorithms and fail to combine both the MLP and MGR algorithms to combine their advantages.
The optimal control problem of strict-feedback nonlinear systems has been widely studied. However, none of these strategies can be extended into the field of nonstrict-feedback systems [18,19]. For this sake, some scholars have proposed their studies to overcome the bottleneck. For example, Tong et al. [20], presented a novel fuzzy tracking control design for a nonstrict-feedback SISO system. Bai et al. [21] utilized MLP-based RL theory to overcome optimal control problems for a class of nonstrict-feedback systems. However, all of the above results omitted the influence of actuator fault and dead zone input. These are common factors that affect the stability of the system. This negligence can lead to severe damage and must be taken seriously. Thus, many works are presented to offset their influence. However, no one has considered the situation that the dead zones and the actuator fault occur simultaneously. Thus, how to obtain an optimal controller for a nonstrict-feedback system with actuator faults and input dead zone with minimal computation and enough accuracy will be an important task, and it is the motivation for the current investigation.
Combining with the back-stepping technique, the authors made a thorough investigation of the tracking control problem of the strict-feedback nonlinear systems [14,15,16]. At present, the back-stepping approach is introduced into the analysis for the high-order nonlinear systems. Li et al. [22], investigated the optimal control problem of a class of SISO strict-feedback systems via the fuzzy control method. Modares et al. [23] have developed an integral RL approach for the strict-feedback system with input constraints. Wang et al. [24] proposed an optimal fault-tolerant control strategy for a nonlinear strict-feedback system via adaptive critic design.
Besides, many papers have been proposed to study novel neural network weight updated algorithms. Li et al. [25] utilized the MLP technique to overcome the fault-tolerant problem of a class of multiagent systems. Liu et al. [26] designed an RL controller by applying a MLP scheme for classic MIMO systems with external disturbance. Bai et al. [27] developed an event trigger control scheme for the multiagent system based on the MLP technique.
Furthermore, many scholars are committed to investigating a tolerance strategy for the input dead zone and actuator fault. For instance, Wang and Yang [28] studied the fault detection problem for linear systems with disturbance. Tan et al. [29] developed a compensation control scheme for a class of discrete-time systems that has actuator failure. In addition, Na et al. [30] provided an adaptive dynamic control approach for a system with an unknown dead zone.
Based on the above discussion, an RL optimal controller is built in this paper to deal with the fault tolerant control problem for a class of nonstrict-feedback nonlinear systems in discrete time with an unknown dead zone input and actuator fault. To deal with the dead zone and actuator fault issues, we propose two auxiliary systems to offset the influence. The ANN and CNN are utilized to approximate the unknown terms and long-time utility functions, respectively. We propose a novel approach to update the neural networks' weight. The stability of all signals in the closed-loop is rigorously proved and tracking errors are converged to a small compact set. The novelties of this paper can be concluded to be as follows
1) We propose a novel neural networks weigh-updated algorithm to eliminate the local optimal problem and reduce computational burden. Besides, compared with the ordinary gradient descent algorithm [11,26], the proposed approach can achieve a faster weight convergence rate.
2) We formulate a modified backstepping method with additional parameters to offset the influence of input dead zone, the actuator fault, and the algebraic loop problem. Besides that, the unified fault-tolerant control algorithms are developed based on the RL strategy.
The organization of this paper is given below. In Section 2, descriptions of the system and radial basis function neural network (RBF NN) theory are given. In Subsection 3.1, the CNN and our novel update law are presented. In Subsection 3.2, the design procedure of the adaptive RL controller is provided. In Section 4, we propose some simulation results to show the contributions of the scheme presented in this paper. The conclusion is provided in Section 5.
The dynamics of a standard n-order strict-feedback nonlinear system [31,32,33,34] can be described as follows:
{xi(k+1)=φi(ˉxi(k))+ϕi(ˉxi(k))xi+1(k)xn(k+1)=φn(ˉxn(k))+ϕn(ˉxn(k))U(k)+d(k)y(k)=x1(k) | (2.1) |
where xi(k)∈R for i=1,...,n represents the state variable of the system. The notation ˉxn(k)=[x1(k),x2(k),...,xn(k)]⊤∈Rn denotes the vectors of the states. The notations U(k)∈R and y(k)∈R are the input and output signals, respectively. Notation d(k) stands for the external disturbance. Notations φi(⋅) and ϕi(⋅) represent unknown smooth nonlinear functions.
Motivated by the transformation proposed in [18,19], the nonstrict-feedback system (2.1) can be further expressed as
{xi(k+n−i+1)=φi(ˉxn(k+n−i))+ϕi(ˉxn(k+n−i))xi+1(k+n−i)xn(k+1)=φn(ˉxn(k))+ϕn(ˉxn(k))U(k)+d(k)y(k)=x1(k). | (2.2) |
To proceed smoothly, an assumption is introduced in the following sequel.
Assumption 1: According to the contributions in [34,35], the functions φi(ˉx(k)) and ϕi(ˉx(k)) satisfy 0<φ_<φi(ˉx(k))<ˉφ and 0<ϕ_<ϕi(ˉx(k))<ˉϕ, where ˉφ and φ_ are the unknown upper bound and the unknown lower bound of φi(ˉx(k)) and ˉϕ and ϕ_ are the unknown upper bound and unknown lower bound of ϕi(ˉx(k)), respectively. The external disturbance is bounded and satisfies |d(k)|≤ˉd with ˉd being an unknown positive constant.
The control signal with the actuator fault and input dead zone can be described as U(k)=ψ(k)u(k)+δ(k), where ψ(k) and δ(k) denote the efficiency factor and the unknown drift fault of the actuator, respectively. We assume that ψ(k) is a positive constant with ψ(k)<ˉψ<1, where ˉψ is an unknown constant. Further, δ(k) satisfies δ(k)<ˉδ with ˉδ being the upper bound. The dead zone can be defined as u(k)=D(v(k)), where v(k) represents the input of the dead zone and D(⋅) is a function of v(k) which represents the output of the dead zone. According to propositions in [31], the dead zone is expressed as
D(v(k))={br(v(k)−fr),v(k)≥fr0,−fl<v(k)<frbl(v(k)+fl),v(k)≤−fl | (2.3) |
where br and bl denote the right slope of the dead zone and the left slope of the dead zone, respectively. The notations fr and fl are breakpoints of the input. For the purpose of simplifying the following calculation, the expression D(v(k)) can be converted into a new form, as follows
D(v(k))=b(k)v(k)+f(k) | (2.4) |
where b(k) and f(k) can be described as
b(k)={br,v(k)>0bl,v(k)≤0f(k)={−brfr,v(k)≥br−b(k)v(k),−br<v(k)<blblfl,v(k)≤bl. | (2.5) |
We suppose that b(k) and f(k) satisfy 0<b_<|b(k)|<ˉb and 0<f_<|f(k)|<ˉf, respectively. The control signal U(k) can be reorganized as
U(k)=ψ(k)(b(k)v(k)+f(k))+δ(k). | (2.6) |
In this paper, the RL controller is developed for the nonstrict-feedback nonlinear system (2.2), ensuring the semi-globally uniformly ultimately bounded (SGUUB) capability of all signals in the closed-loop system. Based on the ACNNs, the tracking error ξ1(k) is required to converge to the neighborhood of zero it will be specified subsequently.
Note that, the RBF NN can approximate any smooth nonlinear functions over a compact set. That is to say, considering an unknown nonlinear function F(N), there exists an RBF NN W∗TS(N) such that F(N)=W∗TS(N)+σ(N), where W∗=[w1,...,wl]T∈Rl denotes the ideal weight vector, l represents the node number in the hidden layer and σ(N) is the estimation error. Both W∗ and σ(N) satisfy ||W∗||<ˉW and ||σ(N)||<ˉσ with ˉW and ˉσ as unknown upper bounds. The notation S(N)=[s1(N),...,sl(N)]T is the vector of the basis function and si(N) is applied as Gaussian form si(N)=exp[−(N−ci)T(N−ci)η2i], where ci represents the kernel of the receptive field, ηi(k) denotes the width of the function. Because 0<si(N)<1, we can further derive that 0<∑li=1si(N)si(N)=S(N)TS(N)<l.
The utility function [35] can be chosen as
ρ(k)={0,|ξ1(k)|≥ϖ1,|ξ1(k)|<ϖ | (3.1) |
where ϖ is a positive constant that denotes the threshold value of tracking performance. The tracking error is written as ξ1(k)=y(k)−xd(k) xd(k) indicates the reference signal. The long term strategic utility function [21,27] is given by
M(k)=ρ(k+1)+aρ(k+2)+a2ρ(k+3)+,......, | (3.2) |
where a is a predefined positive parameter satisfying a<1. According to RBF NN theory, the long term utility function M(k) is defined below
M(K)=WTMSM(k)+δM(k) | (3.3) |
where WM and δM(k) indicate the ideal weight vector and the error of the approximation, respectively. Let SM(k) be the RBF NN basis function. We define ˆM(k)=ˆWTM(k)SM(k); it denotes the estimation of function M(k) with ˆWM(k) being the estimation of the ideal weight WM.
On the basis of the MLP scheme, ˆM(k) can be written in the form
ˆM(k)=ˆΨM(k)‖SM(k)‖ | (3.4) |
where ‖⋅‖ indicates the Euclidean norm, ˆΨM(k)=‖ˆWM(k)‖ is true for ˆΨ≤ˉΨ and ˉΨ is a positive unknown constant.
According to the scheme in [36], the equation of the Bellman error is designed as
EM(k)=aˆM(k)−[ˆM(k−1)−ρ(k)]. | (3.5) |
Adopting the cost function in its quadratic form βM(k)=(1/2)(EM(k))2, the gradient of ˆWM is obtained
ΔˆWM(k)=a‖SM(k)‖[aˆM(k)−ˆM(k−1)+ρ(k)]. | (3.6) |
Defining ωM(k−j+1)=ˆΨM(k)‖SM(k−j+1)‖, we can further get
g(ι,βM(k))=ι∑j=1a||SM(k−j+1)||[aωM(k−j+1)−ωM(k−j)+ρ(k−j+1)] | (3.7) |
where ι≥1 is a positive predefined constant that indicates the step length of the gradient.
Together with (3.7), the updated law of ˆΨM can be obtained
ˆΨM(k+1)=ˆΨM(k)−μMι∑j=1a‖SM(k−j+1)‖[aωM(k−j+1)−ωM(k−j)+ρ(k−j+1)] | (3.8) |
where μM is the selected learning rate. The structure of the CNN is shown in Figure 1.
Remark 1: The neural networks in this paper are updated by our weight-updated algorithm. As compare to the classic gradient descent method, our updated algorithm has the following advantages: 1) Reduces the computational complexity; 2) Eliminates the local optimal problem; 3) Accelerates neural networks weight convergence speed;
In this section, an ANN will be utilized to implement the n-step backstepping RL control strategy. Specifically, two auxiliary signals are introduced in the n step to eliminate the impact of the dead zone and actuator fault.
Step 1: The tracking error can be defined as ξ1(k+n)=x1(k+n)−xd(k+n) and ξ2(k+n−1)=x2(k+n−1)−α1(k). According to System (2.2), the tracking error ξ1(k+n) can be further deduced as
ξ1(k+n)=ϕ1(ˆxn(k+n−1))[(φ1(ˉxn(k+n−1))ϕ1(ˉxn(k+n−1))−xd(k+n)ϕ1(ˉxn(k+n−1))+xd(k+n))+α1(k)−xd(k+n)+ξ2(k+n−1)] |
where α1(k) denotes the virtual controller. Let
γ1(k)=−(φ1(ˉxn(k+n−1))ϕ1(ˉxn(k+n−1))+xd(k+n)−xd(k+n)ϕ1(ˉxn(k+n−1))). | (3.9) |
With the universal approximation capability of the RBF NN, γ1(k)=WT1S1(N1(k))+σ1(k) can be approximated, where W1 represents the ideal weight vector. We define N1(k)=[ˉxn(k+n−1),xd(k+n)] and σ1(k) indicates the approximation error. Suppose that W1 and σ1(k) satisfy ‖W1‖<ˉW1 and ‖σ1(k)‖<ˉσ1, respectively. Both ˉW1 and ˉσ1 are corresponding upper bounds.
Combining (3.9) and γ1(k), ξ1(k+n) can be further expressed as
ξ1(k+n)=ϕ1(ˆxn(k+n−1))[−WT1S1(N1(k))−σ1(k)+α1(k)+ξ2(k+n−1)−xd(k+n)]. | (3.10) |
For the purpose of solving the algebraic loop problem which will be mentioned later, the term o1 is proposed
o1(k)=Ψ1‖S1(ϵ1(k))‖ | (3.11) |
where ϵ1(k)=[ˉx1(k+n−1),xd(k+n)] and Ψ1=‖W1(k)‖.
Adding and subtracting (3.11) into (3.10), one can easily derive
ξ1(k+n)=ϕ1(ˉxn(k+n−1))[−WT1S1(N1(k))−σ1(k)+α1(k)+ξ2(k+n−1)−xd(k+n)+Ψ1‖S1(ϵ1(k))‖−Ψ1‖S1(ϵ1(k))‖]. | (3.12) |
In order to further simplify (3.12), the virtual controller is designed as
α1(k)=−ˆΨ1(k)||S1(ϵ1(k))||+xd(k+n) | (3.13) |
where ˆΨ1(k)=||ˆW1(k)|| and ˆW1 is the estimation of W1.
Substituting (3.13) into (3.12), we gets
ξ1(k+n)=ϕ1(ˉxn(k+n−1))[−WT1S1(N1(k))−σ1(k)+ξ2(k+n−1)−˜Ψ1(k)‖S1(ϵ1(k))‖−Ψ1‖S1(ϵ1(k))‖] | (3.14) |
where ˜Ψ1(k)=ˆΨ1(k)−Ψ1.
Transforming (3.14) with the k+1 time instant, one has
ξ1(k+1)=ϕ1(ˉxn(k))[−WT1S1(N1(k1))−σ1(k1)+ξ2(k)−˜Ψ1(k1)‖S1(ϵ1(k1))‖−Ψ1‖S1(ϵ1(k1))‖] | (3.15) |
where k1=k+n−1 represents the time instant.
On the basis of the RL control scheme, the strategic utility function can be defined as
E1(k)=ˆΨ1(k1)||S1(ϵ1(k1))||+(ˆM(k)−Md(k)) | (3.16) |
where Md(k) represents the ideal strategic utility function and it is usually defined as "0" [37].
The cost function is derived as β1(k)=(1/2)(E1(k))2 and the gradient of ˆΨ1(k) is deduced as
ΔˆΨ1(k)=∂β1(k)∂ˆΨ1(k1)=||S1(ϵ1(k1))||[ˆΨ1(k)||S1(ϵ1(k1))||+ˆM(k)]. | (3.17) |
The multigradient can be further obtained as
g(ι,β1(k))=ι∑j=1||S1(ϵ1(k1−j+1))||[ω1(k1−j+1)+ωM(k−j+1)]. | (3.18) |
Define ω1(k1−j+1)=ˆΨ1(k1)||S1(ϵ1(k1−j+1))||. Similar to step (3.8), the MGR updated law of ˆΨ1(k) is derived
ˆΨ1(k+1)=ˆΨ1(k1)−μ1g(ι,β1(k))=ˆΨ1(k1)−μ1ι∑j=1||S1(ϵ1(k1−j+1))||[ω1(k1−j+1)+ωM(k−j+1)] | (3.19) |
where μ1 stands for the chosen learning rate. The structure of the ANN is proposed in Figure 2.
Remark 2: It is necessary to emphasize that previous works usually designed the ANN basis function to have the form S1(N1(k)), which is not the function of x1(k). According to this design, α1(k) and ˆΨ1(k+1) are all built up as functions of N1(k)=[ˉxn(k+n−1),xd(k+n)]T. This results in the algebraic loop problem proposed in [38]. To settle this conundrum, the term o1(k) is presented in this paper and we adapt α1(k) and ˆΨ1(k+1) as functions of ϵ1(k)=[ˉx1(k+n−1),xd(k+n))]T.
Step i: Define the tracking error ξi(k+n−i+1)=xi(k+n−i+1)−αi−1(k) and ξi+1(k+n−i)=xi+1(k+n−i)−αi(k). Notation αi−1(k) and αi(k) indicate the virtual controller at Step i−1 and Step i, respectively. Similar to the process in (3.9), one has
ξi(k+n−i+1)=ϕi(ˉxn(k+n−i))[(φi(ˉxn(k+n−i))ϕi(ˉxn(k+n−i))−αi−1(k)ϕi(ˉxn(k+n−i))+αi−1(k))+αi(k)+ξi+1(k+n−i)−αi−1(k)]. | (3.20) |
According to the definition of γ1(k), one has γi(k)=−(φi(ˉxn(k+n−i)ϕi(ˉxn(k+n−i))+αi−1(k)−αi−1(k)ϕi(ˉxn(k+n−i))). The unknown function can be approximated by the RBF NN which is given as γi(k)=WTiSi(Ni(k))+σi(k), where Wi and σi(k) are defined as the ideal weight vector and the approximation error, respectively. Futhermore, we let Ni(k)=[x1(k+n−i),...,xn(k+n−i),xd(k+n)]T.
Substituting γi(k) into (3.20), one has
ξi(k+n−i+1)=ϕi(ˉxn(k+n−i))[−WTiSi(Ni(k))−σi(k)+αi(k)+ξi+1(k+n−i)−αi−1(k)]. | (3.21) |
The term oi(k) is given in the form below
oi(k)=Ψi||Si(ϵi(k))|| | (3.22) |
where ϵi(k)=[ˉxi(x+n−i),xd(k+n)]T and Ψi denotes the Euclidean norm of the weight vector Wi.
Substituting (3.22) into (3.21) yields
ξi(k+n−i+1)=ϕi(ˉxn(k+n−i))[−WTiSi(Ni(k))−σi(k)+αi(k)+ξi+1(k+n−1)−ai−1(k)+Ψi||Si(ϵi(k))||−Ψi||Si(ϵi(k))||]. | (3.23) |
The same as the previous process, the virtual controller is designed as
αi(k)=−ˆΨi(k)||Si(ϵi(k))||+αi−1(k) | (3.24) |
where ˆWi(k) is the estimation of Wi and ˆΨ(k)=||ˆWi(k)||.
Substituting (3.24) into (3.23), ξi(k+n−i+1) expresses
ξi(k+n−i+1)=ϕi(ˉxn(k+n−i))[−WTiSi(Ni(k))−σi(k)+ξi+1(k+n−1)−˜Ψi(k)||Si(ϵi(k))||−Ψi||Si(ϵi(k))||]. | (3.25) |
Resembling Step (3.15), (3.25) can be further described as
ξi(k+1)=ϕi(ˉxn(k))[−WTiSi(Ni(ki))−σi(ki)+ξi+1(k)−˜Ψi(ki)||Si(ϵi(ki))||−Ψi||Si(ϵi(ki))||] | (3.26) |
where ki=k−n+i.
Let the prediction error Ei(k)=ˆΨi(k)||Si(ϵi(ki))||+ˆM(k). According to Ei(k), the cost function is described in its quadratic form βi(k)=(1/2)(Ei(k))2. The gradient of ˆΨi is obtained according to the derivation:
ΔˆΨi(k)=∂βi(k)∂ˆΨi(ki)=||Si(ϵi(ki))||[ˆΨi(k)||Si(ϵi(ki)||+ˆM(k)]. | (3.27) |
On the basis of the MGR algorithm definition, the multigradient is expressed as
g(ι,βi(k))=ι∑j=1||Si(ϵi(ki−j+1))||[ωi(ki−j+1)+ωM(k−j+1)]. | (3.28) |
The updated law of ˆΨi(k) is deduced according to (3.28):
ˆΨi(k+1)=ˆΨi(ki)−μig(ι,βi(k))=ˆΨi(ki)−μiι∑j=1||Si(ϵi(ki−j+1))||[ωi(k1−j+1)+ωM(k−j+1)] | (3.29) |
where μi is the i step learning rate.
Step n: The tracking error in the n-th subsystem can be described as ξn(k+1)=xn(k+1)−αn−1(k). Substitute (2.2) and (2.6) into the n-th tracking error equation:
ξn(k+1)=φn(ˉxn(k))+ϕn(ˉxn(k))(ψ(k)(b(k)v(k)+f(k))+δ(k))+d(k)−αn−1(k). | (3.30) |
For the purpose of simplifying 3.30, π(k) is defined
π(k)=−1ϕn(ˉxn(k))ψ(k)b(k)(φn(ˉxn(k))−αn−1(k)). | (3.31) |
Using the theory of the RBF NN to approximate (3.31), one gets
π(k)=WTnSn(Nn(k))+σn(k) | (3.32) |
where the definitions of Wn and σn(k) are the same as those for the steps from 1 to n−1 and Nn(k)=[ˉxn(k),xd(k+n)]T.
Combining (3.32) and (3.29) we derive
ξn(k+1)=ϕn(ˉxn(k))ψ(k)b(k)×(v(k)+f(k)b(k)+δ(k)ψ(k)b(k)−π(k))+d(k). | (3.33) |
From (3.33) we can acquire the dynamics of the actuator fault and dead-zone shown in (2.3) and (2.6), separately. It is easy to deduce that they have the following properties
f(k)b(k)≤ˉfb_=τδ(k)ψ(k)b(k)≤ˉδψ_b_=ϑ | (3.34) |
where ϑ and τ are both unknown parameters. Define the estimation of both parameters as ˆϑ and ˆτ; so the estimation error ˜ϑ and ˜τ are given by
˜ϑ(k)=ˆϑ(k)−ϑ˜τ(k)=ˆτ(k)−τ. | (3.35) |
Based on estimation errors of (3.35), the actual controller is designed as
v(k)=ˆΨn(kn)||Sn(Nn(kn))||+ˆτ(k)+ˆϑ(k) | (3.36) |
where ˆΨn(k)=||ˆWn(k)|| and ˆWn(k) stands for the estimation of Wn. The time index kn=k.
With (3.35) and (3.36), the estimation fault (3.33) can be further written as
ξn(k+1)=ϕn(ˉxn(k))ψ(k)b(kn)˜Ψn(k)||Sn(Nn(kn))||ϕn(ˉxn(k))ψ(k)b(k)×(˜ϑ(k)+ϑ+˜τ(k)+τ+f(k)b(k)+δ(k)ψ(k)b(k))+ϕn(ˉxn(k))ψ(k)b(k)q(kn)+d(k) | (3.37) |
where ˜Ψn(k)=ˆΨn(k)−Ψn and q(k)=Ψn||Sn(Nn(kn))||−WTnSn(Nn(k))−σn(k). Similar to Step i, the strategic utility function is defined
En(k)=ˆΨn(kn)||Sn(Nn(kn))||+ˆM(k). | (3.38) |
Further define the cost function as βn(k)=(1/2)(En(k))2; the gradient of ˆΨn(k) yields
ΔˆΨn(k)=∂βn(k)∂ˆΨn(kn)=||Sn(Nn(kn))||[ˆΨn(k)||Sn(Nn(kn))||+ˆM(k)]. | (3.39) |
The multigradient yields
g(ι,βn(k))=ι∑j=1||Sn(Nn(kn−j+1))||[ωn(kn−j+1)+ωM(k−j+1)] | (3.40) |
where ωn(kn−j+1)=ˆΨn(kn)||Sn(Nn(kn−j+1))||.
The weight updated law for ˆΨn(k) can be further obtained as
ˆΨn(k+1)=ˆΨn(kn)−g(ι,βn(k))=ˆΨn(kn)−μnι∑j=1||Sn(Nn(kn−j+1))||[ωn(kn−j+1)+ωM(k−j+1)] | (3.41) |
where μn is a chosen positive learning rate. Let
Eϑ(k)=ˆϑ(k)+ˆM(k)Eτ(k)=ˆτ(k)+ˆM(k). | (3.42) |
The cost functions of two auxiliary signals are chosen to have a form that is the same as βi(k) form
βϑ(k)=(1/2)(Eϑ(k))2βτ(k)=(1/2)(Eτ(k))2. | (3.43) |
The gradients are deduced
Δˆϑ(k)=∂βϑ(k)∂ˆϑ(k)=ˆϑ(k)+ˆM(k)Δˆτ(k)=∂βτ(k)∂ˆτ(k)=ˆτ(k)+ˆM(k). | (3.44) |
Two MGR updated laws are obtained
ˆϑ(k+1)=ˆϑ(k)−μϑι∑j=1(ˆϑ(k−j+1)+ˆM(k−j+1))ˆτ(k+1)=ˆτ(k)−μτι∑j=1(ˆτ(k−j+1)+ˆM(k−j+1)) | (3.45) |
where μϑ and μτ are positive learning factors.
Figure 3 shows the structure of the proposed control strategy. The analysis of the stability and tracking performance are mentioned in the following theorem.
Theorem 1: Consider the nonstrict feedback nonlinear system (2.2). The adaptive RL strategy includes the updated laws (3.8), (3.19), (3.29) and (3.41), the virtual controllers (3.13) and (3.24) and the actual controller (3.36). If the selected parameters that
0<μM<1/(lιa2), 0<μi<1/lι, 0<μϑ<1/ι, 0<μτ<1/ι | (3.46) |
and Assumption 1 holds, our control strategy can ensure that all signals are SGUUB and the tracking error is tolerated. The proof of Theorem 1 is shown in the appendix.
In this section, some simulation results are presented to illustrate the effectiveness of the proposed approach.
The nonstrict-feedback nonlinear discrete time system is chosen as
{x1(k+1)=φ1(ˉx2(k))+ϕ1(ˉx2(k))x2(k)x2(k+1)=φ2(ˉx2(k))+ϕ2(ˉx2(k))U(k)+d(k)y(k)=x1(k) | (4.1) |
where x1(k) and x2(k) are the states, U(k) is the input and y(k) is the output. The functions φ1(ˉxn(k)) and φ2(ˉxn(k)) are chosen as x1(k) and x2(k), respectively. We choose ϕ1(ˉxn(k)) and ϕ1(ˉxn(k)) as [−x1(k)+0.019(1.5−x1(k))exp(4x2(k)/(3.4+x2(k)))]/20 and [−x2(k)+3.1(0.4−x1(k))exp(1.5x2(k)/(3.4+x2(k)))−4(x2(k)−U(k))], respectively. The external disturbance d(k) is chosen as 0.1cos(0.05k)cos(x1(k)). The desired signal xd(k) can be described as xd(k)=0.013sin(π/8+0.6kπ/38).
The parameters can be selected as follows: Ψ2(0)=0.001, μM=0.005, μ1=0.008, μ2=0.006, μτ=0.05, μϑ=0.05, a=0.00001, ˆτ(0)=0.2 and ˆϑ(0)=0.3. The hidden layer node numbers of the ACNNs have been set as lW1=10, lW2=10 and lWM=10. The pace of the multigradient is chosen as ι=10.
Figures 4 and 5 illustrate the trajectories of signals and the tracking error when br=0.5, bl=−0.5, fr=0.3, and fl=−0.3, respectively. The control output achieved precise tracking of the reference signal and the tracking error was close to zero (almost 0.001). Figures 6 and 7 illustrate the trajectories of signals and the tracking error when br=0.4, bl=−0.4, fr=0.25 and fl=−0.25, respectively. The tracking performance in this scenario was also great (the tracking error was almost 0.002). According to our results, it is true that our fault-tolerant approach can offset the influence of dead zones with different parameters.
Figure 8 describes the trajectories of the ANN weight, the CNN weight, and the control input. We make a comparison between the proposed scheme and the MLP-based strategy. According to the results, it is clear that the proposed scheme achieved a faster convergence rate with the weight parameters than the ordinary MLP scheme. Figures 9 and 10 show the tracking trajectory and the tracking error without utilizing two auxiliary systems. Affected by the input dead zone and the actuator fault, tracking became extremely inaccurate and the tracking error was up to 0.01. Comparing Figures 4 and 9, it is obvious that our the approach can successfully offset the influence of the input dead zone and the actuator fault.
To verify our novel updated algorithm can reduce the computational burden, an experiment was conducted as follows. We consider that the longer the computation the greater the computational burden. The total sample number was 1000. All of results were derived in the same environment by a computer with a 3.6 GHz CPU and 16 GB RAM.
From Table 1, we can derive that the computational time of the ordinary gradient descent algorithm is 0.5445 s but the proposed updated algorithm only needs 0.2862 s. The computational time is reduced by 47.44%. Combined with the previous analysis, our approach not only alleviates the computational burden of the MGR algorithm but it also achieves a faster convergence rate for the weight parameters.
approach | computational time/s |
Gradient descent algorithm in [17] | 0.5445 |
The proposed approach | 0.2862 |
The aim of this paper was to build up a fault-tolerant controller for a class of nonstrict feedback systems with input dead zone. We proposed a novel neural network-updated algorithm to achieve a faster computational speed and eliminate the local optimal problem. Two auxiliary parameters were presented to offset the influence of the dead zone and actuator fault. To eliminate the occurrence of the algebraic loop problem, an auxiliary term was introduced to overcome this difficulty. According to the Lyapunov theory, all signals in the closed-loop system were proven to be SGUUB and the tracking error converged to the neighborhood of zero. Finally, some simulation results were presented to illustrate the effectiveness of our approach.
There are still some other difficulties in the control area; for instance, the tracking control for stochastic systems via RL will be our future topic based on our current investigation.
This work was supported by the Natural Science Foundation Project of Chongqing under Grant cstc2019jcyj-msxmX036, and also in part by the open funding project of the Guangdong Provincial Key Laboratory of Intelligent Decision and Coordination Control under Grant F2021098.
The authors declare that they have no conflict of interest.
Step 1: Defining θξ1, θ˜Ψ1 and θM as positive constants. The Lyapunov function is chosen as
V1(k)=V11(k)+V12(k)+V13(k)+V14(k) | (A.1) |
where V11(k)=(θξ1/4)ξ21(k), V12(k)=(θ˜Ψ1/μ1)∑n−1s=0˜Ψ1(k1+i)2, V13(k)=(θM/μM)˜ΨM(k)2 and V14(k)=2θM∑ιj=1[˜ΨM(k)||SM(k−j)||]2.
The Cauchy-Schwarz inequality is expressed as
(a1+a2+,...,+an)2≤n(a21+a22+,...,+a2n). | (A.2) |
Young's inequality is also given below
˜aT˜b≤(1/2)˜aT˜a+(1/2)˜bT˜b | (A.3) |
where ˜a and ˜b are arbitrary vectors.
By utilizing the inequality (A.3) and property that 0<S(N)TS(N)<l, the two terms shown in (3.12) have the following quality
−WTiSi(Ni(ki))−Ψi||Si(ϵi(ki))||≤|WTiSi(Ni(ki))|+Ψi||Si(ϵi(ki))||≤12WTiWi+12Si(Ni(ki))TSi(Ni(ki))+12Ψ2i+12||Si(ϵi(ki))||2≤ˉΨ2i+l | (A.4) |
where i=1,...,n−1.
According to (A.2) and (3.15), the first order difference of V11(k) is described as
ΔV11(k)≤θξ1ˉϕ21(˜Ψ1(k1)||S1(ϵ1(k1))||)2+θξ1ˉϕ21(ˉΨ21+l)2+θξ1ˉϕ21ξ2(k)2+θξ1ˉϕ21ˉσ21−14θξ1ξ21(k). | (A.5) |
Based on (A.2) and (3.19), the first order difference of ΔV12(k) can be derived
ΔV12(k)≤−θΨ1(1−ιlμ1)ι∑j=1[ω1(k1−j+1)+ωM(k−j+1)]2+2θΨ1ι(ˉΨ1+ˉΨM)2+2θΨ1ι∑j=1[˜ΨM(k1)||SM(k−j+1)||]2−θΨ1ι∑j=1[˜Ψ1(k1)||S1(ϵ1(k1−j+1)||]2. | (A.6) |
Similar to the processes in (A.6), the first order difference of ΔV13(k) is calculated as
ΔV13(k)≤−θM(1−ιlμMa2)ι∑j=1[aωM(k−j+1)+ρ(k−j+1)−ωM(k−1)]2−θMa2ι∑j=1[˜ΨM(k1)||SM(k−j+1)||]2+2θMι∑j=1[˜ΨM(k1)||SM(k−j)||]2+2θMι[ˉΨM(1+a)+1]2. | (A.7) |
Then, consider ΔV14(k); we can obtain
ΔV14(k)=2θMι∑j=1[˜ΨM(k)||SM(k−j+1)||]2−2θMι∑j=1[˜ΨM(k)||SM(k−j)||]2. | (A.8) |
Combining (A.5)–(A.8), the first-order difference of ΔV1(k) is derived
ΔV1(k)≤−θΨ1(1−ιlμ1)ι∑j=1[ω1(k1−j+1)+ωM(k−j+1)]2−θM(1−ιlμMa2)ι∑j=1[aωM(k−j+1)+ρ(k−j+1)−ωM(k−1)]2−(θΨ1−θξ1ˉϕ21)[˜Ψ1(k1)||S1(ϵ1(k1))||]2+θΨ1ˉϕ21ξ22(k)−(θMa2−2θΨ1−2θM)ι∑j=1[˜ΨM(k)||SM(k−j+1)||]2−θΨ1ι∑j=2[˜Ψ1(k1)||S1(ϵ1(k1−j+1))||]2−θξ14ξ21(k)+B1 | (A.9) |
where B1=2θΨ1ι(ˉW1+ˉWM)2+θξ1ˉΨ21ˉσ21+θξ1ˉϕ21(ˉΨ21+l)2+2θMι(ˉΨM(1+a)+1)2.
Step i: The Lyapunov function in Steps 2 to n−1 is designed as
Vi(k)=Vi1(k)+Vi2(k) | (A.10) |
where Vi1(k)=(θξi/4)ξ2i(k), Vi2(k)=(θΨi/μi)∑n−is=0˜Ψi(ki+s)2 and θξi and θΨi are both positive constants.
According to (3.26), the first order difference of Vi1 can be deduced
ΔVi1(k)≤θξiˉϕ2i[˜Ψi(ki)||Si(ϵi(ki))||]2+θξiˉϕ2iξ2i+1+θξiˉϕ2iˉσ2i+θξiˉϕ2i(ˉΨ2i+l)2−14θξiξi(k)2. | (A.11) |
Similar to (A.6), one deduces ΔVi2(k) as
ΔVi2(k)≤−θΨi(1−ιlμi)ι∑j=1[ωi(ki−j+1)+ωM(k−j+1)]2+2θΨiι(ˉΨi+ˉΨM)2+2θΨiι∑j=1[˜ΨM(k)||SM(k−j+1)||]2−θΨiι∑j=1[˜Ψi(ki)||Si(ϵi(ki−j+1)||]2. | (A.12) |
Combining (A.11) and (A.12), ΔVi(k) is derived, which shows that
ΔVi(k)≤−θΨi(1−ιlμi)ι∑j=1[ωi(ki−j+1)+ωM(k−j+1)]2−14θξi(k)ξi(k)2+θξiˉϕ2iξ2i+1+2θΨiι∑j=1[˜ΨM(k)||SM(k−j+1)||]2−(θΨi−θξiˉϕ2i)[˜Ψi(ki)||Si(ϵi(ki)||]2−θΨiι∑j=2[˜Ψi(ki)||Si(ϵi(ki−j+1)||]2+Bi | (A.13) |
where Bi=θξiˉϕ2i(ˉΨ2i+l)2+2θΨiι(ˉΨi+ˉΨM)2+θξiˉϕ2iˉσ2i.
Step n: The Lyapunov function in the n-th step is
Vn(k)=Vn1(k)+Vn2(k)+Vn3(k)+Vn4(k) | (A.14) |
where Vn1(k)=(θξn/2)ξ2n(k), Vn2(k)=(θΨn/μn)˜Ψ2n(k), Vn3(k)=(θϑ/μϑ)˜ϑ2(k) and Vn4(k)=(θτ/μτ)˜τ2(k).
Using the above inequality, (3.37) follows
ΔVn1(k)≤163θξnˉϕ2nˉψ2ˉb2[2τ2+˜τ(k)2+2ϑ2+˜ϑ(k)2+(˜Ψn(k)||Sn(Nn(kn))||)2+ˉq]+23θξnˉd2−(θξn/3)ξn(k)2 | (A.15) |
where q(kn)2<(ˉσn+2l1/2ˉΨn)=ˉq.
Based on (3.41) and (A.2), ΔVn2(k) is given as
ΔVn2(k)≤−θΨn(1−ιlμn)ι∑j=1[ωn(kn−j+1)+ωM(k−j+1)]2+2θΨnι(ˉΨn+ˉΨM)2+2θΨnι∑j=1[˜ΨM(k)||SM(k−j+1)||]2−θΨnι∑j=1[˜Ψn(kn)||Sn(ϵn(kn−j+1)||]2. | (A.16) |
Obviously, similar to (A.6), the first order difference of Vn3(k) and Vn4(k) is:
ΔVn3(k)≤−θϑ(1−μϑι)ι∑j=1[ˆϑ(k−j+1)+ωM(k−j+1)]2−θϑι∑j=1˜ϑ(k−j+1)2+2θϑι∑j=1[˜ΨM(k)||SM(k−j+1)||]2+2θϑι(ϑ+ˉΨM)2ΔVn4(k)≤−θτ(1−μτι)ι∑j=1[ˆτ(k−j+1)+ωM(k−j+1)]2−θτι∑j=1˜τ(k−j+1)2+2θτι∑j=1[˜ΨM(k)||SM(k−j+1)||]2+2θτι(τ+ˉΨM)2. | (A.17) |
Combining (A.15)–(A.17), one has
ΔVn(k)=Vn1(k)+Vn2(k)+Vn3(k)+Vn4(k)≤−θΨn(1−ιlμn)ι∑j=1[ωn(kn−j+1)+ωM(k−j+1)]2−θϑ(1−μϑι)ι∑j=1[ˆϑ(k−j+1)+ωM(k−j+1)]2−θτ(1−μτι)ι∑j=1[ˆτ(k−j+1)+ωM(k−j+1)]2−(θΨn−163θξnˉϕ2nˉψ2ˉb2)[˜Ψn(k)||Sn(ϵn(kn))||]2−θξnι∑j=2[˜Ψn(k)||Sn(ϵn(kn−j+1))||]2−(θξn/3)ξn(k)2+(2θΨn+2θϑ+2θτ)[˜ΨM(k)||SM(k−j+1)||]2−(θϑ−163θξnˉϕ2nˉψ2ˉb2)˜ϑ(k)2−(θτ−163θξnˉϕ2nˉψ2ˉb2)˜τ(k)2−θϑι∑j=2˜ϑ(k−j+1)2−θτι∑j=2˜τ(k−j+1)2+Bn | (A.18) |
where Bn=163θξnˉϕ2nˉψ2ˉb2(2τ2+2ϑ2+ˉq)+23θξnˉd2+2θτι(τ+ˉΨM)2+2θΨnι(ˉΨn+ˉΨM)2+2θϑι(ϑ+ˉΨM)2.
Combining the Lyapunov function from Step 1 to Step n, we can obtain
V(k)=n∑i=1Vi(k)=n∑i=1θΨiμin−i∑s=0˜Ψi(ki+s)2+n−1∑i=114θξiξ2i(k)+13θξnξ2n(k)+θMμM˜ΨM(k)2+2θMι∑j=1ΨM(k−j)2+(θϑ/μϑ)˜ϑ(k)2+(θτ/μτ)˜τ(k)2. | (A.19) |
Combining (A.9), (A.13) and (A.18), we finially get
ΔV(k)≤−(θξi3−θξ(n−1)ˉϕ2n−1)ξn(k)2−n∑i=1θΨi(1−ιlμi)ι∑j=1[ωi(ki−j+1)+ωM(k−j+1)]2−θϑ(1−μϑι)ι∑j=1(ˆϑ(k−j+1)−ωM(k−j+1)]−θτ(1−μτι)ι∑j=1[ˆτ(k−j+1)+ωM(k−j+1)]−θM(1−ιlμMa2)ι∑j=1[aωM(k−j+1)+ρ(k−j+1)−ωM(k−1)]2−(θϑ−163θξnˉϕ2nˉψ2ˉb2)˜ϑ(k)2−(θτ−163θξnˉϕ2nˉψ2ˉb2)˜τ(k)2−(θMa2−2θM−2n∑i=1θΨi−2θϑ−2θτ)ι∑j=1[˜ΨM||SM(k−j+1)||]2−n−1∑i=2(θξi4−θξ(i−1)ˉϕ2i−1)ξi(k)2−14θξ1ξ1(k)2−n∑i=1θΨiι∑j=2[˜Ψi(ki))||Si(ϵi(ki−j+1))||]2−n−1∑i=1(θΨi−θΨiˉϕ2i)[˜Ψi(ki)||Si(ϵi(ki))||]2−(θΨn−163θξnˉϕ2nˉψ2ˉb2)[˜Ψn(kn)||Sn(ϵn(kn))||]2+B | (A.20) |
where B=∑ni=1Bi.
Selecting the parameters as 0<μM<1/(lιa2),0<μi<1/lι,0<μϑ<1/ι and 0<μτ<1/ι. Thus, the first order difference of V(k) can be simplified as
ΔV(k)≤−(θϑ−163θξnˉϕ2nˉψ2ˉb2)˜ϑ(k)2−(θτ−163θξnˉϕ2nˉψ2ˉb2)˜τ(k)2−(θMa2−2θM−2n∑i=1θΨi−2θϑ−2θτ)ι∑j=1[˜ΨM||SM(k−j+1)||]2−14θξ1ξ1(k)2+B−n−1∑i=2(θξi4−θξ(i−1)ˉϕ2i−1)ξi(k)2−(θξi3−θξ(n−1)ˉϕ2n−1)ξn(k)2−n∑i=1θΨiι∑j=2[˜Ψi(ki))||Si(ϵi(ki−j+1))||]2−n−1∑i=1(θΨi−θΨiˉϕ2i)[˜Ψi(ki)||Si(ϵi(ki))||]2−(θΨn−163θξnˉϕ2nˉψ2ˉb2)[˜Ψn(kn)||Sn(ϵn(kn))||]2. | (A.21) |
In this paper, the parameters θΨi, θΦn, θξi, θξn, θM, θϑ and θτ are respectively designed as θϑ>163θξnˉϕ2nˉψ2ˉb2, θτ>163θξnˉϕ2nˉψ2ˉb2, θM>1a2(2θM+2∑ni=1θΨi+2θϑ+2θτ), θΨi>θΨiˉϕ2i, θΨn>163θξnˉϕ2nˉψ2ˉb2, θξi>4θξ(i−1)ˉϕ2i−1 and θξn>3θξ(n−1)ˉϕ2n−1. On the basis of the method, we have that ΔV(k)<0 if the following inequalities hold
|˜ϑ(k)|>√B√θϑ−163θξnˉϕ2nˉψ2ˉb2,|˜τ(k)|>√B√θτ−163θξnˉϕ2nˉψ2ˉb2,|ι∑j=1(˜ΨM||SM(k−j+1)||)|>√B√θMa2−2θM−2n∑i=1θΨi−2θϑ−2θτ,|ξ1(k)|>√B√14θξ1,|ξi(k)|>√B√θξi4−θξ(i−1)ˉϕ2i−1,|ξn(k)|>√B√θξi3−θξ(n−1)ˉϕ2n−1,|ι∑j=2(˜Ψi(ki))||Si(ϵi(ki−j+1))|||>√B√θΨi,|˜Ψi(ki)||Si(ϵi(ki))|||>√B√θΨi−θΨiˉϕ2i,|˜Ψn(kn)||Sn(ϵn(kn))|||>√B√θΨn−163θξnˉϕ2nˉψ2ˉb2. | (A.22) |
In this way, all signals in the closed-loop system are proven to be SGUUB.
[1] |
J. B. Du, W. J. Cheng, G. Y. Lu, H. T. Gao, X. L. Chu, Z. C. Zhang, et al., Resource pricing and allocation in MEC enabled blockchain systems: An A3C deep reinforcement learning approach, IEEE Trans. Network Sci. Eng., 9 (2022), 33–44. https://10.1109/TNSE.2021.3068340 doi: 10.1109/TNSE.2021.3068340
![]() |
[2] |
H. X. Peng, X. M. Shen, Deep reinforcement learning based resource management for multi-access edge computing in vehicular networks, IEEE Trans. Network Sci. Eng., 7 (2021), 2416–2428. https://10.1109/TNSE.2020.2978856 doi: 10.1109/TNSE.2020.2978856
![]() |
[3] |
D. C. Chen, X. L. Liu, W. W. Yu, Finite-time fuzzy adaptive consensus for heterogeneous nonlinear multi-agent systems, IEEE Trans. Network Sci. Eng., 7 (2021), 3057–3066. https://10.1109/TNSE.2020.3013528 doi: 10.1109/TNSE.2020.3013528
![]() |
[4] |
J. Wang, Q. Wang, H. Wu, T. Huang, Finite-time consensus and finite-time H∞ consensus of multi-agent systems under directed topology, IEEE Trans. Network Sci. Eng., 7 (2020), 1619–1632. https://10.1109/TNSE.2019.2943023 doi: 10.1109/TNSE.2019.2943023
![]() |
[5] |
T. Gao, T. Li, Y. J. Liu, S. Tong, IBLF-based adaptive neural control of state-constrained uncertain stochastic nonlinear systems, IEEE Trans. Neural Networks Learn. Syst., 33 (2022), 7345–7356. https://10.1109/TNNLS.2021.3084820 doi: 10.1109/TNNLS.2021.3084820
![]() |
[6] |
T. T. Gao, Y. J. Liu, D. P. Li, S. C. Tong, T. S. Li, Adaptive neural control using tangent time-varying BLFs for a class of uncertain stochastic nonlinear systems with full state constraints, IEEE Trans. Cybern., 51 (2021), 1943–1953. https://10.1109/TCYB.2019.2906118 doi: 10.1109/TCYB.2019.2906118
![]() |
[7] | P. J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences, Ph. D. dissertation, Harvard University, 1974. |
[8] |
Y. Tang, D. D. Zhang, P. Shi, W. B. Zhang, F. Qian, Event-based formation control for nonlinear multiagent systems under DOS attacks, IEEE Trans. Autom. Control., 66 (2021) 452–459. https://10.1109/TAC.2020.2979936 doi: 10.1109/TAC.2020.2979936
![]() |
[9] |
Y. Tang, X. T. Wu, P. Shi, F. Qian, Input-to-state stability for nonlinear systems with stochastic impulses, Automatica, 113 (2020), 0005–1098. https://doi.org/10.1016/j.automatica.2019.108766 doi: 10.1016/j.automatica.2019.108766
![]() |
[10] |
X. T. Wu, Y. Tang, J. D. Cao, X. R. Mao, Stability analysis for continuous-time switched systems with stochastic switching signals, IEEE Trans. Autom. Control., 63 (2018), 3083–3090. https://10.1109/TAC.2017.2779882 doi: 10.1109/TAC.2017.2779882
![]() |
[11] |
B. Kiumarsi, K. G. Vamvoudakis, H. Modares, F. L. Lewis, Optimal and autonomous control using reinforcement learning: A survey, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 2042–2062. https://10.1109/TNNLS.2017.2773458 doi: 10.1109/TNNLS.2017.2773458
![]() |
[12] |
V. Narayanan, S. Jagannathan, Event-triggered distributed control of nonlinear interconnected systems using online reinforcement learning with exploration, IEEE Trans. Cybern., 48 (2018), 2510–2519. https://10.1109/TCYB.2017.2741342 doi: 10.1109/TCYB.2017.2741342
![]() |
[13] |
B. Luo, H. N. Wu, T. Huang, Off-policy reinforcement learning for H∞ control design, IEEE Trans. Cybern., 45 (2015), 65–76. https://10.1109/TCYB.2014.2319577 doi: 10.1109/TCYB.2014.2319577
![]() |
[14] |
R. Song, F. L. Lewis, Q. Wei, Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero sum games, IEEE Trans. Neural Networks Learn. Syst., 28 (2017), 704–713. https://10.1109/TNNLS.2016.2582849 doi: 10.1109/TNNLS.2016.2582849
![]() |
[15] |
X. Yang, D. Liu, B. Luo, C. Li, Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning, Inf. Sci., 369 (2016), 731–747. https://doi.org/10.1016/j.ins.2016.07.051 doi: 10.1016/j.ins.2016.07.051
![]() |
[16] |
H. Zhang, K. Zhang, Y. Cai, J. Han, Adaptive fuzzy fault-tolerant tracking control for partially unknown systems with actuator faults via integral reinforcement learning method, IEEE Trans. Fuzzy Syst., 27 (2019), 1986–1998. https://10.1109/TFUZZ.2019.2893211 doi: 10.1109/TFUZZ.2019.2893211
![]() |
[17] |
W. Bai, Q. Zhou, T. Li, H. Li, Adaptive reinforcement learning neural network control for uncertain nonlinear system with input saturation, IEEE Trans. Cybern., 50 (2020), 3433–3443. https://10.1109/TCYB.2019.2921057 doi: 10.1109/TCYB.2019.2921057
![]() |
[18] |
Y. Li, S. Tong, Adaptive neural networks decentralized FTC design for nonstrict-feedback nonlinear interconnected large-scale systems against actuator faults, IEEE Trans. Neural Networks Learn. Syst., 28 (2017), 2541–2554. https://10.1109/TNNLS.2016.2598580 doi: 10.1109/TNNLS.2016.2598580
![]() |
[19] |
Q. Chen, H. Shi, M. Sun, Echo state network-based backstepping adaptive iterative learning control for strict-feedback systems: An error-tracking approach, IEEE Trans. Cybern., 50 (2020), 3009–3022. https://10.1109/TCYB.2019.2931877 doi: 10.1109/TCYB.2019.2931877
![]() |
[20] |
S. Tong, Y. Li, S. Sui, Adaptive fuzzy tracking control design for SISO uncertain nonstrict feedback nonlinear systems, IEEE Trans. Fuzzy Syst., 24 (2016), 1441–1454. https://10.1109/TFUZZ.2016.2540058 doi: 10.1109/TFUZZ.2016.2540058
![]() |
[21] |
W. Bai, T. Li, S. Tong, NN reinforcement learning adaptive control for a class of nonstrict-feedback discrete-time systems, IEEE Trans. Cybern., 50 (2020), 4573–4584. https://10.1109/TCYB.2020.2963849 doi: 10.1109/TCYB.2020.2963849
![]() |
[22] |
Y. Li, K. Sun, S. Tong, Observer-based adaptive fuzzy fault-tolerant optimal control for SISO nonlinear systems, IEEE Trans. Cybern., 49 (2019), 649–661. https://10.1109/TCYB.2017.2785801 doi: 10.1109/TCYB.2017.2785801
![]() |
[23] |
H. Modares, F. L. Lewis, M. B. Naghibi-Sistani, Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems, Automatica, 50 (2014), 193–202. https://doi.org/10.1016/j.automatica.2013.09.043 doi: 10.1016/j.automatica.2013.09.043
![]() |
[24] |
Z. Wang, L. Liu, Y. Wu, H. Zhang, Optimal fault-tolerant control for discrete-time nonlinear strict-feedback systems based on adaptive critic design, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 2179–2191. https://10.1109/TNNLS.2018.2810138 doi: 10.1109/TNNLS.2018.2810138
![]() |
[25] |
H. Li, Y. Wu, M. Chen, Adaptive fault-tolerant tracking control for discrete-time multiagent systems via reinforcement learning algorithm, IEEE Trans. Cybern., 51 (2021), 1163–1174. https://10.1109/TCYB.2020.2982168 doi: 10.1109/TCYB.2020.2982168
![]() |
[26] |
Y. J. Liu, L. Tang, S. Tong, C. L. P. Chen, D. J. Li, Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems, IEEE Trans. Neural Networks Learn. Syst., 26 (2015), 165–176. https://10.1109/TNNLS.2014.2360724 doi: 10.1109/TNNLS.2014.2360724
![]() |
[27] |
W. Bai, T. Li, Y. Long, C. L. P. Chen, Event-triggered multigradient recursive reinforcement learning tracking control for multiagent systems, IEEE Trans. Neural Networks Learn. Syst., 34 (2023), 355–379. https://10.1109/TNNLS.2021.3094901 doi: 10.1109/TNNLS.2021.3094901
![]() |
[28] |
H. Wang, G. H. Yang, A finite frequency domain approach to fault detection for linear discrete-time systems, Int. J. Control., 81 (2008), 1162–1171. https://doi.org/10.1080/00207170701691513 doi: 10.1080/00207170701691513
![]() |
[29] |
C. Tan, G. Tao, R. Qi, A discrete-time parameter estimation based adaptive actuator failure compensation control scheme, Int. J. Control., 86 (2013), 276–289. https://doi.org/10.1080/00207179.2012.723828 doi: 10.1080/00207179.2012.723828
![]() |
[30] |
J. Na, X. Ren, G. Herrmann, Z. Qiao, Adaptive neural dynamic surface control for servo systems with unknown dead-zone, Control Eng. Pract., 19 (2011), 1328–1343. https://doi.org/10.1016/j.conengprac.2011.07.005 doi: 10.1016/j.conengprac.2011.07.005
![]() |
[31] |
Y. J. Liu, S. Li, S. Tong, C. L. P. Chen, Adaptive reinforcement learning control based on neural approximation for nonlinear discrete-time systems with unknown nonaffine dead-zone input, IEEE Trans. Neural Networks Learn. Syst., 30 (2019), 295–305. https://10.1109/TNNLS.2018.2844165 doi: 10.1109/TNNLS.2018.2844165
![]() |
[32] |
S. S. Ge, J. Zhang, T. H. Lee, Adaptive neural network control for a class of MIMO nonlinear systems with disturbances in discrete time, IEEE Trans. Syst., Man Cybern. B, Cybern., 34 (2004), 1630–1645. https://10.1109/TSMCB.2004.826827 doi: 10.1109/TSMCB.2004.826827
![]() |
[33] |
Y. J. Liu, Y. Gao, S. Tong, Y. Li, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete time systems with dead-zone, IEEE Trans. Fuzzy Syst., 24 (2016), 16–28. https://10.1109/TFUZZ.2015.2418000 doi: 10.1109/TFUZZ.2015.2418000
![]() |
[34] |
S. S. Ge, G. Y. Li, T. H. Lee, Adaptive NN control for a class of strict-feedback discrete-time nonlinear systems, Automatica, 39 (2003), 807–819. https://doi.org/10.1016/S0005-1098(03)00032-3 doi: 10.1016/S0005-1098(03)00032-3
![]() |
[35] |
Q. Yang, S. Jagannathan, Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators, IEEE Trans. Syst., Man, Cybern. B, Cybern., 42 (2012), 377–390. https://10.1109/TSMCB.2011.2166384 doi: 10.1109/TSMCB.2011.2166384
![]() |
[36] |
Y. J. Liu, Y. Gao, S. Tong, Y. Li, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete time systems with dead-zone, IEEE Trans. Fuzzy Syst., 24 (2016), 16–28. https://10.1109/TFUZZ.2015.2418000 doi: 10.1109/TFUZZ.2015.2418000
![]() |
[37] |
S. Ferrari, J. E. Steck, R. Chandramohan, Adaptive feedback control by constrained approximate dynamic programming, IEEE Trans. Syst., Man, Cybern. B, Cybern., 38 (2008), 982–987. https://10.1109/TSMCB.2008.924140 doi: 10.1109/TSMCB.2008.924140
![]() |
[38] |
S. Tong, Y. Li, S. Sui, Adaptive fuzzy tracking control design for SISO uncertain nonstrict feedback nonlinear systems, IEEE Trans. Fuzzy Syst., 24 (2016), 1441–1454. https://10.1109/TFUZZ.2016.2540058 doi: 10.1109/TFUZZ.2016.2540058
![]() |
1. | Xiaohang Su, Peng Liu, Haoran Jiang, Xinyu Yu, Neighbor event-triggered adaptive distributed control for multiagent systems with dead-zone inputs, 2024, 9, 2473-6988, 10031, 10.3934/math.2024491 |
approach | computational time/s |
Gradient descent algorithm in [17] | 0.5445 |
The proposed approach | 0.2862 |
approach | computational time/s |
Gradient descent algorithm in [17] | 0.5445 |
The proposed approach | 0.2862 |