Fault-tolerant control for nonlinear systems with a dead zone: Reinforcement learning approach

Zichen Wang; Xin Wang; Zichen Wang; Xin Wang

doi:10.3934/mbe.2023274

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 4: 6334-6357. doi: 10.3934/mbe.2023274

Previous Article Next Article

Research article Special Issues

Fault-tolerant control for nonlinear systems with a dead zone: Reinforcement learning approach

Zichen Wang ^{1
,
,},
Xin Wang ²

1.
College of Westa, Southwest University, Chongqing 400715, China
2.
College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China

Academic Editor: Danilo Pelusi

Received: 02 December 2022 Revised: 31 December 2022 Accepted: 17 January 2023 Published: 01 February 2023

This paper focuses on the adaptive reinforcement learning-based optimal control problem for standard nonstrict-feedback nonlinear systems with the actuator fault and an unknown dead zone. To simultaneously reduce the computational complexity and eliminate the local optimal problem, a novel neural network weight updated algorithm is presented to replace the classic gradient descent method. By utilizing the backstepping technique, the actor critic-based reinforcement learning control strategy is developed for high-order nonlinear nonstrict-feedback systems. In addition, two auxiliary parameters are presented to deal with the input dead zone and actuator fault respectively. All signals in the system are proven to be semi-globally uniformly ultimately bounded by Lyapunov theory analysis. At the end of the paper, some simulation results are shown to illustrate the remarkable effect of the proposed approach.

Keywords:

Citation: Zichen Wang, Xin Wang. Fault-tolerant control for nonlinear systems with a dead zone: Reinforcement learning approach[J]. Mathematical Biosciences and Engineering, 2023, 20(4): 6334-6357. doi: 10.3934/mbe.2023274

Related Papers:

[1]	Tianqi Yu, Lei Liu, Yan-Jun Liu . Observer-based adaptive fuzzy output feedback control for functional constraint systems with dead-zone input. Mathematical Biosciences and Engineering, 2023, 20(2): 2628-2650. doi: 10.3934/mbe.2023123
[2]	Bin Hang, Beibei Su, Weiwei Deng . Adaptive sliding mode fault-tolerant attitude control for flexible satellites based on T-S fuzzy disturbance modeling. Mathematical Biosciences and Engineering, 2023, 20(7): 12700-12717. doi: 10.3934/mbe.2023566
[3]	Rong Sun, Yuntao Han, Yingying Wang . Design of generalized fault diagnosis observer and active adaptive fault tolerant controller for aircraft control system. Mathematical Biosciences and Engineering, 2022, 19(6): 5591-5609. doi: 10.3934/mbe.2022262
[4]	Xueyan Wang . A fuzzy neural network-based automatic fault diagnosis method for permanent magnet synchronous generators. Mathematical Biosciences and Engineering, 2023, 20(5): 8933-8953. doi: 10.3934/mbe.2023392
[5]	Shixuan Yao, Xiaochen Liu, Yinghui Zhang, Ze Cui . An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning. Mathematical Biosciences and Engineering, 2022, 19(9): 9258-9290. doi: 10.3934/mbe.2022430
[6]	Ihab Haidar, Alain Rapaport, Frédéric Gérard . Effects of spatial structure and diffusion on the performances of the chemostat. Mathematical Biosciences and Engineering, 2011, 8(4): 953-971. doi: 10.3934/mbe.2011.8.953
[7]	Vladimir Djordjevic, Hongfeng Tao, Xiaona Song, Shuping He, Weinan Gao, Vladimir Stojanovic . Data-driven control of hydraulic servo actuator: An event-triggered adaptive dynamic programming approach. Mathematical Biosciences and Engineering, 2023, 20(5): 8561-8582. doi: 10.3934/mbe.2023376
[8]	Hany Bauomy . Safety action over oscillations of a beam excited by moving load via a new active vibration controller. Mathematical Biosciences and Engineering, 2023, 20(3): 5135-5158. doi: 10.3934/mbe.2023238
[9]	Van Dong Nguyen, Dinh Quoc Vo, Van Tu Duong, Huy Hung Nguyen, Tan Tien Nguyen . Reinforcement learning-based optimization of locomotion controller using multiple coupled CPG oscillators for elongated undulating fin propulsion. Mathematical Biosciences and Engineering, 2022, 19(1): 738-758. doi: 10.3934/mbe.2022033
[10]	Xiangfei Meng, Guichen Zhang, Qiang Zhang . Robust adaptive neural network integrated fault-tolerant control for underactuated surface vessels with finite-time convergence and event-triggered inputs. Mathematical Biosciences and Engineering, 2023, 20(2): 2131-2156. doi: 10.3934/mbe.2023099

Abstract

1. Introduction

Optimal control theory originated in the 1960s and has become an important part of automatic control theory primarily owing to its spirit of seeking the optimal solution from all possible control schemes ^[1,2]. Optimal control problems for linear systems can generally be settled by solving Ricatti equations. However, when it comes to nonlinear systems, there are quite a few effective methods since the Hamilton-Jacobi-Bellman (HJB) equation should be addressed. The HJB equation is generally difficult to solve analytically. To overcome this bottleneck, a lot of remarkable approaches have been developed, such as adaptive dynamic ^[3,4], the use of actor-critic neural networks (ACNNs) ^[5,6], policy iteration, and so on. Reinforcement learning (RL), as a method that can solve optimal control problems in nonlinear systems to avoid solving the HJB equation, has received widespread attention in the past decades. In 1974, Werbos first applied the idea of RL to optimal control theory ^[7]. Since then, many outstanding outcomes have been subsequently discovered ^[8,9,10]. At present, RL theory is usually implemented by using ACNNs, where the critic neural network (CNN) provides policy evaluation and the actor neural network (ANN) updates the present policy. The RL algorithm can reduce energy consumption beyond that of other algorithms under the premise of achieving system stability; it became a significant method of modern control theory. In recent years, more and more approaches to RL have been presented in various fields, including online RL ^[11,12], integral RL ^[13,14], off-policy RL ^[15,16], etc.

1.1. Motivation

General RL strategies utilize the gradient descent method to obtain the ideal weights of neural networks, which often fall into the local optimal problem ^[17], resulting in neural network estimation errors that cannot easily meet the requirements. To overcome this bottleneck, Bai et al. proposed the multigradient recursive (MGR) algorithm in ^[17] to obtain the global optimum solution. By updating pseudo-gradients, this state-of-the-art technique can settle down the local optimal problem and accelerate the convergence rate of the neural network weight. However, the MGR algorithm suffers because of a heavy computational burden. Thus, it is necessary to mention the minimal learning parameter (MLP) scheme. It can reduce the number of updated laws without prominently reducing the estimation accuracy. Many studies have been proposed to validate the effectiveness of the MLP. Nevertheless, the aforementioned papers adopt only one of the two algorithms and fail to combine both the MLP and MGR algorithms to combine their advantages.

The optimal control problem of strict-feedback nonlinear systems has been widely studied. However, none of these strategies can be extended into the field of nonstrict-feedback systems ^[18,19]. For this sake, some scholars have proposed their studies to overcome the bottleneck. For example, Tong et al. ^[20], presented a novel fuzzy tracking control design for a nonstrict-feedback SISO system. Bai et al. ^[21] utilized MLP-based RL theory to overcome optimal control problems for a class of nonstrict-feedback systems. However, all of the above results omitted the influence of actuator fault and dead zone input. These are common factors that affect the stability of the system. This negligence can lead to severe damage and must be taken seriously. Thus, many works are presented to offset their influence. However, no one has considered the situation that the dead zones and the actuator fault occur simultaneously. Thus, how to obtain an optimal controller for a nonstrict-feedback system with actuator faults and input dead zone with minimal computation and enough accuracy will be an important task, and it is the motivation for the current investigation.

1.2. Related work

Combining with the back-stepping technique, the authors made a thorough investigation of the tracking control problem of the strict-feedback nonlinear systems ^[14,15,16]. At present, the back-stepping approach is introduced into the analysis for the high-order nonlinear systems. Li et al. ^[22], investigated the optimal control problem of a class of SISO strict-feedback systems via the fuzzy control method. Modares et al. ^[23] have developed an integral RL approach for the strict-feedback system with input constraints. Wang et al. ^[24] proposed an optimal fault-tolerant control strategy for a nonlinear strict-feedback system via adaptive critic design.

Besides, many papers have been proposed to study novel neural network weight updated algorithms. Li et al. ^[25] utilized the MLP technique to overcome the fault-tolerant problem of a class of multiagent systems. Liu et al. ^[26] designed an RL controller by applying a MLP scheme for classic MIMO systems with external disturbance. Bai et al. ^[27] developed an event trigger control scheme for the multiagent system based on the MLP technique.

Furthermore, many scholars are committed to investigating a tolerance strategy for the input dead zone and actuator fault. For instance, Wang and Yang ^[28] studied the fault detection problem for linear systems with disturbance. Tan et al. ^[29] developed a compensation control scheme for a class of discrete-time systems that has actuator failure. In addition, Na et al. ^[30] provided an adaptive dynamic control approach for a system with an unknown dead zone.

1.3. Paper contribution and organization

Based on the above discussion, an RL optimal controller is built in this paper to deal with the fault tolerant control problem for a class of nonstrict-feedback nonlinear systems in discrete time with an unknown dead zone input and actuator fault. To deal with the dead zone and actuator fault issues, we propose two auxiliary systems to offset the influence. The ANN and CNN are utilized to approximate the unknown terms and long-time utility functions, respectively. We propose a novel approach to update the neural networks' weight. The stability of all signals in the closed-loop is rigorously proved and tracking errors are converged to a small compact set. The novelties of this paper can be concluded to be as follows

1) We propose a novel neural networks weigh-updated algorithm to eliminate the local optimal problem and reduce computational burden. Besides, compared with the ordinary gradient descent algorithm ^[11,26], the proposed approach can achieve a faster weight convergence rate.

2) We formulate a modified backstepping method with additional parameters to offset the influence of input dead zone, the actuator fault, and the algebraic loop problem. Besides that, the unified fault-tolerant control algorithms are developed based on the RL strategy.

The organization of this paper is given below. In Section 2, descriptions of the system and radial basis function neural network (RBF NN) theory are given. In Subsection 3.1, the CNN and our novel update law are presented. In Subsection 3.2, the design procedure of the adaptive RL controller is provided. In Section 4, we propose some simulation results to show the contributions of the scheme presented in this paper. The conclusion is provided in Section 5.

2. Problem formulation and preliminaries

2.1. Model description

The dynamics of a standard $n$ -order strict-feedback nonlinear system ^{[31,32,33,34]} can be described as follows:

$\begin{align} \left\{ \begin{array}{lr} x_{i}(k+1) = \varphi_{i}(\bar{x}_{i}(k))+\phi_{i}(\bar{x}_{i}(k)) x_{i+1}(k)\\ x_{n}(k+1) = \varphi_{n}(\bar{x}_{n}(k))+\phi_{n}(\bar{x}_{n}(k)) U(k)+d(k)\\ y(k) = x_{1}(k) \end{array} \right. \end{align}$

(2.1)

where $x_{i}(k)\in R$ for $i = 1, ..., n$ represents the state variable of the system. The notation $\bar{x}_{n}(k) = \left[x_{1}(k), x_{2}(k), ..., x_{n}(k)\right]^\top\in R^n$ denotes the vectors of the states. The notations $U(k) \in R$ and $y(k)\in R$ are the input and output signals, respectively. Notation $d(k)$ stands for the external disturbance. Notations $\varphi_{i}(\cdot)$ and $\phi_{i}(\cdot)$ represent unknown smooth nonlinear functions.

Motivated by the transformation proposed in ^[18,19], the nonstrict-feedback system (2.1) can be further expressed as

$\begin{align} \begin{split} \left\{ \begin{array}{lr} x_{i}(k+n-i+1) = \varphi_{i}(\bar{x}_{n}(k+n-i)) +\phi_{i}(\bar{x}_{n}(k+n-i)) x_{i+1}(k+n-i)\\ x_{n}(k+1) = \varphi_{n}(\bar{x}_{n}(k))+\phi_{n}(\bar{x}_{n}(k)) U(k)+d(k)\\ y(k) = x_{1}(k). \end{array} \right. \end{split} \end{align}$

(2.2)

To proceed smoothly, an assumption is introduced in the following sequel.

Assumption 1: According to the contributions in ^[34,35], the functions $\varphi_{i}(\bar{x}(k))$ and $\phi_{i}(\bar{x}(k))$ satisfy $0 < \underline{\varphi} < \varphi_{i}(\bar{x}(k)) < \bar{\varphi}$ and $0 < \underline{\phi} < \phi_{i}(\bar{x}(k)) < \bar{\phi}$ , where $\bar{\varphi}$ and $\underline{\varphi}$ are the unknown upper bound and the unknown lower bound of $\varphi_{i}(\bar{x}(k))$ and $\bar{\phi}$ and $\underline{\phi}$ are the unknown upper bound and unknown lower bound of $\phi_{i}(\bar{x}(k))$ , respectively. The external disturbance is bounded and satisfies $\vert d(k) \vert \le \bar{d}$ with $\bar{d}$ being an unknown positive constant.

The control signal with the actuator fault and input dead zone can be described as $U(k) = \psi(k)u(k)+\delta(k)$ , where $\psi(k)$ and $\delta(k)$ denote the efficiency factor and the unknown drift fault of the actuator, respectively. We assume that $\psi(k)$ is a positive constant with $\psi(k) < \bar{\psi} < 1$ , where $\bar{\psi}$ is an unknown constant. Further, $\delta(k)$ satisfies $\delta(k) < \bar{\delta}$ with $\bar{\delta}$ being the upper bound. The dead zone can be defined as $u(k) = D(v(k))$ , where $v(k)$ represents the input of the dead zone and $D(\cdot)$ is a function of v(k) which represents the output of the dead zone. According to propositions in ^[31], the dead zone is expressed as

$\begin{align} D(v(k)) = \left\{ \begin{array}{lr} b_{r}(v(k)-f_{r}), \quad &v(k) \ge f_{r}\\ 0, \quad \quad \quad \quad \quad \quad &-f_{l} < v(k) < f_{r}\\ b_{l}(v(k)+f_{l}), \quad &v(k)\le-f_{l} \end{array} \right. \end{align}$

(2.3)

where $b_{r}$ and $b_{l}$ denote the right slope of the dead zone and the left slope of the dead zone, respectively. The notations $f_{r}$ and $f_{l}$ are breakpoints of the input. For the purpose of simplifying the following calculation, the expression $D(v(k))$ can be converted into a new form, as follows

$\begin{align} D(v(k)) = b(k)v(k)+f(k) \end{align}$

(2.4)

where $b(k)$ and $f(k)$ can be described as

$\begin{align} b(k) = \left\{ \begin{array}{lr} b_{r}, \quad &v(k) > 0\\ b_{l}, \quad &v(k)\le0 \end{array}\right.\quad f(k) = \left\{ \begin{array}{lr} -b_{r}f_{r}, &v(k)\ge b_{r}\\ -b(k)v(k), & -b_{r} < v(k) < b_{l}\\ b_{l}f_{l}, &v(k)\le b_{l}. \end{array} \right. \end{align}$

(2.5)

We suppose that $b(k)$ and $f(k)$ satisfy $0 < \underline{b} < |b(k)| < \bar{b}$ and $0 < \underline{f} < |f(k)| < \bar{f}$ , respectively. The control signal $U(k)$ can be reorganized as

$\begin{align} U(k) = \psi(k)\Big(b(k)v(k)+f(k)\Big)+\delta(k). \end{align}$

(2.6)

In this paper, the RL controller is developed for the nonstrict-feedback nonlinear system (2.2), ensuring the semi-globally uniformly ultimately bounded (SGUUB) capability of all signals in the closed-loop system. Based on the ACNNs, the tracking error $\xi_{1}(k)$ is required to converge to the neighborhood of zero it will be specified subsequently.

2.2. RBF NN

Note that, the RBF NN can approximate any smooth nonlinear functions over a compact set. That is to say, considering an unknown nonlinear function $F(N)$ , there exists an RBF NN ${W^*}^\mathit{T}S(N)$ such that $F(N) = {W^*}^\mathit{T}S(N)+\sigma(N)$ , where $W^* = [w_{1}, ..., w_{l}]^\mathit{T}\in R^l$ denotes the ideal weight vector, $l$ represents the node number in the hidden layer and $\sigma(N)$ is the estimation error. Both $W^*$ and $\sigma(N)$ satisfy $||W^*|| < \bar{W}$ and $||\sigma(N)|| < \bar{\sigma}$ with $\bar{W}$ and $\bar{\sigma}$ as unknown upper bounds. The notation $S(N) = [s_{1}(N), ..., s_{l}(N)]^\mathit{T}$ is the vector of the basis function and $s_{i}(N)$ is applied as Gaussian form $s_{i}(N) = \mathrm{exp} \left[-\frac{(N-c_{i})^\mathit{T}(N-c_{i})}{\eta_{i}^2} \right]$ , where $c_{i}$ represents the kernel of the receptive field, $\eta_{i}(k)$ denotes the width of the function. Because $0 < s_{i}(N) < 1$ , we can further derive that $0 < \sum_{i = 1}^l s_{i}(N) s_{i}(N) = S(N)^\mathit{T}S(N) < l$ .

3. Design of adaptive RL controller

3.1. CNN

The utility function ^[35] can be chosen as

$\begin{align} \rho(k) = \left\{ \begin{array}{lr} 0, \quad \quad|\xi_{1}(k)|\ge\varpi\\ 1, \quad \quad|\xi_{1}(k)| < \varpi \end{array} \right. \end{align}$

(3.1)

where $\varpi$ is a positive constant that denotes the threshold value of tracking performance. The tracking error is written as $\xi_{1}(k) = y(k)-x_{d}(k)$ $x_{d}(k)$ indicates the reference signal. The long term strategic utility function ^[21,27] is given by

$\begin{align} M(k) = \rho(k+1)+a\rho(k+2)+a^2\rho(k+3)+, ......, \end{align}$

(3.2)

where $a$ is a predefined positive parameter satisfying $a < 1$ . According to RBF NN theory, the long term utility function $M(k)$ is defined below

$\begin{align} M(K) = W_{M}^\mathit{T}S_{M}(k)+\delta_{M}(k) \end{align}$

(3.3)

where $W_{M}$ and $\delta_{M}(k)$ indicate the ideal weight vector and the error of the approximation, respectively. Let $S_{M}(k)$ be the RBF NN basis function. We define $\hat{M}(k) = \hat{W}_M^\mathit{T}(k)S_{M}(k)$ ; it denotes the estimation of function $M(k)$ with $\hat{W}_{M}(k)$ being the estimation of the ideal weight $W_{M}$ .

On the basis of the MLP scheme, $\hat{M}(k)$ can be written in the form

$\begin{align} \hat{M}(k) = \hat{\Psi}_{M}(k)\Vert S_{M}(k)\Vert \end{align}$

(3.4)

where $\Vert \cdot \Vert$ indicates the Euclidean norm, $\hat{\Psi}_{M}(k) = \Vert \hat{W}_{M}(k) \Vert$ is true for $\hat{\Psi}\le\bar{\Psi}$ and $\bar{\Psi}$ is a positive unknown constant.

According to the scheme in ^[36], the equation of the Bellman error is designed as

$\begin{align} E_{M}(k) = a\hat{M}(k)-\bigg[ \hat{M}(k-1)-\rho(k) \bigg]. \end{align}$

(3.5)

Adopting the cost function in its quadratic form $\beta_{M}(k) = (1/2)(E_{M}(k))^2$ , the gradient of $\hat{W}_M$ is obtained

$\begin{align} \Delta \hat{W}_{M}(k) = a\Vert S_{M}(k)\Vert\bigg[a\hat{M}(k)- \hat{M}(k-1)+\rho(k) \bigg]. \end{align}$

(3.6)

Defining $\omega_{M}(k-j+1) = \hat{\Psi}_{M}(k)\Vert S_{M}(k-j+1)\Vert$ , we can further get

$\begin{align} g(\iota, \beta_{M}(k)) = \sum\limits_{j = 1}^{\iota}a||S_{M}(k-j+1)|| \bigg[a\omega_{M}(k-j+1)-\omega_{M}(k-j)+\rho(k-j+1)\bigg] \end{align}$

(3.7)

where $\iota \ge 1$ is a positive predefined constant that indicates the step length of the gradient.

Together with (3.7), the updated law of $\hat{\Psi}_{M}$ can be obtained

$\begin{align} \hat{\Psi}_{M}(k+1) = \hat{\Psi}_{M}(k)-\mu_{M}\sum\limits_{j = 1}^{\iota}a\Vert S_{M}(k-j+1)\Vert \bigg[ a\omega_{M}(k-j+1)-\omega_{M}(k-j)+\rho(k-j+1)\bigg] \end{align}$

(3.8)

where $\mu_{M}$ is the selected learning rate. The structure of the CNN is shown in Figure 1.

Figure 1. Structure of the CNN.

DownLoad: Full-Size Img PowerPoint

Remark 1: The neural networks in this paper are updated by our weight-updated algorithm. As compare to the classic gradient descent method, our updated algorithm has the following advantages: 1) Reduces the computational complexity; 2) Eliminates the local optimal problem; 3) Accelerates neural networks weight convergence speed;

3.2. Design of $n$ -step adaptive neural network controller

In this section, an ANN will be utilized to implement the $n$ -step backstepping RL control strategy. Specifically, two auxiliary signals are introduced in the $n$ step to eliminate the impact of the dead zone and actuator fault.

Step 1 $\colon$ The tracking error can be defined as $\xi_{1}(k+n) = x_{1}(k+n)-x_{d}(k+n)$ and $\xi_{2}(k+n-1) = x_{2}(k+n-1)-\alpha_{1}(k)$ . According to System (2.2), the tracking error $\xi_{1}(k+n)$ can be further deduced as

$\begin{align} \xi_{1}(k+n) = &\phi_{1}(\hat{x}_{n}(k+n-1)) \bigg[ \bigg( \frac{\varphi_{1}(\bar{x}_{n}(k+n-1))}{\phi_{1}(\bar{x}_{n}(k+n-1))}-\frac{x_{d}(k+n)}{\phi_{1}(\bar{x}_{n}(k+n-1))}+x_{d}(k+n) \bigg)\\&+\alpha_{1}(k)-x_{d}(k+n)\nonumber+\xi_{2}(k+n-1) \bigg] \end{align}$

where $\alpha_{1}(k)$ denotes the virtual controller. Let

$\begin{align} \gamma_{1}(k) = -\bigg( \frac{\varphi_{1}(\bar{x}_{n}(k+n-1))}{\phi_{1}(\bar{x}_{n}(k+n-1))}+x_{d}(k+n)-\frac{x_{d}(k+n)}{\phi_{1}(\bar{x}_{n}(k+n-1))}\bigg). \end{align}$

(3.9)

With the universal approximation capability of the RBF NN, $\gamma_{1}(k) = W_{1}^\mathit{T}S_{1}(N_{1}(k))+\sigma_{1}(k)$ can be approximated, where $W_{1}$ represents the ideal weight vector. We define $N_{1}(k) = [\bar{x}_{n}(k+n-1), x_{d}(k+n)]$ and $\sigma_{1}(k)$ indicates the approximation error. Suppose that $W_{1}$ and $\sigma_{1}(k)$ satisfy $\Vert W_{1} \Vert < \bar{W}_{1}$ and $\Vert\sigma_{1}(k)\Vert < \bar{\sigma}_{1}$ , respectively. Both $\bar{W}_{1}$ and $\bar{\sigma}_{1}$ are corresponding upper bounds.

Combining (3.9) and $\gamma_{1}(k)$ , $\xi_{1}(k+n)$ can be further expressed as

$\begin{align} \begin{split} &\xi_{1}(k+n) = \phi_{1}(\hat{x}_{n}(k+n-1)) \bigg[ -W_{1}^\mathit{T}S_{1}(N_{1}(k))-\sigma_{1}(k)+\alpha_{1}(k)+\xi_{2}(k+n-1)-x_{d}(k+n) \bigg]. \end{split} \end{align}$

(3.10)

For the purpose of solving the algebraic loop problem which will be mentioned later, the term $o_{1}$ is proposed

$\begin{align} o_{1}(k) = \Psi_{1}\Vert S_{1}(\epsilon_{1}(k))\Vert \end{align}$

(3.11)

where $\epsilon_{1}(k) = [\bar{x}_{1}(k+n-1), x_{d}(k+n)]$ and $\Psi_{1} = \Vert W_{1}(k)\Vert$ .

Adding and subtracting (3.11) into (3.10), one can easily derive

$\begin{align} \begin{split} \xi_{1}(k+n) = &\phi_{1}(\bar{x}_{n}(k+n-1)) \bigg[ -W_{1}^\mathit{T}S_{1}(N_{1}(k))-\sigma_{1}(k)+\alpha_{1}(k)\\&+\xi_{2}(k+n-1)-x_{d}(k+n)+\Psi_{1}\Vert S_{1}(\epsilon_{1}(k))\Vert-\Psi_{1}\Vert S_{1}(\epsilon_{1}(k))\Vert \bigg]. \end{split} \end{align}$

(3.12)

In order to further simplify (3.12), the virtual controller is designed as

$\begin{align} \alpha_{1}(k) = -\hat{\Psi}_{1}(k)||S_{1}(\epsilon_{1}(k))||+x_{d}(k+n) \end{align}$

(3.13)

where $\hat{\Psi}_{1}(k) = ||\hat{W}_{1}(k)||$ and $\hat{W}_{1}$ is the estimation of $W_{1}$ .

Substituting (3.13) into (3.12), we gets

$\begin{align} \begin{split} \xi_{1}(k+n) = &\phi_{1}(\bar{x}_{n}(k+n-1)) \bigg[ -W_{1}^\mathit{T}S_{1}(N_{1}(k))-\sigma_{1}(k)+\xi_{2}(k+n-1)\\&-\tilde{\Psi}_{1}(k)\Vert S_{1}(\epsilon_{1}(k))\Vert -\Psi_{1}\Vert S_{1}(\epsilon_{1}(k))\Vert \bigg] \end{split} \end{align}$

(3.14)

where $\tilde{\Psi}_{1}(k) = \hat{\Psi}_{1}(k)-\Psi_{1}$ .

Transforming (3.14) with the $k+1$ time instant, one has

$\begin{align} \begin{split} \xi_{1}(k+1) = \phi_{1}(\bar{x}_{n}(k)) \bigg[ -W_{1}^\mathit{T}S_{1}(N_{1}(k_{1}))-\sigma_{1}(k_{1})+\xi_{2}(k)-\tilde{\Psi}_{1}(k_{1})\Vert S_{1}(\epsilon_{1}(k_{1}))\Vert-\Psi_{1}\Vert S_{1}(\epsilon_{1}(k_{1}))\Vert \bigg] \end{split} \end{align}$

(3.15)

where $k_{1} = k+n-1$ represents the time instant.

On the basis of the RL control scheme, the strategic utility function can be defined as

$\begin{align} E_{1}(k) = \hat{\Psi}_{1}(k_{1})||S_{1}(\epsilon_{1}(k_{1}))||+\left( \hat{M}(k)-M_{d}(k)\right) \end{align}$

(3.16)

where $M_{d}(k)$ represents the ideal strategic utility function and it is usually defined as "0" ^[37].

The cost function is derived as $\beta_{1}(k) = (1/2)(E_{1}(k))^2$ and the gradient of $\hat{\Psi}_{1}(k)$ is deduced as

$\begin{align} \begin{split} \Delta\hat{\Psi}_{1}(k)& = \frac{\partial\beta_{1}(k)}{\partial\hat{\Psi}_{1}(k_{1})} = ||S_{1}(\epsilon_{1}(k_{1}))||\bigg[ \hat{\Psi}_{1}(k)||S_{1}(\epsilon_{1}(k_{1}))||+\hat{M}(k) \bigg]. \end{split} \end{align}$

(3.17)

The multigradient can be further obtained as

$\begin{align} \begin{split} &g(\iota, \beta_{1}(k)) = \sum\limits_{j = 1}^{\iota} ||S_{1}(\epsilon_{1}(k_{1}-j+1))|| \bigg[ \omega_{1}(k_{1}-j+1)+\omega_{M}(k-j+1) \bigg]. \end{split} \end{align}$

(3.18)

Define $\omega_{1}(k_{1}-j+1) = \hat{\Psi}_{1}(k_{1})||S_{1}(\epsilon_{1}(k_{1}-j+1))||$ . Similar to step (3.8), the MGR updated law of $\hat{\Psi}_{1}(k)$ is derived

$\begin{align} \begin{split} \hat{\Psi}_{1}(k+1)& = \hat{\Psi}_{1}(k_{1})-\mu_{1}g(\iota, \beta_{1}(k))\\& = \hat{\Psi}_{1}(k_{1})-\mu_{1}\sum\limits_{j = 1}^{\iota} ||S_{1}(\epsilon_{1}(k_{1}-j+1))||\bigg[ \omega_{1}(k_{1}-j+1)+\omega_{M}(k-j+1) \bigg] \end{split} \end{align}$

(3.19)

where $\mu_{1}$ stands for the chosen learning rate. The structure of the ANN is proposed in Figure 2.

Figure 2. Structure of the ANN.

DownLoad: Full-Size Img PowerPoint

Remark 2: It is necessary to emphasize that previous works usually designed the ANN basis function to have the form $S_{1}(N_{1}(k))$ , which is not the function of $x_{1}(k)$ . According to this design, $\alpha_{1}(k)$ and $\hat{\Psi}_{1}(k+1)$ are all built up as functions of $N_{1}(k) = [\bar{x}_{n}(k+n-1), x_{d}(k+n)]^\mathit{T}$ . This results in the algebraic loop problem proposed in ^[38]. To settle this conundrum, the term $o_{1}(k)$ is presented in this paper and we adapt $\alpha_{1}(k)$ and $\hat{\Psi}_{1}(k+1)$ as functions of $\epsilon_{1}(k) = [\bar{x}_{1}(k+n-1), x_{d}(k+n))]^\mathit{T}$ .

Step i: Define the tracking error $\xi_{i}(k+n-i+1) = x_{i}(k+n-i+1)-\alpha_{i-1}(k)$ and $\xi_{i+1}(k+n-i) = x_{i+1}(k+n-i)-\alpha_{i}(k)$ . Notation $\alpha_{i-1}(k)$ and $\alpha_{i}(k)$ indicate the virtual controller at Step $i-1$ and Step $i$ , respectively. Similar to the process in (3.9), one has

$\begin{align} \begin{split} \xi_{i}(k+n-i+1) = &\phi_{i}(\bar{x}_{n}(k+n-i)) \bigg[ \bigg( \frac{\varphi_{i}(\bar{x}_{n}(k+n-i))}{\phi_{i}(\bar{x}_{n}(k+n-i))}-\frac{\alpha_{i-1}(k)}{\phi_{i}(\bar{x}_{n}(k+n-i))}+\alpha_{i-1}(k) \bigg)\\& +\alpha_{i}(k)+\xi_{i+1}(k+n-i)-\alpha_{i-1}(k) \bigg]. \end{split} \end{align}$

(3.20)

According to the definition of $\gamma_{1}(k)$ , one has $\gamma_{i}(k) = -\bigg(\frac{\varphi_{i}(\bar{x}_{n}(k+n-i)}{\phi_{i}(\bar{x}_{n}(k+n-i))}+\alpha_{i-1}(k)-\frac{\alpha_{i-1}(k)}{\phi_{i}(\bar{x}_{n}(k+n-i))}\bigg)$ . The unknown function can be approximated by the RBF NN which is given as $\gamma_{i}(k) = W_{i}^\mathit{T}S_{i}(N_{i}(k))+\sigma_{i}(k)$ , where $W_{i}$ and $\sigma_{i}(k)$ are defined as the ideal weight vector and the approximation error, respectively. Futhermore, we let $N_{i}(k) = [x_{1}(k+n-i), ..., x_{n}(k+n-i), x_{d}(k+n)]^\mathit{T}$ .

Substituting $\gamma_{i}(k)$ into (3.20), one has

$\begin{align} \begin{split} \xi_{i}(k+n-i+1) = \phi_{i}(\bar{x}_{n}(k+n-i))\bigg[ -W_{i}^\mathit{T}S_{i}(N_{i}(k))-\sigma_{i}(k) +\alpha_{i}(k)+\xi_{i+1}(k+n-i)-\alpha_{i-1}(k) \bigg]. \end{split} \end{align}$

(3.21)

The term $o_{i}(k)$ is given in the form below

$\begin{align} o_{i}(k) = \Psi_{i}||S_{i}(\epsilon_{i}(k))|| \end{align}$

(3.22)

where $\epsilon_{i}(k) = [\bar{x}_{i}(x+n-i), x_{d}(k+n)]^\mathit{T}$ and $\Psi_{i}$ denotes the Euclidean norm of the weight vector $W_{i}$ .

Substituting (3.22) into (3.21) yields

$\begin{align} \begin{split} \xi_{i}(k+n-i+1) = &\phi_{i}(\bar{x}_{n}(k+n-i)) \bigg[ -W_{i}^\mathit{T}S_{i}(N_{i}(k))-\sigma_{i}(k)+\alpha_{i}(k)\\&+\xi_{i+1}(k+n-1)-a_{i-1}(k)+\Psi_{i}||S_{i}(\epsilon_{i}(k))||-\Psi_{i}||S_{i}(\epsilon_{i}(k))|| \bigg]. \end{split} \end{align}$

(3.23)

The same as the previous process, the virtual controller is designed as

$\begin{align} \alpha_{i}(k) = -\hat{\Psi}_{i}(k)||S_{i}(\epsilon_{i}(k))||+\alpha_{i-1}(k) \end{align}$

(3.24)

where $\hat{W}_{i}(k)$ is the estimation of $W_{i}$ and $\hat{\Psi}(k) = ||\hat{W}_{i}(k)||$ .

Substituting (3.24) into (3.23), $\xi_{i}(k+n-i+1)$ expresses

$\begin{align} \begin{split} \xi_{i}(k+n-i+1) = &\phi_{i}(\bar{x}_{n}(k+n-i)) \bigg[ -W_{i}^\mathit{T}S_{i}(N_{i}(k))-\sigma_{i}(k)\\&+\xi_{i+1}(k+n-1)-\tilde{\Psi}_{i}(k)||S_{i}(\epsilon_{i}(k))||-\Psi_{i}||S_{i}(\epsilon_{i}(k))|| \bigg]. \end{split} \end{align}$

(3.25)

Resembling Step (3.15), (3.25) can be further described as

$\begin{align} \begin{split} &\xi_{i}(k+1) = \phi_{i}(\bar{x}_{n}(k)) \bigg[ -W_{i}^\mathit{T}S_{i}(N_{i}(k_{i}))-\sigma_{i}(k_{i})+\xi_{i+1}(k)-\tilde{\Psi}_{i}(k_{i})||S_{i}(\epsilon_{i}(k_{i}))||-\Psi_{i}||S_{i}(\epsilon_{i}(k_{i}))|| \bigg] \end{split} \end{align}$

(3.26)

where $k_{i} = k-n+i$ .

Let the prediction error $E_{i}(k) = \hat{\Psi}_{i}(k)||S_{i}(\epsilon_{i}(k_{i}))||+\hat{M}(k)$ . According to $E_{i}(k)$ , the cost function is described in its quadratic form $\beta_{i}(k) = (1/2)(E_{i}(k))^2$ . The gradient of $\hat{\Psi}_{i}$ is obtained according to the derivation:

$\begin{align} \begin{split} &\Delta\hat{\Psi}_{i}(k) = \frac{\partial\beta_{i}(k)}{\partial\hat{\Psi}_{i}(k_{i})} = ||S_{i}(\epsilon_{i}(k_{i}))||\bigg[ \hat{\Psi}_{i}(k)||S_{i}(\epsilon_{i}(k_{i})||+\hat{M}(k) \bigg]. \end{split} \end{align}$

(3.27)

On the basis of the MGR algorithm definition, the multigradient is expressed as

$\begin{align} \begin{split} g(\iota, \beta_{i}(k))& = \sum\limits_{j = 1}^{\iota} ||S_{i}(\epsilon_{i}(k_{i}-j+1))|| \bigg[ \omega_{i}(k_{i}-j+1)+\omega_{M}(k-j+1) \bigg]. \end{split} \end{align}$

(3.28)

The updated law of $\hat{\Psi}_{i}(k)$ is deduced according to (3.28):

$\begin{align} \begin{split} \hat{\Psi}_{i}(k+1)& = \hat{\Psi}_{i}(k_{i})-\mu_{i}g(\iota, \beta_{i}(k))\\ & = \hat{\Psi}_{i}(k_{i})-\mu_{i}\sum\limits_{j = 1}^{\iota} ||S_{i}(\epsilon_{i}(k_{i}-j+1))||\bigg[ \omega_{i}(k_{1}-j+1)+\omega_{M}(k-j+1) \bigg] \end{split} \end{align}$

(3.29)

where $\mu_{i}$ is the $i$ step learning rate.

Step n $\colon$ The tracking error in the $n$ -th subsystem can be described as $\xi_{n}(k+1) = x_{n}(k+1)-\alpha_{n-1}(k)$ . Substitute (2.2) and (2.6) into the $n$ -th tracking error equation:

$\begin{align} \begin{split} &\xi_{n}(k+1) = \varphi_{n}(\bar{x}_{n}(k))+\phi_{n}(\bar{x}_{n}(k))\bigg(\psi(k)\Big(b(k)v(k)+f(k)\Big)+\delta(k)\bigg)+d(k)-\alpha_{n-1}(k). \end{split} \end{align}$

(3.30)

For the purpose of simplifying 3.30, $\pi(k)$ is defined

$\begin{align} \begin{split} \pi(k) = &-\frac{1}{\phi_{n}(\bar{x}_{n}(k))\psi(k)b(k)}\bigg(\varphi_{n}(\bar{x}_{n}(k))-\alpha_{n-1}(k)\bigg). \end{split} \end{align}$

(3.31)

Using the theory of the RBF NN to approximate (3.31), one gets

$\begin{align} \pi(k) = W_{n}^\mathit{T}S_{n}(N_{n}(k))+\sigma_{n}(k) \end{align}$

(3.32)

where the definitions of $W_{n}$ and $\sigma_{n}(k)$ are the same as those for the steps from $1$ to $n-1$ and $N_{n}(k) = [\bar{x}_{n}(k), x_{d}(k+n)]^\mathit{T}$ .

Combining (3.32) and (3.29) we derive

$\begin{align} \begin{split} &\xi_{n}(k+1) = \phi_{n}(\bar{x}_{n}(k))\psi(k)b(k) \times\bigg( v(k)+\frac{f(k)}{b(k)}+\frac{\delta(k)}{\psi(k)b(k)}-\pi(k)\bigg)+d(k). \end{split} \end{align}$

(3.33)

From (3.33) we can acquire the dynamics of the actuator fault and dead-zone shown in (2.3) and (2.6), separately. It is easy to deduce that they have the following properties

$\begin{align} \begin{split} &\frac{f(k)}{b(k)}\le\frac{\bar{f}}{\underline{b}} = \tau \quad\quad \frac{\delta(k)}{\psi(k)b(k)}\le\frac{\bar{\delta}}{\underline{\psi}\underline{b}} = \vartheta \end{split} \end{align}$

(3.34)

where $\vartheta$ and $\tau$ are both unknown parameters. Define the estimation of both parameters as $\hat{\vartheta}$ and $\hat{\tau}$ ; so the estimation error $\tilde{\vartheta}$ and $\tilde{\tau}$ are given by

$\begin{align} \begin{split} &\tilde{\vartheta}(k) = \hat{\vartheta}(k)-\vartheta \quad \quad \tilde{\tau}(k) = \hat{\tau}(k)-\tau. \end{split} \end{align}$

(3.35)

Based on estimation errors of (3.35), the actual controller is designed as

$\begin{align} v(k) = \hat{\Psi}_{n}(k_{n})||S_{n}(N_{n}(k_{n}))||+\hat{\tau}(k)+\hat{\vartheta}(k) \end{align}$

(3.36)

where $\hat{\Psi}_{n}(k) = ||\hat{W}_{n}(k)||$ and $\hat{W}_{n}(k)$ stands for the estimation of $W_{n}$ . The time index $k_{n} = k$ .

With (3.35) and (3.36), the estimation fault (3.33) can be further written as

$\begin{align} \begin{split} \xi_{n}(k+1) = &\phi_{n}(\bar{x}_{n}(k))\psi(k)b(k_{n})\tilde{\Psi}_{n}(k)||S_{n}(N_{n}(k_{n}))||\phi_{n}(\bar{x}_{n}(k))\psi(k)b(k)\\& \times \bigg(\tilde{\vartheta}(k)+\vartheta+\tilde{\tau}(k)+\tau +\frac{f(k)}{b(k)}+\frac{\delta(k)}{\psi(k)b(k)} \bigg)+\phi_{n}(\bar{x}_{n}(k))\psi(k)b(k)q(k_{n})+d(k) \end{split} \end{align}$

(3.37)

where $\tilde{\Psi}_{n}(k) = \hat{\Psi}_{n}(k)-\Psi_{n}$ and $q(k) = \Psi_{n}||S_{n}(N_{n}(k_{n}))||-W_{n}^\mathit{T}S_{n}(N_{n}(k))-\sigma_{n}(k)$ . Similar to Step $i$ , the strategic utility function is defined

$\begin{align} E_{n}(k) = \hat{\Psi}_{n}(k_{n})||S_{n}(N_{n}(k_{n}))||+\hat{M}(k). \end{align}$

(3.38)

Further define the cost function as $\beta_{n}(k) = (1/2)(E_{n}(k))^2$ ; the gradient of $\hat{\Psi}_{n}(k)$ yields

$\begin{align} \begin{split} \Delta\hat{\Psi}_{n}(k)& = \frac{\partial\beta_{n}(k)}{\partial\hat{\Psi}_{n}(k_{n})} = ||S_{n}(N_{n}(k_{n}))||\bigg[\hat{\Psi}_{n}(k)||S_{n}(N_{n}(k_{n}))||+\hat{M}(k)\bigg]. \end{split} \end{align}$

(3.39)

The multigradient yields

$\begin{align} \begin{split} g(\iota, \beta_{n}(k))& = \sum\limits_{j = 1}^{\iota}||S_{n}(N_{n}(k_{n}-j+1))||\bigg[\omega_{n}(k_{n}-j+1)+\omega_{M}(k-j+1)\bigg] \end{split} \end{align}$

(3.40)

where $\omega_{n}(k_{n}-j+1) = \hat{\Psi}_{n}(k_{n})||S_{n}(N_{n}(k_{n}-j+1))||$ .

The weight updated law for $\hat{\Psi}_{n}(k)$ can be further obtained as

$\begin{align} \begin{split} \hat{\Psi}_{n}(k+1) = &\hat{\Psi}_{n}(k_{n})-g(\iota, \beta_{n}(k)) \\ = &\hat{\Psi}_{n}(k_{n})-\mu_{n}\sum\limits_{j = 1}^{\iota}||S_{n}(N_{n}(k_{n}-j+1))||\bigg[\omega_{n}(k_{n}-j+1)+\omega_{M}(k-j+1)\bigg] \end{split} \end{align}$

(3.41)

where $\mu_{n}$ is a chosen positive learning rate. Let

$\begin{align} \begin{split} E_{\vartheta}(k) = \hat{\vartheta}(k)+\hat{M}(k)\quad\quad E_{\tau}(k) = \hat{\tau}(k)+\hat{M}(k). \end{split} \end{align}$

(3.42)

The cost functions of two auxiliary signals are chosen to have a form that is the same as $\beta_{i}(k)$ form

$\begin{align} \begin{split} \beta_{\vartheta}(k) = (1/2)(E_{\vartheta}(k))^2 \quad\quad \beta_{\tau}(k) = (1/2)(E_{\tau}(k))^2. \end{split} \end{align}$

(3.43)

The gradients are deduced

$\begin{align} \begin{split} &\Delta\hat{\vartheta}(k) = \frac{\partial \beta_{\vartheta}(k)}{\partial\hat{\vartheta}(k)} = \hat{\vartheta}(k)+\hat{M}(k)\\ &\Delta\hat{\tau}(k) = \frac{\partial \beta_{\tau}(k)}{\partial\hat{\tau}(k)} = \hat{\tau}(k)+\hat{M}(k). \end{split} \end{align}$

(3.44)

Two MGR updated laws are obtained

$\begin{align} \begin{split} &\hat{\vartheta}(k+1) = \hat{\vartheta}(k)-\mu_{\vartheta}\sum\limits_{j = 1}^{\iota}\bigg(\hat{\vartheta}(k-j+1)+\hat{M}(k-j+1)\bigg)\\ &\hat{\tau}(k+1) = \hat{\tau}(k)-\mu_{\tau}\sum\limits_{j = 1}^{\iota}\bigg(\hat{\tau}(k-j+1)+\hat{M}(k-j+1)\bigg) \end{split} \end{align}$

(3.45)

where $\mu_{\vartheta}$ and $\mu_{\tau}$ are positive learning factors.

Figure 3 shows the structure of the proposed control strategy. The analysis of the stability and tracking performance are mentioned in the following theorem.

Figure 3. Control system structure for the proposed strategy.

DownLoad: Full-Size Img PowerPoint

Theorem 1: Consider the nonstrict feedback nonlinear system (2.2). The adaptive RL strategy includes the updated laws (3.8), (3.19), (3.29) and (3.41), the virtual controllers (3.13) and (3.24) and the actual controller (3.36). If the selected parameters that

$\begin{align} 0 < \mu_{M} < 1/(l\iota a^2), \ 0 < \mu_{i} < 1/l\iota, \ 0 < \mu_{\vartheta} < 1/\iota, \ 0 < \mu_{\tau} < 1/\iota \end{align}$

(3.46)

and Assumption 1 holds, our control strategy can ensure that all signals are SGUUB and the tracking error is tolerated. The proof of Theorem 1 is shown in the appendix.

4. Simulation

In this section, some simulation results are presented to illustrate the effectiveness of the proposed approach.

The nonstrict-feedback nonlinear discrete time system is chosen as

$\begin{align} \begin{split} \left\{ \begin{array}{lr} x_{1}(k+1) = \varphi_{1}(\bar{x}_{2}(k))+\phi_{1}(\bar{x}_{2}(k))x_{2}(k)\\ x_{2}(k+1) = \varphi_{2}(\bar{x}_{2}(k))+\phi_{2}(\bar{x}_{2}(k)) U(k)+d(k)\\ y(k) = x_{1}(k) \end{array} \right. \end{split} \end{align}$

(4.1)

where $x_{1}(k)$ and $x_{2}(k)$ are the states, $U(k)$ is the input and $y(k)$ is the output. The functions $\varphi_{1}(\bar{x}_{n}(k))$ and $\varphi_{2}(\bar{x}_{n}(k))$ are chosen as $x_{1}(k)$ and $x_{2}(k)$ , respectively. We choose $\phi_{1}(\bar{x}_{n}(k))$ and $\phi_{1}(\bar{x}_{n}(k))$ as $[-x_{1}(k)+0.019(1.5-x_{1}(k))exp(4x_{2}(k)/(3.4+x_{2}(k)))]/20$ and $[-x_{2}(k)+3.1(0.4-x_{1}(k))exp(1.5x_{2}(k)/(3.4+x_{2}(k)))-4(x_{2}(k)-U(k))]$ , respectively. The external disturbance $d(k)$ is chosen as $0.1$ cos $(0.05k)$ cos $(x_{1}(k))$ . The desired signal $x_{d}(k)$ can be described as $x_{d}(k) = 0.013$ sin $(\pi/8+0.6k\pi/38)$ .

The parameters can be selected as follows: $\Psi_{2}(0) = 0.001$ , $\mu_{M} = 0.005$ , $\mu_{1} = 0.008$ , $\mu_{2} = 0.006$ , $\mu_{\tau} = 0.05$ , $\mu_{\vartheta} = 0.05$ , $a = 0.00001$ , $\hat{\tau}(0) = 0.2$ and $\hat{\vartheta}(0) = 0.3$ . The hidden layer node numbers of the ACNNs have been set as $l_{W_{1}} = 10$ , $l_{W_{2}} = 10$ and $l_{W_{M}} = 10$ . The pace of the multigradient is chosen as $\iota = 10$ .

and illustrate the trajectories of signals and the tracking error when $b_r = 0.5$ , $b_l = -0.5$ , $f_r = 0.3$ , and $f_l = -0.3$ , respectively. The control output achieved precise tracking of the reference signal and the tracking error was close to zero (almost 0.001). and illustrate the trajectories of signals and the tracking error when $b_r = 0.4$ , $b_l = -0.4$ , $f_r = 0.25$ and $f_l = -0.25$ , respectively. The tracking performance in this scenario was also great (the tracking error was almost 0.002). According to our results, it is true that our fault-tolerant approach can offset the influence of dead zones with different parameters.

Figure 4. Tracking trajectory for the proposed scheme when

$b_r = 0.5$ ,

$b_l = -0.5$ ,

$f_r = 0.3$ , and

$f_l = -0.3$ .

DownLoad: Full-Size Img PowerPoint

Figure 5. Tracking error for the proposed scheme when

$b_r = 0.5$ ,

$b_l = -0.5$ ,

$f_r = 0.3$ , and

$f_l = -0.3$ .

DownLoad: Full-Size Img PowerPoint

Figure 6. Tracking trajectory for the proposed scheme when

$b_r = 0.4$ ,

$b_l = -0.4$ ,

$f_r = 0.25$ , and

$f_l = -0.25$ .

DownLoad: Full-Size Img PowerPoint

Figure 7. Tracking error utilizing proposed scheme when

$b_r = 0.4$ ,

$b_l = -0.4$ ,

$f_r = 0.25$ , and

$f_l = -0.25$ .

DownLoad: Full-Size Img PowerPoint

Figure 8 describes the trajectories of the ANN weight, the CNN weight, and the control input. We make a comparison between the proposed scheme and the MLP-based strategy. According to the results, it is clear that the proposed scheme achieved a faster convergence rate with the weight parameters than the ordinary MLP scheme. Figures 9 and 10 show the tracking trajectory and the tracking error without utilizing two auxiliary systems. Affected by the input dead zone and the actuator fault, tracking became extremely inaccurate and the tracking error was up to 0.01. Comparing Figures 4 and 9, it is obvious that our the approach can successfully offset the influence of the input dead zone and the actuator fault.

Figure 8. Comparison of the MLP-based control strategy and the proposed scheme.

DownLoad: Full-Size Img PowerPoint

Figure 9. Tracking performance for the scheme proposed in ^[21].

DownLoad: Full-Size Img PowerPoint

Figure 10. Tracking error for the scheme proposed in ^[21].

DownLoad: Full-Size Img PowerPoint

To verify our novel updated algorithm can reduce the computational burden, an experiment was conducted as follows. We consider that the longer the computation the greater the computational burden. The total sample number was 1000. All of results were derived in the same environment by a computer with a 3.6 GHz CPU and 16 GB RAM.

From , we can derive that the computational time of the ordinary gradient descent algorithm is 0.5445 s but the proposed updated algorithm only needs 0.2862 s. The computational time is reduced by 47.44 $\%$ . Combined with the previous analysis, our approach not only alleviates the computational burden of the MGR algorithm but it also achieves a faster convergence rate for the weight parameters.

Table 1. Simulation results.

approach	computational time/s
Gradient descent algorithm in ^[17]	0.5445
The proposed approach	0.2862

| Show Table

DownLoad: CSV

5. Conclusions

The aim of this paper was to build up a fault-tolerant controller for a class of nonstrict feedback systems with input dead zone. We proposed a novel neural network-updated algorithm to achieve a faster computational speed and eliminate the local optimal problem. Two auxiliary parameters were presented to offset the influence of the dead zone and actuator fault. To eliminate the occurrence of the algebraic loop problem, an auxiliary term was introduced to overcome this difficulty. According to the Lyapunov theory, all signals in the closed-loop system were proven to be SGUUB and the tracking error converged to the neighborhood of zero. Finally, some simulation results were presented to illustrate the effectiveness of our approach.

There are still some other difficulties in the control area; for instance, the tracking control for stochastic systems via RL will be our future topic based on our current investigation.

Acknowledgments

This work was supported by the Natural Science Foundation Project of Chongqing under Grant cstc2019jcyj-msxmX036, and also in part by the open funding project of the Guangdong Provincial Key Laboratory of Intelligent Decision and Coordination Control under Grant F2021098.

Conflict of interest

The authors declare that they have no conflict of interest.

Appendix

Step 1: Defining $\theta_{\xi1}$ , $\theta_{\tilde{\Psi}1}$ and $\theta_{M}$ as positive constants. The Lyapunov function is chosen as

$\begin{align} V_{1}(k) = V_{11}(k)+V_{12}(k)+V_{13}(k)+V_{14}(k) \end{align}$

(A.1)

where $V_{11}(k) = (\theta_{\xi1}/4)\xi_{1}^2(k)$ , $V_{12}(k) = (\theta_{\tilde{\Psi}1}/\mu_{1})\sum_{s = 0}^{n-1}\tilde{\Psi}_{1}(k_{1}+i)^2$ , $V_{13}(k) = (\theta_{M}/\mu_{M})\tilde{\Psi}_{M}(k)^2$ and $V_{14}(k) = 2\theta_{M}\sum_{j = 1}^{\iota}\left[\tilde{\Psi}_{M}(k)||S_{M}(k-j)|| \right]^2$ .

The Cauchy-Schwarz inequality is expressed as

$\begin{align} (a_{1}+a_{2}+, ..., +a_{n})^2\le n(a_{1}^2+a_{2}^2+, ..., +a_{n}^2). \end{align}$

(A.2)

Young's inequality is also given below

$\begin{align} \tilde{a}^\mathit{T}\tilde{b}\le(1/2)\tilde{a}^\mathit{T}\tilde{a}+(1/2)\tilde{b}^\mathit{T}\tilde{b} \end{align}$

(A.3)

where $\tilde{a}$ and $\tilde{b}$ are arbitrary vectors.

By utilizing the inequality (A.3) and property that $0 < S(N)^TS(N) < l$ , the two terms shown in (3.12) have the following quality

$\begin{align} \begin{split} &-W_{i}^\mathit{T}S_{i}(N_{i}(k_{i}))-\Psi_{i}||S_{i}(\epsilon_{i}(k_{i}))||\\ \le&|W_{i}^\mathit{T}S_{i}(N_{i}(k_{i}))|+\Psi_{i}||S_{i}(\epsilon_{i}(k_{i}))||\\ \le&\frac{1}{2}W_{i}^\mathit{T}W_{i}+\frac{1}{2}S_{i}(N_{i}(k_{i}))^\mathit{T}S_{i}(N_{i}(k_{i}))+\frac{1}{2}\Psi_{i}^2+\frac{1}{2}||S_{i}(\epsilon_{i}(k_{i}))||^2\\ \le&\bar{\Psi}_{i}^2+l \end{split} \end{align}$

(A.4)

where $i = 1, ..., n-1$ .

According to (A.2) and (3.15), the first order difference of $V_{11}(k)$ is described as

$\begin{align} \begin{split} \Delta{V_{11}(k)}\le&\theta_{\xi1}\bar{\phi}_{1}^2(\tilde{\Psi}_{1}(k_{1})||S_{1}(\epsilon_{1}(k_{1}))||)^2+ \theta_{\xi1}\bar{\phi}_{1}^2(\bar{\Psi}_{1}^2+l)^2+\theta_{\xi1}\bar{\phi}_{1}^2\xi_{2}(k)^2+\theta_{\xi1}\bar{\phi}_{1}^2\bar{\sigma}_{1}^2-\frac{1}{4}\theta_{\xi1}\xi_{1}^2(k). \end{split} \end{align}$

(A.5)

Based on (A.2) and (3.19), the first order difference of $\Delta V_{12}(k)$ can be derived

$\begin{align} \begin{split} \Delta V_{12}(k)\le&-\theta_{\Psi1}(1-\iota l\mu_{1})\sum\limits_{j = 1}^{\iota}\bigg[\omega_{1}(k_{1}-j+1)+\omega_{M}(k-j+1)\bigg]^2+2\theta_{\Psi1}\iota (\bar{\Psi}_{1}+\bar{\Psi}_{M})^2\\&+2\theta_{\Psi1}\sum\limits_{j = 1}^{\iota}\bigg[ \tilde{\Psi}_{M}(k_{1})||S_{M}(k-j+1)||\bigg]^2-\theta_{\Psi1}\sum\limits_{j = 1}^{\iota}\bigg[\tilde{\Psi}_{1}(k_{1})||S_{1}(\epsilon_{1}(k_{1}-j+1)|| \bigg]^2. \end{split} \end{align}$

(A.6)

Similar to the processes in (A.6), the first order difference of $\Delta V_{13}(k)$ is calculated as

$\begin{align} \begin{split} \Delta V_{13}(k)\le&-\theta_{M}(1-\iota l\mu_{M}a^2)\sum\limits_{j = 1}^{\iota}\bigg[a\omega_{M}(k-j+1)+\rho(k-j+1)-\omega_{M}(k-1) \bigg]^2\\&-\theta_{M}a^2\sum\limits_{j = 1}^{\iota}\bigg[ \tilde{\Psi}_{M}(k_{1})||S_{M}(k-j+1)||\bigg]^2 +2\theta_{M}\sum\limits_{j = 1}^{\iota}\bigg[\tilde{\Psi}_{M}(k_{1})||S_{M}(k-j)||\bigg]^2\\&+2\theta_{M}\iota\bigg[\bar{\Psi}_{M}(1+a)+1 \bigg]^2. \end{split} \end{align}$

(A.7)

Then, consider $\Delta V_{14}(k)$ ; we can obtain

$\begin{align} \begin{split} \quad\quad\Delta V_{14}(k)& = 2\theta_{M}\sum\limits_{j = 1}^{\iota}\bigg[\tilde{\Psi}_{M}(k)||S_{M}(k-j+1)|| \bigg]^2-2\theta_{M}\sum\limits_{j = 1}^{\iota}\bigg[\tilde{\Psi}_{M}(k)||S_{M}(k-j)|| \bigg]^2. \end{split} \end{align}$

(A.8)

Combining (A.5)–(A.8), the first-order difference of $\Delta V_{1}(k)$ is derived

$\begin{align} \begin{split} \Delta V_{1}(k)\le&-\theta_{\Psi1}(1-\iota l\mu_{1})\sum\limits_{j = 1}^{\iota}\bigg[\omega_{1}(k_{1}-j+1)+\omega_{M}(k-j+1)\bigg]^2\\&-\theta_{M}(1-\iota l\mu_{M}a^2)\sum\limits_{j = 1}^{\iota}\bigg[a\omega_{M}(k-j+1)+\rho(k-j+1)-\omega_{M}(k-1) \bigg]^2\\&-(\theta_{\Psi1}-\theta_{\xi1}\bar{\phi}_{1}^2)\bigg[ \tilde{\Psi}_{1}(k_{1})||S_{1}(\epsilon_{1}(k_{1}))||\bigg]^2+\theta_{\Psi1}\bar{\phi}_{1}^2\xi_{2}^2(k)\\&-\bigg(\theta_{M}a^2-2\theta_{\Psi1}-2\theta_{M} \bigg)\sum\limits_{j = 1}^{\iota}\bigg[ \tilde{\Psi}_{M}(k)||S_{M}(k-j+1)||\bigg]^2\\ &-\theta_{\Psi1}\sum\limits_{j = 2}^{\iota}\bigg[\tilde{\Psi}_{1}(k_{1})||S_{1}(\epsilon_{1}(k_{1}-j+1))|| \bigg]^2-\frac{\theta_{\xi1}}{4}\xi_{1}^2(k)+B_{1} \end{split} \end{align}$

(A.9)

where $B_{1} = 2\theta_{\Psi1}\iota (\bar{W}_{1}+\bar{W}_{M})^2+\theta_{\xi1}\bar{\Psi}_{1}^2\bar{\sigma}_{1}^2+\theta_{\xi1}\bar{\phi}_{1}^2(\bar{\Psi}_{1}^2+l)^2+2\theta_{M}\iota(\bar{\Psi}_{M}(1+a)+1)^2$ .

Step i: The Lyapunov function in Steps $2$ to $n-1$ is designed as

$\begin{align} \begin{split} V_{i}(k) = V_{i1}(k)+V_{i2}(k) \end{split} \end{align}$

(A.10)

where $V_{i1}(k) = (\theta_{\xi i}/4)\xi_{i}^2(k)$ , $V_{i2}(k) = (\theta_{\Psi i}/\mu i)\sum_{s = 0}^{n-i}\tilde{\Psi}_{i}(k_{i}+s)^2$ and $\theta_{\xi i}$ and $\theta_{\Psi i}$ are both positive constants.

According to (3.26), the first order difference of $V_{i1}$ can be deduced

$\begin{align} \begin{split} &\Delta V_{i1}(k)\le\theta_{\xi i}\bar{\phi}_{i}^2 \bigg[ \tilde{\Psi}_{i}(k_{i})||S_{i}(\epsilon_{i}(k_{i}))|| \bigg]^2 +\theta_{\xi i}\bar{\phi}_{i}^2\xi_{i+1}^2+\theta_{\xi i}\bar{\phi}_{i}^2\bar{\sigma}_{i}^2+\theta_{\xi i}\bar{\phi}_{i}^2 (\bar{\Psi}_{i}^2+l )^2-\frac{1}{4}\theta_{\xi i}\xi_{i}(k)^2. \end{split} \end{align}$

(A.11)

Similar to (A.6), one deduces $\Delta V_{i2}(k)$ as

$\begin{align} \begin{split} \Delta V_{i2}(k)\le&-\theta_{\Psi i}(1-\iota l\mu_{i})\sum\limits_{j = 1}^{\iota}\bigg[\omega_{i}(k_{i}-j+1)+\omega_{M}(k-j+1)\bigg]^2+2\theta_{\Psi i}\iota (\bar{\Psi}_{i}+\bar{\Psi}_{M})^2\\& +2\theta_{\Psi i}\sum\limits_{j = 1}^{\iota}\bigg[ \tilde{\Psi}_{M}(k)||S_{M}(k-j+1)||\bigg]^2-\theta_{\Psi i}\sum\limits_{j = 1}^{\iota}\bigg[\tilde{\Psi}_{i}(k_{i})||S_{i}(\epsilon_{i}(k_{i}-j+1)|| \bigg]^2. \end{split} \end{align}$

(A.12)

Combining (A.11) and (A.12), $\Delta V_{i}(k)$ is derived, which shows that

$\begin{align} \begin{split} \Delta V_{i}(k)\le&-\theta_{\Psi i}(1-\iota l\mu_{i})\sum\limits_{j = 1}^{\iota}\bigg[\omega_{i}(k_{i}-j+1)+\omega_{M}(k-j+1)\bigg]^2-\frac{1}{4}\theta_{\xi i}(k)\xi_{i}(k)^2+\theta_{\xi i}\bar{\phi}_{i}^2\xi_{i+1}^2\\&+2\theta_{\Psi i}\sum\limits_{j = 1}^{\iota}\bigg[ \tilde{\Psi}_{M}(k)||S_{M}(k-j+1)||\bigg]^2-\bigg(\theta_{\Psi i}-\theta_{\xi i}\bar{\phi}_{i}^2 \bigg)\bigg[\tilde{\Psi}_{i}(k_{i})||S_{i}(\epsilon_{i}(k_{i})|| \bigg]^2\\&-\theta_{\Psi i}\sum\limits_{j = 2}^{\iota}\bigg[\tilde{\Psi}_{i}(k_{i})||S_{i}(\epsilon_{i}(k_{i}-j+1)|| \bigg]^2+B_{i} \end{split} \end{align}$

(A.13)

where $B_{i} = \theta_{\xi i}\bar{\phi}_{i}^2(\bar{\Psi}_{i}^2+l)^2+2\theta_{\Psi i}\iota(\bar{\Psi}_{i}+\bar{\Psi}_{M})^2+\theta_{\xi i}\bar{\phi}_{i}^2\bar{\sigma}_{i}^2$ .

Step n: The Lyapunov function in the $n$ -th step is

$\begin{align} V_{n}(k) = V_{n1}(k)+V_{n2}(k)+V_{n3}(k)+V_{n4}(k) \end{align}$

(A.14)

where $V_{n1}(k) = (\theta_{\xi n}/2)\xi_{n}^2(k)$ , $V_{n2}(k) = (\theta_{\Psi n}/\mu_{n})\tilde{\Psi}_{n}^2(k)$ , $V_{n3}(k) = (\theta_{\vartheta}/\mu_{\vartheta})\tilde{\vartheta}^2(k)$ and $V_{n4}(k) = (\theta_{\tau}/\mu_{\tau})\tilde{\tau}^2(k)$ .

Using the above inequality, (3.37) follows

$\begin{align} \begin{split} \Delta V_{n1}(k)\le&\frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2 \bigg[2\tau^2+\tilde{\tau}(k)^2+2\vartheta^2+\tilde{\vartheta}(k)^2 \\&+ \bigg(\tilde{\Psi}_{n}(k)||S_{n}(N_{n}(k_{n}))||\bigg)^2+\bar{q} \bigg]+\frac{2}{3}\theta_{\xi n}\bar{d}^2-(\theta_{\xi n}/3)\xi_{n}(k)^2 \end{split} \end{align}$

(A.15)

where $q(k_{n})^2 < (\bar{\sigma}_{n}+2l^{1/2}\bar{\Psi}_{n}) = \bar{q}$ .

Based on (3.41) and (A.2), $\Delta V_{n2}(k)$ is given as

$\begin{align} \begin{split} \Delta V_{n2}(k)\le&-\theta_{\Psi n}(1-\iota l\mu_{n})\sum\limits_{j = 1}^{\iota}\bigg[\omega_{n}(k_{n}-j+1)+\omega_{M}(k-j+1)\bigg]^2+2\theta_{\Psi n}\iota (\bar{\Psi}_{n}+\bar{\Psi}_{M})^2\\&+2\theta_{\Psi n}\sum\limits_{j = 1}^{\iota}\bigg[ \tilde{\Psi}_{M}(k)||S_{M}(k-j+1)||\bigg]^2-\theta_{\Psi n}\sum\limits_{j = 1}^{\iota}\bigg[\tilde{\Psi}_{n}(k_{n})||S_{n}(\epsilon_{n}(k_{n}-j+1)|| \bigg]^2. \end{split} \end{align}$

(A.16)

Obviously, similar to (A.6), the first order difference of $V_{n3}(k)$ and $V_{n4}(k)$ is:

$\begin{align} \begin{split} \Delta V_{n3}(k)\le &-\theta_{\vartheta}(1-\mu_{\vartheta}\iota)\sum\limits_{j = 1}^{\iota}\bigg[\hat{\vartheta}(k-j+1)+\omega_{M}(k-j+1)\bigg]^2-\theta_{\vartheta}\sum\limits_{j = 1}^{\iota}\tilde{\vartheta}(k-j+1)^2\\&+2\theta_{\vartheta}\sum\limits_{j = 1}^{\iota}\bigg[\tilde{\Psi}_{M}(k)||S_{M}(k-j+1)||\bigg]^2+2\theta_{\vartheta}\iota(\vartheta+\bar{\Psi}_{M})^2\\ \Delta V_{n4}(k)\le&-\theta_{\tau}(1-\mu_{\tau}\iota)\sum\limits_{j = 1}^{\iota}\bigg[\hat{\tau}(k-j+1)+\omega_{M}(k-j+1)\bigg]^2-\theta_{\tau}\sum\limits_{j = 1}^{\iota}\tilde{\tau}(k-j+1)^2\\&+2\theta_{\tau}\sum\limits_{j = 1}^{\iota}\bigg[\tilde{\Psi}_{M}(k)||S_{M}(k-j+1)||\bigg]^2+2\theta_{\tau}\iota(\tau+\bar{\Psi}_{M})^2. \end{split} \end{align}$

(A.17)

Combining (A.15)–(A.17), one has

$\begin{align} \begin{split} \Delta V_{n}(k) = & V_{n1}(k)+V_{n2}(k)+V_{n3}(k)+V_{n4}(k)\\ \le&-\theta_{\Psi n}(1-\iota l\mu_{n})\sum\limits_{j = 1}^{\iota}\bigg[\omega_{n}(k_{n}-j+1)+\omega_{M}(k-j+1)\bigg]^2\\&-\theta_{\vartheta}(1-\mu_{\vartheta}\iota)\sum\limits_{j = 1}^{\iota}\bigg[\hat{\vartheta}(k-j+1)+\omega_{M}(k-j+1)\bigg]^2\\&-\theta_{\tau}(1-\mu_{\tau}\iota)\sum\limits_{j = 1}^{\iota}\bigg[\hat{\tau}(k-j+1)+\omega_{M}(k-j+1)\bigg]^2\\&-\bigg(\theta_{\Psi n}-\frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2\bigg)\bigg[ \tilde{\Psi}_{n}(k)||S_{n}(\epsilon_{n}(k_{n}))||\bigg]^2\\&-\theta_{\xi n}\sum\limits_{j = 2}^{\iota}\bigg[\tilde{\Psi}_{n}(k)||S_{n}(\epsilon_{n}(k_{n}-j+1))||\bigg]^2-(\theta_{\xi n}/3)\xi_{n}(k)^2 \\&+\bigg(2\theta_{\Psi n}+2\theta_{\vartheta}+2\theta_{\tau}\bigg)\bigg[\tilde{\Psi}_{M}(k)||S_{M}(k-j+1)||\bigg]^2\\&-\bigg(\theta_{\vartheta}-\frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2\bigg)\tilde{\vartheta}(k)^2-\bigg(\theta_{\tau}-\frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2\bigg)\tilde{\tau}(k)^2\\&-\theta_{\vartheta}\sum\limits_{j = 2}^{\iota}\tilde{\vartheta}(k-j+1)^2-\theta_{\tau}\sum\limits_{j = 2}^{\iota}\tilde{\tau}(k-j+1)^2+B_{n} \end{split} \end{align}$

(A.18)

where $B_{n} = \frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2(2\tau^2+2\vartheta^2+\bar{q})+\frac{2}{3}\theta_{\xi n}\bar{d}^2+2\theta_{\tau}\iota(\tau+\bar{\Psi}_{M})^2+2\theta_{\Psi n}\iota (\bar{\Psi}_{n}+\bar{\Psi}_{M})^2+2\theta_{\vartheta}\iota(\vartheta+\bar{\Psi}_{M})^2$ .

Combining the Lyapunov function from Step $1$ to Step $n$ , we can obtain

$\begin{align} \begin{split} V(k) = &\sum\limits_{i = 1}^{n}V_{i}(k) \\ = &\sum\limits_{i = 1}^{n}\frac{\theta_{\Psi i}}{\mu_{i}}\sum\limits_{s = 0}^{n-i}\tilde{\Psi}_{i}(k_{i}+s)^2+\sum\limits_{i = 1}^{n-1}\frac{1}{4}\theta_{\xi i}\xi_{i}^2(k)+\frac{1}{3}\theta_{\xi n}\xi_{n}^2(k)+\frac{\theta_{M}}{\mu_{M}}\tilde{\Psi}_{M}(k)^2\\&+2\theta_{M}\sum\limits_{j = 1}^{\iota}\Psi_{M}(k-j)^2+(\theta_{\vartheta}/\mu_{\vartheta})\tilde{\vartheta}(k)^2+(\theta_{\tau}/\mu_{\tau})\tilde{\tau}(k)^2. \end{split} \end{align}$

(A.19)

Combining (A.9), (A.13) and (A.18), we finially get

$\begin{align} \begin{split} \Delta V(k)\le&-(\frac{\theta_{\xi i}}{3}-\theta_{\xi(n-1)}\bar{\phi}_{n-1}^2)\xi_{n}(k)^2\\&-\sum\limits_{i = 1}^{n}\theta_{\Psi i}(1-\iota l\mu_{i})\sum\limits_{j = 1}^{\iota}\bigg[\omega_{i}(k_{i}-j+1)+\omega_{M}(k-j+1)\bigg]^2\\&-\theta_{\vartheta}(1-\mu_{\vartheta}\iota)\sum\limits_{j = 1}^{\iota}(\hat{\vartheta}(k-j+1)-\omega_{M}(k-j+1)\bigg]\\&-\theta_{\tau}(1-\mu_{\tau}\iota)\sum\limits_{j = 1}^{\iota}\bigg[\hat{\tau}(k-j+1)+\omega_{M}(k-j+1)\bigg]\\&-\theta_{M}(1-\iota l\mu_{M}a^2)\sum\limits_{j = 1}^{\iota}\bigg[a\omega_{M}(k-j+1)+\rho(k-j+1)-\omega_{M}(k-1) \bigg]^2\\&-\bigg(\theta_{\vartheta}-\frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2\bigg)\tilde{\vartheta}(k)^2-\bigg(\theta_{\tau}-\frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2\bigg)\tilde{\tau}(k)^2\\&-\bigg(\theta_{M}a^2-2\theta_{M}-2\sum\limits_{i = 1}^{n}\theta_{\Psi i}-2\theta_{\vartheta}-2\theta_{\tau}\bigg)\sum\limits_{j = 1}^{\iota}\bigg[\tilde{\Psi}_{M}||S_{M}(k-j+1)||\bigg]^2\\&-\sum\limits_{i = 2}^{n-1}\bigg(\frac{\theta_{\xi i}}{4}-\theta_{\xi(i-1)}\bar{\phi}_{i-1}^2\bigg)\xi_{i}(k)^2-\frac{1}{4}\theta_{\xi1}\xi_{1}(k)^2-\sum\limits_{i = 1}^{n}\theta_{\Psi i}\sum\limits_{j = 2}^{\iota}\bigg[\tilde{\Psi}_{i}(k_{i}))||S_{i}(\epsilon_{i}(k_{i}-j+1))||\bigg]^2\\&-\sum\limits_{i = 1}^{n-1}(\theta_{\Psi i}-\theta_{\Psi i}\bar{\phi}_{i}^2)\bigg[\tilde{\Psi}_{i}(k_{i})||S_{i}(\epsilon_{i}(k_{i}))||\bigg]^2-\bigg(\theta_{\Psi n}-\frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2\bigg)\bigg[\tilde{\Psi}_{n}(k_{n})||S_{n}(\epsilon_{n}(k_{n}))||\bigg]^2+B \end{split} \end{align}$

(A.20)

where $B = \sum_{i = 1}^n B_i$ .

Selecting the parameters as $0 < \mu_{M} < 1/(l\iota a^2), 0 < \mu_{i} < 1/l\iota, 0 < \mu_{\vartheta} < 1/\iota$ and $0 < \mu_{\tau} < 1/\iota$ . Thus, the first order difference of $V(k)$ can be simplified as

$\begin{align} \begin{split} \Delta V(k)\le&-\bigg(\theta_{\vartheta}-\frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2\bigg)\tilde{\vartheta}(k)^2-\bigg(\theta_{\tau}-\frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2\bigg)\tilde{\tau}(k)^2\\&-\bigg(\theta_{M}a^2-2\theta_{M}-2\sum\limits_{i = 1}^{n}\theta_{\Psi i}-2\theta_{\vartheta}-2\theta_{\tau}\bigg)\sum\limits_{j = 1}^{\iota}\bigg[\tilde{\Psi}_{M}||S_{M}(k-j+1)||\bigg]^2\\&-\frac{1}{4}\theta_{\xi1}\xi_{1}(k)^2+B-\sum\limits_{i = 2}^{n-1}\bigg(\frac{\theta_{\xi i}}{4}-\theta_{\xi(i-1)}\bar{\phi}_{i-1}^2\bigg)\xi_{i}(k)^2\\&-\bigg(\frac{\theta_{\xi i}}{3}-\theta_{\xi(n-1)}\bar{\phi}_{n-1}^2\bigg)\xi_{n}(k)^2-\sum\limits_{i = 1}^{n}\theta_{\Psi i}\sum\limits_{j = 2}^{\iota}\bigg[\tilde{\Psi}_{i}(k_{i}))||S_{i}(\epsilon_{i}(k_{i}-j+1))||\bigg]^2\\&-\sum\limits_{i = 1}^{n-1}(\theta_{\Psi i}-\theta_{\Psi i}\bar{\phi}_{i}^2)\bigg[\tilde{\Psi}_{i}(k_{i})||S_{i}(\epsilon_{i}(k_{i}))||\bigg]^2-(\theta_{\Psi n}-\frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2)\bigg[\tilde{\Psi}_{n}(k_{n})||S_{n}(\epsilon_{n}(k_{n}))||\bigg]^2. \end{split} \end{align}$

(A.21)

In this paper, the parameters $\theta_{\Psi i}$ , $\theta_{\Phi n}$ , $\theta_{\xi i}$ , $\theta_{\xi n}$ , $\theta_{M}$ , $\theta_{\vartheta}$ and $\theta_{\tau}$ are respectively designed as $\theta_{\vartheta} > \frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2$ , $\theta_{\tau} > \frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2$ , $\theta_{M} > \frac{1}{a^2}(2\theta_{M}+2\sum_{i = 1}^{n}\theta_{\Psi i}+2\theta_{\vartheta}+2\theta_{\tau})$ , $\theta_{\Psi i} > \theta_{\Psi i}\bar{\phi}_{i}^2$ , $\theta_{\Psi n} > \frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2$ , $\theta_{\xi i} > 4\theta_{\xi(i-1)}\bar{\phi}_{i-1}^2$ and $\theta_{\xi n} > 3\theta_{\xi(n-1)}\bar{\phi}_{n-1}^2$ . On the basis of the method, we have that $\Delta V(k) < 0$ if the following inequalities hold

$\begin{align} \begin{split} &|\tilde{\vartheta}(k)| > \frac{\sqrt{B}}{\sqrt{\theta_{\vartheta}-\frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2}}, \quad |\tilde{\tau}(k)| > \frac{\sqrt{B}}{\sqrt{\theta_{\tau}-\frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2}}, \\ &\bigg|\sum\limits_{j = 1}^{\iota}(\tilde{\Psi}_{M}||S_{M}(k-j+1)||)\bigg| > \frac{\sqrt{B}}{\sqrt{\theta_{M}a^2-2\theta_{M}-2\sum\limits_{i = 1}^{n}\theta_{\Psi i}-2\theta_{\vartheta}-2\theta_{\tau}}}, \\ &|\xi_{1}(k)| > \frac{\sqrt{B}}{\sqrt{\frac{1}{4}\theta_{\xi1}}}, \quad\quad|\xi_{i}(k)| > \frac{\sqrt{B}}{\sqrt{\frac{\theta_{\xi i}}{4}-\theta_{\xi(i-1)}\bar{\phi}_{i-1}^2}}, \\ &|\xi_{n}(k)| > \frac{\sqrt{B}}{\sqrt{\frac{\theta_{\xi i}}{3}-\theta_{\xi(n-1)}\bar{\phi}_{n-1}^2}}, \quad\bigg|\sum\limits_{j = 2}^{\iota}(\tilde{\Psi}_{i}(k_{i}))||S_{i}(\epsilon_{i}(k_{i}-j+1))||\bigg| > \frac{\sqrt{B}}{\sqrt{\theta_{\Psi i}}}, \\ &\bigg|\tilde{\Psi}_{i}(k_{i})||S_{i}(\epsilon_{i}(k_{i}))||\bigg| > \frac{\sqrt{B}}{\sqrt{\theta_{\Psi i}-\theta_{\Psi i}\bar{\phi}_{i}^2}}, \\ &\bigg|\tilde{\Psi}_{n}(k_{n})||S_{n}(\epsilon_{n}(k_{n}))||\bigg| > \frac{\sqrt{B}}{\sqrt{\theta_{\Psi n}-\frac{16}{3}\theta_{\xi n}\bar{\phi}_{n}^2\bar{\psi}^2\bar{b}^2}}. \end{split} \end{align}$

(A.22)

In this way, all signals in the closed-loop system are proven to be SGUUB.

References

[1]	J. B. Du, W. J. Cheng, G. Y. Lu, H. T. Gao, X. L. Chu, Z. C. Zhang, et al., Resource pricing and allocation in MEC enabled blockchain systems: An A3C deep reinforcement learning approach, IEEE Trans. Network Sci. Eng., 9 (2022), 33–44. https://10.1109/TNSE.2021.3068340 doi: 10.1109/TNSE.2021.3068340
[2]	H. X. Peng, X. M. Shen, Deep reinforcement learning based resource management for multi-access edge computing in vehicular networks, IEEE Trans. Network Sci. Eng., 7 (2021), 2416–2428. https://10.1109/TNSE.2020.2978856 doi: 10.1109/TNSE.2020.2978856
[3]	D. C. Chen, X. L. Liu, W. W. Yu, Finite-time fuzzy adaptive consensus for heterogeneous nonlinear multi-agent systems, IEEE Trans. Network Sci. Eng., 7 (2021), 3057–3066. https://10.1109/TNSE.2020.3013528 doi: 10.1109/TNSE.2020.3013528
[4]	J. Wang, Q. Wang, H. Wu, T. Huang, Finite-time consensus and finite-time $H_{\infty}$ consensus of multi-agent systems under directed topology, IEEE Trans. Network Sci. Eng., 7 (2020), 1619–1632. https://10.1109/TNSE.2019.2943023 doi: 10.1109/TNSE.2019.2943023
[5]	T. Gao, T. Li, Y. J. Liu, S. Tong, IBLF-based adaptive neural control of state-constrained uncertain stochastic nonlinear systems, IEEE Trans. Neural Networks Learn. Syst., 33 (2022), 7345–7356. https://10.1109/TNNLS.2021.3084820 doi: 10.1109/TNNLS.2021.3084820
[6]	T. T. Gao, Y. J. Liu, D. P. Li, S. C. Tong, T. S. Li, Adaptive neural control using tangent time-varying BLFs for a class of uncertain stochastic nonlinear systems with full state constraints, IEEE Trans. Cybern., 51 (2021), 1943–1953. https://10.1109/TCYB.2019.2906118 doi: 10.1109/TCYB.2019.2906118
[7]	P. J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences, Ph. D. dissertation, Harvard University, 1974.
[8]	Y. Tang, D. D. Zhang, P. Shi, W. B. Zhang, F. Qian, Event-based formation control for nonlinear multiagent systems under DOS attacks, IEEE Trans. Autom. Control., 66 (2021) 452–459. https://10.1109/TAC.2020.2979936 doi: 10.1109/TAC.2020.2979936
[9]	Y. Tang, X. T. Wu, P. Shi, F. Qian, Input-to-state stability for nonlinear systems with stochastic impulses, Automatica, 113 (2020), 0005–1098. https://doi.org/10.1016/j.automatica.2019.108766 doi: 10.1016/j.automatica.2019.108766
[10]	X. T. Wu, Y. Tang, J. D. Cao, X. R. Mao, Stability analysis for continuous-time switched systems with stochastic switching signals, IEEE Trans. Autom. Control., 63 (2018), 3083–3090. https://10.1109/TAC.2017.2779882 doi: 10.1109/TAC.2017.2779882
[11]	B. Kiumarsi, K. G. Vamvoudakis, H. Modares, F. L. Lewis, Optimal and autonomous control using reinforcement learning: A survey, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 2042–2062. https://10.1109/TNNLS.2017.2773458 doi: 10.1109/TNNLS.2017.2773458
[12]	V. Narayanan, S. Jagannathan, Event-triggered distributed control of nonlinear interconnected systems using online reinforcement learning with exploration, IEEE Trans. Cybern., 48 (2018), 2510–2519. https://10.1109/TCYB.2017.2741342 doi: 10.1109/TCYB.2017.2741342
[13]	B. Luo, H. N. Wu, T. Huang, Off-policy reinforcement learning for H $\infty$ control design, IEEE Trans. Cybern., 45 (2015), 65–76. https://10.1109/TCYB.2014.2319577 doi: 10.1109/TCYB.2014.2319577
[14]	R. Song, F. L. Lewis, Q. Wei, Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero sum games, IEEE Trans. Neural Networks Learn. Syst., 28 (2017), 704–713. https://10.1109/TNNLS.2016.2582849 doi: 10.1109/TNNLS.2016.2582849
[15]	X. Yang, D. Liu, B. Luo, C. Li, Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning, Inf. Sci., 369 (2016), 731–747. https://doi.org/10.1016/j.ins.2016.07.051 doi: 10.1016/j.ins.2016.07.051
[16]	H. Zhang, K. Zhang, Y. Cai, J. Han, Adaptive fuzzy fault-tolerant tracking control for partially unknown systems with actuator faults via integral reinforcement learning method, IEEE Trans. Fuzzy Syst., 27 (2019), 1986–1998. https://10.1109/TFUZZ.2019.2893211 doi: 10.1109/TFUZZ.2019.2893211
[17]	W. Bai, Q. Zhou, T. Li, H. Li, Adaptive reinforcement learning neural network control for uncertain nonlinear system with input saturation, IEEE Trans. Cybern., 50 (2020), 3433–3443. https://10.1109/TCYB.2019.2921057 doi: 10.1109/TCYB.2019.2921057
[18]	Y. Li, S. Tong, Adaptive neural networks decentralized FTC design for nonstrict-feedback nonlinear interconnected large-scale systems against actuator faults, IEEE Trans. Neural Networks Learn. Syst., 28 (2017), 2541–2554. https://10.1109/TNNLS.2016.2598580 doi: 10.1109/TNNLS.2016.2598580
[19]	Q. Chen, H. Shi, M. Sun, Echo state network-based backstepping adaptive iterative learning control for strict-feedback systems: An error-tracking approach, IEEE Trans. Cybern., 50 (2020), 3009–3022. https://10.1109/TCYB.2019.2931877 doi: 10.1109/TCYB.2019.2931877
[20]	S. Tong, Y. Li, S. Sui, Adaptive fuzzy tracking control design for SISO uncertain nonstrict feedback nonlinear systems, IEEE Trans. Fuzzy Syst., 24 (2016), 1441–1454. https://10.1109/TFUZZ.2016.2540058 doi: 10.1109/TFUZZ.2016.2540058
[21]	W. Bai, T. Li, S. Tong, NN reinforcement learning adaptive control for a class of nonstrict-feedback discrete-time systems, IEEE Trans. Cybern., 50 (2020), 4573–4584. https://10.1109/TCYB.2020.2963849 doi: 10.1109/TCYB.2020.2963849
[22]	Y. Li, K. Sun, S. Tong, Observer-based adaptive fuzzy fault-tolerant optimal control for SISO nonlinear systems, IEEE Trans. Cybern., 49 (2019), 649–661. https://10.1109/TCYB.2017.2785801 doi: 10.1109/TCYB.2017.2785801
[23]	H. Modares, F. L. Lewis, M. B. Naghibi-Sistani, Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems, Automatica, 50 (2014), 193–202. https://doi.org/10.1016/j.automatica.2013.09.043 doi: 10.1016/j.automatica.2013.09.043
[24]	Z. Wang, L. Liu, Y. Wu, H. Zhang, Optimal fault-tolerant control for discrete-time nonlinear strict-feedback systems based on adaptive critic design, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 2179–2191. https://10.1109/TNNLS.2018.2810138 doi: 10.1109/TNNLS.2018.2810138
[25]	H. Li, Y. Wu, M. Chen, Adaptive fault-tolerant tracking control for discrete-time multiagent systems via reinforcement learning algorithm, IEEE Trans. Cybern., 51 (2021), 1163–1174. https://10.1109/TCYB.2020.2982168 doi: 10.1109/TCYB.2020.2982168
[26]	Y. J. Liu, L. Tang, S. Tong, C. L. P. Chen, D. J. Li, Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems, IEEE Trans. Neural Networks Learn. Syst., 26 (2015), 165–176. https://10.1109/TNNLS.2014.2360724 doi: 10.1109/TNNLS.2014.2360724
[27]	W. Bai, T. Li, Y. Long, C. L. P. Chen, Event-triggered multigradient recursive reinforcement learning tracking control for multiagent systems, IEEE Trans. Neural Networks Learn. Syst., 34 (2023), 355–379. https://10.1109/TNNLS.2021.3094901 doi: 10.1109/TNNLS.2021.3094901
[28]	H. Wang, G. H. Yang, A finite frequency domain approach to fault detection for linear discrete-time systems, Int. J. Control., 81 (2008), 1162–1171. https://doi.org/10.1080/00207170701691513 doi: 10.1080/00207170701691513
[29]	C. Tan, G. Tao, R. Qi, A discrete-time parameter estimation based adaptive actuator failure compensation control scheme, Int. J. Control., 86 (2013), 276–289. https://doi.org/10.1080/00207179.2012.723828 doi: 10.1080/00207179.2012.723828
[30]	J. Na, X. Ren, G. Herrmann, Z. Qiao, Adaptive neural dynamic surface control for servo systems with unknown dead-zone, Control Eng. Pract., 19 (2011), 1328–1343. https://doi.org/10.1016/j.conengprac.2011.07.005 doi: 10.1016/j.conengprac.2011.07.005
[31]	Y. J. Liu, S. Li, S. Tong, C. L. P. Chen, Adaptive reinforcement learning control based on neural approximation for nonlinear discrete-time systems with unknown nonaffine dead-zone input, IEEE Trans. Neural Networks Learn. Syst., 30 (2019), 295–305. https://10.1109/TNNLS.2018.2844165 doi: 10.1109/TNNLS.2018.2844165
[32]	S. S. Ge, J. Zhang, T. H. Lee, Adaptive neural network control for a class of MIMO nonlinear systems with disturbances in discrete time, IEEE Trans. Syst., Man Cybern. B, Cybern., 34 (2004), 1630–1645. https://10.1109/TSMCB.2004.826827 doi: 10.1109/TSMCB.2004.826827
[33]	Y. J. Liu, Y. Gao, S. Tong, Y. Li, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete time systems with dead-zone, IEEE Trans. Fuzzy Syst., 24 (2016), 16–28. https://10.1109/TFUZZ.2015.2418000 doi: 10.1109/TFUZZ.2015.2418000
[34]	S. S. Ge, G. Y. Li, T. H. Lee, Adaptive NN control for a class of strict-feedback discrete-time nonlinear systems, Automatica, 39 (2003), 807–819. https://doi.org/10.1016/S0005-1098(03)00032-3 doi: 10.1016/S0005-1098(03)00032-3
[35]	Q. Yang, S. Jagannathan, Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators, IEEE Trans. Syst., Man, Cybern. B, Cybern., 42 (2012), 377–390. https://10.1109/TSMCB.2011.2166384 doi: 10.1109/TSMCB.2011.2166384
[36]	Y. J. Liu, Y. Gao, S. Tong, Y. Li, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete time systems with dead-zone, IEEE Trans. Fuzzy Syst., 24 (2016), 16–28. https://10.1109/TFUZZ.2015.2418000 doi: 10.1109/TFUZZ.2015.2418000
[37]	S. Ferrari, J. E. Steck, R. Chandramohan, Adaptive feedback control by constrained approximate dynamic programming, IEEE Trans. Syst., Man, Cybern. B, Cybern., 38 (2008), 982–987. https://10.1109/TSMCB.2008.924140 doi: 10.1109/TSMCB.2008.924140
[38]	S. Tong, Y. Li, S. Sui, Adaptive fuzzy tracking control design for SISO uncertain nonstrict feedback nonlinear systems, IEEE Trans. Fuzzy Syst., 24 (2016), 1441–1454. https://10.1109/TFUZZ.2016.2540058 doi: 10.1109/TFUZZ.2016.2540058

This article has been cited by:

Xiaohang Su, Peng Liu, Haoran Jiang, Xinyu Yu, Neighbor event-triggered adaptive distributed control for multiagent systems with dead-zone inputs, 2024, 9, 2473-6988, 10031, 10.3934/math.2024491

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Mathematical Biosciences and Engineering

3.9

Metrics

Article views(1865) PDF downloads(144) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(10) / Tables(1)

Mathematical Biosciences and Engineering

Fault-tolerant control for nonlinear systems with a dead zone: Reinforcement learning approach

Related Papers:

Abstract

1. Introduction

1.1. Motivation

1.2. Related work

1.3. Paper contribution and organization

2. Problem formulation and preliminaries

2.1. Model description

2.2. RBF NN

3. Design of adaptive RL controller

3.1. CNN

3.2. Design of n n -step adaptive neural network controller

4. Simulation

5. Conclusions

Acknowledgments

Conflict of interest

Appendix

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog

Abstract

1. Introduction

1.1. Motivation

1.2. Related work

1.3. Paper contribution and organization

2. Problem formulation and preliminaries

2.1. Model description

2.2. RBF NN

3. Design of adaptive RL controller

3.1. CNN

3.2. Design of n n -step adaptive neural network controller

4. Simulation

5. Conclusions

Acknowledgments

Conflict of interest

Appendix

References

3.2. Design of $n$ -step adaptive neural network controller

3.2. Design of $n$ -step adaptive neural network controller