
In this paper, we investigated the optimal tracking control problem of flexible-joint robotic manipulators in order to achieve trajectory tracking, and at the same time reduced the energy consumption of the feedback controller. Technically, optimization strategies were well-integrated into backstepping recursive design so that a series of optimized controllers for each subsystem could be constructed to improve the closed-loop system performance, and, additionally, a reinforcement learning method strategy based on neural network actor-critic architecture was adopted to approximate unknown terms in control design, making that the Hamilton-Jacobi-Bellman equation solvable in the sense of optimal control. With our scheme, the closed-loop stability, the convergence of output tracking error can be proved rigorously. Besides theoretical analysis, the effectiveness of our scheme was also illustrated by simulation results.
Citation: Huihui Zhong, Weijian Wen, Jianjun Fan, Weijun Yang. Reinforcement learning-based adaptive tracking control for flexible-joint robotic manipulators[J]. AIMS Mathematics, 2024, 9(10): 27330-27360. doi: 10.3934/math.20241328
[1] | Mohamed Kharrat, Moez Krichen, Loay Alkhalifa, Karim Gasmi . Neural networks-based adaptive command filter control for nonlinear systems with unknown backlash-like hysteresis and its application to single link robot manipulator. AIMS Mathematics, 2024, 9(1): 959-973. doi: 10.3934/math.2024048 |
[2] | Xufeng Tan, Yuan Li, Yang Liu . Stochastic linear quadratic optimal tracking control for discrete-time systems with delays based on Q-learning algorithm. AIMS Mathematics, 2023, 8(5): 10249-10265. doi: 10.3934/math.2023519 |
[3] | Jamilu Sabi'u, Ali Althobaiti, Saad Althobaiti, Soubhagya Kumar Sahoo, Thongchai Botmart . A scaled Polak-Ribi$ \grave{e} $re-Polyak conjugate gradient algorithm for constrained nonlinear systems and motion control. AIMS Mathematics, 2023, 8(2): 4843-4861. doi: 10.3934/math.2023241 |
[4] | Taewan Kim, Jung Hoon Kim . A new optimal control approach to uncertain Euler-Lagrange equations: $ H_\infty $ disturbance estimator and generalized $ H_2 $ tracking controller. AIMS Mathematics, 2024, 9(12): 34466-34487. doi: 10.3934/math.20241642 |
[5] | Hae Yeon Park, Jung Hoon Kim . Model-free control approach to uncertain Euler-Lagrange equations with a Lyapunov-based $ L_\infty $-gain analysis. AIMS Mathematics, 2023, 8(8): 17666-17686. doi: 10.3934/math.2023902 |
[6] | Tan Zhang, Pianpian Yan . Asymmetric integral barrier function-based tracking control of constrained robots. AIMS Mathematics, 2024, 9(1): 319-339. doi: 10.3934/math.2024019 |
[7] | Yuchen Niu, Kaibo Shi, Xiao Cai, Shiping Wen . Adaptive smooth sampled-data control for synchronization of T–S fuzzy reaction-diffusion neural networks with actuator saturation. AIMS Mathematics, 2025, 10(1): 1142-1161. doi: 10.3934/math.2025054 |
[8] | Mohamed S. Elhadidy, Waleed S. Abdalla, Alaa A. Abdelrahman, S. Elnaggar, Mostafa Elhosseini . Assessing the accuracy and efficiency of kinematic analysis tools for six-DOF industrial manipulators: The KUKA robot case study. AIMS Mathematics, 2024, 9(6): 13944-13979. doi: 10.3934/math.2024678 |
[9] | Wahida Mansouri, Amal Alshardan, Nazir Ahmad, Nuha Alruwais . Deepfake image detection and classification model using Bayesian deep learning with coronavirus herd immunity optimizer. AIMS Mathematics, 2024, 9(10): 29107-29134. doi: 10.3934/math.20241412 |
[10] | Mashael Maashi, Mohammed Abdullah Al-Hagery, Mohammed Rizwanullah, Azza Elneil Osman . Deep convolutional neural network-based Leveraging Lion Swarm Optimizer for gesture recognition and classification. AIMS Mathematics, 2024, 9(4): 9380-9393. doi: 10.3934/math.2024457 |
In this paper, we investigated the optimal tracking control problem of flexible-joint robotic manipulators in order to achieve trajectory tracking, and at the same time reduced the energy consumption of the feedback controller. Technically, optimization strategies were well-integrated into backstepping recursive design so that a series of optimized controllers for each subsystem could be constructed to improve the closed-loop system performance, and, additionally, a reinforcement learning method strategy based on neural network actor-critic architecture was adopted to approximate unknown terms in control design, making that the Hamilton-Jacobi-Bellman equation solvable in the sense of optimal control. With our scheme, the closed-loop stability, the convergence of output tracking error can be proved rigorously. Besides theoretical analysis, the effectiveness of our scheme was also illustrated by simulation results.
In recent decades, automation has flourished, leading to the widespread integration of robots across various sectors, including industrial production [1], healthcare [2], defense [3], aerospace engineering [4], and numerous other domains [5,6,7]. Robots used in industrial production are typically made of rigid materials, which results in high manufacturing costs and limited degrees of freedom. Furthermore, because of their relatively rigid structure, they are not well-suited for complex environments and may struggle to efficiently complete tasks in situations that involve interacting with unpredictable environments or objects. Therefore, the control problem of flexible-joint robotic manipulators with high adaptability and an extensive range of degrees of freedom has received much attention, and various approaches have been developed(e.g., [8,9,10,11,12]), among which the backstepping-based strategy would be the commonly used only due to the advantages in handling nonlinearities [13,14,15,16,17,18,19].
The backstepping controller, which utilizes a sampled-data extended state observer (SD-ESO), was proposed in [17] as a methodology to optimize the transient response of a flexible-joint robotic manipulator. This methodology is devised to minimize estimation inaccuracies and other constraints, thereby enhancing the overall performance of the robotic system. In [18], an explicit state feedback controller has been designed to solve the problem of practical tracking control of a flexible-joint robotic manipulator in the presence of actuator saturation by cleverly combining an inverse stepping scheme, an adaptive technique and a method of constructing a command filter and an actuator saturation assist system. In the study presented in [19], an adaptive control scheme is introduced to ensure the convergence of tracking deviations in a flexible-joint robotic manipulator. The methodology employs a backstepping control strategy to ensure that the deviation converges within a specified timeframe to a predetermined range. While the tracking accuracy and convergence rate can be well improved with the existing backstepping-based control schemes such as those mentioned above, they overlook the energy consumption of the controller. Considering that flexible manipulators require more energy for deformation and adjustment compared to rigid manipulators, optimizing energy consumption becomes crucial to enhance system performance and reduce operational costs. Therefore, it is crucial to implement control methods to optimize energy consumption.
Bellman in [20] and Pontryagin in [21] proposed the optimal control. This control approach aims to find control strategies for dynamical systems and to optimize the structured cost metric, thus achieving a harmonious balance between the available resources and required performance. However, since the optimal control is typically determined by solving the Hamilton-Jacobi-Bellman (HJB) equation [22], its inherent nonlinearity and complexity make it challenging to solve directly using analytical methods. Fortunately, the adaptive dynamic programming (ADP) or reinforcement learning (RL) proposed by Werbos et al. [23,24,25] provides an efficient technique for learning solutions to the HJB equation. The fundamental concept underlying this methodology is to modify the action step-by-step through feedback from the environment. This is generally achieved through the interactive learning of two neural networks (NNs): the actor and the critic. The critic plays a pivotal role in evaluating the actor's actions and providing feedback that guides the actor's policy optimization and subsequent action execution. Therefore, the energy consumption problem of the flexible-joint robotic manipulator can be managed by incorporating optimal control based on RL into the backstepping control. It should be pointed out that, integrating optimized control into the backstepping control of a flexible-joint robotic manipulator remains challenging due to the complexity of system control and convergence analysis.
In this paper, we propose a trajectory tracking control approach for flexible-joint robotic manipulators. By integrating optimization techniques into the backstepping control framework, we formulate each controller as an optimal solution tailored for its respective subsystem. This approach enhances the overall control efficacy of the flexible-joint robotic manipulator system. Concurrently, we employ RL grounded in the NN-based actor-critic architecture to tackle the intricate challenge posed by the HJB equation. In summary, the contributions of this paper are as follows:
(1) By constructing the performance index function with an error term and controller input, the controller is designed to minimize energy consumption and achieve the desired trajectory tracking task of the flexible-joint robotic manipulator.
(2) In the optimal backstepping control of a flexible-joint robotic manipulator, RL based on a NN actor-critic architecture is utilized. In this setup, the critic evaluates performance and provides feedback to the actor, which then executes the actor. This simplifies the design of the controller for the higher-order nonlinear flexible-joint robotic manipulator model.
The rest of this paper is organized as follows. In Section 2, we formulate the control problem, and give some fundamentals for design and analysis. In Section 3, a complete procedure is presented to show how an optimized controller is constructed, and the closed-loop stability is established. In Section 4, simulation results are collected to illustrate the effectiveness of our scheme. The whole paper is concluded in Section 5.
Disregarding the viscous damping effects, as referenced in [26], we obtain the dynamic equations for the single-link flexible-joint robotic manipulator depicted in Figure 1.
I¨q1+Mglsin(q1)+k(q1−q2)=0,J¨q2+k(q2−q1)=u, | (2.1) |
where q1 and q2 are the angular positions of the link and motor shaft, and u is the torque generated by the driving motor. The inertia I and J, the link mass M, the gravity acceleration g, the position of the link's center of gravity l, and the coefficient of strength of the spring k can be obtained by the identification system, so all of them are regarded as known parameters.
By selecting the state variables, x1=q1, x2=˙q1, x3=q2, x4=˙q2, the dynamic equation of system (2.1) becomes
˙x1(t)=x2(t),˙x2(t)=−MglIsin(x1(t))−kI(x1(t)−x3(t)),˙x3(t)=x4(t),˙x4(t)=kJ(x3(t)−x1(t))+1Ju(t). | (2.2) |
System (2.2) is equivalent to the following nonlinear model
˙x1(t)=x2(t),˙x2(t)=f2(ˉx2(t))+g2x3(t),˙x3(t)=x4(t),˙x4(t)=f4(ˉx4(t))+g4u(t),y(t)=x1(t), | (2.3) |
where f2(ˉx2(t))=−MgdJ1sin(x1(t))−kIx1(t), g2=kI, f4(ˉx4(t))=kJ(x3(t)−x1(t)), g4=1J. y(t)∈R is the system output, u(t)∈R is the control input, f(ˉxi(t))∈R is a known and bounded continuous function, and ˙xi(t),i=1,…,4, are assumed to exhibit stabilizability properties within the subsets that include the origin, and to satisfy the Lipschitz continuous.
Remark 2.1. The assumption that ˙xi satisfies Lipschitz continuous is made here to ensure that the system evolves smoothly over time, preventing sudden changes that could lead to instability or suboptimal performance to facilitate optimal control. Moreover, the system's seamless progression is ensured to remain within a defined boundary, subject to the confinement imposed by the Lipschitz continuity condition. In other words, the velocity of variation exhibited by the system's state variables is confined to a bounded region, dictated by a Lipschitz constant.
Definition 2.1. (Stable and ultimately uniformly bounded (SGUUB) [27]). For a nonlinear system with the state vector x(t)∈Rn
˙x(t)=f(x,t). |
Its solution is said to be SGUUB if, for x(0)∈Ωx where Ωx∈Rn is a compact set, there exist two constants σ and T(σ,x(0)), such that ‖x(t)‖≤σ is held for all t>t0+T(σ,x(0)).
The solution is characterized as SGUUB when, for any initial condition x(0) within the compact subset Ωx∈Rn, there exist positive scalar constants σ and T(σ,x(0)) that satisfy the inequality ‖x(t)‖≤σ for all time instants t exceeding the initial time t0 by a duration greater than T(σ,x(0)).
Lemma 2.1. Given G(t)∈R with G(0) bounded, if ˙G(t)≤−aG(t)+c for a,c>0, then G(t)≤e−atG(0)+ca(1−e−at).
Control objectives: In developing a critic-actor RL-based optimal control strategy for the single-link manipulator system (2.3), our objective is to ensure the following:
P1) Within the closed-loop control framework, all error signals, designated as zi(t) for i=1,⋯,4, and the weight estimation errors, expressed as ˜Wci(t) and ˜Wai(t) for i=1,⋯,4, are assured to be SGUUB in a predictable and desirable fashion;
P2) The single-link manipulator joint angular position q1(t) exhibits the capability to follow the desired trajectory yr in a predictable and desirable manner.
To describe the optimal control strategy, consider the following nonlinear continuous-time dynamic system:
˙x(t)=f(x)+g(x)u(x), | (2.4) |
where x(t)∈Rn represents the state variable, f(x)∈Rn denotes a continuous function, u(x)∈Rm signifies the input signal, and the term g(x)∈Rn×m is the continuous gain function. Assuming that the derivative x(t) exhibits Lipschitz continuity within the set Ω encompassing the origin, it ensures the uniqueness of the solution for the nonlinear system (2.4) with bounded initial values. Furthermore, the stabilizability of the system (2.4) implies the availability of a continuous control function u that can asymptotically stabilize the system, as referenced in [28].
Define the performance index of the dynamic system (2.4) as follows
V(x)=∫∞tr(x(τ),u(x(τ)))dτ, |
where r(x,u)=xTP1x+uTP2u is the cost function, P1=PT1∈Rn×n and P2=PT2∈Rm×m are two positive semi-definite matrices, and P2 signifies the impact of control efforts on the total cost.
Definition 2.2. The control strategy u(x) is considered acceptable on Ω, denoted as u(x)∈Ψ(Ω), if u(x) is continuous, u(0)=0, u is stable on Ω, and V(x) is finite.
When addressing the optimization of control strategies related to system (2.4), the primary objective is to determine a suitable control strategy, denoted as u(x) and belonging to the set Ψ(Ω), that enables the minimization of the value function V(x). Define the HJB function for system (2.4) as follows
H(x,u,Vx)=r(x,u)+VTx(x)˙x(t)=xTP1x+uTP2u+VTx(x)(f(x)+g(x)u(x)), |
where Vx(x)=∂V(x)/∂x is the partial differentiation of the performance index function V(x) with respect to the variable x.
To obtain optimal control, define the optimal function V∗(x) for the dynamic system (2.4) mentioned above with the optimal input u∗(x) as follows:
V∗(x)=minu∈Ψ(Ω)(∫∞tr(x(τ),u(x(τ))dτ)=∫∞tr(x(τ),u∗(x(τ))dτ. |
The HJB function is then obtained as follows:
H(x,u∗,V∗x)=r(x,u∗)+V∗Tx(x)˙x(t)=xTP1x+u∗TP2u∗+V∗Tx(x)(f(x)+g(x)u∗)=0, | (2.5) |
where V∗x(x)=∂V∗(x)/∂x denotes the partial derivative of the optimal performance index function V∗(x) with respect to x.
Assuming that (2.5) has, and only has, a unique solution, by solving the equation ∂H(x,u∗,V∗x)/∂u∗=0, the expression of u∗(x) is derived as
u∗(x)=−12P−12gT(x)V∗x(x). | (2.6) |
Substituting (2.6) into (2.5) gives the following result as
H(x,u∗,V∗x)=xTP1x+V∗Txf(x)−14V∗Tx(x)g(x)P−12gT(x)V∗x(x)=0. | (2.7) |
The optimal control policy u∗(x) in (2.5) is unknown because the term V∗x(x) is unknown, but it can be obtained by solving (2.7) to find the gradient term V∗x(x), and then substituting V∗x(x) into (2.6). However solving (2.7) is difficult or even impossible, especially for some high-order systems. To tackle such a problem, the prevalent approach in the extant literature involves employing the technique of RL with an actor-critic architecture: see in [29].
Multiple use cases have formalized the strong function approximation and adaptive learning capabilities of NNs. Distinctly, for any given nonlinear and continuous function F(z):Rn→Rm that is defined over a compact domain Ω, NNs of a specific configuration can serve as a proximate representation
FNN(z)=WTΓ(z), |
where W∈Rp×m is the weight of the NN, Γ(z)=[γ1(z),γ2(z),…,γp(z)]T∈Rp represents the Gaussian basis function vector, and p signifies the total number of neurons. Specifically, the expression for γi where i=1,…,p is given as follows:
γi=exp[−(x−vi)T(x−vi)φ2i], |
where vi=[vi1,vi2,…,vin] are centers of the respective field, and φi is the width of the Gaussian function.
In accordance with theoretical principles, there ought to exist an optimal weight matrix, denoted as W∗, which enables the accurate representation of F(z) as follows
F(z)=W∗TΓ(z)+ε(z), |
where ε(z)∈Rm denotes the approximation error that when the number of neurons p is large enough to satisfy ‖ε(z)‖≤δ, δ is an extremely small positive constant, and W∗ is the ideal weight used only for making stability analysis, denoted as
W∗≜argminW∈Rp×m{supz∈Ωz‖F(z)−WTΓ(z)‖}. |
Step 1: In this step, the tracking deviation vector is defined as z1(t)=x1(t)−yr(t). From (2.3), it can be deduced that its derivative is
˙z1(t)=x2(t)−˙yr(t). | (3.1) |
The optimal virtual control for the first step is denoted by α∗1(z1), with the optimal value function being defined accordingly,
V∗1(z1)=minα1∈Ψ(Ωz1)(∫∞tr1(z1(τ),α1(z1(τ)))dτ)=∫∞tr1(z1(τ),α∗1(z1(τ)))dτ, | (3.2) |
where α1(z1) is the virtual control, Ωz1 is the admissible set of α∗1, and r1=z21(t)+α21(z1) is the cost function in the first step. The optimal performance index function V∗1(z1) is divided into two components as shown below to facilitate the construction of optimal tracking control,
V∗1(z1)=β1z21(t)+Vo1(z1), | (3.3) |
where β1>0 is a designable constant, and Vo1(z1)=−β1z21(t)+V∗1(z1). By viewing x2(t) as α∗1, the HJB function can be obtained from tracking error (3.1) and the optimal function (3.3) as follows
H1(z1,α∗1,∂V∗1∂z1)=r1+∂V∗1(z1)∂z1˙z1(t)=z21(t)+α∗21(z1)+(2β1z1(t)+∂Vo1(z1)∂z1)(α∗1(z1)−˙yr(t))=0. | (3.4) |
The optimal virtual control α∗1 can be derived by solving ∂H1/∂α∗1=0 as
α∗1(z1)=−β1z1(t)−12∂Vo1(z1)∂z1. | (3.5) |
Because solving ∂Vo1(z1)/∂z1 is complex, but the term is continuous for Ωz1, it can be approximated with an NN as
∂Vo1(z1)∂z1=W∗T1Γ1(z1)+ε1(z1), | (3.6) |
where W∗T1∈Rm1 represents the ideal weight in the NN, and the item Γ1(z1)∈Rm1 signifies the basis function in the NN, and ε1(z1)∈R is the bounded approximation error.
Remark 3.1. Note that both NNs and FLSs can be used to approximate uncertain functions: see [30,31,32] for examples. Nevertheless, compared with FLS, the NN approximator could have the following advantages: 1) NNs eliminate the need to formulate a rule base, as they can automatically learn the input-output mapping relationship through training, making the process less complex, and 2) NNs can effectively handle anomalous samples through an adaptive mechanism.
With the aid of (3.6), it can be derived from (3.3) and (3.5) that
∂V∗1(z1)∂z1=2β1z1(t)+W∗T1Γ1(z1)+ε1(z1), | (3.7) |
α∗1(z1)=−β1z1(t)−12(W∗T1Γ1(z1)+ε1(z1)). | (3.8) |
Substituting (3.6) and (3.8) into (3.4), we can get the following expression:
H1(z1,α∗1,W∗1)=−(β21−1)z21(t)−2β1˙yr(t)z1(t)+W∗T1Γ1(z1)(−˙yr(t)−β1z1(t))−14W∗T1Γ1(z1)ΓT1(z1)W∗1+ϵ1(t)=0, | (3.9) |
where ϵ1(t)=ε1(z1)(−˙yr(t)+α∗1)+(1/4)ε21(z1) is bounded.
Due to the uncertainty surrounding the ideal weight W∗1, the optimal virtual control in (3.8) remains undetermined. Therefore, to achieve the desired tracking control, we employ an RL algorithm based on an actor-critic framework. In this framework, we use the critic module to assess the effectiveness of the control, while the actor component formulates the virtual control signal
∂ˆV∗1(z1)∂z1=2β1z1(t)+ˆWTc1(t)Γ1(z1), | (3.10) |
ˆα1(z1)=−β1z1(t)−12ˆWTa1(t)Γ1(z1), | (3.11) |
where ˆV∗1 is the estimation of V∗1, ˆWc1∈Rm1 represents the weight of critic NN, and ˆWa1∈Rm1 is the actor NN weight.
Remark 3.2. It's worth noting that unlike the single NN approach for approximating unknown functions discussed in [31] and other works, this paper employs RL based on actor-critic NNs. In this framework, the critic evaluates performance and provides feedback to the participants, who then execute the suggested actions. Since the critic offers direct feedback on the policy, the actor can focus on optimizing the policy, resulting in a more stable and effective update. In contrast, a single NN typically updates its strategy based on direct returns to adjust the policy, which can result in greater variance and negatively impact the efficiency and stability of the learning process.
By incorporating Eqs (3.10) and (3.11) into the framework of (3.4), the HJB equation is derived as
H1(z1,ˆα1,ˆWc1)=z21(t)+(−β1z1(t)−12ˆWTa1(t)Γ1(z1))2+(2β1z1(t)+ˆWTc1(t)Γ1(z1))(−β1z1(t)−12ˆWTa1(t)Γ1(z1)−˙yr(t)). | (3.12) |
Bellman residual error e1(t) can be derived from (3.9) and (3.12) as
e1(t)=H1(z1,ˆα1,ˆWc1)−H1(z1,α∗1,W∗1)=H1(z1,ˆα1,ˆWc1). | (3.13) |
Define the positive definite function of the Bellman residual error (3.13) as
E1(t)=12e21(t). | (3.14) |
To achieve the minimization of E1(t), the update law for the critic NN is derived by employing the method of gradient descent,
˙ˆWc1(t)=−μc1‖ω1‖2+1∂E1(t)∂ˆWc1=−μc1‖ω1‖2+1ω1(t)(ωT1(t)ˆWc1(t)−(β21−1)z21(t)+2β1z1(−˙yr)+14ˆWTa1Γ1(z1)ΓT1(z1)ˆWa1), | (3.15) |
where μc1>0 is the learning rate of critic NN and ω1=Γ1(z1)(−β1z1(t)−(1/2)ˆWTa1Γ1(z1)−˙yr)∈Rm1.
Remark 3.3. The matrix ωi(t) needs to satisfy the following equation for every t within the interval [t,t+ˉti]:
ΛiImi≤ωi(t)ωTi(t)≤ηiImi,i=1,⋯,4, | (3.16) |
where Λi, ηi, and ˉti are all positive values, and Imi∈Rmi×mi is the identity matrix. Satisfying the aforementioned incentive persistence conditions enhances the robustness and adaptability of the system, which further ensures the stability and performance of the flexible-joint robotic manipulator system.
The actor NN weight is updated by the following law
˙ˆWa1(t)=12Γ1(z1)z1(t)−μa1Γ1(z1)ΓT1(z1)ˆWa1(t)+μc14(‖ω1‖2+1)Γ1(z1)ΓT1(z1)ˆWa1(t)ωT1(t)ˆWc1(t), | (3.17) |
where μa1>0 is the actor learning rate.
Designate the tracking discrepancy for the second step as z2(t)=x2(t)−^α1(z1). Replace x2(t) with z2(t)+^α1(z1), then we can yield (3.1) as follows:
˙z1(t)=z2(t)+ˆα1(z1)−˙yr(t). | (3.18) |
Taking into account the scalar quadratic Lyapunov function pertaining to the first step, its formulation is presented as follows:
L1(t)=12z21(t)+12˜WTc1(t)˜Wc1(t)+12˜WTa1(t)˜Wa1(t), | (3.19) |
where ˜Wc1(t)=ˆWc1(t)−W∗1 is the critic NN weight error, and ˜Wa1(t)=ˆWa1(t)−W∗1 is the NN weight error of the actor. The derivative of (3.19) is
˙L1(t)=z1(t)˙z1(t)+˜WTc1(t)˙ˆWc1(t)+˜WTa1(t)˙ˆWa1(t). | (3.20) |
Then, recalling the tracking error (3.18), the updating law (3.15) (3.17), and the virtual control (3.11), we have
˙L1(t)=z1(t)(z2(t)+ˆa1(z1)−˙yr(t))−μc1‖ω1‖2+1˜WTc1(t)ω1(ωT1ˆWc1(t)−(β21−1)z21(t)−2β1z1(t)˙yr(t)+14ˆWTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t))+˜WTa1(t)(12Γ1(z1)z1(t)−μa1Γ1(z1)ΓT1(z1)ˆWa1(t)+μc14(‖ω1‖2+1)Γ1(z1)ΓT1(z1)ˆWa1(t)ωT1(t)ˆWc1(t)). | (3.21) |
By collating Eq (3.21), the following expression can be obtained:
˙L1(t)=z1(t)z2(t)−β1z21(t)−z1(t)˙yr−12z1(t)ˆWTa1(t)Γ1(z1)+12˜WTa1(t)Γ1(z1)z1(t)−μa1˜WTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)+μc14(‖ω1‖2+1)˜WTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)ωT1ˆWc1(t)−μc1‖ω1‖2+1˜WTc1(t)ω1(ωT1ˆWc1(t)−(β21−1)z21(t)+2β1z1(t)(−˙yr)+14ˆWTa1(t)Γ1(z1)ΓT1(z1)׈Wa1(t)). | (3.22) |
The following results can be deduced because of the equation ˜Wa1(t)=ˆWa1(t)−W∗1:
˜WTa1(t)Γ1(z1)z1−z1ˆWTa1(t)Γ1(z1)=−z1(t)W∗T1Γ1(z1), | (3.23) |
μa1˜WTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)=μa12˜WTa1(t)Γ1(z1)ΓT1(z1)˜Wa1(t)+μa12ˆWTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)−μa12W∗T1Γ1(z1)ΓT1(z1)W∗1. | (3.24) |
By inserting (3.23) and (3.24) into (3.22), ˙L1(t) is rewritten as
˙L1(t)=z1(t)z2(t)−β1z21(t)−z1(t)˙yr−12z1(t)W∗T1Γ1(z1)−μa12˜WTa1(t)Γ1(z1)ΓT1(z1)˜Wa1(t)−μa12ˆWTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)+μa12W∗T1Γ1(z1)ΓT1(z1)W∗1+μc14(‖ω1‖2+1)˜WTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)ωT1ˆWc1(t)−μc1‖ω1‖2+1˜WTc1(t)ω1(ωT1ˆWc1(t)−(β21−1)z21(t)+2β1z1(t)(−˙yr)+14ˆWTa1Γ1(z1)ΓT1(z1)ˆWa1). | (3.25) |
Utilizing Young's inequality ab≤(a2/2)+(b2/2), the following results are derived
−z1(t)˙yr(t)≤12z21(t)+12˙y2r(t), | (3.26) |
z1(t)z2(t)≤z21(t)+z22(t), | (3.27) |
−12z1(t)W∗T1Γ1(z1)≤12z21(t)+12(W∗T1Γ1(z1))2. | (3.28) |
By substituting (3.26), (3.27), and (3.28) into (3.25), we can get the following derivation:
˙L1(t)≤z22(t)−(β1−2)z21(t)+12˙y2r+μa1+12(W∗T1Γ1(z1))2−μa12˜WTa1(t)Γ1(z1)ΓT1(z1)˜Wa1(t)−μa12ˆWTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)+μc14(‖ω1‖2+1)˜WTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)ωT1ˆWc1(t)−μc1‖ω1‖2+1˜WTc1(t)ω1(ωT1ˆWc1(t)−(β21−1)z21(t)+2β1z1(−˙yr)+14ˆWTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)). | (3.29) |
There is the following fact:
−(β21−1)z21+2β1z1(−˙yr)=−W∗T1Γ1(z1)(−˙yr(t)−β1z1(t))+14W∗T1Γ1(z1)ΓT1(z1)W∗1−ϵ1(t)=−ωT1W∗1−12ˆWTa1(t)Γ1(z1)ΓT1(z1)W∗1+14W∗T1Γ1(z1)ΓT1(z1)W∗1−ϵ1(t), | (3.30) |
then we can rewrite the inequality (3.29) as
˙L1(t)≤z22(t)−(β1−2)z21(t)+μa1+12(W∗T1Γ1(z1))2+12˙y2r−μa12˜WTa1(t)Γ1(z1)ΓT1(z1)˜Wa1(t)−μa12ˆWTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)+μc14(‖ω1‖2+1)˜WTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)ωT1ˆWc1(t)−μc1‖ω1‖2+1˜WTc1(t)ω1(ωT1(t)˜Wc1(t)−12ˆWTa1(t)Γ1(z1)ΓT1(z1)W∗1+14W∗T1Γ1(z1)ΓT1(z1)W∗1+14ˆWTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)−ϵ1(t)). | (3.31) |
Given the equation ˜Wa1(t)=ˆWa1(t)−W∗1, it leads to the following equations:
−12ˆWTa1(t)Γ1(z1)ΓT1(z1)W∗1+14W∗T1Γ1(z1)ΓT1(z1)W∗1+14ˆWTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)=14˜WTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)−14W∗T1Γ1(z1)ΓT1(z1)˜Wa1(t). | (3.32) |
Pursuant to Young's inequality, the subsequent consequence can be deduced
μc1‖ω1‖2+1˜WTc1(t)ω1(t)ϵ1(t)≤μc12(‖ω1‖2+1)˜WTc1(t)ω1(t)ωT1(t)˜Wc1(t)+μc12ϵ21(t). | (3.33) |
Adding (3.32) and (3.33) into (3.31) yields
˙L1(t)≤z22(t)−(β1−2)z21(t)++μa1+12(W∗T1Γ1(z1))2+12˙y2r−μa12˜WTa1(t)Γ1(z1)ΓT1(z1)˜Wa1(t)−μa12ˆWTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)−μc12(‖ω1‖2+1)˜WTc1(t)ω1ωT1˜Wc1(t)+μc12ϵ21(t)+μc14(‖ω1‖2+1)˜WTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)ωT1ˆWc1(t)−μc14(‖ω1‖2+1)˜WTc1(t)ω1˜WTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)+μc14(‖ω1‖2+1)˜WTc1(t)ω1W∗T1Γ1(z1)ΓT1(z1)˜Wa1. | (3.34) |
Substituting the following equation
μc14(‖ω1‖2+1)˜WTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)ωT1ˆWc1(t)−μc14(‖ω1‖2+1)˜WTc1(t)ω1˜WTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)=μc14(‖ω1‖2+1)˜WTa1(t)Γ1(z1)W∗T1ω1ΓT1(z1)ˆWa1(t), | (3.35) |
into (3.34), we have
˙L1(t)≤z22(t)−(β1−2)z21(t)+μa1+12(W∗T1Γ1(z1))2+12˙y2r−μa12˜WTa1(t)Γ1(z1)ΓT1(z1)˜Wa1(t)−μa12ˆWTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)−μc12(‖ω1‖2+1)˜WTc1(t)ω1ωT1˜Wc1(t)+μc12ϵ21(t)+μc14(‖ω1‖2+1)˜WTa1(t)Γ1(z1)W∗T1ω1ΓT1(z1)ˆWa1(t)+μc14(‖ω1‖2+1)˜WTc1(t)ω1W∗T1Γ1(z1)ΓT1(z1)˜Wa1(t). | (3.36) |
Employing the principles of Young's inequality in conjunction with Cauchy's inequality, a series of inequalities can be formulated as follows:
μc14(‖ω1‖2+1)˜WTa1(t)Γ1(z1)W∗T1ω1(t)ΓT1(z1)ˆWa1(t)≤132˜WTa1(t)Γ1(z1)W∗T1ω1ωT1W∗1ΓT1(z1)˜Wa1(t)+μ2c12ˆWTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t), | (3.37) |
μ2c14(‖ω1‖2+1)˜WTc1(t)ω1(t)W∗T1Γ1(z1)ΓT1(z1)˜Wa1(t)≤132(‖ω1‖2+1)˜WTc1(t)Γ1(z1)W∗T1ω1ωT1W∗1ΓT1(z1)˜Wc1(t)+μ2c12˜WTa1(t)Γ1(z1)ΓT1(z1)˜Wa1(t). | (3.38) |
By incorporating the aforementioned inequalities into (3.36), we have made the necessary substitution,
˙L1(t)≤z22(t)−(β1−2)z21(t)−(μa12−μ2c12−132W∗T1ω1ωT1W∗1)˜WTa1(t)Γ1(z1)ΓT1(z1)˜Wa1(t)−1‖ω1‖2+1(μc12−132W∗T1Γ1(z1)ΓT1(z1)W∗1)˜WTc1(t)ω1ωT1˜Wc1(t)−(μa12−μ2c12)ˆWTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t)+12˙y2r(t)+μa1+12(W∗T1Γ1(z1))2+μc12ϵ21(t). | (3.39) |
Rewrite (3.39) as follows:
˙L1(t)≤−ξT1(t)A1(t)ξ1(t)+C1(t)+z22(t)−(μa12−μ2c12)ˆWTa1(t)Γ1(z1)ΓT1(z1)ˆWa1(t), | (3.40) |
where ξ1(t)=[z1(t),˜WTa1(t),˜WTc1(t)]T, C1(t)=12˙y2r(t)+μa1+12(W∗T1Γ1(z1))2+μc12ϵ21(t).
In accordance with the persistence of excitation (PE) assumption, the positivity of the definite matrix A1(t) can be ensured through the deliberate design of the parameters β1, μc1 and μa1 in such a way that they satisfy the specified set of inequalities
β1>2,μc1>116λ1,μa1>μ2c1+η116W∗T1W∗1, | (3.41) |
where λ1 is the maximal eigenvalue of Λ1=W∗T1Γ1(z1)ΓT1(z1)W∗1. Then, (3.40) becomes
˙L1(t)<z22(t)−a1‖ξ1(t)‖2+c1, | (3.42) |
where a1 is the lower bound on the minimum eigenvalue of A1(t) and c1 is the maximum value of C1(t).
Step 2: According to the tracking error z2(t)=x2(t)−^α1(z1) in the second step, it yields that
˙z2(t)=f2(ˉx2)+g2x3(t)−˙ˆα1(z1). | (3.43) |
The optimal value function V∗2(z2) in second step can be defined with the dynamic error z2(t) and the optimal virtual control α∗2 as
V∗2(z2)=minα2∈Ψ(Ωz2)(∫∞tr2(z2(τ),α2(z2(τ))dτ)=∫∞tr2(z2(τ),α∗2(z2(τ)))dτ, | (3.44) |
where r2=z22(t)+α22(z2) is the cost function, and α2(z2) represents the virtual control. Ψ(Ωz2) is the set of admissible control policies over Ωz2, where Ωz2 denotes a compact set that includes the origin of the system. To minimize the tracking error z2(t), we can rewrite the optimal value function V∗2 as
V∗2(z2)=β2z22(t)+Vo2(z2), | (3.45) |
where β2 is a positive designable constant and Vo2(z2)=−β2z22(t)+V∗2(z2) is a scalar-valued function. According to both (3.43) and (3.45), the HJB equation of the second step is
H2(z2,α∗2,∂V∗2∂z2)=z22(t)+α∗22(z2)+(2β2z2(t)+∂Vo2(z2)∂z2)(f2(ˉx2)+g2α∗2(z2)−˙ˆα1(z1))=0. | (3.46) |
Assuming that there is a solution and that it is unique, then by solving ∂H2/∂α∗2=0, the optimal virtual control α∗2 is
α∗2(z2)=g2(−β2z2(t)−12∂Vo2(z2)∂z2). | (3.47) |
Utilizing an NN approximator to estimate ∂Vo2(z2)/∂z2 yields that
∂Vo2(z2)∂z2=W∗T2Γ2(z2)+ε2(z2), | (3.48) |
where W∗T2∈Rm2 signifies the ideal weight in the NN, and the item Γ2(z2)∈Rm2 represents the basis function, and ε2(z2) denotes the approximation error that is bounded. The gradient term ∂V∗2(z2)/∂z2 and the optimal virtual control α∗2(z2) become
∂V∗2(z2)∂z2=2β2z2(t)+W∗T2Γ2(z2)+ε2(z2), | (3.49) |
α∗2(z2)=g2(−β2z2(t)−12(W∗T2Γ2(z2)+ε2(z2))). | (3.50) |
The optimal virtual control (3.50) cannot be used directly because the ideal weight vector W∗T2 is unknown. To achieve an effective and optimized control strategy, we implement an RL based on actor-critic NNs for deriving practical optimization
∂ˆV∗2∂z2=2β2z2(t)+ˆWTc2(t)Γ2(z2), | (3.51) |
ˆα2(z2)=g2(−β2z2(t)−12ˆWTa2(t)Γ2(z2)), | (3.52) |
where ˆV∗2 is the estimation of V∗2, ˆWc2∈Rm2 represents the weight of critic NN, and ˆWa2∈Rm2 denotes the actor NN weight. Upon inserting Eqs (3.51) and (3.52) into (3.46), we obtain the HJB equation
H2(z2,ˆα2,ˆWc2)=z22(t)+(−β2g2z2(t)−g22ˆWTa2(t)Γ2(z2))2+(2β2z2(t)+ˆWTc2(t)Γ2(z2))(f2(ˉx2)−β2g22z2(t)−g222ˆWTa2(t)Γ2(z2)−˙ˆα1(z1)). | (3.53) |
Remark 3.4. To ensure the boundedness of the HJB function H2, here we prove the boundedness of ˙ˆα1(z1).
The expression for ˙α1(z1) is as follows:
˙ˆα1(z1)=−β1(˙x1(t)−˙yr(t))−12(˙ˆWTa1Γ1(z1)+ˆWTa1˙Γ1(z1)). |
Because the term ˙x1 satisfies Lipschitz continuity, it is bounded. Obviously, ˙yr(t) and ˙ˆWTa1Γ1(z1)+ˆWTa1˙Γ1(z1) are also bounded. Consequently, the function ˙α1(z1), which consists of these bounded terms, is also bounded. Furthermore, ˙αi(zi),i=1,…,3 is bounded at each step, although this will not be shown hereafter.
To optimize the function E2(t)=e22(t)/2, we employ the gradient descent methodology. Then we can derive the subsequent update law for the critic NN weight ˆWc2(t),
˙ˆWc2(t)=−μc2‖ω2‖2+1ω2(t)(ωT2(t)ˆWc2(t)−(β22g22−1)z22(t)+2β2z2(f2(ˉx2)−˙ˆα1(z1))+g224ˆWTa2Γ2(z2)ΓT2(z2)ˆWa2), | (3.54) |
where μc2>0 is the learning rate and ω2=Γ2(z2)(f2(ˉx2)−β2g22z2(t)−(g22/2)ˆWTa2Γ2(z2)−˙ˆα1(z1))∈Rm2. The renewal law of actor NN weight ˆWa2(t) is designed as
˙ˆWa2(t)=g222Γ2(z2)z2(t)−μa2Γ2(z2)ΓT2(z2)ˆWa2(t)+μc2g224(‖ω2‖2+1)Γ2(z2)ΓT2(z2)ˆWa2(t)ωT2(t)ˆWc2(t), | (3.55) |
where μa2>0 is the learning rate of the actor NN.
By introducing the error variable in the third step as z3(t)=x3(t)−^α2(z2), we can rewrite (3.43) as
˙z2(t)=f2(ˉx2)+g2(z3(t)+ˆα2(z2))−˙ˆα1(z1). | (3.56) |
Design the Lyapunov function as
L2(t)=L1(t)+12z22(t)+12˜WTc2(t)˜Wc2(t)+12˜WTa2(t)˜Wa2(t), | (3.57) |
where ˜Wc2(t)=ˆWc2(t)−W∗2 and ˜Wa2(t)=ˆWa2(t)−W∗2. Its derivative is as follows:
˙L2(t)=˙L1(t)+z2(t)˙z2(t)+˜WTc2(t)˙ˆWc2(t)+˜WTa2(t)˙ˆWa2(t). | (3.58) |
Inserting (3.52), (3.54), (3.55), and (3.56) into (3.58), we have
˙L2(t)=˙L1(t)+g2z2(t)z3(t)+f2(¯x2)z2(t)−β2g22z22(t)−z2(t)˙ˆα1(z1)+μc24(‖ω2‖2+1)˜WTa2(t)Γ2(z2)ΓT2(z2)ˆWa2(t)ωT2ˆWc2(t)−g222z2(t)ˆWTa2(t)Γ2(z2)+g222˜WTa2(t)Γ2(z2)z2(t)−μa2˜WTa2(t)Γ2(z2)ΓT2(z2)ˆWa2−μc2‖ω2‖2+1˜WTc2(t)ω2(ωT2ˆWc2(t)−(β22g22−1)z22(t)+2β2z2(t)(f2(ˉx2)−˙ˆα1)+g224ˆWTa2(t)Γ2(z2)ΓT2(z2)ˆWa2(t)). | (3.59) |
Analogous to the first step, we can obtain the inequality shown as follows:
˙L2(t)≤˙L1(t)+z23(t)−(β2g22−g22−1)z22(t)−(μa22−μ2c2g422−132W∗T2ω2ωT2W∗2)˜WTa2(t)Γ2(z2)ΓT2(z2)˜Wa2(t)−1‖ω2‖2+1(μc22−132W∗T2Γ2(z2)ΓT2(z2)W∗2)˜WTc2(t)ω2ωT2˜Wc2(t)−(μa22−μ2c2g422)ˆWTa2(t)Γ2(z2)ΓT2(z2)ˆWa2(t)+12f22(ˉx2)+12˙ˆα21+μa2+g222(W∗T2Γ2(z2))2+μc22ϵ22(t). | (3.60) |
Rewrite (3.60) as follows:
˙L2(t)≤(−a1∥ξ1(t)‖2+c1)−ξT2(t)A2(t)ξ2(t)+C2(t)+z23(t)−(μa22−μ2c2g422)ˆWTa2(t)Γ2(z2)ΓT2(z2)ˆWa2(t), | (3.61) |
with the matrix ξ2(t)=[z2(t),˜WTa2(t),˜WTc2(t)]T and the term C2(t)=12f22(ˉx2)+12˙ˆα21+μc22ϵ22(t)+μa2+g222(W∗T2Γ2(z2))2.
In order to satisfy that the matrix A2(t) is positive definite, the parameters are designed as follows:
β2>1g22+1,μc2>116λ2,μa2>μ2c2g42+ζ216W∗T2W∗2, | (3.62) |
where λ2 is the maximal eigenvalue of matrix Λ2=W∗T2Γ2(z2)ΓT2(z2)W∗2. Consequently, we have
˙L2(t)<z23(t)−a1‖ξ1(t)‖2+c1−a2‖ξ2(t)‖2+c2, | (3.63) |
where a2 is the minimum eigenvalue of A2(t) and c2 is the maximum value of C2(t).
Step 3: Define the tracking error between x3(t) and ^α2(z2) for the third step as z3(t)=x3(t)−^α2(z2). Its time derivative along the pure-feedback system (2.3) is
˙z3(t)=x4(t)−˙ˆα2(z2). | (3.64) |
In the process, we first define the virtual control term α3(z3) and further introduce its optimal counterpart, denoted as α∗3(z3). Describe the performance index function V∗3(z3) as
V∗3(z3)=minα3∈Ψ(Ωz3)(∫∞tr3(z3(τ),α2(z3(τ)))dτ)=∫∞tr3(z3(τ),α∗3(z3(τ)))dτ, | (3.65) |
where r3=z23(t)+α23(z3) is the cost function, and the set Ωz3 represents a compact domain that encompasses the origin of the system. Rewrite the optimal value function V∗3 as
V∗3(z3)=β3z23(t)+Vo3(z3), | (3.66) |
where β3 is a positive designable constant and Vo3(z3)=−β3z23(t)+V∗3(z3) is a scalar-valued function. Then, we can derive the HJB equation as follows:
H3(z3,α∗3,∂V∗3∂z3)=α∗23(z3)+(2β3z3(t)+∂Vo3(z3)∂z3)(α∗3(z3)−˙ˆα2(z2))=0. | (3.67) |
By solving ∂H3/∂α∗3=0, the optimal virtual control α∗3 is
α∗3(z3)=−β3z3(t)−12∂Vo3(z3)∂z3. | (3.68) |
By applying NN, the part ∂Vo3(z3)/∂z3 can be approximated as
∂Vo3(z3)∂z3=W∗T3Γ3(z3)+ε3(z3), | (3.69) |
where W∗T3∈Rm3 represents the ideal weight, Γ3(z3)∈Rm3 denotes the basis function in the NN, and ε3(z3) signifies the bounded approximation error. With (3.69), the gradient term ∂V∗3(z3)/∂z3 and the optimal virtual control α∗3(z3) are obtained:
∂V∗3(z3)∂z3=2β3z3(t)+W∗T3Γ3(z3)+ε3(z3), | (3.70) |
α∗3(z3)=−β3z3(t)−12(W∗T3Γ3(z3)+ε3(z3)). | (3.71) |
Since W∗3 is not directly available, an RL based on the actor-critic architecture is employed as
∂ˆV∗3∂z3=2β3z3(t)+ˆWTc3(t)Γ3(z3), | (3.72) |
ˆα3(z3)=−β3z3(t)−12ˆWTa3(t)Γ3(z3), | (3.73) |
where ˆV∗3 is the estimation of V∗3, ˆWc3 is the weight of critic NN, and ˆWa3 is the weight of actor NN. Substituting (3.72) and (3.73) into (3.67), we can rewrite the HJB equation as
H3(z3,ˆα3,ˆWc3)=z23(t)+(−β3z3(t)−12ˆWTa3(t)Γ3(z3))2+(2β3z3(t)+ˆWTc3(t)Γ3(z3))(−β3z3(t)−12ˆWTa3(t)Γ3(z3)−˙ˆα2(z2)). | (3.74) |
To minimize E3(t)=e23(t)/2, design the following updating laws for the weights in the critic and actor NNs
˙ˆWc3(t)=−μc3‖ω3‖2+1ω3(t)(ωT3(t)ˆWc3(t)−(β23−1)z23(t)−2β3z3˙ˆα2(z2)+14ˆWTa3Γ3(z3)ΓT3(z3)ˆWa3), | (3.75) |
˙ˆWa3(t)=12Γ3(z3)z3(t)−μa3Γ3(z3)ΓT3(z3)ˆWa3(t)+μc34(‖ω3‖2+1)Γ3(z3)ΓT3(z3)ˆWa3(t)ωT3(t)ˆWc3(t), | (3.76) |
where μa3>0 and μc3>0 represent the designable learning rates of the actor NN and critic NN, respectively, and ω3=Γ3(z3)(−β3z3(t)−(1/2)ˆWTa3Γ3(z3)−˙ˆα2(z2))∈Rm3.
The tracking error in the step 4 is written as z4(t)=x4(t)−^α3(z3), then (3.64) is replaced as
˙z3(t)=z4(t)+ˆα3(z3)−˙ˆα2(z2). | (3.77) |
The Lyapunov function can be formulated as described below:
L3(t)=2∑k=1Lk(t)+12z23(t)+12˜WTc3(t)˜Wc3(t)+12˜WTa3(t)˜Wa3(t), | (3.78) |
where ˜Wc3(t)=ˆWc3(t)−W∗3 represents the estimation error of the critic NN, while ˜Wa3(t)=ˆWa3(t)−W∗3 is the actor NN estimation error. The derivative of the Lyapunov quadratic scalar function (3.78) is
˙L3(t)=2∑k=1˙Lk(t)+z3(t)˙z3(t)+˜WTc3(t)˙ˆWc3(t)+˜WTa3(t)˙ˆWa3(t). | (3.79) |
The equation along with (3.73), (3.75), (3.76), and (3.77) is
˙L3(t)=2∑k=1˙Lk(t)+z3(t)z4(t)−β3z23(t)−z3(t)˙ˆα2(z2)−12z3(t)ˆWa3(t)Γ3(z3)+12˜WTa3(t)Γ3(z3)z3(t)−μa3˜WTa3(t)Γ3(z3)ΓT3(z3)ˆWa3(t)+μc34(‖ω3‖2+1)˜WTa3(t)Γ3(z3)ΓT3(z3)ˆWa3(t)ωT3(t)ˆWc3(t)−μc3‖ω3‖2+1˜WTc3(t)ω3(ωT3ˆWc3(t)−(β23−1)z23(t)−2β3z3(t)˙ˆα2+14ˆWTa3(t)Γ3(z3)ΓT3(z3)ˆWa3(t)). | (3.80) |
Applying the control (3.73), (3.75), and (3.76), similar with step 1, we have the result
˙L3(t)≤2∑k=1˙Lk(t)+z24(t)−(β3−2)z23(t)−(μa32−μ2c32−132W∗T3ω3ωT3W∗3)˜WTa3(t)Γ3(z3)ΓT3(z3)˜Wa3(t)−1‖ω3‖2+1(μc32−132W∗T3(z3)ΓT3(z3)W∗3)˜WTc3(t)ω3ωT3˜Wc3(t)−(μa32−μ2c32)ˆWTa3(t)Γ3(z3)ΓT3(z3)ˆWa3(t)+12˙ˆα22+μa3+12(W∗T3Γ3(z3))2+μc32ϵ23(t). | (3.81) |
Rewrite (3.81) as follows:
˙L3(t)≤2∑k=1(−ak‖ξk(t)‖2+ck)−ξT3(t)A3(t)ξ3(t)+C3(t)+z24(t)−(μa32−μ2c32)ˆWTa3(t)Γ3(z3)ΓT3(z3)ˆWa3(t), | (3.82) |
where ξ3(t)=[z3(t),˜WTa3(t),˜WTc3(t)]T, C3(t)=12˙ˆα22+μa3+12(W∗T3Γ3(z3))2+μc32ϵ23(t).
Select parameters within the following intervals:
β3>2,μc3>116λ3,μa3>μ2c3+ζ316W∗T3W∗3, | (3.83) |
where λ3 is the maximal eigenvalue of matrix Λ3=W∗T3Γ3(z3)ΓT3(z3)W∗3. We have
˙L3(t)<z24(t)+3∑k=1(−ak‖ξk(t)‖2+ck), | (3.84) |
where a3 is the lower bound on the minimum eigenvalue of A3(t) and c3 is the maximum value of C3(t).
Step 4: The actual input u is obtained in the final step. The tracking error is z4(t)=x4(t)−^α3(z3), then
˙z4(t)=f4(ˉx4)+g4u−˙ˆα3(z3). | (3.85) |
The performance index function in the final step is described as
V∗4(z4)=minu∈Ψ(Ωz4)(∫∞tr4(z4(τ),u(z4(τ))dτ)=∫∞tr4(z4(τ),u∗(z4(τ)))dτ, | (3.86) |
where u∗ is the optimal actual input and r4=z24(t)+u2(z4) represents the cost function.
Without prejudice to generality, the actual controller u(z4) can be obtained as follows:
u(z4)=g4(−β4z4(t)−12ˆWTa4(t)Γ4(z4)), | (3.87) |
where ˆWa4 is the weight of actor NN. With the critic and actor updating law
˙ˆWc4(t)=−μc4‖ω2‖2+1ω4(t)(ωT4(t)ˆWc4(t)−(β24g24−1)z24(t)+2β4z4(f4(ˉx4)−˙ˆα3(z3))+g244ˆWTa4Γ4(z4)ΓT4(z4)ˆWa4), | (3.88) |
˙ˆWa4(t)=g242Γ4(z4)z4(t)−μa4Γ4(z4)ΓT4(z4)ˆWa4(t)+μc4g244(‖ω4‖2+1)Γ4(z4)ΓT4(z4)ˆWa4(t)ωT4(t)ˆWc4(t), | (3.89) |
where μc4>0 and μa4>0 are the critic and actor learning rate, separately, and ω4=Γ4(z4)(f4(ˉx4)−β4g24z4(t)−(g24/2)ˆWTa4Γ4(z4)−˙ˆα3(z3))∈Rm4.
In the final step, the Lyapunov quadratic scalar function is chosen as
L4(t)=3∑k=1Lk(t)+12˜WTc4(t)˜Wc4(t)+12z24(t)+12˜WTa4(t)˜Wa4(t), | (3.90) |
where ˜Wc4(t)=ˆWc4(t)−W∗4 is the critic NN estimation error, and ˜Wa4(t)=ˆWa4(t)−W∗4 is estimation error of the actor NN. The derivative of (3.90) is
˙L4(t)=3∑k=1˙Lk(t)+z4(t)˙z4(t)+˜WTa4(t)˙ˆWa4(t)+˜WTc4(t)˙ˆWc4(t). | (3.91) |
According to (3.87), (3.88), and (3.89), we have
˙L4(t)=3∑k=1˙Lk(t)+f4(¯x4)z4(t)−β4g24z24(t)−z4(t)˙ˆα3−g242z4(t)ˆWa4(t)Γ4(z4)+g242˜WTa4(t)Γ4(z4)z4(t)−μa4˜WTa4(t)Γ4(z4)ΓT4(z4)ˆWa4(t)+μc44(‖ω4‖2+1)˜WTa4(t)Γ4(z4)ΓT4(z4)ˆWa4(t)ωT4(t)ˆWc4(t)−μc4‖ω4‖2+1˜WTc4(t)ω4(ωT4ˆWc4(t)−(β24g24−1)z24(t)+2β4z4(t)(f4(ˉx4)−˙ˆα3)+g244ˆWTa4(t)Γ4(z4)ΓT4(z4)ˆWa4(t)). | (3.92) |
Similar to the first step, we can also deduce the following result
˙L4(t)≤3∑k=1˙Lk(t)−(β4g24−g24−1)z24−(μa42−μ2c4g442−132W∗T4ω4ωT4W∗4)˜WTa4(t)Γ4(z4)ΓT4(z4)˜Wa4(t)−1‖ω4‖2+1(μc42−132W∗T4Γ4(z4)ΓT4(z4)W∗4)˜WTc4(t)ω4ωT4˜Wc4(t)−(μa42−μ2c4g442)ˆWTa4(t)Γ4(z4)ΓT4(z4)ˆWa4(t)+12f24(ˉx4)+14˙ˆα23(t)+μa4+g242(W∗T4Γ4(z4))2+μc42ϵ24(t). | (3.93) |
Rewrite (3.93) as follows:
˙L4(t)≤3∑k=1(−ak‖ξk(t)‖2+ck)−ξT4(t)A4(t)ξ4(t)+C4(t)−(μa42−μ2c4g442)ˆWTa4(t)Γ4(z4)ΓT4(z4)ˆWa4(t), | (3.94) |
with the matrix ξ4(t)=[z4(t),˜WTa4(t),˜WTc4(t)]T, and the term C4(t)=12f24(ˉx4)+12˙ˆα23+μa4+g242(W∗T4Γ4(z4))2+μc42ϵ24(t).
To ensure system stability, the design parameters β4, μc4, and μa4 must satisfy
β4>1g24+1,μc4>116λ4,μa4>μ2c4g44+ζ416W∗T4W∗4, | (3.95) |
where λ4 is the maximal eigenvalue of matrix Λ4=W∗T4Γ4(z4)ΓT4(z4)W∗4.
The selection of a4 as the infimum over t≥0 of the minimum eigenvalue of A4(t) and c4 as the supremum over t≥0 of C4(t) allows Eq (3.94) to be reformulated as follows:
˙L(t)<4∑k=1(−ak‖ξk(t)‖2+ck). | (3.96) |
Based on the above derivation, we can achieve the objectives:
1) Within the closed-loop control framework, all error signals, designated as zi(t) for i=1,⋯,4 and the weight estimation errors, expressed as ˜Wci(t) and ˜Wai(t) for i=1,⋯,4, are assured to be SGUUB in anpredictable and desirable fashion;
2) The single-link manipulator joint angular position q1(t) exhibits the capability to follow the desired trajectory yr in a predictable and desirable manner.
Prove as follows:
1) The inequality (3.96) can be
˙L(t)<−aL(t)+c, |
where a is the minimal of ak,k=1,2,⋯,4 and c is the sum of ck,k=1,2,⋯,4. According to Lemma 2.1, we can clearly get the following result:
L(t)<e−atL(0)+ca(1−e−at), |
which can prove that that control objective 1 is valid.
2) Define Lz(t)=(1/2)∑4k=1z2k(t). According to the Eqs (3.18), (3.56), (3.77), and (3.85), we have
˙Lz(t)=z1(t)(ˆα1(z1)+z2(t)−˙yr(t))+z2(t)(f2(ˉx2)+g2(ˆα2(z2)+z3(t))−˙ˆα1(z1))+z3(t)(z4(t)+ˆα3(z3)−˙ˆα2(z2))+z4(t)(f4(ˉx4)+g4u(t)−˙ˆα3(z3)). | (3.97) |
Substituting (3.11), (3.52), (3.73), and (3.87) into (3.97), we have the following result:
˙Lz(t)=−β1z1(t)2+z1(t)z2(t)−z1(t)˙yr−12z1(t)ˆWTa1Γ1−g22β2z2(t)2+g2z2(t)z3(t)−z2(t)˙ˆα1−g222z2(t)ˆWTa2Γ2+z2(t)f2(ˉx2)−β3z3(t)2+z3(t)z4(t)−z1(t)˙^α2−12z3(t)ˆWTa3Γ3−g24β4z4(t)2−z4(t)˙ˆα3−g242z4(t)ˆWTa4Γ4+z4(t)f4(ˉx4). | (3.98) |
Using Young's inequality, it is clear that we can get the following result:
˙Lz(t)≤−(β1−2)z21(t)−(β2g22−g22−1)z22(t)−(β3−2)z23(t)−(β4g24−g24−1)z24(t)+D(t), | (3.99) |
where D(t)=(1/2)f22(ˉx2)+(1/2)f24(ˉx4)+(1/2)∑3k=1˙ˆα2k+(1/2)˙ˆy2r(t)+(1/2)(ˆWTa1(t)Γ1(z1))2+(1/2)(ˆWTa3(t)Γ3(z3))2+(g22/2)(ˆWTa2(t)Γ2(z2))2+(g24/2)(ˆWTa4(t)Γ4(z4))2 is bounded. A constant ρ exists, bounding |D(t)|. Hence, the above result can be described as
˙Lz(t)<−βLz(t)+ρ, |
where β is the minimal of {β1−2,β2g22−g22−1,β3−2,β4g24−g24−1}. Obviously, we can get the following result:
Lz(t)<e−βtLz(0)+ρβ(1−e−βt). |
It implies that increasing β sufficiently ensures desired tracking accuracy and control performance.
Ultimately, according to (3.11), (3.52), (3.73), and (3.87), we design an adaptive tracking control strategy for the flexible-joint manipulator. The details of this control method are illustrated in Figure 2.
To enhance the validation of the method's effectiveness in controlling a flexible-joint robotic manipulator, numerical simulations were conducted. Table 1 provides the key parameters relevant to the single-link manipulator. The initial conditions are set to q1(0)=8deg, ˙q1(0)=0deg/s, q2(0)=10deg, and ˙q2(0)=0deg/s, and we choose the desired trajectory as yr(t)=28sin(3t/4), shown in Figure 3.
Parameters | Description | Values | Unit |
I | the mass inertia | 20 | kg⋅m2 |
J | the actuator inertia | 0.1 | kg⋅m2 |
M | the link mass | 0.1 | kg |
g | gravity acceleration | 9.8 | m/s2 |
l | the link's center of gravity position | 0.1 | m |
k | the joint flexible | 100 | N⋅m/rad |
To achieve the tracking objectives, the design of the virtual controller for the first three steps and the input signal for the final step correspond to (3.11), (3.52), (3.73), and (3.87), respectively, where the designable parameters are set as [β1,β2,β3,β4]=[6.00,2.04,11.00,2.01]. The NN at each step has 36 neurons with centers uniformly distributed in the range [−6,6], and the widths φi,i=1,⋯,4 of the Gaussian functions of the basis functions Γi are all chosen to be 2. The update rate of the critic weights at each step corresponds to (3.15), (3.54), (3.75), and (3.88), respectively, and the designable parameters learning rate and initial weights are [μc1,μc2,μc3,μc4]=[0.4,0.4,0.4,0.4], Wci(0)=[0.5]36×1,i=1,⋯,4. The update rate of the actor weights at each step corresponds to (3.17), (3.55), (3.76), and (3.89), respectively, where the designable parameters learning rate and initial weights are [μa1,μa2,μa3,μa4]=[300,300,300,300], Wai(0)=[0.4]36×1,i=1,⋯,4.
Simulation result: The individual figures depict the results of the simulation process. The actual output y(t) and the expected trajectory yr(t) are demonstrated in Figure 3, which it is clear to see that the actual output is able to better align with the desired output. Figure 4 shows the states xi,i=1,⋯,4. The weight's norm of critic NN Wci(t),i=1,⋯,4 is presented in Figure 5 and the weight's norm of actor NN Wai(t),i=1,⋯,4 is presented in Figure 6, which it is clear that all weights are bounded and converge to a certain value. The input u(t) is illustrated in Figure 7, which observes that the input converges to the range of [−5,5]. In addition, Figures 8 and 9 illustrate the tracking error z1(t) as k varies within the range of [100,200] and i varies within the range of [15,30], demonstrating the robustness of this control method. In conclusion, it is observed that our control strategy enables the actual output y(t) to track well on the expected trajectory yr(t) while optimizing the controller energy consumption. In order to better demonstrate the optimization of the energy consumption in this control scheme for a flexible robotic manipulator, we conduct a comparative experiment with the control scheme referenced in [19]. As illustrated in Figures 10 and 11, under conditions of similar tracking effectiveness, the control energy consumption of our scheme is significantly improved compared to that of the scheme in [19].
In this paper, an optimal backstepping control scheme is proposed for trajectory tracking of a flexible manipulator by integrating optimal control into backstepping control. In this control scheme, each virtual controller, as well as the actual controller, is designed as an optimized solution in the corresponding inverse step. This approach achieves performance optimization for the entire flexible robotic manipulator system. RL is built on a critic-actor architecture, where the critic assesses performance then provides feedback to the actor. The actor then controls the system, and the two NNs collaborate to learn. Since the RL update law is derived from the negative gradient of a simple function, we simplify the design of the controller compared to existing optimal control methods for flexible robotic manipulators. Finally, the effectiveness of the control method for solving the trajectory tracking problem of flexible robotic manipulators is demonstrated through both theoretical analysis and simulation studies.
Huihui Zhong: Methodology, Validation, Writing-original draft; Weijian Wen: Formal analysis, Supervision; Jianjun Fan: Conceptualization, Investigation; Weijun Yang: Writing – Review and Editing, Visualization. All authors have read and approved the final version of the manuscript for publication.
This work was supported by the Special projects in key fields of colleges and universities in Guangdong Province, China (No.2024ZDZX1070, No.2024ZDZX3094), and the Guangdong University research and innovation team project (No.2024KCXTD075).
All authors declare no conflicts of interest in this paper.
[1] |
Z. Li, S. Li, X. Luo, An overview of calibration technology of industrial robots, IEEE-CAA J. Automatica Sin., 8 (2021), 23–36. https://doi.org/10.1109/JAS.2020.1003381 doi: 10.1109/JAS.2020.1003381
![]() |
[2] |
M. Kyrarini, F. Lygerakis, A. Rajavenkatanarayanan, C. Sevastopoulos, H. R. Nambiappan, K. K. Chaitanya, et al., A survey of robots in healthcare, Technologies, 9 (2021), 8. https://doi.org/10.3390/technologies9010008 doi: 10.3390/technologies9010008
![]() |
[3] | M. Payal, P. Dixit, T. V. M. Sairam, N. Goyal, Robotics, AI, and the IoT in defense systems, In: AI and IoT-based intelligent automation in robotics, Wiley, 2021. https://doi.org/10.1002/9781119711230.ch7 |
[4] | Q. Qi, G. Qin, Z. Yang, G. Chen, J. Xu, Z. Lv, et al., Design and motion control of a tendon-driven continuum robot for aerospace applications, P. I. Mech. Eng. G J. Aer., 2024. https://doi.org/10.1177/09544100241263004 |
[5] | M. Sostero, Automation and robots in services: Review of data and taxonomy, In: JRC working papers series on labour, education and technology, Joint Research Centre, 2020. |
[6] |
Q. Yang, X. Du, Z. Wang, Z. Meng, Z. Ma, Q. Zhang, A review of core agricultural robot technologies for crop productions, Comput. Electron. Agr., 206 (2023), 107701. https://doi.org/10.1016/j.compag.2023.107701 doi: 10.1016/j.compag.2023.107701
![]() |
[7] |
I. Arocena, A. Huegun-Burgos, I. Rekalde-Rodriguez, Robotics and education: A systematic review, TEM J., 11 (2022), 379–387. https://doi.org/10.18421/TEM111-48 doi: 10.18421/TEM111-48
![]() |
[8] |
C. E. Boudjedir, M. Bouri, D. Boukhetala, An enhanced adaptive time delay control-based integral sliding mode for trajectory tracking of robot manipulators, IEEE Trans. Control Syst. Technol., 31 (2023), 1042–1050. http://dx.doi.org/10.1109/TCST.2022.3208491 doi: 10.1109/TCST.2022.3208491
![]() |
[9] |
P. Li, D. Liu, S. Baldi, Adaptive integral sliding mode control in the presence of state-dependent uncertainty, IEEE-ASME Trans. Mechatron., 27 (2022), 3885–3895. http://dx.doi.org/10.1109/TMECH.2022.3145910 doi: 10.1109/TMECH.2022.3145910
![]() |
[10] |
J. Park, W. Kwon, P. Park, An improved adaptive sliding mode control based on time-delay control for robot manipulators, IEEE Trans. Ind. Electron., 70 (2023), 10363–10373. http://dx.doi.org/10.1109/TIE.2022.3222616 doi: 10.1109/TIE.2022.3222616
![]() |
[11] |
H. Ma, H. Ren, Q. Zhou, H. Li, Z. Wang, Observer-based neural control of N-link flexible-joint robots, IEEE Trans. Neural Netw. Learn. Syst., 35 (2024), 5295–5305. https://doi.org/10.1109/TNNLS.2022.3203074 doi: 10.1109/TNNLS.2022.3203074
![]() |
[12] |
Y. Xie, Q. Ma, J. Gu, G. Zhou, Event-triggered fixed-time practical tracking control for flexible-joint robot, IEEE Trans. Fuzzy Syst., 31 (2023), 67–76. https://doi.org/10.1109/TFUZZ.2022.3181463 doi: 10.1109/TFUZZ.2022.3181463
![]() |
[13] |
M. M. Arefi, N. Vafamand, B. Homayoun, M. Davoodi, Command filtered backstepping control of constrained flexible joint robotic manipulator, IET Control Theory Appl., 17 (2023), 2506–2518. https://doi.org/10.1049/cth2.12528 doi: 10.1049/cth2.12528
![]() |
[14] |
X. Cheng, Y. J. Zhang, H. S. Liu, D. Wollherr, M. Buss, Adaptive neural backstepping control for flexible-joint robot manipulator with bounded torque inputs, Neurocomputing, 458 (2021), 70–86. https://doi.org/10.1016/j.neucom.2021.06.013 doi: 10.1016/j.neucom.2021.06.013
![]() |
[15] |
Y. Zhang, M. Zhang, F. Du, Robust finite-time command-filtered backstepping control for flexible-joint robots with only position measurements, IEEE Trans. Syst. Man Cybern. Syst., 54 (2024), 1263–1275. https://doi.org/10.1109/TSMC.2023.3324761 doi: 10.1109/TSMC.2023.3324761
![]() |
[16] |
R. Datouo, J. J. B. M. Ahanda, A. Melingui, F. Biya-Motto, B. E. Zobo, Adaptive fuzzy finite-time command-filtered backstepping control of flexible-joint robots, Robotica, 39 (2021), 1081–1100. https://doi.org/10.1017/S0263574720000910 doi: 10.1017/S0263574720000910
![]() |
[17] |
U. K. Sahu, B. Subudhi, D. Patra, Sampled-data extended state observer-based backstepping control of two-link flexible manipulator, Trans. Inst. Meas. Control, 41 (2019), 3581–3599. https://doi.org/10.1177/0142331219832954 doi: 10.1177/0142331219832954
![]() |
[18] |
J. Li, L. Zhu, Practical tracking control under actuator saturation for a class of flexible-joint robotic manipulators driven by DC motors, Nonlinear Dyn., 109 (2022), 2745–2758. https://doi.org/10.1007/s11071-022-07602-4 doi: 10.1007/s11071-022-07602-4
![]() |
[19] |
G. Lai, S. Zou, H. Xiao, L. Wang, Z. Liu, K. Chen, Fixed-time adaptive fuzzy control with prescribed tracking performances for flexible-joint manipulators, J. Franklin Inst., 361 (2024), 106809. https://doi.org/10.1016/j.jfranklin.2024.106809 doi: 10.1016/j.jfranklin.2024.106809
![]() |
[20] | R. Bellman, Dynamic programming, Science, 153 (1966), 34–37. https://doi.org/10.1126/science.153.3731.34 |
[21] | L. S. Pontryagin, Mathematical theory of optimal processes, London: Routledge, 2017. https://doi.org/10.1201/9780203749319 |
[22] |
Y. Yang, H. Modares, K. G. Vamvoudakis, W. He, C. Z. Xu, D. C. Wunsch, Hamiltonian-driven adaptive dynamic programming with approximation errors, IEEE Trans. Cybern., 52 (2022), 13762–13773. https://doi.org/10.1109/TCYB.2021.3108034 doi: 10.1109/TCYB.2021.3108034
![]() |
[23] | P. J. Werbos, Neural networks for control and system identification, In: Proceedings of the 28th IEEE conference on decision and control, 1 (1989), 260–265. https://doi.org/10.1109/CDC.1989.70114 |
[24] | W. T. Miller, R. S. Sutton, P. J. Webros, A menu of designs for reinforcement learning over time, In: Neural networks for control, MIT Press, 1995, 67–95. |
[25] | P. J. Webros, Approximate dynamic programming for real-time control and neural modeling, In: Handbook of intelligent control: Neural fuzzy and adaptive approaches, New York: Van Nostrand Reinhold, 1992. |
[26] |
G. Lai, Y. Zhang, Z. Liu, J. Wang, K. Chen, C. L. P. Chen, Direct adaptive fuzzy control scheme with guaranteed tracking performances for uncertain canonical nonlinear systems, IEEE Trans. Fuzzy Syst., 30 (2022), 818–829. https://doi.org/10.1109/TFUZZ.2021.3049902 doi: 10.1109/TFUZZ.2021.3049902
![]() |
[27] |
Y. Wang, Y. Chang, A. F. Alkhateeb, N. D. Alotaibi, Adaptive fuzzy output-feedback tracking control for switched nonstrict-feedback nonlinear systems with prescribed performance, Circuits Syst. Signal Process., 40 (2021), 88–113. https://doi.org/10.1007/s00034-020-01466-y doi: 10.1007/s00034-020-01466-y
![]() |
[28] |
D. Wang, M. Ha, M. Zhao, The intelligent critic framework for advanced optimal control, Artif. Intell. Rev., 55 (2022), 1–22. https://doi.org/10.1007/s10462-021-10118-9 doi: 10.1007/s10462-021-10118-9
![]() |
[29] | D. Li, J. Dong, Fractional-order systems optimal control via actor-critic reinforcement learning and its validation for chaotic MFET, IEEE Trans. Autom. Sci. Eng., 2024, 1–10. https://doi.org/10.1109/TASE.2024.3361213 |
[30] |
D. Cui, C. K. Ahn, Y. Sun, Z. Xiang, Mode-dependent state observer-based prescribed performance control of switched systems, IEEE Trans. Circuits Syst. Ⅱ-Express Briefs, 71 (2024), 3810–3814. https://doi.org/10.1109/TCSII.2024.3370865 doi: 10.1109/TCSII.2024.3370865
![]() |
[31] |
H. Jiang, W. Su, B. Niu, H. Wang, J. Zhang, Adaptive neural consensus tracking control of distributed nonlinear multiagent systems with unmodeled dynamics, Int. J. Robust Nonlinear Control, 32 (2022), 8999–9016. https://doi.org/10.1002/rnc.6313 doi: 10.1002/rnc.6313
![]() |
[32] |
G. Lai, Y. Zhang, Z. Liu, C. L. P. Chen, Indirect adaptive fuzzy control design with guaranteed tracking error performance for uncertain canonical nonlinear systems, IEEE Trans. Fuzzy Syst., 27 (2019), 1139–1150. https://doi.org/10.1109/TFUZZ.2018.2870574 doi: 10.1109/TFUZZ.2018.2870574
![]() |
Parameters | Description | Values | Unit |
I | the mass inertia | 20 | kg⋅m2 |
J | the actuator inertia | 0.1 | kg⋅m2 |
M | the link mass | 0.1 | kg |
g | gravity acceleration | 9.8 | m/s2 |
l | the link's center of gravity position | 0.1 | m |
k | the joint flexible | 100 | N⋅m/rad |