A survey of adaptive optimal control theory

Xiaoxuan Pei; Kewen Li; Yongming Li; Xiaoxuan Pei; Kewen Li; Yongming Li

doi:10.3934/mbe.2022561

Mathematical Biosciences and Engineering

2022, Volume 19, Issue 12: 12058-12072. doi: 10.3934/mbe.2022561

Previous Article Next Article

Survey Special Issues

A survey of adaptive optimal control theory

College of Science, Liaoning University of Technology, Jinzhou 121001, China

Academic Editor: Xiaodi Li

Received: 10 July 2022 Revised: 02 August 2022 Accepted: 05 August 2022 Published: 18 August 2022

This paper makes a survey about the recent development of optimal control based on adaptive dynamic programming (ADP). First of all, based on DP algorithm and reinforcement learning (RL) algorithm, the origin and development of the optimization idea and its application in the control field are introduced. The second part introduces achievements in the optimal control direction, then we classify and summarize the research results of optimization method, constraint problem, structure design in control algorithm and practical engineering process based on optimal control. Finally, the possible future research topics are discussed. Through a comprehensive and complete investigation of its application in many existing fields, this survey fully demonstrates that the optimal control algorithms via ADP with critic-actor neural network (NN) structure, which also have a broad application prospect, and some developed optimal control design algorithms have been applied to practical engineering fields.
- optimal control,
- ADP,
- backstepping design,
- neural networks,
- application
Citation: Xiaoxuan Pei, Kewen Li, Yongming Li. A survey of adaptive optimal control theory[J]. Mathematical Biosciences and Engineering, 2022, 19(12): 12058-12072. doi: 10.3934/mbe.2022561

Related Papers:

Abstract

This paper makes a survey about the recent development of optimal control based on adaptive dynamic programming (ADP). First of all, based on DP algorithm and reinforcement learning (RL) algorithm, the origin and development of the optimization idea and its application in the control field are introduced. The second part introduces achievements in the optimal control direction, then we classify and summarize the research results of optimization method, constraint problem, structure design in control algorithm and practical engineering process based on optimal control. Finally, the possible future research topics are discussed. Through a comprehensive and complete investigation of its application in many existing fields, this survey fully demonstrates that the optimal control algorithms via ADP with critic-actor neural network (NN) structure, which also have a broad application prospect, and some developed optimal control design algorithms have been applied to practical engineering fields.

References

[1]	R. E. Kalman, When is a linear control system optimal, J. Basic Eng., 86 (1964), 51–60. https://doi.org/10.1115/1.3653115 doi: 10.1115/1.3653115
[2]	R. A. Freeman, P. V. Kokotovic, Inverse optimality in robust stabiliztion, SIAM J. Control Optim., 34 (1998). https://doi.org/10.1137/S0363012993258732
[3]	R. Bellman, Dynamic programming, Science, 153 (1966), 34–37. https://doi.org/10.1126/science.153.3731.34
[4]	P. J. Werbos, New Tools for Prediction and Analysis in the Behavioral Sciences, Ph.D thesis, Harvard University, 1974.
[5]	P. J. Werbos, Advanced forecasting methods for global crisis warning and models of intelligence, Gen. Syst., 1977 (1977), 25–38. https://doi.org/10.1086/292050 doi: 10.1086/292050
[6]	P. J. Werbos, Optimization methods for brain-like intelligent control, in Proceedings of 1995 34th IEEE Conference on Decision and Control, 1 (1977), 579–584. https://doi.org/10.1109/CDC.1995.478957
[7]	G. A. Rovithakis, M. A. Christodoulou, Adaptive control of unknown plants using dynamical neural networks, IEEE Trans. Syst. Man Cybern., 24 (1994), 400–412. https://doi.org/10.1109/21.278990 doi: 10.1109/21.278990
[8]	J. J. Murray, C. J. Cox, G. G. Lendaris, R. Saeks, Adaptive dynamic programming, IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., 32 (2002), 140–153. https://doi.org/10.1109/TSMCC.2002.801727
[9]	M. Abu-Khalaf, F. L. Lewis, Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach, Automatica, 41 (2010), 779–791. https://doi.org/10.1016/j.automatica.2004.11.034 doi: 10.1016/j.automatica.2004.11.034
[10]	K. G. Vamvoudakis, F. L. Lewis, Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem, Automatica, 46 (2010), 878–888. https://doi.org/10.1016/j.automatica.2010.02.018 doi: 10.1016/j.automatica.2010.02.018
[11]	X. D. Li, D. X. Peng, J. D. Cao, Lyapunov stability for impulsive systems via event-triggered impulsive control, IEEE Trans. Autom. Control, 65 (2020), 4908–4913. https://doi.org/10.1109/TAC.2020.2964558 doi: 10.1109/TAC.2020.2964558
[12]	X. D. Li, S. J. Song, J. H. Wu, Exponential stability of nonlinear systems with delayed impulses and applications, IEEE Trans. Autom. Control, 64 (2019), 4024–4034. https://doi.org/10.1109/TAC.2019.2905271 doi: 10.1109/TAC.2019.2905271
[13]	D. Wang, D. R. Liu, H. L. Li, Policy iteration algorithm for online design of robust control for a class of continuous-time nonlinear systems, IEEE Trans. Autom. Sci. Eng., 11 (2014), 627–632. https://doi.org/10.1109/TASE.2013.2296206 doi: 10.1109/TASE.2013.2296206
[14]	H. G. Zhang, L. L. Cui, X. Zhang, Y. H. Luo, Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method, IEEE Trans. Neural Networks, 22 (2011), 2226–2236. https://doi.org/10.1109/TNN.2011.2168538 doi: 10.1109/TNN.2011.2168538
[15]	D. R. Liu, D. Wang, F. Y. Wang, H. L. Li, X. Yang, Neural-network-based online HJB solution for optimal robust guaranteed cost control of continuous-time uncertain nonlinear systems, IEEE Trans. Cybern., 44 (2014), 2834–2847. https://doi.org/10.1109/TCYB.2014.2357896 doi: 10.1109/TCYB.2014.2357896
[16]	D. R. Liu, X. Yang, H. L. Li, Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics, Neural Comput. Appl., 23 (2013), 1843–1850. https://doi.org/10.1007/s00521-012-1249-y doi: 10.1007/s00521-012-1249-y
[17]	G. X. Wen, C. L. Philip Chen, S. Z. Sam Ge, H. L. Yang, X. G. Liu, Optimized adaptive nonlinear tracking control using actor-critic reinforcement learning strategy, IEEE Trans. Ind. Inf., 15 (2019), 4969–4977. https://doi.org/10.1109/TII.2019.2894282 doi: 10.1109/TII.2019.2894282
[18]	X. Yang, D. R. Liu, Y. Z. Huang, Neural-network-based online optimal control for uncertain non-linear continuous-time systems with control constraints, IET Control Theory Appl., 7 (2013), 2037–2047. https://doi.org/10.1049/iet-cta.2013.0472 doi: 10.1049/iet-cta.2013.0472
[19]	D. R. Liu, X. Yang, D. Wang, Q. L. Wei, Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints, IEEE Trans. Cybern., 45 (2015), 1372–1385. https://doi.org/10.1109/TCYB.2015.2417170 doi: 10.1109/TCYB.2015.2417170
[20]	X. Yang, D. R. Liu, Q. L. Wei, Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming, IET Control Theory Appl., 8 (2014), 1676–1688. https://doi.org/10.1049/iet-cta.2014.0186 doi: 10.1049/iet-cta.2014.0186
[21]	S. Bhasin, R. Kamalapurkar, M. Johnson, K. G. Vamvoudakis, F. L. Lewis, W. E. Dixon, A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems, Automatica, 49 (2013), 82–92. https://doi.org/10.1016/j.automatica.2012.09.019 doi: 10.1016/j.automatica.2012.09.019
[22]	M. Krstic, P. V. Kokotovic, I. Kanellakopoulos, Nonlinear and Adaptive Control Design, John Wiley & Sons, 1995.
[23]	G. X. Wen, S. Z. Sam Ge, F. W. Tu, Optimized backstepping for tracking control of strict-feedback systems, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 3850–3862. https://doi.org/10.1109/TNNLS.2018.2803726 doi: 10.1109/TNNLS.2018.2803726
[24]	S. C. Tong, K. K. Sun, S. Sui, Observer-based adaptive fuzzy decentralized optimal control design for strict-feedback nonlinear large-scale systems, IEEE Trans. Fuzzy Syst., 26 (2017), 569–584. https://doi.org/10.1109/TFUZZ.2017.2686373 doi: 10.1109/TFUZZ.2017.2686373
[25]	Y. M. Li, T. C. Wang, W. Liu, S. C. Tong, Neural network adaptive output-feedback optimal control for active suspension systems, IEEE Trans. Syst. Man Cybern.: Syst., 52 (2021), 4021–4032. https://doi.org/10.1109/TSMC.2021.3089768 doi: 10.1109/TSMC.2021.3089768
[26]	G. X. Wen, C. L. Philip Chen, W. N. Li, Simplified optimized control using reinforcement learning algorithm for a class of stochastic nonlinear systems, Inf. Sci., 517 (2020), 230–243. https://doi.org/10.1016/j.ins.2019.12.039 doi: 10.1016/j.ins.2019.12.039
[27]	X. X. Pei, Y. M. Li, S. D. Yi, Adaptive neural network optimal control of hybrid electric vehicle power battery, J. Jilin Univ. (Eng. Technol. Edition), 2021 (2021). https://doi.org/10.13229/j.cnki.jdxbgxb20211422
[28]	G. X. Wen, C. L. Philip Chen, S. Z. Sam Ge, Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions, IEEE Trans. Cybern., 51 (2020), 4567–4580. https://doi.org/10.1109/TCYB.2020.3002108 doi: 10.1109/TCYB.2020.3002108
[29]	J. Lan, Y. J. Liu, D. X. Yu, G. X. Wen, S. C. Tong, L. Liu, Time-varying optimal formation control for second-order multiagent systems based on neural network observer and reinforcement learning, IEEE Trans. Neural Networks Learn. Syst., 2022 (2022), 1–12. https://doi.org/10.1109/TNNLS.2022.3158085 doi: 10.1109/TNNLS.2022.3158085
[30]	W. B. Xiao, Q. Zhou, Y. liu, H. Y. Li, R. Q. Lu, Distributed reinforcement learning containment control for multiple nonholonomic mobile robots, IEEE Trans. Circuits Syst. I Regul. Pap., 69 (2021), 896–907. https://doi.org/10.1109/TCSI.2021.3121809 doi: 10.1109/TCSI.2021.3121809
[31]	B. X. Jiang, Y. J. Lou, J. Q. Lu, Input-to-state stability of delayed systems with bounded-delay impulses, Math. Modell. Control, 2 (2022), 44–54. https://doi.org/10.3934/mmc.2022006 doi: 10.3934/mmc.2022006
[32]	V. Djordjevic, L. Dubonjic, M. M. Morato, D. Prsic, V. Stojanovic, Sensor fault estimation for hydraulic servo actuator based on sliding mode observer, Math. Modell. Control, 2 (2022), 34–43. https://doi.org/10.3934/mmc.2022005 doi: 10.3934/mmc.2022005
[33]	Y. M. Li, Y. J. Liu, S. C. Tong, Observer-based neuro-adaptive optimized control of strict-feedback nonlinear systems with state constraints, IEEE Trans. Neural Networks Learn. Syst., 33 (2022), 3131–3145. https://doi.org/10.1109/TNNLS.2021.3051030 doi: 10.1109/TNNLS.2021.3051030
[34]	Y. M. Li, Y. L. Fan, K. W. Li, W. Liu, S. C. Tong, Adaptive optimized backstepping control-based RL algorithm for stochastic nonlinear systems with state constraints and its application, IEEE Trans. Cybern., 2021 (2021), 1–14. https://doi.org/10.1109/TCYB.2021.3069587 doi: 10.1109/TCYB.2021.3069587
[35]	Y. M. Li, J. X. Zhang, W. Liu, S. C. Tong, Observer-based adaptive optimized control for stochastic nonlinear systems with input and state constraints, IEEE Trans. Neural Networks Learn. Syst., 2021 (2021), 1–15. https://doi.org/10.1109/TNNLS.2021.3087796 doi: 10.1109/TNNLS.2021.3087796
[36]	Y. Wu, X. J. Xie, Robust adaptive control for state-constrained nonlinear systems with input saturation and unknown control direction, IEEE Trans. Syst. Man Cybern.: Syst., 51 (2019), 1192–1202. https://doi.org/10.1109/TSMC.2019.2895048 doi: 10.1109/TSMC.2019.2895048
[37]	Y. M. Li, J. X. Zhang, S. C. Tong, Fuzzy adaptive optimized leader-following formation control for second-order stochastic multi-agent systems, IEEE Trans. Ind. Inf., 18 (2021), 6026–6037. https://doi.org/10.1109/TII.2021.3133927 doi: 10.1109/TII.2021.3133927
[38]	K. Ezal, Z. G. Pan, P. Kokotovic, Locally optimal and robust backstepping design, IEEE Trans. Autom. Control, 45 (2000), 260–271. https://doi.org/10.1109/9.839948 doi: 10.1109/9.839948
[39]	Y. M. Li, X. Min, S. C. Tong, Adaptive fuzzy inverse optimal control for uncertain strict-feedback nonlinear systems, IEEE Trans. Fuzzy Syst., 28 (2019), 2363–2374. https://doi.org/10.1109/TFUZZ.2019.2935693 doi: 10.1109/TFUZZ.2019.2935693
[40]	Y. M. Li, X. Min, S. C. Tong, Observer-based fuzzy adaptive inverse optimal output feedback control for uncertain nonlinear systems, IEEE Trans. Fuzzy Syst., 29 (2020), 1484–1495. https://doi.org/10.1109/TFUZZ.2020.2979389 doi: 10.1109/TFUZZ.2020.2979389
[41]	K. X. Lu, Z. Liu, C. L. Philip Chen, Y. N. Wang, Y. Zhang, Inverse optimal design of direct adaptive fuzzy controllers for uncertain nonlinear systems, IEEE Trans. Fuzzy Syst., 30 (2022), 1669–1682. https://doi.org/10.1109/TFUZZ.2021.3064678 doi: 10.1109/TFUZZ.2021.3064678
[42]	X. Min, Y. M. Li, S. C. Tong, Adaptive fuzzy output feedback inverse optimal control for vehicle active suspension systems, Neurocomputing, 403 (2020), 257–267. https://doi.org/10.1016/j.neucom.2020.04.096 doi: 10.1016/j.neucom.2020.04.096
[43]	H. H. Long, J. K. Zhao, J. Q. Lai, $ H_{\infty}$ inverse optimal adaptive fault-tolerant attitude control for flexible spacecraft with input saturation, J. Shanghai Jiaotong Univ. (Sci.), 20 (2015), 513–527. 10.1007/s12204-015-1659-y doi: 10.1007/s12204-015-1659-y
[44]	X. D. Li, D. W. C. Ho, J. D. Cao, Finite-time stability and settling-time estimation of nonlinear impulsive systems, Automatica, 99 (2019), 361–368. https://doi.org/10.1016/j.automatica.2018.10.024 doi: 10.1016/j.automatica.2018.10.024
[45]	X. D. Li, X. Y. Yang, S. J. Song, Lyapunov conditions for finite-time stability of time-varying time-delay systems, Automatica, 103 (2019), 135–140. https://doi.org/10.1016/j.automatica.2019.01.031 doi: 10.1016/j.automatica.2019.01.031
[46]	Y. M. Li, T. T. Yang, S. C. Tong, Adaptive neural networks finite-time optimal control for a class of nonlinear systems, IEEE Trans. Neural Networks Learn. Syst., 31 (2019), 4451–4460. https://doi.org/10.1109/TNNLS.2019.2955438 doi: 10.1109/TNNLS.2019.2955438
[47]	Y. M. Li, T. T. Yang, L. Liu, G. Feng, S. C. Tong, Finite-time optimal control for interconnected nonlinear systems, Int. J. Robust Nonlinear Control, 30 (2020), 3451–3470. https://doi.org/10.1002/rnc.4944 doi: 10.1002/rnc.4944
[48]	K. X. Lu, Z. Liu, H. Y. Yu, C. L. Philip Chen, Y. Zhang, Adaptive fuzzy inverse optimal fixed-time opntrol of uncertain nonlinear systems, IEEE Trans. Fuzzy Syst., 45 (2000), 260–271. https://doi.org/10.1109/TFUZZ.2021.3132151 doi: 10.1109/TFUZZ.2021.3132151
[49]	S. J. Cao, L. Sun, J. J. Jiang, Z. Y. Zuo, Reinforcement learning-based fixed-time trajectory tracking control for uncertain robotic manipulators with input saturation, IEEE Trans. Neural Networks Learn. Syst., 2021 (2021), 1–12. https://doi.org/10.1109/TNNLS.2021.3116713 doi: 10.1109/TNNLS.2021.3116713
[50]	J. T. Hu, G. X. Sui, X. X. Lv, X. D. Li, Fixed-time control of delayed neural networks with impulsive perturbations, IEEE Trans. Neural Networks Learn. Syst., 23 (2018), 904–920. https://doi.org/10.15388/NA.2018.6.6 doi: 10.15388/NA.2018.6.6

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)