Research article Special Issues

Stochastic linear quadratic optimal tracking control for discrete-time systems with delays based on Q-learning algorithm

  • Received: 13 December 2022 Revised: 13 February 2023 Accepted: 20 February 2023 Published: 27 February 2023
  • MSC : 93E20, 93C05, 93C41, 93C55

  • In this paper, a reinforcement Q-learning method based on value iteration (Ⅵ) is proposed for a class of model-free stochastic linear quadratic (SLQ) optimal tracking problem with time delay. Compared with the traditional reinforcement learning method, Q-learning method avoids the need for accurate system model. Firstly, the delay operator is introduced to construct a novel augmented system composed of the original system and the command generator. Secondly, the SLQ optimal tracking problem is transformed into a deterministic one by system transformation and the corresponding Q function of SLQ optimal tracking control is derived. Based on this, Q-learning algorithm is proposed and its convergence is proved. Finally, a simulation example shows the effectiveness of the proposed algorithm.

    Citation: Xufeng Tan, Yuan Li, Yang Liu. Stochastic linear quadratic optimal tracking control for discrete-time systems with delays based on Q-learning algorithm[J]. AIMS Mathematics, 2023, 8(5): 10249-10265. doi: 10.3934/math.2023519

    Related Papers:

  • In this paper, a reinforcement Q-learning method based on value iteration (Ⅵ) is proposed for a class of model-free stochastic linear quadratic (SLQ) optimal tracking problem with time delay. Compared with the traditional reinforcement learning method, Q-learning method avoids the need for accurate system model. Firstly, the delay operator is introduced to construct a novel augmented system composed of the original system and the command generator. Secondly, the SLQ optimal tracking problem is transformed into a deterministic one by system transformation and the corresponding Q function of SLQ optimal tracking control is derived. Based on this, Q-learning algorithm is proposed and its convergence is proved. Finally, a simulation example shows the effectiveness of the proposed algorithm.



    加载中


    [1] H. Modares, F. L. Lewis, Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning, Automatica, 50 (2014), 1780–1792. https://doi.org/10.1016/j.automatica.2014.05.011 doi: 10.1016/j.automatica.2014.05.011
    [2] B. Zhao, Y. Li, Model-free adaptive dynamic programming based near-optimal decentralized tracking control of reconfigurable manipulators, Int. J. Control, Autom. Syst., 16 (2018), 478–490. https://doi.org/10.1007/s12555-016-0711-5 doi: 10.1007/s12555-016-0711-5
    [3] T. Huang, D. Liu, A self-learning scheme for residential energy system control and management, Neural Comput. Appl., 22 (2013), 259–269. https://doi.org/10.1007/s00521-011-0711-6 doi: 10.1007/s00521-011-0711-6
    [4] M. Gluzman, J. G. Scott, A. Vladimirsky, Optimizing adaptive cancer therapy: dynamic programming and evolutionary game theory, Proc. Royal Soc. B: Biol. Sci., 287 (2020), 20192454. https://doi.org/10.1098/rspb.2019.2454 doi: 10.1098/rspb.2019.2454
    [5] I. Ha, E. Gilbert, Robust tracking in nonlinear systems, IEEE Trans. Automat. Control, 32 (1987), 763–771. https://doi.org/10.1109/TAC.1987.1104710 doi: 10.1109/TAC.1987.1104710
    [6] M. A. Rami, X. Y. Zhou, Linear matrix inequalities, Riccati equations and indefinite stochastic linear quadratic controls, IEEE Trans. Automat. Control, 45 (2000), 1131–1143. https://doi.org/10.1109/9.863597 doi: 10.1109/9.863597
    [7] R. Byers, Solving the algebraic Riccati equation with the matrix sign function, Linear Algebra Appl., 89 (1987), 267–279. https://doi.org/10.1016/0024-3795(87)90222-9 doi: 10.1016/0024-3795(87)90222-9
    [8] D. Vrabie, O. Pastravanu, M. Abu-Khalaf, F. L. Lewis, Adaptive optimal control for continuous-time linear systems based on policy iteration, Automatica, 45 (2009), 477–484. https://doi.org/10.1016/j.automatica.2008.08.017 doi: 10.1016/j.automatica.2008.08.017
    [9] B. Kiumarsi, F. L. Lewis, M. B. Naghibi-Sistani, A. Karimpour, Optimal tracking control of unknown discrete-time linear systems using input-output measured data, IEEE Trans. Cybern., 45 (2015), 2770–2779. https://doi.org/10.1109/TCYB.2014.2384016 doi: 10.1109/TCYB.2014.2384016
    [10] B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, M. B. Naghibi-Sistani, Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics, Automatica, 50 (2014), 1167–1175. https://doi.org/10.1016/j.automatica.2014.02.015 doi: 10.1016/j.automatica.2014.02.015
    [11] G. Wang, H. Zhang, Model-free value iteration algorithm for continuous-time stochastic linear quadratic optimal control problems, arXiv, 2022. https://doi.org/10.48550/arXiv.2203.06547
    [12] H. Zhang, Adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems, arXiv, 2022. https://doi.org/10.48550/arXiv.2210.04486
    [13] R. Liu, Y. Li, X. Liu, Linear-quadratic optimal control for unknown mean-field stochastic discrete-time system via adaptive dynamic programming approach, Neurocomputing, 282 (2018), 16–24. https://doi.org/10.1016/j.neucom.2017.12.007 doi: 10.1016/j.neucom.2017.12.007
    [14] X. Chen, F. Wang, Neural-network-based stochastic linear quadratic optimal tracking control scheme for unknown discrete-time systems using adaptive dynamic programming, Control Theory Technol., 19 (2021), 315–327. https://doi.org/10.1007/s11768-021-00046-y doi: 10.1007/s11768-021-00046-y
    [15] Z. Zhang, X. Zhao, Stochastic linear quadratic optimal tracking control for stochastic discrete time systems based on Q-learning, J. Nanjing Univ. Inf. Sci. Technol. (Nat. Sci.), 13 (2021), 548–555.
    [16] Y. Liu, H. Zhang, Y. Luo, J. Han, ADP based optimal tracking control for a class of linear discrete-time system with multiple delays, J. Franklin Inst., 353 (2016), 2117–2136. https://doi.org/10.1016/j.jfranklin.2016.03.012 doi: 10.1016/j.jfranklin.2016.03.012
    [17] B. L. Zhang, Q. L. Han, X. M. Zhang, X. Yu, Sliding mode control with mixed current and delayed states for offshore steel jacket platforms, IEEE Trans. Control Syst. Technol., 22 (2014), 1769–1783. https://doi.org/10.1109/TCST.2013.2293401 doi: 10.1109/TCST.2013.2293401
    [18] M. J. Park, O. M. Kwon, J. H. Ryu, Advanced stability criteria for linear systems with time-varying delays, J. Franklin Inst., 355 (2018), 520–5433. https://doi.org/10.1016/j.jfranklin.2017.11.029 doi: 10.1016/j.jfranklin.2017.11.029
    [19] H. Zhang, Y. Luo, D. Liu, Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints, IEEE Trans. Neural Networks, 20 (2009), 1490–1503. https://doi.org/10.1109/TNN.2009.2027233 doi: 10.1109/TNN.2009.2027233
    [20] H. Zhang, Z. Wang, D. Liu, Global asymptotic stability of recurrent neural networks with multiple time-varying delays, IEEE Trans. Neural Networks, 19 (2008), 855–873. https://doi.org/10.1109/TNN.2007.912319 doi: 10.1109/TNN.2007.912319
    [21] T. Wang, H. Zhang, Y. Luo, Infinite-time stochastic linear quadratic optimal control for unknown discrete-time systems using adaptive dynamic programming approach, Neurocomputing, 171 (2016), 379–386. https://doi.org/10.1016/j.neucom.2015.06.053 doi: 10.1016/j.neucom.2015.06.053
    [22] A. Garate-Garcia, L. A. Marquez-Martinez, C. H. Moog, Equivalence of linear time-delay systems, IEEE Trans. Automat. Control, 56 (2011), 666–670. https://doi.org/10.1109/TAC.2010.2095550 doi: 10.1109/TAC.2010.2095550
    [23] Y. Liu, R. Yu, Model-free optimal tracking control for discrete-time system with delays using reinforcement Q-learning, Electron. Lett., 54 (2018), 750–752. https://doi.org/10.1049/el.2017.3238 doi: 10.1049/el.2017.3238
    [24] H. Zhang, Q. Wei, Y. Luo, A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm, IEEE Trans. Syst., Man, Cybern. B, 38 (2008), 937–942. https://doi.org/10.1109/TSMCB.2008.920269 doi: 10.1109/TSMCB.2008.920269
    [25] J. Shi, D. Yue, X. Xie, Adaptive optimal tracking control for nonlinear continuous-time systems with time delay using value iteration algorithm, Neurocomputing, 396 (2020), 172–178. https://doi.org/10.1016/j.neucom.2018.07.098 doi: 10.1016/j.neucom.2018.07.098
    [26] Q. Wei, D. Liu, Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification, IEEE Trans. Automat. Sci. Eng., 11 (2014), 1020–1036. https://doi.org/10.1109/TASE.2013.2284545 doi: 10.1109/TASE.2013.2284545
    [27] T. Wang, H. Zhang, Y. Luo, Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm, Neurocomputing, 312 (2018), 1–8. https://doi.org/10.1016/j.neucom.2018.04.018 doi: 10.1016/j.neucom.2018.04.018
    [28] F. L. Lewis, D. Vrabie, Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circuits Syst. Mag., 9 (2009), 32–50. https://doi.org/10.1109/MCAS.2009.933854 doi: 10.1109/MCAS.2009.933854
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1464) PDF downloads(139) Cited by(1)

Article outline

Figures and Tables

Figures(4)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog