Research article

An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning


  • In recent years, dynamic programming and reinforcement learning theory have been widely used to solve the nonlinear control system (NCS). Among them, many achievements have been made in the construction of network model and system stability analysis, but there is little research on establishing control strategy based on the detailed requirements of control process. Spurred by this trend, this paper proposes a detail-reward mechanism (DRM) by constructing the reward function composed of the individual detail evaluation functions in order to replace the utility function in the Hamilton-Jacobi-Bellman (HJB) equation. And this method is introduced into a wider range of deep reinforcement learning algorithms to solve optimization problems in NCS. After the mathematical description of the relevant characteristics of NCS, the stability of iterative control law is proved by Lyapunov function. With the inverted pendulum system as the experiment object, the dynamic environment is designed and the reward function is established by using the DRM. Finally, three deep reinforcement learning algorithm models are designed in the dynamic environment, which are based on Deep Q-Networks, policy gradient and actor-critic. The effects of different reward functions on the experimental accuracy are compared. The experimental results show that in NCS, using the DRM to replace the utility function in the HJB equation is more in line with the detailed requirements of the designer for the whole control process. By observing the characteristics of the system, designing the reward function and selecting the appropriate deep reinforcement learning algorithm model, the optimization problem of NCS can be solved.

    Citation: Shixuan Yao, Xiaochen Liu, Yinghui Zhang, Ze Cui. An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning[J]. Mathematical Biosciences and Engineering, 2022, 19(9): 9258-9290. doi: 10.3934/mbe.2022430

    Related Papers:

    [1] Xiao-Min Huang, Xiang-ShengWang . Traveling waves of di usive disease models with time delay and degeneracy. Mathematical Biosciences and Engineering, 2019, 16(4): 2391-2410. doi: 10.3934/mbe.2019120
    [2] Kang Wu, Yibin Lu . Numerical computation of preimage domains for spiral slit regions and simulation of flow around bodies. Mathematical Biosciences and Engineering, 2023, 20(1): 720-736. doi: 10.3934/mbe.2023033
    [3] Bing Hu, Minbo Xu, Zhizhi Wang, Jiahui Lin, Luyao Zhu, Dingjiang Wang . Existence of solutions of an impulsive integro-differential equation with a general boundary value condition. Mathematical Biosciences and Engineering, 2022, 19(4): 4166-4177. doi: 10.3934/mbe.2022192
    [4] Meng Zhao, Wan-Tong Li, Yang Zhang . Dynamics of an epidemic model with advection and free boundaries. Mathematical Biosciences and Engineering, 2019, 16(5): 5991-6014. doi: 10.3934/mbe.2019300
    [5] Huy Tuan Nguyen, Nguyen Van Tien, Chao Yang . On an initial boundary value problem for fractional pseudo-parabolic equation with conformable derivative. Mathematical Biosciences and Engineering, 2022, 19(11): 11232-11259. doi: 10.3934/mbe.2022524
    [6] Sai Zhang, Li Tang, Yan-Jun Liu . Formation deployment control of multi-agent systems modeled with PDE. Mathematical Biosciences and Engineering, 2022, 19(12): 13541-13559. doi: 10.3934/mbe.2022632
    [7] P. Vafeas, A. Skarlatos, P. K. Papadopoulos, P. Svarnas, N. Sarmas . A boundary value problem of heat transfer within DBD-based plasma jet setups. Mathematical Biosciences and Engineering, 2023, 20(10): 18345-18367. doi: 10.3934/mbe.2023815
    [8] Zhenwu Xiang, Qi Mao, Jintao Wang, Yi Tian, Yan Zhang, Wenfeng Wang . Dmbg-Net: Dilated multiresidual boundary guidance network for COVID-19 infection segmentation. Mathematical Biosciences and Engineering, 2023, 20(11): 20135-20154. doi: 10.3934/mbe.2023892
    [9] Fugeng Zeng, Yao Huang, Peng Shi . Initial boundary value problem for a class of p-Laplacian equations with logarithmic nonlinearity. Mathematical Biosciences and Engineering, 2021, 18(4): 3957-3976. doi: 10.3934/mbe.2021198
    [10] M. B. A. Mansour . Computation of traveling wave fronts for a nonlinear diffusion-advection model. Mathematical Biosciences and Engineering, 2009, 6(1): 83-91. doi: 10.3934/mbe.2009.6.83
  • In recent years, dynamic programming and reinforcement learning theory have been widely used to solve the nonlinear control system (NCS). Among them, many achievements have been made in the construction of network model and system stability analysis, but there is little research on establishing control strategy based on the detailed requirements of control process. Spurred by this trend, this paper proposes a detail-reward mechanism (DRM) by constructing the reward function composed of the individual detail evaluation functions in order to replace the utility function in the Hamilton-Jacobi-Bellman (HJB) equation. And this method is introduced into a wider range of deep reinforcement learning algorithms to solve optimization problems in NCS. After the mathematical description of the relevant characteristics of NCS, the stability of iterative control law is proved by Lyapunov function. With the inverted pendulum system as the experiment object, the dynamic environment is designed and the reward function is established by using the DRM. Finally, three deep reinforcement learning algorithm models are designed in the dynamic environment, which are based on Deep Q-Networks, policy gradient and actor-critic. The effects of different reward functions on the experimental accuracy are compared. The experimental results show that in NCS, using the DRM to replace the utility function in the HJB equation is more in line with the detailed requirements of the designer for the whole control process. By observing the characteristics of the system, designing the reward function and selecting the appropriate deep reinforcement learning algorithm model, the optimization problem of NCS can be solved.



    Five basic quantities (voltage, charge, current, capacitance, and resistance) in electrostatics are involved in almost all applications. Electrostatics generally plays an important role in improving the performance of microelectro/mechanical systems (MEMS) and electron devices in the design stage. Many numerical methods (e.g., the finite difference method, the variational method, the moment method, the finite element method, and the boundary element method) were popularly used for engineering problems. Among diverse numerical techniques, the finite element method (FEM) and the boundary integral equation method (BIEM), as well as the boundary element method (BEM) become acceptable tools for engineers due to the increasing development of digital computing power. Here, we may focus on the mathematical study of the BIEM for electrostatics of two identical cylinders. Researchers have paid attention to the dual BEM paper of IEEE in 2003 [1], which has received nearly 5000 views in the Research Gate.

    For a pair of two conducting cylinders, there is a large amount of literature on charged cylinders [2,3,4]. Different solutions existed to the electrostatic problem of two identical parallel cylinders held at the same (symmetric) potential [2,3]. A note was given to show their equivalence, and the identities were confirmed [4]. Four distinct solutions for the potential distribution around two equal circular parallel conducting cylinders by [2,3,5,6] were demonstrated to be equivalent by Lekner [7] by ways of several identities. Here, we may try an alternative way of BIEM using degenerate kernels to revisit this problem. A degenerate kernel is based on the method of separation variables, but it separates the variables in the two-point kernel function. Although the BIE in conjunction with the available degenerate kernel can only solve simple geometries and the results may be obtained more directly by using the method of separation variables for the solution instead of the fundamental solution, the tool can explain the rank-deficiency mechanism in the BIE/BEM such as degenerate scale, degenerate boundary, spurious eigenvalues and fictitious frequency, which is meaningful to the BEM community. Besides, symmetric and anti-symmetric cases are both considered. Regarding the anti-symmetric electrostatic potential, Lebedev et al. [8] have provided a closed-form solution by using the bipolar coordinates. The solution is interestingly found to be the simplest method of fundamental solution (MFS) of two opposite strengths of sources at the two foci. It is not trivial to check the asymptotic behavior at infinity of the two cases, symmetric and anti-symmetric. Besides, whether the equilibrium of the boundary flux along the two cylinders is satisfied or not is also our main concern.

    Regarding the potential problem of a two-dimensional plane containing two circular boundaries, Chen and Shen [9] studied the multiply-connected Laplace problem. They found that a degenerate scale depends on the outer boundary. Chen et al. [10] solved the Laplace problem by using the BIEM in conjunction with the degenerate kernel to derive an analytical solution. It is found that a degenerate scale may occur due to the introduction of the logarithmic kernel for the two-dimensional case. Efficient techniques for the rank-deficiency of the BEM in electrostatic problems were proposed by Chyuan et al. [11]. Later, it was found that the special (degenerate) geometry happened to be the shape of unit logarithmic capacity. Kuo et al. [12] studied the degenerate scale for regular N-gon domains by using complex variables. Numerical implementation was also done by using the BEM. Kuo et al. [13] revisited the degenerate scale for an infinite plane problem containing two circular holes using the conformal mapping. Chen et al. [14] linked the logarithmic capacity in the potential theory and the degenerate scale in the BEM for two tangent discs. The logarithmic capacity of the line segment as well as the double degeneracy in the BIEM/BEM was studied by Chen et al. [15]. Due to the use of the two-dimensional fundamental solution in the BIEM, the solution space is expanded, and sometimes the corresponding matrix is rank deficient in the BEM. In other words, the integral operator of the logarithmic kernel is range deficient. A corresponding chart to show the rank deficiency and the null space of the integral operator of single, double layer potentials and their derivatives was given in [16,17,18], while the original one was provided in the face cover of the Strang book [19]. Fikioris et al. [20] solved rectangularly shielded lines by using the Carleman-Vekua method. In the mentioned paper [20], it is interesting to find that its formulation also needs a constraint [21] to ensure a unique solution. This outcome is similar to the paper of Chen et al. [22] using the Fichera's approach, where an additional constraint is also required.

    In this paper, we revisit two cylinders of electrostatics by using the BIE with the degenerate kernel of the bipolar coordinates. Both the symmetric and anti-symmetric specified potentials are considered. Besides, the logarithmic capacity is also discussed. The boundary potential and flux are expanded by using the Fourier series, while the fundamental solution is represented by using the degenerate kernel. The equilibrium of boundary flux and the asymptotic behavior at infinity are also examined. The solution space expanded using the BIEM is compared with the true solution space. After summarizing the single (circle and ellipse) and two cylinders, a conclusion for constructing the solution space can be made.

    First, we consider a conducting cylinder. The governing equation and the Dirichlet boundary condition are shown below:

    2u(x)=0, xD,u(x)=ˉu(x), xB,u(x)= ln |x|+O(1), x, (1)

    where 2, D and B are the Laplace operator, the domain of interest and the boundary, respectively. Furthermore, x is the position vector of a field point and ˉu(x) is the specified B.C. The integral formulation for the Laplace problem is derived from Green's third identity. The representation of the conventional integral equation for the domain point is written as

    2πu(x)=BT(s,x)ˉu(s)dB(s)BU(s,x)t(s)dB(s),xD, (2)

    where s is the position vector of a source point, U(s,x)= ln |xs|  is the fundamental solution, T(s,x)=U(s,x)ns, and t(x) is the unknown boundary flux. By moving the field point to the smooth boundary, Eq (2) becomes:

    πˉu(x)=C.P.V.BT(s,x)ˉu(s)dB(s)BU(s,x)t(s)dB(s),xB, (3)

    where the C.P.V. denotes the Cauchy principal value, and T(s,x)=U(s,x)/ns is the closed-form kernel. Once the field point x locates outside the domain, we obtain the null-field integral equation as shown below:

    0=BT(s,x)ˉu(s)dB(s)BU(s,x)t(s)dB(s),xDc. (4)

    where Dc is the complementary domain. By employing the proper degenerate kernel (U(s,x)) to represent the closed-form fundamental solution, the collocation point can be exactly located on the real boundary free of facing the singular integral. Equations (2) and (4) can be rewritten as:

    2πu(x)=BTdk(s,x)ˉu(s)dB(s)BUdk(s,x)t(s)dB(s),xDB (5)

    and

    0=BTdk(s,x)ˉu(s)dB(s)BUdk(s,x)t(s)dB(s),xDcB. (6)

    where Tdk(s,x) and Udk(s,x) are the corresponding degenerate kernels to represent T(s,x) and U(s,x), respectively. By setting the field point x=(ρ,ϕ) and the source point s=(R,θ) in the polar coordinates for a circular domain, the closed-form fundamental solution in Eqs (5) and (6) can be expressed by using the degenerate kernel form as shown below:

    Udk(s,x)={Ui(R,θ;ρ,ϕ)= ln Rm=11m(ρR)m cos m(θϕ),  Rρ,          (a)Ue(R,θ;ρ,ϕ)= ln ρm=11m(Rρ)m cos m(θϕ),  ρ>R,          (b). (7)

    and

    Tdk(s,x)={Ti(R,θ;ρ,ϕ)=(1R+m=1(ρmRm+1) cos m(θϕ)),  R>ρ,      (a)Te(R,θ;ρ,ϕ)=m=1(Rm1ρm) cos m(θϕ),  ρ>R.                 (b). (8)

    The unknown boundary flux t(s) is expanded in terms of Fourier series as shown below:

    t(s)=1Js(a0+n=1an cos (nθ)+n=1bn sin (nθ)), 0θ2π, (9)

    where Js=1 is the Jacobian term, a0, an and bn are unknown coefficients. The given boundary condition is

    ˉu(x)=v. (10)

    where v is a constant. By considering R=a in Eqs (6)–(8), the coefficient of the Fourier constant base is

    a ln a a0=v, (11)

    where a is the radius of the circular cylinder. Equation (11) indicates that the occurring mechanism of a degenerate scale is

     ln a=0. (12)

    When a=1, the coefficient of a0 cannot be determined. It results in a non-unique solution. This critical size is called a degenerate scale. In Rumely's book [23], the logarithmic capacity, cL, of a circle is equal to its radius. It is easily found that the special (degenerate) geometry happens to be the shape of unit logarithmic capacity. The discriminant Dp(a) of the degenerate scale in the BEM/BIEM for a circular boundary is written as

    Dp(a)= ln a. (13)

    If Dp(a)0, this size is an ordinary scale and there exists a unique solution. Otherwise, according to the Fredholm alternative theorem, there is no solution or infinite solutions. For an ordinary scale, the boundary flux, of the electrostatic field along the boundary, is obtained as

    t(x)=va ln a,xB. (14)

    The unique solution of electrostatic potential is obtained by

    u(x)=v ln ρ ln a, (15)

    as shown in Figure 1(a). Even though the electrostatic field along the boundary in Eq (14) is not in equilibrium, i.e. Bt(x)dB(x)0, the electrostatic field at infinity, Γ, would exist and satisfy the equilibrium condition together in total, B+Γt(x)dB(x)=0. If we normalize the potential on the cylinder to the unity, and let λ be the dimensionless ratio, the potential becomes

    u(x)=v+λud(x), (16)
    Figure 1.  Analytical solution of the electrostatic potential subject to conducting cylinders derived by using the null-field BIEM.

    where

    ud(x)= ln ρ ln a. (17)

    The solution by using the direct BIE of Eq (15) is the special case of Eq (16) by setting λ=vDp(a). When the size of the boundary is a degenerate scale, i.e., a=1 and Dp(a)= ln a=0, it has no solution if v0. If v=0, then the constant term in Eq (9), a0, is a free constant. The electrostatic potential yields

    u(x)=a0 ln ρ, (18)

    and Eq (16) would reduce to

    u(x)=λud(x), (19)

    and ud(x) in Eq (19) reduces to  ln ρ since  ln a=0. It is easy to find that a0 and λ are equivalent.

    For an elliptical case, we naturally utilize the elliptic coordinates to solve the problem in the BIE. The relation between the Cartesian coordinates and the elliptic coordinates is given below:

    x=c cosh ξ cos η,y=c sinh ξ sin η. (20)

    where c is the focal length. By separating the source point and the field point in the elliptic coordinates [24] to represent the closed-form fundamental solution, we have

    Udk(s,x)= ln |xs|={Ui(ξs,ηs;ξx,ηx)=ξs+ ln c2m=12memξs cosh m ξx cos m ηx cos m ηs                       m=12memξs sinh m ξx sin m ηx sin m ηs,  ξsξx,  (a)Ue(ξs,ηs;ξx,ηx)=ξx+ ln c2m=12memξx cosh m ξs cos m ηx cos m ηs m=12memξx sinh m ξs sin m ηx sin m ηs,  ξs<ξx,   (b) (21)
    Tdk(s,x)=U(s,x)ns={Ti(ξs,ηs;ξx,ηx)=1Js(1+2m=1emξs cosh m ξx cos m ηx cos m ηs +2m=1emξs sinh m ξx sin m ηx sin m ηs),    ξs>ξx,  (a)Te(ξs,ηs;ξx,ηx)=1Js(2m=1emξx sinh m ξs cos m ηx cos m ξs+2m=1emξx cosh m ξs sin m ηx sin m ηs),    ξs<ξx. (b) (22)

    where Js=c cosh 2ξs sin 2ηs+ sinh 2ξs cos 2ηs. The unknown boundary flux t(s) is expanded in terms of generalized Fourier series. We have

    t(s)=1Js(a0+n=1an cos (nηs)+n=1bn sin (nηs)), 0ηs2π, (23)

    where a0, an and bn are unknown coefficients. The given boundary condition is

    ˉu(x)=v. (24)

    By substituting Eqs (21a), (22a), (23) and (24) into Eq (6), the coefficient of the Fourier constant base is

    (ξ0+ ln c2)a0=v, (25)

    Equation (25) indicates that the occurring mechanism of a degenerate scale is

    ξ0+ ln c2=0. (26)

    Equation (26) yields the degenerate scale of a+b2=1, where a and b are the semi-major and semi-minor axes of an ellipse, respectively. According to Eq (25), the discriminant of a degenerate scale in the BEM/BIEM is obtained

    De(c,ξ0)=ξ0+ ln c2= ln (a+b2). (27)

    In Rumely's book [23], the logarithmic capacity of an ellipse is equal to a+b2. According to Eqs (13) and (27), the logarithmic capacity, cL, and the discriminant, De(), satisfy the relation,

    cL=eDe(). (28)

    The relationship of the discriminant, logarithmic capacity, and degenerate scale are summarized in Table 1. If De(c,ξ0) is not equal to zero, this size is an ordinary scale with a unique solution. Otherwise, according to the Fredholm alternative theorem, it has no solution or infinite solution. For an ordinary scale, the boundary flux is obtained by

    t(x)=vDe(c,ξ0),xB. (29)
    Table 1.  Relationship of the discriminant, logarithmic capacity and degenerate scale.

     | Show Table
    DownLoad: CSV

    The unique solution of electrostatic potential is

    u(x)=(ξx+ ln c2)(vDe(c,ξ0)), (30)

    as shown in Figure 1(b). Even though the boundary flux in Eq (29) is not in equilibrium, i.e., Bt(x)dB(x)0, the electrostatic field at infinity, Γ, would exist and satisfy the equilibrium condition together in total, i.e., B+Γt(x)dB(x)=0. If we normalize the potential on the cylinder to the unity, and let λ be the dimensionless ratio, the potential becomes

    u(x)=v+λud(x), (31)

    where

    ud(x)=(ξx+ ln c2)(ξ0+ ln c2)=ξxξ0. (32)

    A neat formula of ud(x) could be defined as ud(x)=De(ξx)De(ξ0). The solution by using the direct BIE of Eq (30) is the special case of Eq (31), if λ=vDe(c,ξ0).

    When the size of the boundary is a degenerate scale, De(c,ξ0)=ξ0+lnc2=0, it is no solution if v0. If v=0, the constant term in Eq (23), a0, is a free constant. The electrostatic potential yields

    u(x)=(ξx+ ln c2)a0, (33)

    and Eq (16) reduces to

    u(x)=λud(x), (34)

    where ud(x) in Eq (32) reduces to ξx+ ln c2 since De(ξ0) is equal to zero. It is easy to find that a0 and λ are equivalent. In addition, the degenerate scale in the BEM/BIEM is due to the logarithmic kernel.

    For the single elliptical cylinder, the degenerate kernel is expanded in terms of the generalized form as

    Udk(s,x)= ln |xs|={Ui(ξs,ηs;ξx,ηx)=De(ξs)m=1αm(ξs,ηs;ξx,ηx),  ξsξx,  (a)Ue(ξs,ηs;ξx,ηx)=De(ξx)m=1αm(ξx,ηx;ξs,ηs),  ξs<ξx,   (b) (35)
    Tdk(s,x)=U(s,x)ns={Ti(ξs,ηs;ξx,ηx)=1Js(D'eξs)m=1βm(ξs,ηs;ξx,ηx)),  ξs>ξx,  (a)Te(ξs,ηs;ξx,ηx)=1Jsm=1βm(ξs,ηs;ξx,ηx),    ξs<ξx, (b) (36)

    where ξ and η are the radial and angular directions, respectively, De() is the constant function for ηs and other term is α(). The unknown boundary flux t(s) is expanded in terms of the generalized Fourier series as shown below:

    t(s)=1Js(a0+n=1an cos (nηs)+n=1bn sin (nηs)), 0ηs2π, (37)

    where a0, an and bn are unknown coefficients. By substituting Eqs (35a), (36a), (37) and the boundary condition (Eq (10)) into Eq (6), the coefficient of the Fourier constant base is

    De(ξ0)a0=vD'e(ξ0), (38)

    If D(ξ0)0, then the unique solution of electrostatic potential is

    u(x)=vDe(ξ0)De(ξx), (39)

    and

    ud(x)=De(ξx)De(ξ0). (40)

    When the size of the boundary is a degenerate scale, De(ξ0)=0, there is no solution if v0. If v=0, the constant term in Eq (37), a0, is a free constant. The electrostatic potential yields

    u(x)=(De(ξx)De(ξ0))a0, (41)

    and Eq (40) reduces to

    u(x)=λud(x), (42)

    where ud(x) in Eq (40) is reduced to De(ξx), since De(ξ0) is equal to zero. By using the generalized form of Eqs (35) and (36), the analytical and neat form of ud(x) for the single elliptical cylinder is derived. It is easy to find that a0 and λ are equivalent. The generalized potential and the solution by using the BIEM are compared in Table 2.

    Table 2.  Comparison of the general solution and the BIE solution for single or double cylinders.

     | Show Table
    DownLoad: CSV

    In this section, we consider two circular cylinders of electrostatics. The Dirichlet boundary conditions of two circular cylinders are given by

    ul(x)=v1 and ur(x)=v2,xB, (43)

    where ul(x) and ur(x) are potentials of the left and right circular boundaries, respectively, x is the position vector of the field point, B is the boundary, and v1 and v2 are specified constant potentials. The original problem can be decomposed into a symmetric problem and an anti-symmetric problem as shown below:

    ul(x)=ur(x)=v,xB, symmetry BC, (44)

    and

    ul(x)=ur(x)=v,xB. anti-symmetry BC.  (45)

    Since the problem contains two circular boundaries, we naturally employ the bipolar coordinates to express the closed-form fundamental solution. The relation between the Cartesian coordinates and the bipolar coordinates is shown below:

    x=c sinh η cosh η cos ξ,y=c sin ξ cosh η cos ξ. (46)

    where η and ξ are the radial and angular coordinates, respectively, c is the half distance between the two foci of the bipolar coordinates. By separating the source point and the field point in the bipolar coordinates [9] for the closed-form fundamental solution, we have

    Udk(s,x)= ln |xs|=
    { ln (2c)+ηsm=11m{em(ηsηx) cos [m(ξxξs)]emηx cos (mξx)emηs cos (mξs)},0>ηsηx ln (2c)+ηxm=11m{em(ηxηs) cos [m(ξxξs)]emηx cos (mξx)emηs cos (mξs)},0>ηx>ηs ln (2c)m=11m{em(ηxηs) cos [m(ξxξs)]emηx cos (mξx)emηs cos (mξs)},ηx>0>ηs ln (2c)ηsm=11m{em(ηxηs) cos [m(ξxξs)]emηx cos (mξx)emηs cos (mξs)},ηxηs>0 ln (2c)ηxm=11m{em(ηsηx) cos [m(ξxξs)]emηx cos (mξx)emηs cos (mξs)},ηs>ηx>0 ln (2c)m=11m{em(ηsηx) cos [m(ξxξs)]emηx cos (mξx)emηs cos (mξs)},ηs>0>ηx (47)
    Tdk(s,x)=U(s,x)ns=
    {1Js{1+m=1[em(ηsηx) cos [m(ξxξs)]emηs cos (mξs)]},0>ηsηx1Jsm=1{em(ηxηs) cos [m(ξxξs)]emηs cos (mξs)},0>ηx>ηs1Jsm=1{em(ηxηs) cos [m(ξxξs)]emηs cos (mξs)},ηx>0>ηs1Js{1+m=1[em(ηxηs) cos [m(ξxξs)]emηs cos (mξs)]},ηxηs>01Jsm=1{em(ηsηx) cos [m(ξxξs)]emηs cos (mξs)},ηs>ηx>0 1Jsm=1{em(ηsηx) cos [m(ξxξs)]emηs cos (mξs)},ηs>0>ηx (48)

    where x=(ηx,ξx), s=(ηs,ξs) and Js=c/[cosh(ηs)cos(ξs)].

    The boundary condition of the symmetry problem is shown in Eq (44). The unknown boundary densities on the two circular cylinders can be expanded by using the generalized Fourier series as shown below:

    tM(s)={1Js(al0+n=1aln cos n ξs+n=1bln sin n ξs), ηs < 0,sBl,1Js(ar0+n=1arn cos n ξs+n=1brn sin n ξs), ηs0,sBr, (49)

    where al0,aln,bln,ar0,arn and brn are unknown coefficients of the generalized Fourier series. By substituting Eqs (47a), (47f), (48a), (48f), (44) and (49) into Eq (6), and collocating the null-field point on the left boundary, Bl, we have

    2πvπ{2( ln (2c)η0)al0+n=11nenη0alnn=11n(2enη0al0+aln) cos n ξxn=11nbln sin n ξx}π{2 ln (2c)ar0+n=11nenη0arn+n=11n[(2enη0ar0e2nη0arn) cos n ξxe2nη0brn sin n ξx]}=0. (50)

    Similarly, substituting Eqs (47c), (47d), (48c), (48d), (44) and (49) into Eq (6), and collocating the null-field point on the right boundary, Br, we have

    π{2 ln (2c)al0+n=11nenη0aln+n=11n[(e2nη0aln+2enη0al0) cos n ξxe2nη0bln sin n ξx]}2πvπ{2( ln (2c)η0)ar0+n=11nenη0arn+n=11n[(2enη0ar0arn) cos n ξxbrn sin n ξx]}=0. (51)

    By adding Eqs (50) and (51) together, we obtain

    (4 ln (2c)2η0)(al0+ar0)+2n=11nenη0(aln+arn)+n=11n[(1+e2nη0)(aln+arn)+4enη0(al0+ar0)] cos n ξxn=11n[(1+e2nη0)(bln+brn)] sin n ξx=4v (52)

    After comparing the coefficient of generalized Fourier bases, we have

    {(4ln(2c)2η0)(al0+ar0)+2n=11nenη0(aln+arn)=4v,n=1, 2, 3...1n(1+e2nη0)(aln+arn)+4enη0(al0+ar0)=0,n=1, 2, 3...1n(1+e2nη0)(bln+brn)=0,n=1, 2, 3... (53)

    By similarly subtracting Eq (50) from Eq (51), we have

    π{2η0al0+n=11n[(1+e2nη0)aln cos n ξx+(1+e2nη0)bln sin n ξx]}+π{2η0ar0+n=11n[(1e2nη0)arn cos n ξx+(1e2nη0)brn sin n ξx]}=0 (54)

    After comparing the coefficient of generalized Fourier bases, we have

    {2η0(al0ar0)=0,n=1, 2, 3...1n(1+e2nη0)(alnarn)=0, n=1, 2, 3...1n(1+e2nη0)(blnbrn)=0, n=1, 2, 3... (55)

    In order to solve the coefficients al0 and ar0, we need to define a discriminant as shown below:

    Db(c,η0)=2 ln (2c)η0+n=11n4e2nη0(1+e2nη0). (56)

    For the case of two cylinders, Rumely [23] employed the complex variable to derive the logarithmic capacity, as shown in Table 1. Since the logarithmic capacity is not a closed-form or an exact formula, the postulate in Eq (28) for the case of two cylinders could not be analytically verified at present. If Db0, the geometry of the problem is an ordinary scale, Eqs (53) and (55) yield the coefficients as shown below:

    al0=ar0=vDb(c,η0)aln=arn=2enη0(1+e2nη0)al0,n=1,2,3...bln=brn=0,n=1,2,3... (57)

    Substituting Eqs (47b), (47f), (48b), (48f), (44) and the obtained unknown boundary densities into Eq (5) for the field solution of ηx<0, we have the unique solution

    u(x)=vDb(c,η0)((2 ln (2c)+ηx+n=14ne2nη01+e2nη0)n=12n(enη0enηx+enηxenη0+enη0enηx) cos (nξx)),η0ηx<0. (58)

    Similar substitution of Eqs (47c), (47e), (48c), (48e), (44) and the obtained unknown densities into Eq (5), the field solution for ηx0 yields

    u(x)=vDb(c,η0)((2 ln (2c)ηx+n=14ne2nη01+e2nη0)n=12n(enη0enηx+enηxenη0+enη0enηx) cos (nξx)),η0ηx0. (59)

    It is found that Eqs (58) and (59) show the symmetry solution. All potentials are shown in Figure 1(c). This solution will be compared and discussed with that of Darevski [2] later.

    If Db(c,η0)=0, a degenerate scale occurs. When the constant potential v0, it yields no solution. When the constant potential v=0, it yields infinite solutions. Equations (53) and (55) yield the coefficients as shown below:

    ar0=al0=k,aln=arn=4enη0(1+e2nη0)k,n=1,2,3...bln=brn=0,n=1,2,3... (60)

    where k is an arbitrary constant. In case of a degenerate scale, η0 becomes

    η0=2 ln (2c)+n=11n4e2nη0(1+e2nη0) (61)

    Substituting Eqs (47b), (47f), (48b), (48f), (44) and the obtained boundary unknown densities into Eq (5) for the field solution of ηx<0, we have the infinite solution,

    u(x)=((η0+ ηx)n=12n(enη0enηx+enηxenη0+enη0enηx) cos (nξx))k, η0ηx<0. (62)

    Similar substitution of Eqs (47c), (47e), (48c), (48e), (44) and the obtained boundary unknown densities into Eq (5), the field solution for ηx0 yields the infinite solution,

    u(x)=((η0ηx)n=12n(enη0enηx+enηxenη0+enη0enηx) cos (nξx))k, η0ηx0. (63)

    Equations (62) and (63) also indicate symmetry.

    Similarly, we consider the anti-symmetry problem. The coefficient of generalized Fourier bases in Eqs (45) and (49) satisfy

    {(2 ln (2c)η0+4n=1e2nη0n(1+e2nη0))(al0+ar0)=0,1n(1+e2nη0)(aln+arn)+4enη0(al0+ar0)=0, n=1, 2, 3...1n(1+e2nη0)(bln+brn)=0, n=1, 2, 3... (64)

    and

    {2η0(al0ar0)=4v,1n(1+e2nη0)(alnarn)=0, n=1, 2, 3...,1n(1+e2nη0)(blnbrn)=0, n=1, 2, 3... (65)

    We also find the discriminant, Db(c,η0) in Eq (64). If Db(c,η0)0, the geometry of the problem is an ordinary scale. Equations (64) and (65) yield the coefficients as shown below:

    al0=ar0=vη0aln=arn=0,n=1,2,3...bln=brn=0,n=1,2,3... (66)

    Substituting Eqs (47b), (47f), (48b), (48f), (45) and the obtained boundary unknown densities into Eq (5) for the field solution of ηx<0, we have

    u(x)=vηxη0, ηx < 0. (67)

    Substituting Eqs (47c), (47e), (48c), (48e), (45) and the obtained unknown densities into Eq (5) for the field solution of ηx0, we also have

    u(x)=vηxη0, ηx0. (68)

    All potentials are shown in Figure 1(d). The solution in Eq (68) matches well with that of Lebedev et al. [8]. From the viewpoint of the MFS, this solution is the simplest one since only two sources with opposite strengths are required to locate the two foci.

    If Db(c,η0)=0, a degenerate scale occurs. Fortunately, it doesn't result in no solution whether v is equal to zero or not as shown in the boundary condition of Eq (45). Equations (64) and (65) yield the coefficients as shown below:

    al0+ar0=2k,aln=arn=4enη0(1+e2nη0)k,n=1,2,3...bln=brn=0,n=1,2,3... (69)

    where k is an arbitrary constant. For a degenerate scale case, η0 satisfies Db(c,η0)=0, i.e.

    η0=2 ln (2c)+n=11n4e2nη0(1+e2nη0) (70)

    Substituting Eqs (47b), (47f), (48b), (48f), (45) and the obtained boundary unknown densities into Eq (5) for the field solution of ηx<0, we have

    u(x)=vη0ηx+k( ln (2 cosh ηx2 cos ξx)η0+2n=11nenη0 cosh (nηx) cosh (nη0) cos (nξx)), η0ηx0. (71)

    Similarly substituting Eqs (47c), (47e), (48c), (48e), (45) and the obtained unknown densities into Eq (5), we obtain the field solution

    u(x)=vη0ηx+k(ln(2 cosh ηx2 cos ξx)η0+2n=11nenη0 cosh (nη0) cosh (nηx) cos (nξx)), η0ηx > 0. (72)

    Equations (71) and (72) destroy the anti-symmetry due to the second part of k. To obey the anti-symmetry solution, k should be zero. In other words, this k part in the solution of Eqs (71) and (72) also disobey the bounded potential at infinity. This solution for a free constant, k, will be compared with that of Lekner [4] later.

    According to the solution of Lekner [4], the general solution space of the symmetry problem in Eq (44) is expressed as follows:

    u(x)=v+λud(x) (73)

    where

    ud(x)= ln (2 cosh ηx2 cos ξx)η0+n=12nenη0 cosh (nηx) cosh (nη0) cos (nξx). (74)

    By using the identity equation,

     ln ( cosh ηx cos ξx)=ηxm=12memηx cos m ξx ln 2, (75)

    the solution by using the direct BIE of Eq (59) is rewritten as

    u(x)=vDb(c,η0)( ln (2 cosh ηx2 cos ξx)+n=12nenη0 cosh (nηx) cosh (nη0) cos (nξx)(2 ln (2c)+n=11n4e2nη01+e2nη0)),η0ηx > 0. (76)

    Equation (59) is the special case of Eq (73), if λ=vDb(c,η0). When the size of the boundary is a degenerate scale, i.e., Db(c,η0)=0, the BIE solution does not exist if v0. If v=0, the constant term in Eq (57), ar0, is a free constant. The electrostatic potential is obtained by

    u(x) =( ln (2 cosh ηx2 cos ξx)η0+n=12nenη0 cosh (nηx) cosh (nη0) cos (nξx))ar0, η0ηx > 0, (77)

    and Eq (73) can be reduced to

    u(x)=λud(x), (78)

    since v is zero. It is easy to find that ar0 and λ are equivalent.

    Similarly, the general solution space of the anti-symmetry problem in Eq (45) is expressed as follows:

    u(x)=vηxη0+λud(x) (79)

    where

    ud(x)= ln (2 cosh ηx2 cos ξx)η0+n=12nenη0 cosh (nηx) cosh (nη0) cos (nξx). (80)

    Lebedev et al. [8] considered the condition at the infinity, u(x)=0,x, the solution of Eq (79) would reduce to only

    u(x)=vηxη0. (81)

    It is the reason why the solution of Eq (68) by using the BIEM is a special case of Eq (79) for λ=0. When the size of the boundary is a degenerate scale, i.e., Db(c,η0)=0, it yields infinite solutions. Since the sum of constant terms, ar0 and al0, in Eq (69) is a free constant, k, the electrostatic potential yields

    u(x)=vηxη0( ln (2 cosh ηx2 cos ξx)η0+2n=11nenη0 cosh (nη0) cosh (nηx) cos (nξx))k, η0ηx > 0. (82)

    It is easy to find that k and λ are equivalent. To sum up, the free constant, λ and ud(x) in the general solution by Lekner [4] are similar to the constant term in the boundary potential and the obtained BIE solution for the degenerate scale by Chen et al. [9], respectively. The obtained BIE solution for the degenerate case yields nontrivial boundary flux even though the boundary potential is trivial. The generalized potential and the available solutions by using the BIEM for the problem containing two cylinders are compared with each other in Table 3.

    Table 3.  Comparison of the general solution and the BIE solution for the double cylinder subject to the symmetrical or anti-symmetrical condition.

     | Show Table
    DownLoad: CSV

    This paper investigates the solution space for the electrostatics of two cylinders using the BIEM. Both the symmetric and anti-symmetric cases are considered. Flux equilibrium on the cylindrical boundaries and the asymptotic behavior at infinity is also examined. Moreover, on the base of the Fredholm alternative theorem, the relation of unique solution and the degenerate scale in the BIEM is linked. The logarithmic capacity and the discriminant are also linked by using an exponential relation. Besides, the degenerate scale is also related. Not only two cylinders but also a single one (circle or ellipse) are considered. Finally, the results are compared with those derived by other researchers. Linkage and agreement are made.

    The authors wish to thank the financial supports from the National Science and Technology Council, Taiwan under Grant No. MOST 111-2221-E-019-009-MY3 for National Taiwan Ocean University.

    The authors declare there is no conflict of interest.



    [1] J. Wu, W. Sun, S. F. Su, Y. Q. Wu, Adaptive quantized control for uncertain nonlinear systems with unknown control directions, Int. J. Robust Nonlinear Control, 31 (2021), 8658–8671. https://doi.org/10.1002/rnc.5748 doi: 10.1002/rnc.5748
    [2] A. Shatyrko, J. Diblík, D. Khusainov, M. Růžičková, Stabilization of Lur'e-type nonlinear control systems by Lyapunov-Krasovskii functionals, Adv. Diff. Equations, 2012 (2012), 1–9. https://doi.org/10.1186/1687-1847-2012-229 doi: 10.1186/1687-1847-2012-229
    [3] K. Tatsuya, Limit-cycle-like control for 2-dimensional discrete-time nonlinear control systems and its application to the Hénon map, Commun. Nonlinear Sci. Numer. Simul., 18 (2013), 171–183. https://doi.org/10.1016/j.cnsns.2012.06.012 doi: 10.1016/j.cnsns.2012.06.012
    [4] Y. H. Wei, Lyapunov stability theory for nonlinear nabla fractional order systems, IEEE Trans. Circuits Sys., 68 (2021), 3246–3250. https://doi.org/10.1109/TCSII.2021.3063914 doi: 10.1109/TCSII.2021.3063914
    [5] G. Pole, A. Girard, P. Tabuada, Approximately bisimilar symbolic models for nonlinear control systems, Automatica, 44 (2008), 2508–2516. https://doi.org/10.1016/j.automatica.2008.02.021 doi: 10.1016/j.automatica.2008.02.021
    [6] H. G. Zhang, X. Zhang, Y. H. Luo, J. Yang, An overview of research on adaptive dynamic programming, Acta Autom. Sin., 39 (2013), 303–311. https://doi.org/10.1016/S1874-1029(13)60031-2 doi: 10.1016/S1874-1029(13)60031-2
    [7] M. Volckaert, M. Diehl, J. Swevers, Generalization of norm optimal ILC for nonlinear systems with constraints, Mech. Syst. Signal Proc., 39 (2013), 280–296. https://doi.org/10.1016/j.ymssp.2013.03.009 doi: 10.1016/j.ymssp.2013.03.009
    [8] W. N. Gao, Z. P. Jiang, Nonlinear and adaptive suboptimal control of connected vehicles: A global adaptive dynamic programming approach, J. Intell. Rob. Syst., 85 (2017), 597–611. http://doi.org/10.1007/s10846-016-0395-3 doi: 10.1007/s10846-016-0395-3
    [9] E. Trélat, Optimal control and applications to aerospace: Some results and challenges, J. Optim. Theory Appl., 154 (2012), 713–758. https://doi.org/10.1007/s10957-012-0050-5 doi: 10.1007/s10957-012-0050-5
    [10] M. Margaliot, Stability analysis of switched systems using variational principles: An introduction, Automatica, 42 (2006), 2059–2077. https://doi.org/10.1016/j.automatica.2006.06.020 doi: 10.1016/j.automatica.2006.06.020
    [11] A. Maidi, J. P. Corriou, Open-loop optimal controller design using variational iteration method, Appl. Math. Comput., 219 (2013), 8632–8645. https://doi.org/10.1016/j.amc.2013.02.075 doi: 10.1016/j.amc.2013.02.075
    [12] F. H. Clarke, R. B. Vinter, The relationship between the maximum principle and dynamic programming, SIAM J. Control Optim., 25 (1987), 1291–1311. http://doi.org/10.1137/0325071 doi: 10.1137/0325071
    [13] R. W. Beard, G. N. Saridis, J. T. Wen, Approximate solutions to the time-invariant Hamilton–Jacobi–Bellman equation, J. Optim. Theory Appl., 96 (1998), 589–626. http://doi.org/10.1023/A:1022664528457 doi: 10.1023/A:1022664528457
    [14] J. A. Roubos, S. Mollov, R. Babuška, H. B. Verbruggen, Fuzzy model-based predictive control using Takagi–Sugeno models, Int. J. Approximate Reasoning, 22 (1999), 3–30. http://doi.org/10.1016/S0888-613X(99)00020-1 doi: 10.1016/S0888-613X(99)00020-1
    [15] D. A. Bristow, M. Tharayil, A. G. Alleyne, A survey of iterative learning control, IEEE Control Syst. Mag., 26 (2006), 96–114. https://doi.org/10.1109/MCS.2006.1636313 doi: 10.1109/MCS.2006.1636313
    [16] P. J. Werbos, W. T. Miller, R. S. Sutton, A menu of designs for reinforcement learning over time, Neural networks for control, MIT press, Cambridge, (1990), 67–95.
    [17] J. Wang, R. Y. K. Fung, Adaptive dynamic programming algorithms for sequential appointment scheduling with patient preferences, Artif. Intell. Med., 63 (2015), 33–40. https://doi.org/10.1016/j.artmed.2014.12.002 doi: 10.1016/j.artmed.2014.12.002
    [18] D. V. Prokhorov, D. C. Wunsch, Adaptive critic designs, IEEE Trans. Neural Networks, 8 (1997), 997–1007. http://doi.org/10.1109/72.623201 doi: 10.1109/72.623201
    [19] J. J. Murray, C. J. Cox, G. G. Lendaris, R. Saeks, Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern., 32 (2002), 140–153. http://doi.org/10.1109/TSMCC.2002.801727 doi: 10.1109/TSMCC.2002.801727
    [20] H. G. Zhang, Q. L. Wei, D. R. Liu, An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games, Automatica, 47 (2011), 207–214. http://doi.org/10.1016/j.automatica.2010.10.033 doi: 10.1016/j.automatica.2010.10.033
    [21] Q. L. Wei, H. G. Zhang, D. R. Liu, Y. Zhao, An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming, Acta Autom. Sin., 36 (2010), 121–129. http://doi.org/10.1016/S1874-1029(09)60008-2 doi: 10.1016/S1874-1029(09)60008-2
    [22] J. Ding, S. N. Balakrishnan, Approximate dynamic programming solutions with a single network adaptive critic for a class of nonlinear systems, J. Control Theory Appl., 9 (2011), 370–380. http://doi.org/10.1007/s11768-011-0191-3 doi: 10.1007/s11768-011-0191-3
    [23] D. R. Liu, D. Wang, D. B. Zhao, Adaptive dynamic programming for optimal control of unknown nonlinear discrete-time systems, in 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) IEEE, (2011), 242–249. https://doi.org/10.1109/ADPRL.2011.5967357
    [24] J. Modayil, A. White A, R. S. Sutton, Multi-timescale nexting in a reinforcement learning robot, Adapt. Behav., 22 (2014), 146–160. http://doi.org/10.1177/1059712313511648 doi: 10.1177/1059712313511648
    [25] C. X. Mu, Y. Zhang, Z. K. Gao, C. Y. Sun, ADP-based robust tracking control for a class of nonlinear systems with unmatched uncertainties, IEEE Trans. Syst. Man Cybern. Syst., 50 (2019), 4056–4067. http://doi.org/10.1109/TSMC.2019.2895692 doi: 10.1109/TSMC.2019.2895692
    [26] H. Y. Dong, X. W. Zhao, B. Luo, Optimal tracking control for uncertain nonlinear systems with prescribed performance via critic-only ADP, IEEE Trans. Syst. Man Cybern. Syst., 52 (2020), 561–573. https://doi.org/10.1109/TSMC.2020.3003797 doi: 10.1109/TSMC.2020.3003797
    [27] R. Z. Song, L. Zhu, Optimal fixed-point tracking control for discrete-time nonlinear systems via ADP, IEEE/CAA J. Autom. Sin., 6 (2019), 657–666. https://doi.org/10.1109/JAS.2019.1911453 doi: 10.1109/JAS.2019.1911453
    [28] M. M. Liang, Q. L. Wei, A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward, Neurocomputing, 424 (2021), 23–34. https://doi.org/10.1016/j.neucom.2020.11.014 doi: 10.1016/j.neucom.2020.11.014
    [29] B. Fan, Q. M. Yang, X. Y. Tang, Y. X. Sun, Robust ADP design for continuous-time nonlinear systems with output constraints, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 2127–2138. https://doi.org/10.1109/TNNLS.2018.2806347 doi: 10.1109/TNNLS.2018.2806347
    [30] X. Yang, H. B. He, Self-learning robust optimal control for continuous-time nonlinear systems with mismatched disturbances, Neural Networks, 99 (2018), 19–30. https://doi.org/10.1016/j.neunet.2017.11.022 doi: 10.1016/j.neunet.2017.11.022
    [31] D. R. Liu, X. Yang, D. Wang, Q. L. Wei, Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints, IEEE Trans. Cybern., 45 (2015), 1372–1385. http://doi.org/10.1109/TCYB.2015.2417170 doi: 10.1109/TCYB.2015.2417170
    [32] X. Yang, D. R. Liu, D. Wang, Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints, Int. J. Control, 87 (2014), 553–566. https://doi.org/10.1080/00207179.2013.848292 doi: 10.1080/00207179.2013.848292
    [33] J. G. Zhao, M. G. Gan, Finite-horizon optimal control for continuous-time uncertain nonlinear systems using reinforcement learning, Int. J. Syst. Sci., 51 (2020), 2429–2440. https://doi.org/10.1080/00207721.2020.1797223 doi: 10.1080/00207721.2020.1797223
    [34] B. Zhao, D. R. Liu, C. M. Luo, Reinforcement learning-based optimal stabilization for unknown nonlinear systems subject to inputs with uncertain constraints, IEEE Trans. Neural Networks Learn. Syst., 31 (2019), 4330–4340. https://doi.org/10.1109/TNNLS.2019.2954983 doi: 10.1109/TNNLS.2019.2954983
    [35] D. Wang, J. F. Qiao, Approximate neural optimal control with reinforcement learning for a torsional pendulum device, Neural Networks, 117 (2019), 1–7. https://doi.org/10.1016/j.neunet.2019.04.026 doi: 10.1016/j.neunet.2019.04.026
    [36] J. W. Kim, B. J. Park, H. Yoo, T. H. Oh, J. H. Lee, J. M. Lee, A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system, J. Proc. Control, 87 (2020), 166–178. https://doi.org/10.1016/j.jprocont.2020.02.003 doi: 10.1016/j.jprocont.2020.02.003
    [37] F. Y. Wang, N. Jin, D. R. Liu, Q. L. Wei, Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with epsilon-error bound, IEEE Trans. Neural Networks, 22 (2010), 24–36. https://doi.org/10.1109/TNN.2010.2076370 doi: 10.1109/TNN.2010.2076370
    [38] K. G. Vamvoudakis, F. L. Lewis, Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton–Jacobi equations, Automatica, 47 (2011), 1556–1569. https://doi.org/10.1016/j.automatica.2011.03.005 doi: 10.1016/j.automatica.2011.03.005
    [39] Q. L. Wei, D. R. Liu, An iterative epsilon-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state, Neural Networks, 32 (2012), 236–244. https://doi.org/10.1007/978-981-10-4080-1_2 doi: 10.1007/978-981-10-4080-1_2
    [40] D. R. Liu, Q. L. Wei, P. F. Yan, Generalized policy iteration adaptive dynamic programming for discrete-time nonlinear systems, IEEE Trans. Syst. Man Cybern. Syst., 45 (2015), 1577–1591. https://doi.org/10.1109/TSMC.2015.2417510 doi: 10.1109/TSMC.2015.2417510
    [41] S. H. Li, H. B. Du, X. H. Yu, Discrete-time terminal sliding mode control systems based on euler's discretization, IEEE Trans. Autom. Control, 59 (2013), 546–552. https://doi.org/10.1109/TAC.2013.2273267 doi: 10.1109/TAC.2013.2273267
    [42] D. Bertsekas, Dynamic Programming and Optimal Control: Volume I, Athena scientific, 2012.
    [43] C. J. C. H. Watkins, P. Dayan, Q-learning, Mach. Learn., 8 (1992), 279–292. https://doi.org/10.1007/BF00992698 doi: 10.1007/BF00992698
    [44] A. Y. Ng, D. Harada, S. Russell, Policy invariance under reward transformations: Theory and application to reward shaping, LCML, 99 (1999), 278–287.
    [45] L. Buşoniu, B. D. Schutter, R. Babuška, Approximate dynamic programming and reinforcement learning, in Interactive collaborative information systems, (2010), 3–44. https://doi.org/10.1007/978-3-642-11688-9_1
    [46] T. Aotani, T. Kobayashi, K. Sugimoto, Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks, Appl. Intell., 51 (2021), 4434–4452. https://doi.org/10.1007/s10489-020-02034-2 doi: 10.1007/s10489-020-02034-2
    [47] C. HolmesParker, A. K. Agogino, K. Tumer, Combining reward shaping and hierarchies for scaling to large multiagent systems, Knowl. Eng. Rev., 31 (2016), 3–18. https://doi.org/10.1017/S0269888915000156 doi: 10.1017/S0269888915000156
    [48] P. Mannion, S. Devlin, K. Mason, J. Duggan, E. Howley, Policy invariance under reward transformations for multi-objective reinforcement learning, Neurocomputing, 263 (2017), 60–73. https://doi.org/10.1016/j.neucom.2017.05.090 doi: 10.1016/j.neucom.2017.05.090
    [49] P. Mannion, S. Devlin, J. Duggan, E. Howley, Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning, Knowl. Eng. Rev., 33 (2018). https://doi.org/10.1017/S0269888918000292 doi: 10.1017/S0269888918000292
    [50] C. Y. Hu, A confrontation decision-making method with deep reinforcement learning and knowledge transfer for multi-agent system, Symmetry, 12 (2020), 631. https://doi.org/10.3390/sym12040631 doi: 10.3390/sym12040631
    [51] A. G. Barto, R. S. Sutton, C. W. Anderson, Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Trans. Syst. Man Cybern. Syst., 5 (1983), 834–846. https://doi.org/10.1109/TSMC.1983.6313077 doi: 10.1109/TSMC.1983.6313077
    [52] L. B. Prasad, B. Tyagi, H. O. Gupta, Optimal control of nonlinear inverted pendulum system using PID controller and LQR: Performance analysis without and with disturbance input, Int. J. Autom. Comput., 11 (2014), 661–670. https://doi.org/10.1007/s11633-014-0818-1 doi: 10.1007/s11633-014-0818-1
    [53] V. Mnih, K. Kavukcuoglu, D. Silver, J. Veness, A. Graves, M. Riedmiller, et al, Human-level control through deep reinforcement learning, Nature, 518 (2015), 529–533. https://doi.org/10.1038/nature14236 doi: 10.1038/nature14236
    [54] T. d. Bruin, J. Kober, K. Tuyls, R. Babuˇska, Experience selection in deep reinforcement learning for control, J. Mach. Learn. Res., 19 (2018).
    [55] B. C. Stadie, S. Levine, p. Abbeel, Incentivizing exploration in reinforcement learning with deep predictive models, preprint, arXiv: 1507.00814.
    [56] Z. L. Ning, P. R. Dong, X. J. Wang, JJPC. Rodrigues, F. Xia, Deep reinforcement learning for vehicular edge computing: An intelligent offloading system, in ACM Transactions on Intelligent Systems and Technology, 10 (2019), 1–24. https://doi.org/10.1145/3317572
    [57] H. Yoo, B. Kim, J. W. Kim, J. H. Lee, Reinforcement learning based optimal control of batch processes using Monte-Carlo deep deterministic policy gradient with phase segmentation, Comput. Chem. Eng., 144 (2021), 107133. https://doi.org/10.1016/j.compchemeng.2020.107133 doi: 10.1016/j.compchemeng.2020.107133
    [58] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, et al, Continuous control with deep reinforcement learning, preprint, arXiv: 1509.02971.
    [59] S. Satheeshbabu, N. K. Uppalapati, T. Fu, G. Krishnan, Continuous control of a soft continuum arm using deep reinforcement learning, in 2020 3rd IEEE International Conference on Soft Robotics (RoboSoft), IEEE, (2020), 497–503. https://doi.org/10.1109/RoboSoft48309.2020.9116003
    [60] Y. Ma, W. B. Zhu, M. G. Benton, J. Romagnoli, Continuous control of a polymerization system with deep reinforcement learning, J. Proc. Control, 75 (2019), 40–47. https://doi.org/10.1016/j.jprocont.2018.11.004 doi: 10.1016/j.jprocont.2018.11.004
    [61] R. B. Zmood, The euclidean space controllability of control systems with delay, SIAM J. Control, 12 (1974), 609–623. https://doi.org/10.1137/0312045 doi: 10.1137/0312045
  • This article has been cited by:

    1. Jeng-Tzong Chen, Wei-Chen Tai, Ying-Te Lee, Shing-Kai Kao, An analytical Green’s function for Laplace operator in an infinite plane with two circular holes using degenerate kernels, 2023, 146, 08939659, 108774, 10.1016/j.aml.2023.108774
  • Reader Comments
  • © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2449) PDF downloads(151) Cited by(4)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog