Loading [MathJax]/jax/output/SVG/jax.js
Research article Topical Sections

A study of value iteration and policy iteration for Markov decision processes in Deterministic systems

  • Received: 04 August 2024 Revised: 11 November 2024 Accepted: 19 November 2024 Published: 28 November 2024
  • MSC : 60J05, 60J10

  • In the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as the convergence criteria of these algorithms are deeply interconnected with the inherent characteristics of the probability function governing state transitions. For VI, convergence is contingent upon verifying that the cost difference function stabilizes to a constant k ensuring uniformity across iterations. In contrast, PI achieves convergence when the value function maintains consistent values over successive iterations. Finally, a detailed example demonstrates the conditions under which convergence of the algorithm is achieved, underscoring the practicality of these methods in deterministic settings.

    Citation: Haifeng Zheng, Dan Wang. A study of value iteration and policy iteration for Markov decision processes in Deterministic systems[J]. AIMS Mathematics, 2024, 9(12): 33818-33842. doi: 10.3934/math.20241613

    Related Papers:

    [1] Xiao Guo, Chuanpei Xu, Zhibin Zhu, Benxin Zhang . Nonmonotone variable metric Barzilai-Borwein method for composite minimization problem. AIMS Mathematics, 2024, 9(6): 16335-16353. doi: 10.3934/math.2024791
    [2] Jamilu Sabi'u, Ali Althobaiti, Saad Althobaiti, Soubhagya Kumar Sahoo, Thongchai Botmart . A scaled Polak-Ribiˊere-Polyak conjugate gradient algorithm for constrained nonlinear systems and motion control. AIMS Mathematics, 2023, 8(2): 4843-4861. doi: 10.3934/math.2023241
    [3] Jamilu Sabi'u, Ibrahim Mohammed Sulaiman, P. Kaelo, Maulana Malik, Saadi Ahmad Kamaruddin . An optimal choice Dai-Liao conjugate gradient algorithm for unconstrained optimization and portfolio selection. AIMS Mathematics, 2024, 9(1): 642-664. doi: 10.3934/math.2024034
    [4] Sani Aji, Aliyu Muhammed Awwal, Ahmadu Bappah Muhammadu, Chainarong Khunpanuk, Nuttapol Pakkaranang, Bancha Panyanak . A new spectral method with inertial technique for solving system of nonlinear monotone equations and applications. AIMS Mathematics, 2023, 8(2): 4442-4466. doi: 10.3934/math.2023221
    [5] Yiting Zhang, Chongyang He, Wanting Yuan, Mingyuan Cao . A novel nonmonotone trust region method based on the Metropolis criterion for solving unconstrained optimization. AIMS Mathematics, 2024, 9(11): 31790-31805. doi: 10.3934/math.20241528
    [6] Ting Lin, Hong Zhang, Chaofan Xie . A modulus-based modified multivariate spectral gradient projection method for solving the horizontal linear complementarity problem. AIMS Mathematics, 2025, 10(2): 3251-3268. doi: 10.3934/math.2025151
    [7] Zhensheng Yu, Peixin Li . An active set quasi-Newton method with projection step for monotone nonlinear equations. AIMS Mathematics, 2021, 6(4): 3606-3623. doi: 10.3934/math.2021215
    [8] Luyao Zhao, Jingyong Tang . Convergence properties of a family of inexact Levenberg-Marquardt methods. AIMS Mathematics, 2023, 8(8): 18649-18664. doi: 10.3934/math.2023950
    [9] Austine Efut Ofem, Jacob Ashiwere Abuchu, Godwin Chidi Ugwunnadi, Hossam A. Nabwey, Abubakar Adamu, Ojen Kumar Narain . Double inertial steps extragadient-type methods for solving optimal control and image restoration problems. AIMS Mathematics, 2024, 9(5): 12870-12905. doi: 10.3934/math.2024629
    [10] Limei Xue, Jianmin Song, Shenghua Wang . A modified projection and contraction method for solving a variational inequality problem in Hilbert spaces. AIMS Mathematics, 2025, 10(3): 6128-6143. doi: 10.3934/math.2025279
  • In the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as the convergence criteria of these algorithms are deeply interconnected with the inherent characteristics of the probability function governing state transitions. For VI, convergence is contingent upon verifying that the cost difference function stabilizes to a constant k ensuring uniformity across iterations. In contrast, PI achieves convergence when the value function maintains consistent values over successive iterations. Finally, a detailed example demonstrates the conditions under which convergence of the algorithm is achieved, underscoring the practicality of these methods in deterministic settings.



    In this paper, we consider the two-dimensional viscous, compressible and heat conducting magnetohydrodynamic equations in the Eulerian coordinates (see [1])

    {ρt+div(ρu)=0,(ρu)t+div(ρuu)+P=μu+(μ+λ)(divu)+HH12|H|2,cv((ρθ)t+div(ρuθ))+Pdivu=κΔθ+λ(divu)2+ν|curlH|2+2μ|D(u)|2,Ht+uHHu+Hdivu=νΔH,divH=0. (1.1)

    Here x=(x1,x2)Ω is the spatial coordinate, Ω is a bounded smooth domain in R2, t0 is the time, and the unknown functions ρ=ρ(x,t), θ=θ(x,t), u=(u1,u2)(x,t) and H=(H1,H2)(x,t) denote, respectively, the fluid density, absolute temperature, velocity and magnetic field. In addition, the pressure P is given by

    P(ρ)=Rθρ,(R>0),

    where R is a generic gas constant. The deformation tensor D(u) is defined by

    D(u)=12(u+(u)tr).

    The shear viscosity μ and the bulk one λ satisfy the hypotheses as follows

    μ>0,μ+λ0.

    Positive constants cv, κ and ν represent, respectively, the heat capacity, heat conductivity and magnetic diffusivity coefficient.

    The initial condition and boundary conditions for Eq (1.1) are given as follows

    (ρ,θ,u,H)(x,t=0)=(ρ0,θ0,u0,H0), (1.2)
    θn=0,u=0,Hn=0,curlH=0,onΩ, (1.3)

    where n denotes the unit outward normal vector of Ω.

    Remark 1.1. The boundary condition imposed on H (1.3) is physical and means that the container is perfectly conducting, see [1,2,3,4].

    In the absence of electromagnetic effect, namely, in the case of H0, the MHD system reduces to the Navier-Stokes equations. Due to the strong coupling and interplay interaction between the fluid motion and the magnetic field, it is rather complicated to investigate the well-posedness and dynamical behaviors of MHD system. There are a huge amount of literature on the existence and large time behavior of solutions to the Navier-Stokes system and MHD one due to the physical importance, complexity, rich phenomena and mathematical challenges, see [1,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26] and the reference therein. However, many physically important and mathematically fundamental problems are still open due to the lack of smoothing mechanism and the strong nonlinearity. When the initial density contain vacuum states, the local large strong solutions to Cauchy problem of 3D full MHD equations and 2D isentropic MHD system have been obtained, respectively, by Fan-Yu [5] and Lü-Huang [6]. For the global well-posedness of strong solutions, Li-Xu-Zhang [7] and Lü-Shi-Xu [8] established the global existence and uniqueness of strong solutions to the 3D and 2D MHD equations, respectively, provided the smooth initial data are of small total energy. In particular, the initial density can have compact support in [7,8]. Furthermore, Hu-Wang [9,10] and Fan-Yu [11] proved the global existence of renormalized solutions to the compressible MHD equations for general large initial data. However, it is an outstanding challenging open problem to establish the global well-posedness for general large strong solutions with vacuum.

    Therefore, it is important to study the mechanism of blow-up and structure of possible singularities of strong (or smooth) solutions to the compressible MHD system (1.1). The pioneering work can be traced to Serrin's criterion [12] on the Leray-Hopf weak solutions to the 3D incompressible Navier-Stokes equations, that is

    limtTuLs(0,t;Lr)=,for3r+2s=1,3<r, (1.4)

    where T is the finite blow up time. Later, He-Xin [13] established the same Serrin's criterion (1.4) for strong solutions to the incompressible MHD equations.

    First of all, we recall several known blow-up criteria for the compressible Navier-Stokes equations. In the isentropic case, Huang-Li-Xin [14] established the Serrin type criterion as follows

    limtT(uLs(0,t;Lr)+divuL1(0,t;L))=,for3r+2s=1,3<r. (1.5)

    For the full compressible Navier-Stokes equations, Fan-Jiang-Ou [15] obtained that

    limtT(θL(0,t;L)+uL1(0,t;L))=, (1.6)

    under the condition

    7μ>λ. (1.7)

    Later, the restriction (1.7) was removed in Huang-Li-Xin [16]. Recently, Wang [17] established a blow-up criterion for the initial boundary value problem (IBVP) on a smooth bounded domain in R2, namely,

    limtTdivuL1(0,t;L)=. (1.8)

    Then, let's return to the compressible MHD system (1.1). Under the three-dimensional isentropic condition, Xu-Zhang [18] founded the same criterion (1.5) as [14]. For the three-dimensional full compressible MHD system, the criterion (1.6) is also established by Lu-Du-Yao [19] under the condition

    μ>4λ. (1.9)

    Soon, the restriction (1.9) was removed by Chen-Liu [20]. Later, for the Cauchy problem and the IBVP of three-dimensional full compressible MHD system, Huang-Li [21] proved that

    limtT(uLs(0,t;Lr)+ρL(0,t;L))=,for3r+2s1,3<r. (1.10)

    Recently, Fan-Li-Nakamura [22] extended the results of [17] to the MHD system and established a blow-up criterion which depend only on H and divu as follows

    limtT(HL(0,t;L)+divuL1(0,t;L))=. (1.11)

    In fact, if H0 in (1.11), the criterion (1.11) becomes (1.8).

    The purpose of this paper is to loosen and weaken the regularity of H required in the blow-up criterion (1.11) for strong solutions of the IBVP (1.1)–(1.3).

    In this paper, we denote

    dxΩdx.

    Furthermore, for s0 and 1r, we define the standard Lebesgue and Sobolev spaces as follows

    {Lr=Lr(Ω),Ws,r=Ws,r(Ω),Hs=Ws,2,Ws,r0={fWs,r|f=0onΩ},Hs0=Ws,20.

    To present our results, we first recall the local existence theorem of the strong solution. Fan-Yu [5] attained the local existence and uniqueness of strong solution with full compressible MHD system in R3. In fact, when Ω is a bounded domain in R2, the method applied in [5,23] can also be used to the case here. The corresponding result can be expressed as follows.

    Theorem 1.1. (Local existence theorem) For q>2, assume that the initial data (ρ0,θ0,u0,H0) satisfies

    {0ρ0W1,q,0θ0H2,u0H10H2,H0H2,divH0=0,θ0n|Ω=0,u0|Ω=0,H0n|Ω=curlH0|Ω=0, (1.12)

    and the compatibility conditions as follows

    μu0(μ+λ)divu0+R(ρ0θ0)H0H0+12|H0|2=ρ1/20g1, (1.13)
    κθ02μ|D(u0)|2λ(divu0)2ν(curlH0)2=ρ1/20g2, (1.14)

    for some g1,g2L2. Then there exists a time T0>0 such that the IBVP (1.1)–(1.3) has a unique strong solution (ρ,θ,u,H) on Ω×(0,T0] satisfying that

    {0ρC([0,T0];W1,q),ρtC([0,T0];Lq),(u,θ,H)C([0,T0];H2)L2(0,T0;W2,q),θ0,(ut,θt,Ht)L2(0,T0;H1),(ρut,ρθt,Ht)L(0,T0;L2). (1.15)

    Then, our main result is stated as follows.

    Theorem 1.2. Under the assumption of Theorem 1.1, suppose (ρ,θ,u,H) is the strong solution of the IBVP (1.1)–(1.3) obtained in Theorem 1.1. If T< is the maximum existence time of the strong solution, then

    limtT(HL(0,t;Lb)+divuL1(0,t;L))=, (1.16)

    for any b>2.

    Remark 1.2. Compared to the blow-up criterion (1.11) attained in [22], Theorem 1.2 demonstrates some new message about the blow-up mechanism of the MHD system (1.1)–(1.3). Particularly, beside the same regularity on divuL1(0,t;L) as (1.11) in [22], our result (1.16) improves the regularity on HL(0,t;L) by relaxing it to HL(0,t;Lb) for any b>2.

    The rest of the paper is arranged as follows. We state several basic facts and key inequalities which are helpful for later analysis in Section 2. Sections 3 is devoted to a priori estimate which is required to prove Theorem 1.2, while we give its proof in Section 4.

    In this section, we will recall several important inequalities and well-known facts. First of all, Gagliardo-Nirenberg inequality (see [27]) is described as follows.

    Lemma 2.1. (Gagliardo-Nirenberg) For q(1,),r(2,) and s[2,), there exists some generic constant C>0 which may depend only on q,r and s such that for fC0(Ω), we have

    fsLs(Ω)Cf2L2(Ω)fs2L2(Ω), (2.1)
    gL(Ω)Cgq(r2)/(2r+q(r2))Lq(Ω)g2r/(2r+q(r2))Lr(Ω). (2.2)

    Then, we give several regularity results for the following Lamé system with Dirichlet boundary condition (see [24])

    {LUμΔU+(μ+λ)divU=F,xΩ,U=0,xΩ. (2.3)

    We assume that UH10 is a weak solution of the Lamé system, due to the uniqueness of weak solution, it could be denoted by U=L1F.

    Lemma 2.2. Let r(1,), then there exists some generic constant C>0 depending only on μ,λ,r and Ω such that

    If FLr, then

    UW2,r(Ω)CFLr(Ω). (2.4)

    If FW1,r (i.e., F=divf with f=(fij)2×2,fijLr), then

    UW1,r(Ω)CfLr(Ω). (2.5)

    Furthermore, for the endpoint case, if fijL2L, then UBMO(Ω) and

    UBMO(Ω)CfL(Ω)+CfL2(Ω). (2.6)

    The following Lp-bound for elliptic systems, whose proof is similar to that of [28,Lemma 12], is a direct consequence of the combination of a well-known elliptic theory due to Agmon-Douglis-Nirenberg[29,30] with a standard scaling procedure.

    Lemma 2.3. For k0 and p>1, there exists a constant C>0 depending only on k and p such that

    k+2vLp(Ω)CΔvWk,p(Ω), (2.7)

    for every vWk+2,p(Ω) satisfying either

    vn=0,rotv=0,on  Ω,

    or

    v=0,on  Ω.

    Finally, we give two critical Sobolev inequalities of logarithmic type, which are originally due to Brezis-Gallouet [31] and Brezis-Wainger [32].

    Lemma 2.4. Let ΩR2 be a bounded Lipschitz domain and fW1,q with q>2, then it holds that

    fL(Ω)CfBMO(Ω)ln(e+fW1,q(Ω))+C, (2.8)

    with a constant C depending only on q.

    Lemma 2.5. Let ΩR2 be a smooth domain and fL2(s,t;H10W1,q) with q>2, then it holds that

    f2L2(s,t;L)Cf2L2(s,t;H1)ln(e+fL2(s,t;W1,q))+C, (2.9)

    with a constant C depending only on q.

    Let (ρ,θ,u,H) be the strong solution of the IBVP (1.1)–(1.3) obtained in Theorem 1.1. Assume that (1.16) is false, namely, there exists a constant M>0 such that

    limtT(HL(0,t;Lb)+divuL1(0,t;L))M<,for anyb>2. (3.1)

    First of all, the upper bound of the density can be deduced from (1.1)1 and (3.1), see [14,Lemma 3.4].

    Lemma 3.1. Under the assumptions of Theorem 1.2 and (3.1), it holds that for any t[0,T),

    sup0stρL1LC, (3.2)

    where (and in what follows) C represents a generic positive constant depending only on μ,λ,cv,κ, ν, q, b, M, T and the initial data.

    Then, we give the following estimates, which are similar to the energy estimates.

    Lemma 3.2. Under the assumptions of Theorem 1.2 and (3.1), it holds that for any t[0,T),

    sup0st(ρθL1+ρ1/2u2L2+H2L2)+t0(u2L2+H2L2)dsC. (3.3)

    Proof. First, using the standard maximum principle to (1.1)3 together with θ00 (see [15,25]) gives

    infΩ×[0,t]θ(x,t)0. (3.4)

    Then, utilizing the standard energy estimates to (1.1) shows

    sup0st(ρθL1+ρ1/2u2L2+H2L2)C. (3.5)

    Next, adding (1.1)2 multiplied by u to (1.1)4 multiplied by H, and integrating the summation by parts, we have

    12ddt(ρ1/2u2L2+H2L2)+μu2L2+νH2L2+(μ+λ)divu2L2CρθL1divuL, (3.6)

    where one has used the following well-known fact

    HL2CcurlHL2, (3.7)

    due to divH=0 and Hn|Ω=0.

    Hence, the combination of (3.6) with (3.1), (3.4) and (3.5) yields (3.3). This completes the proof of Lemma 3.2.

    The following lemma shows the estimates on the spatial gradients of both the velocity and the magnetic, which are crucial for obtaining the higher order estimates of the solution.

    Lemma 3.3. Under the assumptions of Theorem 1.2 and (3.1), it holds that for any t[0,T),

    sup0st(ρ1/2θ2L2+u2L2+curlH2L2)+t0(ρ1/2˙u2L2+θ2L2+Ht2L2+ΔH2L2)dsC, (3.8)

    where ˙fuf+ft represents the material derivative of f.

    Proof. Above all, multiplying the equation (1.1)3 by θ and integrating by parts yield

    cv2ddtρ1/2θ2L2+κθ2L2νθ|curlH|2dx+Cθ|u|2dx+Cρ1/2θ2L2divuL. (3.9)

    Firstly, integration by parts together with (3.1) and Gagliardo-Nirenberg inequality implies that

    νθ|curlH|2dxCθL2HLbHL˜b+CθL˜bHLb2HL2CθL2HL˜b+C2HL2(θL2+1)εθ2L2+C2H2L2+C(H2L2+1), (3.10)

    where ˜b2bb2>2 satisfies 1/b+1/˜b=1/2, and in the second inequality where one has applied the estimate as follows

    θLrC(θL2+1),for anyr1. (3.11)

    Indeed, denote the average of θ by ˉθ=1|Ω|θdx, it follows from (3.2) and (3.3) that

    ˉθρdxρθdx+ρ|θˉθ|dxC+CθL2, (3.12)

    which together with Poincaré inequality yields

    θL2C(1+θL2). (3.13)

    And consequently, (3.11) holds.

    Secondly, according to [17,21,33], Multiplying equations (1.1)2 by uθ and integrating by parts yield

    μθ|u|2dx+(μ+λ)θ|divu|2dx=ρ˙uθudxμuθudx(μ+λ)divuuθdxPθudx+HHθudx12|H|2θudx6i=1Ii. (3.14)

    Using the same arguments in [17,33], we have

    4i=1Iiηρ1/2˙u2L2+εθ2L2+Cρ1/2θ2L2divuL+C(ρ1/2θ2L2+u2L2)u2L. (3.15)

    Besides, according to (3.1) and (3.11) yields

    6i=5IiCθL˜bHLbHL2uLεθ2L2+CH2L2u2L+C. (3.16)

    Substituting (3.10), (3.15) and (3.16) into (3.9), and choosing ε suitably small, we have

    cvddtρ1/2θ2L2+κθ2L22ηρ1/2˙u2L2+C1ΔH2L2+C(ρ1/2θ2L2+u2L2+H2L2+1)(divuL+u2L+1), (3.17)

    where one has applied the key fact as follows

    2HL2CΔHL2. (3.18)

    Furthermore, it follows from (3.1) and (1.1)4 that

    Ht2L2+ν2ΔH2L2+νddtcurlH2L2CuL2uL2˜bHLbHL2˜b+CH2L2u2LCuL2uL2˜b(HL2+1)+CH2L2u2L. (3.19)

    In order to estimate uL2˜b, according to [24,26], we divide u into v and w. More precisely, let

    u=v+w,andv=L1P, (3.20)

    then we get

    Lw=ρ˙uHH+12|H|2. (3.21)

    And hence, Lemma 2.2 implies that for any r>1,

    vLrCθρLr, (3.22)

    and

    2wLrCρ˙uLr+C|H||H|Lr. (3.23)

    Consequently, it follows from Gagliardo-Nirenberg inequality, (3.2), (3.11), (3.20), (3.22) and (3.23) that for any s2,

    uLsCvLs+CwLsCρθLs+CwL2+Cw2/sL22w12/sL2CρθLs+CwL2+Cw2/sL2(ρ˙uL2+|H||H|L2)12/sηρ1/2˙uL2+CρθLs+CwL2+C|H||H|L2ηρ1/2˙uL2+CuL2+CθL2+CHL2+CΔHL2+C. (3.24)

    Putting (3.24) into (3.19) and utilizing Young inequality lead to

    Ht2L2+ν22ΔH2L2+νddtcurlH2L2εθ2L2+ηρ1/2˙u2L2+C(u2L2+u2L+1)(H2L2+1). (3.25)

    Adding (3.25) multiplied by 2ν2(C1+1) to (3.17) and choosing ε suitably small, we have

    κ2θ2L2+2ν2(C1+1)Ht2L2+ΔH2L2+ddt(cvρ1/2θ2L2+2ν1(C1+1)curlH2L2)C(ρ1/2θ2L2+u2L2+H2L2+1)(u2L2+u2L+divuL+1)+Cηρ1/2˙u2L2. (3.26)

    Then, multiplying (1.1)2 by ut and integrating by parts, we get

    12ddt(μu2L2+(μ+λ)divu2L2)+ρ1/2˙u2L2ηρ1/2˙u2L2+Cu2L2u2L+ddt(Pdivudx+12|H|2divudxHuHdx)PtdivudxHHtdivudx+HtuHdx+HuHtdx. (3.27)

    Notice that

    Ptdivudx=Ptdivvdx+Ptdivwdx, (3.28)

    integration by parts together with (3.20) leads to

    Ptdivvdx=12ddt((μ+λ)divv2L2+μv2L2). (3.29)

    Moreover, define

    Ecvθ+12|u|2,

    according to (1.1) that E satisfies

    (ρE)t+div(ρuE+Pu)=Δ(κθ+12μ|u|2)+μdiv(uu)+λdiv(udivu)+HHu12u|H|2+ν|curlH|2. (3.30)

    Motivated by [17,21], it can be deduced from (3.30) that

    Ptdivwdx=Rcv((ρE)tdivwdx12(ρ|u|2)tdivwdx)=Rcv{((cv+R)ρθu+12ρ|u|2uκθμuuμuuλudivu)divwdx12ρ|u|2udivwdxρ˙uudivwdxdivHHudivwdxHuHdivwdx(Hu)Hdivwdx+12divu|H|2divwdx+12|H|2udivwdxνdivw×curlHHdxνcurl(curlH)Hdivwdx}Cηρ1/2˙u2L2+Cθ2L2+CΔH2L2+C(u2L2+u2L+1)(ρ1/2θ2L2+u2L2+H2L2+1). (3.31)

    Additionally, combining (3.1) and (3.24) yields

    HtuHdx+HuHtdxHHtdivudxCHt2L2+Cu2L˜bH2LbCηρ1/2˙u2L2+C(Ht2L2+θ2L2+u2L2+H2L2+ΔH2L2+1). (3.32)

    Substituting (3.28), (3.29), (3.31) and (3.32) into (3.27) yields

    ρ1/2˙u2L2+ddt(μ2(u2L2+v2L2)+μ+λ2(divu2L2+divv2L2)A(t))C2(θ2L2+Ht2L2+ΔH2L2)+Cηρ1/2˙u2L2+C(u2L2+u2L+1)(u2L2+H2L2+ρ1/2θ2L2+1), (3.33)

    where

    A(t)12|H|2divudx+PdivudxHuHdx, (3.34)

    satisfies

    A(t)μ4u2L2+C3(ρ1/2θ2L2+curlH2L2+1). (3.35)

    Recalling the inequality (3.26), let

    C4=min{2ν2(C1+1),κ2,1},C5=min{2ν1(C1+1),cv}, (3.36)

    adding (3.26) multiplied by C6=max{C14(C2+1),C15(C3+1)} into (3.33) and choosing η suitably small, we have

    ddt˜A(t)+12ρ1/2˙u2L2+θ2L2+Ht2L2+ΔH2L2C(ρ1/2θ2L2+u2L2+H2L2+1)(u2L2+u2L+divuL+1), (3.37)

    where

    ˜A(t)C6(cvρ1/2θ2L2+2ν1(C1+1)curlH2L2)+μ2(u2L2+v2L2)+μ+λ2(divu2L2+divv2L2)A(t), (3.38)

    satisfies

    ρ1/2θ2L2+μ4u2L2+curlH2L2C˜A(t)Cρ1/2θ2L2+Cu2L2+CcurlH2L2+C. (3.39)

    Finally, integrating (3.37) over (τ,t), along with (3.39) yields

    ψ(t)Ctτ(u2L2+u2L+divuL+1)ψ(s)ds+Cψ(τ), (3.40)

    where

    ψ(t)t0(ρ1/2˙u2L2+θ2L2+Ht2L2+ΔH2L2)ds+ρ1/2θ2L2+u2L2+curlH2L2+1. (3.41)

    Combined with (3.1), (3.3) and Gronwall inequality implies that for any 0<τt<T,

    ψ(t)Cψ(τ)exp{tτ(u2L2+u2L+divuL+1)ds}Cψ(τ)exp{tτu2Lds}. (3.42)

    Utilizing Lemma 2.5, we have

    u2L2(τ,t;L)Cu2L2(τ,t;H1)ln(e+uL2(τ,t;W1,b))+C. (3.43)

    Combining (3.1), (3.2), (3.11), (3.22), (3.23) and Sobolev inequality leads to

    uW1,bvW1,b+CwW2,2b/(b+2)Cρ˙uL2b/(b+2)+CρθLb+CuL2+C|H||H|L2b/(b+2)Cρ1/2Lbρ1/2˙uL2+CθL2+CuL2+CHLbHL2+CCρ1/2˙uL2+CθL2+CuL2+CHL2+C, (3.44)

    this implies that

    uL2(τ,t;W1,b)Cψ1/2(t). (3.45)

    Substituting (3.45) into (3.43) indicates

    u2L2(τ,t;L)C+Cu2L2(τ,t;H1)ln(Cψ(t))C+ln(Cψ(t))C7u2L2(τ,t;H1). (3.46)

    Using (3.3), one can choose some τ which is close enough to t such that

    C7u2L2(τ,t;H1)12, (3.47)

    which together with (3.42) and (3.46) yields

    ψ(t)Cψ2(τ)C. (3.48)

    Noticing the definition of ψ in (3.41), we immediately have (3.8). The proof of Lemma 3.3 is completed.

    Now, we show some higher order estimates of the solutions which are needed to guarantee the extension of local solution to be a global one under the conditions (1.12)–(1.14) and (3.1).

    Lemma 3.4. Under the assumptions of Theorem 1.2 and (3.1), it holds that for any t[0,T),

    sup0st(ρW1,q+θH2+uH2+HH2)C. (3.49)

    Proof. First, it follows from (3.8), Gagliardo-Nirenberg and Poincaré inequalities that for 2q<,

    uLq+HLqC. (3.50)

    Combining (1.1)4, (3.3), (3.8) and (3.18) yields

    HH2+H2L4CuL4+CHtL2+C. (3.51)

    Furthermore, it can be deduced from (3.8), (3.24), (3.50) and (3.51) that

    uL4Cρ1/2˙uL2+CθL2+CHtL2+C. (3.52)

    Then, according to (3.11) and Sobolev inequality, we get

    θ2Lε2θ2L2+Cθ2L2+C, (3.53)

    which combined with (1.1)3, (3.8), and choosing ε suitably small yield

    θ2H2Cρ1/2˙θ2L2+Cθ2L2+Cu4L4+CH4L4+C. (3.54)

    Therefore, the combination of (3.51) and (3.52) yields

    sup0st(θLr+θL2+uL4+HH2+HL4)C,r1. (3.55)

    Together with (3.53) and (3.54) gives

    sup0st(θH2+θL)C. (3.56)

    Now, we bound ρW1,q and uH2. For r[2,q], it holds that

    ddtρLrCρLr(uL+1)+C2uLrCρLr(vL+wL+1)+C2vLr+C2wLrCρLr(vL+wL+1)+C2wLr+C, (3.57)

    where in the last inequality one has applied the following fact

    2vLrCρLr+C. (3.58)

    Taking (3.2), (3.56), (3.58) and Lemmas 2.2–2.4, we get

    vLCln(e+ρLr)+C. (3.59)

    Putting (3.59) into (3.57), it can be deduced from Gronwall inequality that

    ρLrC. (3.60)

    Finally, let r=2 in (3.60), according to Lemma 2.2, (3.50), (3.55) and (3.58) yields

    uH2C. (3.61)

    Therefore, together with (3.55), (3.56), (3.60) and (3.61), we get (3.49). The proof of Lemma 3.4 is completed.

    With the priori estimates in Lemmas 3.1–3.4, we can prove Theorem 1.2.

    Proof of Theorem 1.2. Assume that (1.16) is false, namely, (3.1) holds. Notice that the general constant C in Lemmas 3.1–3.4 is independent of t, that is, all the priori estimates attained in Lemmas 3.1–3.4 are uniformly bounded for any tT. Therefore, the function

    (ρ,θ,u,H)(x,T)limtT(ρ,θ,u,H)(x,t)

    satisfies the initial conditions (1.12) at t=T.

    Due to

    (ρ˙u,ρ˙θ)(x,T)=limtT(ρ˙u,ρ˙θ)L2,

    therefore

    μu(μ+λ)divu+R(ρθ)HH+12|H|2|t=T=ρ1/2(x,T)g1(x),κθ2μ|D(u)|2λ(divu)2ν(curlH)2|t=T=ρ1/2(x,T)g2(x),

    with

    g1(x){ρ1/2(x,T)(ρ˙u)(x,T),forx{x|ρ(x,T)>0},0,forx{x|ρ(x,T)=0},

    and

    g2(x){ρ1/2(x,T)(cvρ˙θ+Rθρdivu)(x,T),forx{x|ρ(x,T)>0},0,forx{x|ρ(x,T)=0},

    satisfying g1,g2L2. Thus, (ρ,θ,u,H)(x,T) also satisfies (1.13) and (1.14).

    Hence, Theorem 1.1 shows that we could extend the local strong solutions beyond T, while taking (ρ,θ,u,H)(x,T) as the initial data. This contradicts the hypothesis of Theorem 1.2 that T is the maximum existence time of the strong solution. This completes the proof of theorem 1.2.

    This paper concerns the blow-up criterion for the initial boundary value problem of the two-dimensional full compressible magnetohydrodynamic equations in the Eulerian coordinates. When the initial density allowed to vanish, and the magnetic field H satisfies the perfect conducting boundary condition Hn=curlH=0, we prove the blow-up criterion limtT(HL(0,t;Lb)+divuL1(0,t;L))= for any b>2, which depending on both H and divu.

    The author sincerely thanks the editors and anonymous reviewers for their insightful comments and constructive suggestions, which greatly improved the quality of the paper. The research was partially supported by the National Natural Science Foundation of China (No.11971217).

    The author declares no conflict of interest in this paper.



    [1] R. Bellman, The theory of dynamic programming, Bull. Amer. Math. Soc., 1954, 503–515.
    [2] O. Hernández-Lerma, J. B. Lasserre, Discrete-time Markov control processes: basic optimality criteria, Vol. 30, New York: Springer Science & Business Media, 2012.
    [3] O. Hernández-Lerma, J. B. Lasserre, Further topics on discrete-time Markov control processes, Vol. 42, New York: Springer Science & Business Media, 2012.
    [4] O. Hernández-Lerma, L. R. Laura-Guarachi, S. Mendoza-Palacios, A survey of AC problems in deterministic discrete-time control systems, J. Math. Anal. Appl., 522 (2023), 126906. https://doi.org/10.1016/j.jmaa.2022.126906 doi: 10.1016/j.jmaa.2022.126906
    [5] S. P. Meyn, R. L. Tweedie, Markov chains and stochastic stability, 1 Ed., New York: Springer London, 1993. https://doi.org/10.1007/978-1-4471-3267-7
    [6] W. A. Brock, L. J. Mirman, Optimal economic growth and uncertainty: the discounted case, J. Econ. Theory, 4 (1979), 479–513. https://doi.org/10.4337/9781782543046.00008 doi: 10.4337/9781782543046.00008
    [7] W. A. Brock, L. J. Mirman, Optimal economic growth and uncertainty: the no discounting case, Int. Econ. Rev., 14 (1973), 560–573. https://doi.org/10.2307/2525969 doi: 10.2307/2525969
    [8] V. Gaitsgory, A. Parkinson, I. Shvartsman, Linear programming based optimality conditions and approximate solution of a deterministic infinite horizon discounted optimal control problem in discrete time, arXiv Preprint, 2017. https://doi.org/10.48550/arXiv.1711.00801
    [9] O. L. V. Costa, F. Dufour, Average control of Markov decision processes with Feller transition probabilities and general action spaces, J. Math. Anal. Appl., 396 (2012), 58–69. https://doi.org/10.1016/j.jmaa.2012.05.073 doi: 10.1016/j.jmaa.2012.05.073
    [10] S. P. Meyn, The policy iteration algorithm for average reward Markov decision processes with general state space, IEEE Trans. Automat. Control, 42 (1997), 1663–1680. https://doi.org/10.1109/9.650016 doi: 10.1109/9.650016
    [11] R. Bellman, Dynamic programming, Science, 153 (1966), 34–37. https://doi.org/10.1126/science.153.3731.34
    [12] R. A. Howard, Dynamic programming and markov processes, MIT Press, 1960, 46–69. https://doi.org/10.1086/258477
    [13] S. Dai, O. Menoukeu-Pamen, An algorithm based on an iterative optimal stopping method for Feller processes with applications to impulse control, perturbation, and possibly zero random discount problems, J. Comput. Appl. Mathe., 421 (2023), 114864. https://doi.org/10.1016/j.cam.2022.114864 doi: 10.1016/j.cam.2022.114864
    [14] E. A. Feinberg, Y. Liang, On the optimality equation for average cost Markov decision processes and its validity for inventory control, Ann. Ope. Res., 317 (2022), 569–586. https://doi.org/10.1007/s10479-017-2561-9 doi: 10.1007/s10479-017-2561-9
    [15] Z. Yu, X. Guo, L. Xia, Zero-sum semi-Markov games with state-action-dependent discount factors, Discrete Event Dyn. Syst., 32 (2022), 545–571. https://doi.org/10.1007/s10626-022-00366-4 doi: 10.1007/s10626-022-00366-4
    [16] S. He, H. Fang, M. Zhang, F. Liu, Z. Ding, Adaptive optimal control for a class of nonlinear systems: the online policy iteration approach, IEEE Trans. Neural Netw. Learn. Syst., 31 (2019), 549–558. https://doi.org/10.1109/TNNLS.2019.2905715 doi: 10.1109/TNNLS.2019.2905715
    [17] H. Fang, M. Zhang, S. He, X. Luan, F. Liu, Z. Ding, Solving the zero-sum control problem for tidal turbine system: an online reinforcement learning approach, IEEE Trans. Cybern., 53 (2023), 7635–7647. https://doi.org/10.1109/TCYB.2022.3186886 doi: 10.1109/TCYB.2022.3186886
    [18] H. Fang, Y. Tu, H. Wang, S. He, F. Liu, Z. Ding, et al., Fuzzy-based adaptive optimization of unknown discrete-time nonlinear Markov jump systems with off-policy reinforcement learning, IEEE Trans. Fuzzy Syst., 30 (2022), 5276–5290. https://doi.org/10.1109/TFUZZ.2022.3171844 doi: 10.1109/TFUZZ.2022.3171844
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(641) PDF downloads(70) Cited by(0)

Figures and Tables

Figures(2)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog