Loading [MathJax]/jax/output/SVG/jax.js
Research article Special Issues

A Pontryagin maximum principle for terminal state-constrained optimal control problems of Volterra integral equations with singular kernels

  • We consider the terminal state-constrained optimal control problem for Volterra integral equations with singular kernels. A singular kernel introduces abnormal behavior of the state trajectory with respect to the parameter of α(0,1). Our state equation covers various state dynamics such as any types of classical Volterra integral equations with nonsingular kernels, (Caputo) fractional differential equations, and ordinary differential state equations. We prove the maximum principle for the corresponding state-constrained optimal control problem. In the proof of the maximum principle, due to the presence of the (terminal) state constraint and the control space being only a separable metric space, we have to employ the Ekeland variational principle and the spike variation technique, together with the intrinsic properties of distance function and the generalized Gronwall's inequality, to obtain the desired necessary conditions for optimality. The maximum principle of this paper is new in the optimal control problem context and its proof requires a different technique, compared with that for classical Volterra integral equations studied in the existing literature.

    Citation: Jun Moon. A Pontryagin maximum principle for terminal state-constrained optimal control problems of Volterra integral equations with singular kernels[J]. AIMS Mathematics, 2023, 8(10): 22924-22943. doi: 10.3934/math.20231166

    Related Papers:

    [1] Guowei Wang, Yan Fu . Spatiotemporal patterns and collective dynamics of bi-layer coupled Izhikevich neural networks with multi-area channels. Mathematical Biosciences and Engineering, 2023, 20(2): 3944-3969. doi: 10.3934/mbe.2023184
    [2] M. B. A. Mansour . Computation of traveling wave fronts for a nonlinear diffusion-advection model. Mathematical Biosciences and Engineering, 2009, 6(1): 83-91. doi: 10.3934/mbe.2009.6.83
    [3] Xixia Ma, Rongsong Liu, Liming Cai . Stability of traveling wave solutions for a nonlocal Lotka-Volterra model. Mathematical Biosciences and Engineering, 2024, 21(1): 444-473. doi: 10.3934/mbe.2024020
    [4] Yun Hu, Qianqian Duan . Solving the TSP by the AALHNN algorithm. Mathematical Biosciences and Engineering, 2022, 19(4): 3427-3448. doi: 10.3934/mbe.2022158
    [5] Youqiong Liu, Li Cai, Yaping Chen, Bin Wang . Physics-informed neural networks based on adaptive weighted loss functions for Hamilton-Jacobi equations. Mathematical Biosciences and Engineering, 2022, 19(12): 12866-12896. doi: 10.3934/mbe.2022601
    [6] Tong Li, Zhi-An Wang . Traveling wave solutions of a singular Keller-Segel system with logistic source. Mathematical Biosciences and Engineering, 2022, 19(8): 8107-8131. doi: 10.3934/mbe.2022379
    [7] Reymundo Itzá Balam, Francisco Hernandez-Lopez, Joel Trejo-Sánchez, Miguel Uh Zapata . An immersed boundary neural network for solving elliptic equations with singular forces on arbitrary domains. Mathematical Biosciences and Engineering, 2021, 18(1): 22-56. doi: 10.3934/mbe.2021002
    [8] Wenhao Chen, Guo Lin, Shuxia Pan . Propagation dynamics in an SIRS model with general incidence functions. Mathematical Biosciences and Engineering, 2023, 20(4): 6751-6775. doi: 10.3934/mbe.2023291
    [9] Sergei Trofimchuk, Vitaly Volpert . Traveling waves in delayed reaction-diffusion equations in biology. Mathematical Biosciences and Engineering, 2020, 17(6): 6487-6514. doi: 10.3934/mbe.2020339
    [10] Guo Lin, Shuxia Pan, Xiang-Ping Yan . Spreading speeds of epidemic models with nonlocal delays. Mathematical Biosciences and Engineering, 2019, 16(6): 7562-7588. doi: 10.3934/mbe.2019380
  • We consider the terminal state-constrained optimal control problem for Volterra integral equations with singular kernels. A singular kernel introduces abnormal behavior of the state trajectory with respect to the parameter of α(0,1). Our state equation covers various state dynamics such as any types of classical Volterra integral equations with nonsingular kernels, (Caputo) fractional differential equations, and ordinary differential state equations. We prove the maximum principle for the corresponding state-constrained optimal control problem. In the proof of the maximum principle, due to the presence of the (terminal) state constraint and the control space being only a separable metric space, we have to employ the Ekeland variational principle and the spike variation technique, together with the intrinsic properties of distance function and the generalized Gronwall's inequality, to obtain the desired necessary conditions for optimality. The maximum principle of this paper is new in the optimal control problem context and its proof requires a different technique, compared with that for classical Volterra integral equations studied in the existing literature.



    In this paper, we consider the following bistable three-species competition systems.

    ut=d1uxx+a1u(1ub2v),vt=d2vxx+a2v(1vb1ub3w),wt=d3wxx+a3w(1wb2v), where(u,v,w)(0,x)=(u0(x),v0(x),w0(x)){(u,v,w)asx,(u+,v+,w+)asx. (1.1)

    The above system describes the population distribution of predators over time when three predators compete for two prey. u,v, and w represent the nonnegative population distributions of the three predators. Three distributions usually have values less than 1. {ai}3i=1 and {di}3i=1 denote each predator's diffusion coefficient and net growth rate, respectively. Finally, {bi}3i=1 refers to the proportion of population decline caused by one predator competing with another. In this system, we assume that predators corresponding to u and w do not compete with each other.

    We aim to approximate the monotone traveling wave solution for the above system through the definite integral of positive functions. The traveling wave solution, a particular solution of partial differential equations (PDE), may exist in several forms for a single equation because there is no constraint on the wave speed. The most common case is when traveling wave solutions always exist for speeds higher than a specific value, called minimum wave speed. The exact minimum wave speed was revealed in the case of the Keller-Segel model with logistic growth dynamics [1]. The authors in [2] investigated the presence of a minimum wave speed and a sharp bound for a non-KPP type reaction-diffusion system. The authors in [3] identified the necessary and sufficient conditions for the maximum delay time to assure the existence of the traveling wave solution to the Lotka-Volterra competition system. It turns out that only monotone solutions are possible in this system.

    In contrast to the previous examples, there are several equations where a solution and wave speed are uniquely determined. A monotone traveling wave solution exists uniquely for the Keller-Segel model with a chemotactic sensitivity term in the form of a logarithmic function [4]. The same results hold for the Allen-Cahn model with relaxation terms and Lotka Volterra competition diffusion system [5,6]. When several functions can satisfy constraints, it is challenging to predict to which solution the trained neural network will converge. Since the three-species competition systems covered in this paper ensure the uniqueness of the solution under the assumption of monotonicity, we solve the well-posed forward-inverse problem that simultaneously approximates a solution and speed.

    For all positive integers n, shallow neural networks without bias are dense in Cn(), representing the set of functions whose all n th partial derivatives are continuous [7]. It suggests that neural networks can approximate the PDE or ODE solution with an arbitrarily small error. Modifying a fully connected neural network can also impose boundedness or monotonicity on neural networks. Therefore, the deep neural network (DNN) approach becomes a flexible method for solving differential equations with additional constraints.

    The authors in [8] presented a physics informed neural networks (PINN) in which neural networks simultaneously learn solutions and coefficients of partial differential equations. They employed automatic differentiation [9] in computing the derivatives of neural networks and optimized the mean squared loss corresponding to initial and boundary conditions. Since it can estimate a high-order differential coefficient of neural networks at all points, the method does not have a constraint to be trained on a fixed grid.

    Several recent studies have shown that minimizing the loss function designed above is equivalent to finding an approximator near a solution in terms of pointwise error. The authors in [10] first proved the existence of a neural network with a sufficiently small loss for quasilinear parabolic partial differential equations. Assuming the regularity of the solution, they concluded that the neural network eventually converges to the solution when the residual loss decreases to zero. Subsequently, studies showing that the neural network approximation sequence converges to a solution of PDE through the Gronwall inequality have emerged. The authors in [11] discussed the 1-dimensional kinetic Fokker-Planck equation for inflow and specular boundary conditions. The second-order parabolic equation of the zero Dirichlet boundary problem was analyzed in [12]. The authors in [13] employed monotonicity to set the point where the solution necessarily passes and perform estimates of traveling wave solutions for various equations. Furthermore, estimates for the unique continuation problem, including additional observation of the ill-posed forward problem, have also been recently studied. The authors in [14] identified the number of training points required to reduce residual loss, considering the errors which arise from the quadrature rule.

    In Section 2, we introduce the structure of a fully connected neural network model. We construct a residual loss to approximate the solution of the system and describe the monotonic neural network. In Section 3, existing theoretical results are first provided, such as the uniqueness of the traveling wave solution and the sign of wave speed. Then we prove that our model can minimize the loss function and that the sequence of DNN solutions approaches the actual solution. In section 4, corresponding experiments of several methods are addressed, that impose monotonicity. We offer the shortcomings of each method and experimental results, which prove that our model can overcome them. Finally, we conclude the article by summarizing the results and discussing how to supplement the paper and develop future research topics.

    In this section, we describe our neural network model to approximate the solutions of the system and the wave speed. The construction of the loss function is fundamentally based on PINN[8] and follows the detailed settings in [13].

    Consider the system (1.1) with the traveling wave ansatz

    (u(t,x),v(t,x),w(t,x))=(U(z),v(z),W(z)),z=xst,

    where s denotes the wave speed. Then, traveling wave solution (U,V,W)(z) of the system (1.1) with the boundary condition (1.2) is a solution of the following system.

    sUz=d1Uzz+a1U(1Ub2V),sVz=d2Vzz+a2V(1Vb1Ub3W),sWz=d3Wzz+a3W(1Wb2V), (2.1)
    where(U,V,W)(z)(u,v,w)asz,(U,V,W)(z)(u+,v+,w+)asz. (2.2)

    In this paper, we set the two constant states (u,v,w) and (u+,v+,w+) in the asymptotic boundary conditions (2.2) to (1,0,1) and (0,1,0) respectively. If the sign of wave speed s is positive, the predator corresponding to V disappears, and the predators corresponding to U and W, which do not compete with each other, survive. In the opposite case, only the predator corresponding to V survives. It should be noted that in this research, we are only concerned with the study of monotone traveling wave solutions. That is, Uz<0,Vz>0,Wz<0.

    The fully connected neural network Unn, an approximator of the solution U(t,x), is formulated as a recurrence relation with a linear affine transformation and a nonlinear activation function σ.

    Unn(t,x)=WLσ(WL1σ(W1(xsnnt)+b1)+bL1)+bL.

    {WiRhi×Rhi+1}Li=1 denotes the sequence of matrices where {hi} denotes the number of hidden neurons in each layer. Since h1 and hL+1 denote the dimension of input and output respectively, they are set to 1. snn is an approximator of the wave speed s in our method, which varies by optimizing the loss functions provided below. The other two networks, Vnn and Wnn, are defined similarly.

    Now we define the following three operators P(U,V,W;s),Q(U,V,W;s) and R(U,V,W;s).

    P(U,V,W;s):=sUz+d1Uzz+a1U(1Ub2V),Q(U,V,W;s):=sVz+d2Vzz+a2V(1Vb1Ub3W),R(U,V,W;s):=sWz+d3Wzz+a3W(1Wb2V).

    We use the truncated interval as the domain instead because we cannot consider all the points on a infinite real line. For a sufficiently large interval [L,L], the residual loss function for each governing equation in (2.1) is defined by

    Loss(1)GE=LL(P(Unn,Vnn,Wnn;snn))2dzi(P(Unn(zi),Vnn(zi),Wnn(zi);snn))2,Loss(2)GE=LL(Q(Unn,Vnn,Wnn;snn))2dzi(Q(Unn(zi),Vnn(zi),Wnn(zi);snn))2,Loss(3)GE=LL(R(Unn,Vnn,Wnn;snn))2dzi(R(Unn(zi),Vnn(zi),Wnn(zi);snn))2,LossGE=Loss(1)GE+Loss(2)GE+Loss(3)GE,

    where {zi}i represents the set of sample points over the spatial domain used for computing the integral expression of the mean squared loss. At each iteration of the training process, new points are randomly selected for use in the Monte Carlo integration method.

    It is also challenging to consider asymptotic boundary conditions on the real line. Therefore, we assign boundary conditions to both endpoints of the truncated domain instead. The residual boundary loss function is formulated as follows.

    Loss(1)limit=(Unn(L)u)2+(Unn(L)u+)2,Loss(2)limit=(Vnn(L)v)2+(Vnn(L)v+)2,Loss(3)limit=(Wnn(L)w)2+(Wnn(L)w+)2,Losslimit=Loss(1)limit+Loss(2)limit+Loss(3)limit.

    In this paper, we propose a Neumann boundary condition the solution inherently satisfies. Multiplying both sides of the first equation in (2.1) by Uz and integrating them over a truncated interval [c,b], we can derive the convergence of Uz through the following computation. First, the last term on the right side of the equation must be finite when b goes to infinity.

    bcUUzU2Uzb2UUzVdz=12(U2(b)U2(c))13(U3(b)U3(c))bcb2UUzVdz,

    where last integral must be finite since V(z) should be less than 1 and UUz is a negative integrable function. Therefore, the definite integral bcsU2z+d1UzUzzdz=bcsU2zdz+d1(U2z(b)U2z(c))/2 must converge.

    When we consider the asymptotic boundary condition, the only possibility is that Uz(b) converges to zero regardless of the sign of s. For s=0, the convergence of Uz(b) permits the case only when Uz(+)=0. When s is positive, the increasing function bcsU2zdz should converge so that 1/2Uz(b)2 also converge. For negative s, we focus on the first equation of (2.1). Since sU(z)+d1U(z) goes to zero, there exists an arbitrary small ϵ>0 such that for all sufficiently large z,

    sU(z)+d1U(z)<d12ϵ.

    Suppose that there exists large z1 such that U(z1)<(d1/s)ϵ. Then, the inequality d1ϵ+d1U(z1)<sU(z1)+d1U(z1)<d12ϵ results in the negativity of U(z1). This contradicts to the asymptotic boundary condition (2.2) since U(z) cannot maintain values indefinitely below a certain negative constant. Therefore, we put forward the following Neumann boundary condition (2.3).

    limξUz(ξ)=limξVz(ξ)=limξWz(ξ)=limξUz(ξ)=limξVz(ξ)=limξWz(ξ)=0. (2.3)

    The loss function for the boundary condition of the derivative is defined as follows.

    Loss(1)BC=(Unnz(L))2+(Unnz(L))2,Loss(2)BC=(Vnnz(L))2+(Vnnz(L))2,Loss(3)BC=(Wnnz(L))2+(Wnnz(L))2,LossBC=Loss(1)BC+Loss(2)BC+Loss(3)BC.

    We note that the translation of a traveling wave solution is still a traveling wave solution. To consider a wide interval on the real line in both directions, it is necessary to fix a point through which the solution must pass. If the monotone function's left and right limits on the real line are 0 and 1 respectively, the range of the function must contain 1/2R. For the symmetric interval [L,L] about the origin, it would be natural to enforce a value of 1/2 at x=0.

    Losstrans=(Unn(0)u+u+2)2=(Unn(0)12)2.

    Finally, we optimize the total loss function aggregating the above losses. Changes in weights included in neural networks follow the ADAM[15] based on gradient descent.

    Losstotal=LossGE+Losslimit+LossBC+Losstrans.

    The original monotonic network in [16] was constructed by combining positive constrained weights and min-max operation. In the case of simply forcing the positivity of the weights, a neural network with a popular activation function, such as ReLU[17] or ELU[18], can only represent a convex function [19]. In contrast, the above min-max monotonous neural network has universal properties for monotone functions in C1([0,1]D;R). Some studies propose a method of considering the regularizer for monotonicity[20]. The authors in [21] revealed that in multi-dimensional spaces, the upper envelope of the function must be a monotone function. They employed a monotone function in prediction by directly constructing the upper envelope of the counterexample.

    In this paper, we transform the unconstrained monotonic neural network [22], which devised the integration of the positive function. By replacing numerical integration with DeepONet-based operator learning without using the quadrature rule, the neural network becomes a smoother function suitable for representing solutions to differential equations.

    We generate m variables {pi}mi=1 to represent the values of a positive function at m positions {xi}mi=1. To find a fully connected neural network representing the primitive function, we perform operator learning for the integration based on DeepONet. The overall structure of our neural network model is formulated as follows.

    Umon(z):=Branch Net(p1,,pm),Trunk Net(z)

    It is worth noting that the structure of this method is the same as that of DeepONet, which is capable of learning general operators. In this research, it is specifically designed to learn integral operators. We construct a monotonic neural network model Umon as an inner product of Branch Net and Trunk Net, which are fully connected neural networks in Section 2.1. Branch Net receives a discretization of function as an input to determine which positive function to integrate. Trunk Net takes the point of the domain as input and queries the location of interest. Due to the high expressivity for compact operators, combining the two networks will improve the approximation possibility for monotone functions. We remark that Branch Net and Trunk Net are called shallow networks when they are two-layered fully connected neural networks (i.e. L=2).

    In this section, we introduce existing theories about the system and provide results on the convergence of our model. The first part discusses the assumption that guarantees uniqueness other than the initial and boundary conditions. The following part addresses the theory obtained based on DeepONet's universality [23] and Grönwall inequality.

    In the case of an original bistable three-species competition System (1.1), we cannot guarantee the monotonicity and uniqueness of traveling wave solutions. The authors in [24] additionally assume one of the two following conditions.

    (1)U<0,V>0,W<0,or(2)0<U,V,W<1.

    It may be straightforward to show that the former condition implies the latter when U, V, W are traveling wave solutions of the system (1.1). The authors in [24] remarked that the above two inequalities are indeed equivalent conditions. Under the above assumption, the following theorem states the uniqueness of a traveling wave solution.

    Theorem 1. (Thm 3.1 in [24]) The three-species competition system (1.1) has a traveling wave solution that is unique up to translation with the unique wave speed.

    In several cases, the sign of wave speed was revealed without knowing the exact value. The theorem below presents that U and W should become extinct when the predators corresponding to U and W are relatively less competitive than the predator corresponding to V.

    Theorem 2. (Thm 3.2 in [24]) Suppose that a1=d1,a3=d3,b3=b1. If 0<b1<1<2b2, then the sign of wave speed s for the system (1.1) is negative.

    Similar to the above, there is a theorem describing the sign of wave speed for various parameter settings. In particular, the theorem provides the presence of a solution with a wave speed of zero. If the condition b2=2b1 is satisfied, it will be used as a validation example to verify the accuracy of proposed method.

    Theorem 3. (Thm 3.4 in [24]) Suppose that a1=d1=a3=d3 and 0<b1<1<b2<2. Then, the sign of wave speed s for the system (1.1) is the same as the sign of (2b1b2). In particular, the wave speed s is precisely zero when b2 is identical to 2b1.

    We now introduce theoretical results on the performance of our model when we optimize the loss introduced in Section 2.1. Our monotonic neural network in Section 2.2 replaces the original fully connected neural network. When approximating each traveling wave solution, we maintain the weights of Branch Net and Trunk Net, and only a sequence {pi}mi=1 depends on the solution. In the above situation, our model capabilities embrace uniform convergence.

    Corollary 1. Assume that an activation function σ is continuous and non-polynomial. Let S(U)C1([L,L]) denote the set which contains traveling wave solutions U of the first predator in the system (2.1). Suppose the set of derivatives D(U) of elements in S(U) is compact in C([L,L]). Then for any arbitrary small ϵ>0, there exists a large positive integer m with shallow Branch Net and Trunk Net such that for any US(U), our monotonic neural network Umon(z)(= Branch Net (p1,,pm), Trunk Net (z)) with an appropriate sequence {pi}mi=1 satisfies the following.

    Umon(z)U(z)L([L,L])<ϵ.

    Proof. The compactness of the set of derivatives implies the compactness of S(U) since the integral operator is continuous when the domain of functions is a finite closed interval. Define a operator G:D(U)S(U) as G(f)(z)=zLfdz for z[L,L]. As in Theorem 5 in [23], we can choose m grid points {xi}mi=1 for G which guarantees

    Branch Net(f(x1),,f(xm)),Trunk Net(z)G(f)(z)L([L,L])<ϵ,fD(U).

    We then achieve the desired result by setting f and {f(xi)}mi=1 to U(z) and {pi}mi=1, respectively.

    Remark 1. As a representative example of D(U) that satisfies the requirement, we have considered the case of a finite-dimensional space S(U) with bounded properties, which can be achieved through the use of sequentially compact bounded intervals. However, it should be noted that due to the unbounded differential values of the monotone traveling wave solution, it is not possible to search the entire traveling wave solution within the solution space using a single DeepONet. The size of D(U) is determined by the user of the theorem. The assumption becomes necessary from the fact that a single DeepONet structure, comprising of parameters with appropriate values, was employed to approximate the integration of multiple functions.

    Proposition 1 suggests that if DeepONet approximates the integral operator with proper performance, our monotonic neural network does not deviate significantly from the possible solutions. In other words, we can assume the boundness of the approximator when the loss for operator learning becomes sufficiently reduced. Given the optimized residual loss with the assumption, we estimated the pointwise error for the solution of the approximator. Note that the same results hold for fully connected neural networks where smon denotes our method's approximation of speed s.

    Theorem 4. Suppose that our monotonic networks Umon,Vmon,Wmon are bounded by a constant C. Denote |(U(z),V(z),W(z))(Umon(z),Vmon(z),Wmon(z))| by E(z). Then, the following holds.

    E(z)K1 exp (K2(z+L)),z[L,L],

    where K1,K2 are constants depending on E(L), LossGE, LossBC,Losslimit, L, s, smon, and the parameters of the system (2.1).

    Proof. Integrating the first equation of the system (2.1), we derive the followings.

    U(z)=sd1U(z)+sd1U(L)a1d1zLU(1Ub2V)dz:=F(z,U),V(z)=sd2V(z)+sd2V(L)a2d2zLV(1Vb1Ub3W)dz:=G(z,V),W(z)=sd3W(z)+sd3W(L)a3d3zLW(1Wb2V)dz:=H(z,W).

    With the residual loss for governing equation, our monotonic neural network satisfies a similar system above. The detailed form of a system can be written as follows.

    (Umon)(z)=smond1Umon(z)+smond1(Umon)(L)a1d1zLUmon(1Umonb2Vmon)dz+1d1zLf(y)dy:=Fmon(z,Umon),(Vmon)(z)=smond2Vmon(z)+smond2(Vmon)(L)a2d2zLVmon(1Vmonb1Umonb3Wmon)dz+1d2zLg(y)dy:=Gmon(z,Vmon),(Wmon)(z)=smond3Wmon(z)+smond3(Wmon)(L)a3d3zLWmon(1Wmonb2Vmon)dz+1d3zLh(y)dy:=Hmon(z,Wmon),

    where LL{f(y)}2+{g(y)}2+{h(y)}2dy=LossGE as in Section 2.1. We now leverage the boundedness of the solutions and triangular inequalities to estimate how similarly U and Umon follow the system of ordinary differential equations. First of all, the difference in the first equation of the system can be decomposed into three terms as follows.

    F(z,U)F(z,Umon)=sd1(UUmon)(z)+sd1(U(L)(Umon)(L))a1d1zL(UUmon(U2(Umon)2)b2UV+b2UmonVmon)dz:=f1+f2+f3

    The last term f3 can be bounded by a constant via the boundedness of solutions.

    |f3|a1d1zL|UUmon|+|U2(Umon)2|+b2|UVUmonVmon|dza1d1zL|UUmon|+|UUmon||U+Umon|+b2|UUmon||V|+b2|Umon||VVmon|dz2L(a1d1)((C+1)+(C+1)2+b2(C+1)2)

    Therefore, we complete the estimates of |F(z,U)F(z,Umon)| by triangular inequalities.

    |F(z,U)F(z,Umon)|sd1|UUmon|(z)+sd1|U(Umon)|(L)+2L(a1d1)((C+1)+(C+1)2+b2(C+1)2)

    Estimates of the second and third equations can be obtained by using a similar manner. We combine all the results to obtain the following inequality,

    |(F(z,U),G(z,V),H(z,W))(F(z,Umon),G(z,Vmon),H(z,Wmon))|C1|(U(z),V(z),W(z))(Umon(z),Vmon(z),Wmon(z))|+C2,

    where a constant C1 depends on s and the diffusion coefficients of system and a constant C2 depends on the parameters of system, |U(Umon)|(L), L and C.

    Finally, we would like to complete the proof by presenting the upper bound for the growth of the error function E(z).

    E(z)E(L)=zL|(F(y,U),G(y,V),H(y,W))(Fmon(y,Umon),Gmon(y,Vmon),Hmon(y,Wmon))|dyzL|(F(y,U),G(y,V),H(y,W))(F(y,Umon),G(y,Vmon),H(y,Wmon))|dy+zL|(F(y,Umon),G(y,Vmon),H(y,Wmon))(Fmon(y,Umon),Gmon(y,Vmon),Hmon(y,Wmon))|dyzLC1E(y)+C2dy+zL(f(y))2d1+(g(y))2d2+(h(y))2d3+C3|ssmon|dy

    where C3 is a constant depending on C, (Umon)(L), and the diffusion coefficients of the system. {Therefore, we can get the following inequalities.

    E(z)E(L)+zLC1E(y)+C2dy+zL(f(y))2d1+(g(y))2d2+(h(y))2d3+C3|ssmon|dyE(L)+2LC2+LL(f(y))2d1+(g(y))2d2+(h(y))2d3+C3|ssmon|dy+zLC1E(y)dy

    By the Grönwall's inequality in section 17.3 of [25] with two constants E(L)+2LC2+LL(f(y))2d1+(g(y))2d2+(h(y))2d3+C3|ssmon|dy and C1,

    E(z)(E(L)+2LC2+LL(f(y))2d1+(g(y))2d2+(h(y))2d3+C3|ssmon|dy)exp(C1(z+L))

    We remark that Grönwall's inequality can be modified by considering the opposite direction. The estimate above states that the trained neural network effectively represents the solution near the boundary, assuming that smon approximates s within a small error. We conclude this section by introducing the existence of our monotonic network model that reduces the Losstotal for the solution of the finite-dimensional space.

    Theorem 5. Assume that an activation function σ is continuous and non-polynomial. Consider a finite-dimensional space OC2([L,L]) which is bounded. Then for any positive ϵ, there exists a large positive integer m with shallow Branch Net and Trunk Net such that for any traveling wave solution (U,V,W) of 2.1 where U,V,WO, our monotonic neural network Umon(z)(= Branch Net (p1,,pm), Trunk Net (z)) with an appropriate sequence {pi}mi=1 satisfy the following.

    Losstotal<ϵ.

    Proof. For any ϵ>0, we first note that Trunk Net learns the basis of the function space, and Branch Net learns the coefficients for each basis. Denote the ornormal basis of O by {oi}ni=1. By universal properties of neural network in [7], there exists a sequence of neural networks {NNi}ni=1 such that

    ni=12k=0dkdzk(oi)dkdzk(NNi)L([L,L])ϵ.

    Suppose that Trunk Net represents the NNi. Then by the Theorem 5 in [23], we can choose m values {pi}mi=1 as in the proof of proposition 1 which guarantees

     Branch Net (p1,,pm), Trunk Net (z)U(z)L([L,L])<ϵ,UO.

    Since {NNi}ni=1 is the approximator of {oi}ni=1 in C2([L,L]), the above inequality implies that Branch Net estimates the appropriate coefficients, resulting the uniform convergence in C2([L,L]) for all UO.

    If (Umon,Vmon,Wmon) are efficient approximations of (U,V,W) in C2([L,L]), Losslimit, LossBC and Losstrans should be small (t.e. less than a constant multiple of ϵ). {For the term LossGE, we only consider the loss function Loss(1)GE for the first equation in the governing equation. For the traveling wave solution (U,V,W) of the system (2.1),

    sUmonz+d1Umonzz+a1Umon(1Umonb2Vmon)=sUmonz+d1Umonzz+a1Umon(1Umonb2Vmon)(sUz+d1Uzz+a1U(1Ub2V))=s(UmonzUz)+d1(UmonzzUzz)+a1(UmonU)a1((Umon)2U2)a1b2(UmonVmonUV).

    Since we may assume the boundedness of (Umon,Vmon,Wmon) by Corollary 1, we can decompose the nonlinear product term in the system (2.1) as |UVUmonVmon||UUmon||V|+|VVmon||Umon|. By observing this type of triangular inequality, we can conclude that the convergence in C2([L,L]) indeed guarantees an arbitrary small LossGE.

    In this section, we present experimental results that our model can be more accurately trained than previous min-max monotonic networks [16] or fully connected neural networks. Before interpreting the tables and figures, we describe experimental settings in detail. In all cases, the ADAM optimization algorithm, as outlined in the literature by Kingma and Ba[15], was utilized with a learning rate of 1e-2 to minimize the loss function. By employing the StepLR scheduler, the learning rate decreased by 0.9 per 300 epochs. All experimental results were adopted using an optimal model based on the value of the loss function in training up to 10000 epochs. The 5-layered neural network has 48 hidden neurons for each hidden layer. The initial weights were set in accordance with the default initialization method provided by PyTorch.

    Training of the DeepONet in Section 2.2 utilized the 20th-order Chebyshev polynomials. The model learns the integral of Chebyshev polynomials in the closed interval [0, 1] and derives the primitive function on [-50, 50] using a linear transformation. For the discretization of the function on 300 points, a dataset was generated by odeint implemented in the module named Scipy in Python. Trunk Net and Branch Net are 5-layer neural networks with a width of 256 and a latent dimension of 64. For the integration, we used an Adam optimizer and StepLR scheduler with a learning rate of 1e-3 and a rate of 0.9.

    If we optimize only the residual loss of Section 2.1 without any restrictions, the fully connected neural network proceeds with unstable training. Figure 1 shows the results of the convergence of the fully connected neural network for three different seeds provided by Pytorch and Numpy. Theorem 3 implies that the wave speed is precisely zero for (a1,d1,a3,d3,b1,b2,b3)=(1,1,1,0.6,1.2,0.6). Although the fully connected neural network seems to extract the corresponding wave speed in the second experiment, the approximations do not have monotonicity. In the case of the first and third experiments, as seen in Table 1, the model achieves the approximations with a small loss function. This result raises the possibility that the system without monotonicity may admits another solution.

    Figure 1.  Approximations of the solutions to the system (2.1) via three neural networks.
    Table 1.  Experimental results of three neural network models for (a1,d1,a3,d3,b1,b2,b3)=(1,1,1,1,0.6,1.2,0.6).
    Exact Speed(0) Fully Connected Neural Network min-max Monotonic Network Our model
    Experiment 1 Experiment 2 Experiment 3
    Estimated Speed 4.36e-01 -5.84e-02 -1.80e-00 -9.80e-02 2.61e-03
    Loss 1.66e-07 1.77e-03 3.66e-07 2.38e-01 5.60e-06

     | Show Table
    DownLoad: CSV

    In contrast to the above, our method showed outstanding performance. Figure 2(B) shows that our model indeed approximates smooth function. By observing that our model predicts an accurate speed with a small loss in Table 1, the results of Theorem 4 suggest that our model exists near the solution. min-max monotonic networks have failed to predict the exact speed and reduce loss. We analyzed that the network cannot satisfy the governing equation because max or min operation hinders differentiability.

    Figure 2.  Approximations of the solutions to the system (2.1) via two types of monotonic networks.

    When the sign is known instead of the exact value of wave speed, our model has performed feasible predictions. Figures 3 and 4 show the experimental results of our model when (a1,d1,a3,d3,b1,b2,b3) is determined to (1, 1, 1, 1, 0.75, 0.75, 1.25) and (1, 1, 1, 1, 0.75, 0.75, 1.75), respectively. Our model predicted the correct sign, and the loss function converged to zero for the above cases. However, stabilization of learning requires more hyperparameter search. The learning and decay rates used in this paper are searched by performing a grid search on LossTotal. The segmentation and extension of the set of hyperparameters may contribute to the convergence of the model.

    Figure 3.  Training trajectory and our approximations of the solutions to the system (2.1) via three neural networks for positive wave speed.
    Figure 4.  Training trajectory and our approximations of the solutions to the system (2.1) via three neural networks for negative wave speed.

    Finally, for analysis of negative wave speeds related to Theorem 2, we set (a1,d1,a3,d3,b1,b2,b3) to (1,1,1,1,0.99,0.99,2.01). In this case, Figure 5 shows that the solution is challenging to approximate through our methods. Although the estimated speed seems to converge to -0.2, we can observe a phenomenon in which the value of the loss function increases rapidly before reaching an acceptable level of convergence. We observed that in the majority of experiments, it was necessary for the loss function values to be less than 1e-5 to ensure convergence to accurate solutions. The experimental results presented in Table 1 demonstrate that The training of neural networks with loss function values around 1e-3 can result in a failure to predict an adequate wave speed. This raised the question of whether our method indeed approximates real solutions in this negative speed case.

    Figure 5.  Training trajectory and our approximations of the solutions to the system (2.1) via three neural networks for negative wave speed in Theorem 3.

    In this paper, we construct the monotonic neural network to impose constraints on the monotonicity. In all experiments, our method performed better than other methods, suggesting that it effectively represents real solutions based on a theoretical basis.

    To verify that this approach is universal, we require supplementary experiments on various equations in which a unique monotone solution exists. We anticipate that the proof techniques or algorithm can be applied to several equations without significant modifications, provided that the uniqueness of the traveling wave solution is guaranteed. As a future research direction, it would be valuable to develop a neural network structure that can approximate the traveling wave solution with the minimum wave speed when multiple solutions exist.

    Recently, the number of studies attempting to transform the residual loss of PINN has been increasing. The optimization problem for PDE was interpreted as a boundary constraint problem, with empirical evidence that minimizing the loss function for boundary conditions is quite challenging. Reflecting this trend, we will conduct a convergence analysis of algorithms that consider regularization rather than constraints. There exist a few traveling wave solutions which possess monotonicity as a property, rather than as an imposed constraint. In future work, we will investigate the variations that result from training with Physics-informed Neural Networks (PINNs) for this regularized problem.

    This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2022-00165268) and by Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea government(MSIP) (No.2019-0-01906, Artificial Intelligence Graduate School Program (POSTECH)).

    The authors declare there is no conflict of interest.



    [1] O. P. Agrawal, A general formulation and solution scheme for fractional optimal control problems, Nonlinear Dyn., 38 (2004), 323–337. https://doi.org/10.1007/s11071-004-3764-6 doi: 10.1007/s11071-004-3764-6
    [2] G. V. Alekseev, R. V. Brizitskii, Analysis of the boundary value and control problems for nonlinear reaction-diffusion-convection equation, J. Sib. Fed. Univ. Math. Phys., 14 (2021), 452–462. https://doi.org/10.17516/1997-1397-2021-14-4-452-462 doi: 10.17516/1997-1397-2021-14-4-452-462
    [3] T. S. Angell, On the optimal control of systems governed by nonlinear Volterra equations, J. Optim. Theory Appl., 19 (1976), 29–45. https://doi.org/10.1007/BF00934050 doi: 10.1007/BF00934050
    [4] A. Arutyunov, D. Karamzin, A survey on regularity conditions for state-constrained optimal control problems and the non-degenerate maximum principle, J. Optim. Theory Appl., 184 (2020), 697–723. https://doi.org/10.1007/s10957-019-01623-7 doi: 10.1007/s10957-019-01623-7
    [5] E. S. Baranovskii, Optimal boundary control of nonlinear-viscous fluid flows, Sb. Math., 211 (2020), 505–520. https://doi.org/10.1070/SM9246 doi: 10.1070/SM9246
    [6] E. S. Baranovskii, The optimal start control problem for 2D Boussinesq equations, Izv. Math., 86 (2022), 221–242. https://doi.org/10.1070/IM9099 doi: 10.1070/IM9099
    [7] S. A. Belbas, A new method for optimal control of Volterra integral equations, Appl. Math. Comput., 189 (2007), 1902–1915. https://doi.org/10.1016/j.amc.2006.12.077 doi: 10.1016/j.amc.2006.12.077
    [8] M. Bergounioux, L. Bourdin, Pontryagin maximum principle for general Caputo fractional optimal control problems with Bolza cost and terminal constraints, ESAIM: COCV, 26 (2020), 35. https://doi.org/10.1051/cocv/2019021 doi: 10.1051/cocv/2019021
    [9] P. Bettiol, L. Bourdin, Pontryagin maximum principle for state constrained optimal sampled-data control problems on time scales, ESAIM: COCV, 27 (2020), 51. https://doi.org/10.1051/cocv/2021046 doi: 10.1051/cocv/2021046
    [10] V. I. Bogachev, Measure theory, Springer, 2000.
    [11] J. F. Bonnans, The shooting approach to optimal control problems, IFAC Proc. Vol., 46 (2013), 281–292. https://doi.org/10.3182/20130703-3-FR-4038.00158 doi: 10.3182/20130703-3-FR-4038.00158
    [12] J. F. Bonnans, C. de la Vega, Optimal control of state constrained integral equations, Set-Valued Anal., 18 (2010), 307–326. https://doi.org/10.1007/s11228-010-0154-8 doi: 10.1007/s11228-010-0154-8
    [13] J. F. Bonnans, C. de la Vega, X. Dupuis, First- and second-order optimality conditions for optimal control problems of state constrained integral equations, J. Optim. Theory Appl., 159 (2013), 1–40. https://doi.org/10.1007/s10957-013-0299-3 doi: 10.1007/s10957-013-0299-3
    [14] L. Bourdin, A class of fractional optimal control problems and fractional Pontryagin's systems. Existence of a fractional Noether's theorem, arXiv, 2012. https://doi.org/10.48550/arXiv.1203.1422
    [15] L. Bourdin, Note on Pontryagin maximum principle with running state constraints and smooth dynamics–Proof based on the Ekeland variational principle, arXiv, 2016. https://doi.org/10.48550/arXiv.1604.04051
    [16] L. Bourdin, G. Dhar, Optimal sampled-data controls with running inequality state constraints: Pontryagin maximum principle and bouncing trajectory phenomenon, Math. Program., 191 (2022), 907–951. https://doi.org/10.1007/s10107-020-01574-2 doi: 10.1007/s10107-020-01574-2
    [17] B. Brunner, Volterra integral equations: an introduction to theory and applications, Cambridge University Press, 2017. https://doi.org/10.1017/9781316162491
    [18] C. Burnap, M. A. Kazemi, Optimal control of a system governed by nonlinear Volterra integral equations with delay, IMA J. Math. Control Inf., 16 (1999), 73–89. https://doi.org/10.1093/imamci/16.1.73 doi: 10.1093/imamci/16.1.73
    [19] T. A. Burton, Volterra integral and differential equations, 2 Eds., Elsevier Science Inc., 2005.
    [20] D. A. Carlson, An elementary proof of the maximum principle for optimal control problems governed by a Volterra integral equation, J. Optim. Theory Appl., 54 (1987), 43–61. https://doi.org/10.1007/BF00940404 doi: 10.1007/BF00940404
    [21] F. H. Clarke, Optimization and nonsmooth analysis, SIAM, 1990.
    [22] A. V. Dmitruk, N. P. Osmolovskii, Necessary conditions for a weak minimum in optimal control problems with integral equations subject to state and mixed constraints, SIAM J. Control Optim., 52 (2014), 3437–3462. https://doi.org/10.1137/130921465 doi: 10.1137/130921465
    [23] A. V. Dmitruk, N. P. Osmolovskii, Necessary conditions for a weak minimum in a general optimal control problem with integral equations on a variable time interval, Math. Control Relat. F., 7 (2017), 507–535. https://doi.org/10.3934/mcrf.2017019 doi: 10.3934/mcrf.2017019
    [24] T. M. Flett, Differential analysis, Cambridge University Press, 1980. https://doi.org/10.1017/CBO9780511897191
    [25] M. I. Gomoyunov, Dynamic programming principle and Hamilton-Jacobi-Bellman equations for fractional-order systems, SIAM J. Control Optim., 58 (2020), 3185–3211. https://doi.org/10.1137/19M1279368 doi: 10.1137/19M1279368
    [26] Y. Hamaguchi, Infinite horizon backward stochastic Volterra integral equations and discounted control problems, ESAIM: COCV, 101 (2021), 1–47. https://doi.org/10.1051/cocv/2021098 doi: 10.1051/cocv/2021098
    [27] Y. Hamaguchi, On the maximum principle for optimal control problems of stochastic Volterra integral equations with delay, Appl. Math. Optim., 87 (2023), 42. https://doi.org/10.1007/s00245-022-09958-w doi: 10.1007/s00245-022-09958-w
    [28] S. Han, P. Lin, J. Yong, Causal state feedback representation for linear quadratic optimal control problems of singular Volterra integral equations, Math. Control Relat. F., 2022. https://doi.org/10.3934/mcrf.2022038
    [29] R. F. Hartl, S. P. Sethi, R. G. Vickson, A survey of the maximum principle for optimal control problems with state constraints, SIAM J. Control Optim., 37 (1995), 181–218. https://doi.org/10.1137/1037043 doi: 10.1137/1037043
    [30] M. I. Kamien, E. Muller, Optimal control with integral state equations, Rev. Econ. Stud., 43 (1976), 469–473. https://doi.org/10.2307/2297225 doi: 10.2307/2297225
    [31] R. Kamocki, On the existence of optimal solutions to fractional optimal control problems, Appl. Math. Comput., 235 (2014), 94–104. https://doi.org/10.1016/j.amc.2014.02.086 doi: 10.1016/j.amc.2014.02.086
    [32] A. A. Kilbas, H. M. Srivastava, J. J. Trujillo, Theory and applications of fractional differential equations, Elsevier, 2006.
    [33] X. Li, J. Yong, Optimal control theory for infinite dimensional systems, 1 Ed., Boston: Birkhäuser Boston, 1995. https://doi.org/10.1007/978-1-4612-4260-4
    [34] P. Lin, J. Yong, Controlled singular Volterra integral equations and Pontryagin maximum principle, SIAM J. Control Optim., 58 (2020), 136–164. https://doi.org/10.1137/19M124602X doi: 10.1137/19M124602X
    [35] N. G. Medhin, Optimal processes governed by integral equation equations with unilateral constraints, J. Math. Anal. Appl., 129 (1988), 269–283. https://doi.org/10.1016/0022-247X(88)90248-X doi: 10.1016/0022-247X(88)90248-X
    [36] H. K. Moffatt, Helicity and singular structures in fluid dynamics, Proc. Natl. Acad. Sci., 111 (2014), 3663–3670. https://doi.org/10.1073/pnas.1400277111 doi: 10.1073/pnas.1400277111
    [37] J. Moon, The risk-sensitive maximum principle for controlled forward-backward stochastic differential equations, Automatica, 120 (2020), 109069. https://doi.org/10.1016/j.automatica.2020.109069 doi: 10.1016/j.automatica.2020.109069
    [38] A. Ruszczynski, Nonlinear optimization, Princeton University Press, 2006.
    [39] C. de la Vega, Necessary conditions for optimal terminal time control problems governed by a Volterra integral equation, J. Optim. Theory Appl., 130 (2006), 79–93. https://doi.org/10.1007/s10957-006-9087-7 doi: 10.1007/s10957-006-9087-7
    [40] V. R. Vinokurov, Optimal control of processes described by integral equations III, SIAM J. Control, 7 (1969), 324–355. https://doi.org/10.1137/0307024 doi: 10.1137/0307024
    [41] R. Vinter, Optimal control, Birkhäuser, 2000.
    [42] T. Wang, Linear quadratic control problems of stochastic integral equations, ESAIM: COCV, 24 (2018), 1849–1879. https://doi.org/10.1051/cocv/2017002 doi: 10.1051/cocv/2017002
    [43] J. Yong, X. Y. Zhou, Stochastic controls: Hamiltonian systems and HJB equations, New York: Springer Science+Business Media, 1999. https://doi.org/10.1007/978-1-4612-1466-3
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1806) PDF downloads(128) Cited by(3)

Figures and Tables

Figures(5)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog