1.
Introduction
Many of the problems we encounter in the world that can be presented as optimal control problems contain constraints on both the state and control variables. Imagine a scheduling problem whereby the aim is to maximize profits, such as in [26], in which the control variable is the production rate and the state variable is the inventory level. Naively, one may want to greatly increase both the inventory level and production rate to maximize profits, but, in practice, there is a limit to the amount of inventory a factory can hold, as well as a limit on how quickly humans or machines can work. In order to accurately model this problem, and many others from a wide range of application areas, such as manufacturing, engineering, science, economics, etc., we must use an optimal control problem with both state and control constraints. Although these problems are commonly found in applications, they are much more difficult to solve than those optimal control problems which purely have control constraints.
We focus our attention on linear-quadratic (LQ) state- and control-constrained optimal control problems. These are infinite-dimensional optimization problems that involve the minimization of a quadratic objective function that is subject to linear differential equation (DE) constraints, affine constraints on the state variables, and affine constraints on the control variables. There is extensive literature on LQ control problems, as they can model many problems from a wide variety of disciplines (see [1,14,15,16,24,27,28]). Though many of the LQ control problems posed in the literature contain control constraints along with the linear DE constraints, it is rarer to see state constraints included since, as mentioned above, they are much more difficult to manage. Here, we propose a unique approach for solving state-constrained problems by applying the Douglas–Rachford (DR) algorithm.
The DR algorithm is a projection algorithm that is used to minimize the sum of two convex functions. In order to apply the algorithm one needs the proximal operators of the two convex functions. Splitting and projection methods such as the DR algorithm constitute a popular area of research in optimization, with a variety of applications (see [2,3,6,9,20]) for their use in sphere packing problems, protein reconstruction, etc. The use of these methods to solve discrete-time optimal control problems is not new, but there are very few applications of these methods to continuous-time optimal control problems. In [7], projection methods were used to solve the energy-minimizing double integrator problem with control constraints. Other papers [11,12,13] address more general energy-minimizing LQ control problems with control constraints. Additionally, a collection of general projection methods is used in [11], and the DR algorithm is used in [12,13].
Following the promising numerical results observed in [12,13], here, we have used the DR algorithm to address the more challenging control- and state-constrained LQ control problems.
The current technique for solving these control- and state-constrained LQ problems is a direct discretization approach whereby the infinite-dimensional problem is reduced to a large-scale finite-dimensional problem by using a discretization scheme, e.g., a Runge–Kutta method [5,22]. This discretized problem is then solved through the use of an optimization modelling language such as AMPL [19], paired with (finite-dimensional) large-scale optimization software such as Ipopt [33].
Through the present paper (as was done in [7,11] for control constraints only), we present the following contributions to research on the numerical solution of LQ control problems with state and control constraints.
(ⅰ) We rewrite the original LQ control problem as the minimization of the sum of two convex functions. This reformulation addresses much more general problems than those considered in [7,11]. Our setting encompasses important practical problems, such as those referenced above (i.e., problems that have constraints on the state variables).
(ⅱ) We find the proximal mappings of these functions by solving the resulting calculus of variations subproblems (Theorems 1 and 2). Although the proof of Theorem 1 is straightforward thanks to the separability of the variables x and u, the proof of Theorem 2 is more involved due to a new LQ control subproblem that needs to be solved.
(ⅲ) We define and apply the DR algorithm to the constrained LQ control problem. We define and calculate proximal mappings at (x,u), rather than just u, as in [7,12]. Thus, we describe the DR algorithm for the iterate (xk,uk) rather than for uk alone.
(ⅳ) We propose an algorithm for computing the projection of an iterate (x−,u−) onto the affine constraint set defined by the ordinary differential equations (ODEs). The procedure is an extension of [12, Algorithm 2], which projects only u−. The current algorithm involves a more complicated two-point boundary-value problem to solve.
(ⅴ) We perform numerical experiments to illustrate the higher efficiency of the DR algorithm as compared to the traditional approach of direct discretization. Solving optimal control problems with state constraints is well known to be more challenging than those without them. We demonstrate the value of the new approach in solving a class of such problems.
In addition to these, we also present the following contribution, which is new even for the purely control-constrained LQ control problems.
(ⅵ) We present a conjecture for finding the costates. Moreover, we present a technique for finding the multiplier for the state constraints in the case that only one state constraint is active at a given time. We have effectively employed the costates and the state control multiplier for numerical verification of the optimality conditions.
The two convex functions mentioned above are defined as follows: the information from the ODE constraints appears in one function, while the information from the control and state constraints, along with the objective functional appear in the other. We derive the proximal mappings of the two convex functions without reducing the original infinite-dimensional optimization problem to a finite-dimensional one, though we need to discretize the state and control variables over a partition of time when implementing the DR algorithm since a digital computer cannot iterate with functions.
The paper is structured as follows. In Section 2, we give the problem formulation and optimality conditions for the optimal control problem. In Section 3, we derive the proximal mappings used in the implementation of the DR algorithm. Section 4 introduces the DR algorithm. Then, Section 5 begins with an algorithm for computing one of the proximal mappings (namely, the projection onto an affine set) and a conjecture that is used to obtain the costate variable of the original LQ control problem. Next, this section introduces two example problems, a harmonic oscillator and a mass-spring system. At the end of this section, numerical experiments for the DR algorithm and the AMPL–Ipopt suite and their comparisons are given for these two problems. Finally, concluding remarks and comments for future work are provided in Section 6.
2.
Optimal control problem
In this, section we formulate the general optimal control problem that will be the focus of this paper. We give some necessary definitions and provide conditions for optimality from optimal control theory.
Before introducing the optimal control problem, we will give some standard definitions. Unless otherwise stated, all vectors are column vectors. Let L2([t0,tf];Rq) be the Banach space of Lebesgue measurable functions given by z:[t0,tf]→Rq, with a finite L2 norm, namely,
where ‖⋅‖ is the ℓ2 norm in Rq. Furthermore, W1,2([t0,tf];Rq) is the Sobolev space of absolutely continuous functions, namely,
endowed with the norm
With these definitions, we can now state the general LQ control problem as follows:
The state variable x∈W1,2([t0,tf];Rn), with x(t):=(x1(t),…,xn(t))∈Rn, and the control variable u∈L2([t0,tf];Rm), with u(t):=(u1(t),…,um(t))∈Rm. The time-varying matrices A:[t0,tf]→Rn×n, B:[t0,tf]→Rn×m, Q:[t0,tf]→Rn×n, and R:[t0,tf]→Rm×m are continuous. For every t∈[t0,tf], the matrices Q(t) and R(t) are symmetric and respectively positive semi-definite and positive definite. For clarity of argument, these matrices are assumed to be diagonal, namely, Q(t):=diag(q1(t),…,qn(t)) and R(t):=diag(r1(t),…,rm(t)). These diagonality assumptions particularly simplify proximal mapping expressions that appear later. The initial and terminal states are given as x0 and xf, respectively.
2.1. Optimality conditions
In this section, we state the maximum principle by using the direct adjoining approach from [21, Theorem 4.1]. We note that the authors of [21] designate Theorem 4.1 in their paper as an "informal theorem" since it has not been proved fully (for the case of pure state and mixed control and state constraints). However, they also point to a paper by Maurer as an exception where Theorem 4.1 has been proved for the case of pure state and pure control constraints, which is exactly our setting. We start by defining the extended Hamiltonian function H:Rn×Rm×Rn×Rn×Rn×[t0,tf]→R for Problem (P) as follows:
where the multiplier λ0≥0, the adjoint variable vector λ:[t0,tf]→Rn with λ(t):=(λ1(t),…,λn(t))∈Rn, and the state constraint multiplier vectors μ1,μ2:[t0,tf]→Rn with μi(t):=(μi1(t),…,μin(t))∈Rn, i=1,2. For brevity, we use the following notation:
The adjoint variable vector is assumed to satisfy
for almost every t∈[t0,tf], where Hx:=∂H/∂x.
We assume that the problem we are solving is normal, so we set λ0=1.
Maximum Principle. Suppose that the pair
is optimal for Problem (P). Then, there exists a piecewise absolutely continuous adjoint variable vector λ, as defined in (2.1), and piecewise continuous multipliers μ1,μ2∈L2([t0,tf];Rn), such that, for almost every t∈[t0,tf],
for i=1,…,m, where bi(t) is the ith column of the matrix B(t) and ri(t) is the ith diagonal element of R(t). Moreover, the multipliers μ1(t),μ2(t) must satisfy the complementarity conditions given by
for all i=1,…,n.
Suppose that u_i=−∞, ¯ui=∞, i=1,…,m, i.e., the control vector is unconstrained. Then, (2.2) becomes
i.e.,
i=1,…,m. Then, (2.5) can be solved for ui(t) as follows:
for i=1,…,m; or by using the matrices B(t) and R(t):
When we consider the constraints on u, one gets the following from (2.2):
for almost every t∈[t0,tf], j=1,…,m.
3.
Splitting and proximal mappings
In this section, we rewrite Problem (P) as the minimization of the sum of two convex functions f and g, and we give the proximal mappings for these functions in Theorems 1 and 2.
We split the constraints from (P) into two sets A,B, given as
Despite previously defining x∈W1,2([t0,tf];Rn) in our sets A,B, we let x∈L2([t0,tf];Rn) to simplify the calculation of the proximal mappings. By the definition of W1,2([t0,tf];Rn), if x∈A, then x∈W1,2([t0,tf];Rn).
We assume that the control system ˙x(t)=A(t)x(t)+B(t)u(t) is controllable; in other words, the control system can be driven from any x0 to any other xf; for a precise definition of controllability and the tests for controllability, see [29]. Then, there exists a (possibly not unique) u(⋅) such that, when this u(⋅) is substituted, the boundary-value problem given in the expression for A has a solution x(⋅). In other words, A≠∅. Also, clearly, B≠∅. We note that the constraint set A is an affine subspace. Given that B is a box, the constraints turn out to be two convex sets in Hilbert space. Since every sequence converging in L2 has a subsequence converging pointwise almost everywhere, it is straightforward to see that the set B is closed in L2. The closedness of A will be established later as a consequence of Theorem 2 (see Remark 2).
Fix β>0 and let
where ιC is the indicator function of the set C, namely,
We note that the DE constraint in Problem (P) is represented by the indicator function ιA(x,u) and the box constraints on (x,u) in Problem (P) are represented by the indicator function ιB(x,u). Problem (P) is then equivalent to the following problem:
The equivalence between Problem (P) and (3.4) follows from the fact that the objective function in (3.4) is the sum of the β weighted objective function of Problem (P) and the indicator functions of the constraints. Although the parameter β does not change the solution of the problem, it will have a role in the performance of the algorithm we study.
In our setting, we assume that we are able to compute the projector operators PA and PB. These operators project a given point onto each of the constraint sets A and B, respectively. Recall that the proximal mapping of a functional h is defined by [8, Definition 24.1]. For our setting,
for any (x,u)∈L2([t0,tf];Rn)×L2([t0,tf];Rm). Recall that the projection PC(x,u) of a point (x,u) onto C is characterized by PC(x,u)∈C and that ∀(y,v)∈C, ⟨(y,v)−PC(x,u)|(x,u)−PC(x,u)⟩≤0 [8, Theorem 3.16].
Note that ProxιC=PC.
In order to implement the DR algorithm, we must write the proximal mappings f and g. The proofs of Theorems 1 and 2 below follow the broad lines of proof of Lemma 2 in [13]. In both theorems, the major difference from [13] is that the proximal operators in the current paper have two variables x− and u−. Thanks to separability, the proof of Theorem 1 is a straightforward modification of the corresponding part of the proof of [13, Lemma 2]. We include a full proof of Theorem 1 for the convenience of the reader. On the other hand, the proof of Theorem 2 deals with the solution of a more involved optimal control subproblem, namely, Problem (Pg).
Theorem 1. The proximal mapping of f is given as Proxf(x−,u−)=(y,v) such that the components of y and v are respectively expressed as follows
for all t∈[t0,tf], i=1,…,n, j=1,…,m.
Proof. From (3.5) and the definition of f in (3.3), we have that
In other words, to find Proxf(x−,u−) we need to find (y,v) that solves the following problem:
Problem (Pf) is separable in the variables y and v, so we can consider the problems of minimizing with respect to y and v individually, and thus solve the following two subproblems:
and
The solution to Problem (Pf1) is given by
i=1,…,n, which, after straightforward manipulations, yields (3.6). The solution to Problem (Pf2) is obtained similarly as:
j=1,…,m, which yields (3.7) after straightforward manipulations. □
Theorem 2. The proximal mapping of g is given as Proxg(x−,u−)=PA(x−,u−)=(y,v) such that
where x(t),λ(t) are obtained by solving the following two-point boundary-value problem (TPBVP):
Proof. Using [8, Example 12.25], and the definition of g in (3.3),
which verifies the very first assertion. From (3.5) and the definition of g in (3.3), we have that
In other words, to find Proxg(x−,u−), we need to find (y,v) that solves the following problem:
Problem (Pg) is an optimal control problem wherein y(t) is the state variable and v(t) is the control variable. The Hamiltonian for Problem (Pg) is given by
and the associated costate equation is given by
If v is the optimal control for Problem (Pg), then, by the maximum principle, Hv[t]=0 for all t∈[t0,tf]. In other words,
for all t∈[t0,tf], a re-arrangement of which yields (3.9). Collecting together the ODE in Problem (Pg) and the ODE in (3.11), substituting v(t) from (3.9), and assigning y(t)=x(t), we can obtain the TPBVP in (3.10). □
Remark 1. We note from Theorem 2 that Proxg is the projection onto the affine set A. Unlike Proxf, in general, we cannot find an analytical solution to (3.10) to obtain Proxg (or the projection onto A) thus for this purpose, we will propose Algorithm 2 in Section 4.1 to get a numerical solution.
Remark 2. From Theorem 2, we see that every pair (x,u)∈L2×L2 has a projection onto A. In other words, A is a Chebyshev set. It is well known that every Chebyshev set is closed (see [8, Remark 3.11(ⅰ)]). Hence, the set A is closed in the topology of L2×L2.
4.
Douglas–Rachford algorithm
The application of the DR algorithm to our problem is slightly different from that in [7,12]. Since we are solving control- and state-constrained optimal control problems, we must define the proximal mappings at the pair (x,u), rather than just at u, as in [7,12]. Thus, in the implementation of the DR algorithm, we give the iterations for the pair (xk,uk), rather than for uk alone.
Given β>0, we specialize the DR algorithm (see [17,18,25]) to the case of minimizing the sum of the two functions f and g as in (3.3) and (3.4). The DR operator associated with the ordered pair (f,g) is defined by
Application of the operator to our case is given by
where the proximal mappings of f and g are provided as in Theorems 1 and 2. Let X be an arbitrary Hilbert space. Now, fix x0∈X. Given xn∈X, k≥0, the DR iterations are set as follows:
The DR algorithm is implemented as Algorithm 1. We define a new parameter γ:=1/(1+β), where β is the parameter that multiplies the objective as in (3.3) and Theorem 1. The choice of γ∈]0,1[ can be made because changing β does not affect the solution of Problem (P).
Remark 3. We point out that, in general, only weak convergence is guaranteed for the DR algorithm (see [32, Theorem 1] or [10, Theorem 4.4]). It should be noted that the final iterate of the state-control pair in the box B is returned by the algorithm.
4.1. Algorithm for projector onto A
We introduce a procedure for numerically projecting onto A, that is, an extension of Algorithm 2 from [12] to the case of LQ control problems with state and control constraints. The procedure below (Algorithm 2) can be employed in Step 3 of the DR algorithm. In the procedure, we effectively solve the TPBVP in (3.10) by implementing the standard shooting method [4,23,31]. Throughout the steps of Algorithm 2 below, we will solve the ODEs in (3.10), rewritten here in matrix form as follows:
with various initial conditions (ICs):
In the above equations, we use λDR, instead of just λ, to emphasize the fact that λDR is the costate variable that results from solving Problem (Pg) to compute the projection onto A within the DR algorithm. We reiterate that Problem (Pg) is more involved than its counterpart in [13], which leads to the ODE in (4.2), which, in turn, is more complicated than its counterpart in [12].
4.2. A conjecture for the costates for Problem (P)
Recall that the optimal control for Problem (P) is given by the cases in (2.8).
A junction time ¯tj is a time when the control uj(t) falls into two cases of (2.8) simultaneously, i.e., a point in time when a control constraint transitions from "active" to "inactive", or vice versa. This definition of a junction time becomes important in the following conjecture, which has been formulated and tested by means of extensive numerical experiments.
Conjecture 1. Let λDR(t) be the costate variable emerging from the projection into A as computed in Algorithm 2, and λ(t) be the costate variable that results from Problem (P). Let ¯tj be a junction time for some uj, j=1,…,m, such that bj(¯tj)TλDR(¯tj)≠0. Define
Then,
Remark 4. The ability to obtain the costate variable by Conjecture 1 is desirable as a tool for checking that the necessary condition of optimality in (2.8) is satisfied. Without this conjecture, we are unable to verify whether the optimality condition is satisfied when using the DR algorithm, except when a dual version of the DR algorithm is employed, as in [13].
Once we have calculated λ in this way, we can also find a multiplier μk, k=1,2, numerically, for the case in which only one state constraint is active at a given time. Suppose that only the ith state box constraint becomes active. By rearranging (2.1), using numerical differentiation to find ˙λ, and assuming μ2i(t)=0, we have that
If μ1i(t)=0, then we compute the following:
With (4.5), or with (4.6), the complementarity conditions in (2.3) or (2.4) can now be checked numerically.
5.
Numerical experiments
We will now introduce two example problems. Along with posing the optimal control problems, we also give plots of their optimal controls, states, costates, and multipliers, with vertical lines signifying the regions in which the state constraints become active.
5.1. Harmonic oscillator
Problem (PHO) below contains the dynamics of a harmonic oscillator which, is typically used to model a point mass with a spring. The dynamics are given as ¨y(t)+ω20y(t)=f(t), where ω0>0 is known as the natural frequency and f(t) is some forcing. In a physical system, y represents the position of a unit mass, ˙y is the velocity of said mass, the natural frequency is defined as ω0=√k, where k is the stiffness of the spring producing the harmonic motion, and f is the restoring force. In addition to the restoring force, we will introduce another force u1 that will affect the velocity ˙y directly. We let x1:=y, x2:=˙y, and u2:=f to arrive at ˙x1(t)=x2(t)+u1(t), ˙x2(t)=−ω20x1(t)+u2(t).
In this example problem, the objective contains the squared sum of all four variables in the system. It is common to consider this problem with the objective of minimizing the energy of the control variable, but, in this case, we have also included the state variables to test the algorithm with a slightly more involved objective. The focus of this research is control- and state-constrained problems, so the constraints have been added as in Problem (P):
5.2. Simple spring-mass system
The simple spring-mass system is another physical system that can be easily visualized, see [30]. This problem contains two masses and two springs that are connected in sequence with the dynamics given by m1¨y1(t)+(k1+k2)y1(t)−k2y2=f1(t), m2¨y2(t)−k2y1(t)+k2y2(t)=f2(t), where m1,m2 are the two masses, k1,k2 are the spring coefficients (stiffness), and f1(t),f2(t) are the forces applied to m1,m2, respectively. Let x1:=y1, x2:=˙y1, x3:=y2, x4:=˙y2, u1:=f1, u2:=f2, m1=m2=1, k1=1, and k2=2; then, we retrieve the system in Problem (PSM). This dynamical system furnishes an optimal control problem with four state variables and two control variables. As in (PHO), we have added state and control constraints and set the objective as the integral of the squared sum of the state and control variables:
5.3. Numerical discussion and comparisons
Technical specifications. In the numerical experiments, we used Matlab version 2023b with the DR algorithm and compared its performance with that of the AMPL–Ipopt optimization computational suite [19,33] with Ipopt version 3.12.13. We chose to make comparisons to Ipopt since it is a free and easily obtainable solver that is used for problems such as those presented in this paper (also see [5]). All computations applied ϵ=10−8 from Algorithm 1 or, in the case of Ipopt, tol=10−8, and were run on a PC with an i5-10500T 2.30GHz processor with 8GB RAM. For the two examples (PHO) and (PSM), we experimented with a case with only control constraints and a case in which there was also an added state constraint. The problem specifications can be found in Table 1, along with the choices of γ that were used in the implementation of the DR algorithm.
Generation of "true" solutions. Since we did not have analytical solutions to these problems, we generated higher-accuracy numerical solutions to our problems that we will use as "true" solutions in our error calculations. In the Case 1 examples (only control constraints present) the DR algorithm was able to successfully converge to an acceptable solution by using N=107 and ϵ=10−12, without reaching 200 iterations (the maximum number of iterations we allowed). The 8 GB of RAM provided insufficient memory for Ipopt to generate a solution with such a high number of discretization points, so the DR algorithm was used to generate the "true" solutions in this case. In Case 2 (state-constrained case), the DR algorithm was unable to converge to an accepted solution in less than 200 iterations, so we instead relied on Ipopt. Due to the memory limitations, the "true" solution for (PHO) Case 2 was generated with N=106, ϵ=10−12, and (PSM) Case 2 used N=7×105, ϵ=10−12.
Choice of an algorithmic parameter. The values of γ in Table 1 were decided by generating plots that display the number of iterations required for the DR algorithm to find an acceptable solution for 500 values of γ in the interval (0,1). For both (PHO) and (PSM) in Case 2, for all values of γ that were tested the DR algorithm required more than the maximum 200 iterations to converge. For these cases, we instead generated plots that compared the errors in the states and controls for the different values of γ. A specific value of γ that would provide the smallest errors was not obvious since many values provided similar performance; but, values closer to 1 appeared optimal; thus, the choice of γ=0.95 was made for these experiments.
Graphical results. In Figures 1 and 2, we have plots for (PHO) and (PSM) using the DR algorithm. We have generated similar figures based on the results of Ipopt as well, but, since they mostly overlap Figures 1 and 2, we will not show them all here but will point out the differences. In Figure 3, we give the multiplier vector μ2 components for Case 2 (PHO) as obtained by using the DR algorithm and Ipopt. As indicated by the black vertical lines in the bottom left plot of Figure 1, when using the DR algorithm the constraint on x1 is active over a time interval that aligns with the interval in which μ21 is positive, as expected from (2.3) and (2.4).
In the results from Ipopt, we observe that the state constraint only became active at a single point in time, but Figure 3 shows that μ21 from Ipopt (yellow dotted line) is positive for a larger interval of time; thus, (2.4) is violated. This appears to be a numerical error that is not present in the DR results since there is an interval of points around the spike where the state constraint −0.025≤x1(t) becomes active. When N=104,105, for (PHO), we observe that, when using Ipopt, the lower bound on x1 is never reached, though, again, we see the interval of points that are almost equal to the lower bound.
In Figure 3, we also see a discrepancy in μ22 since we have not imposed a constraint on the variable x2. Equation (2.4) implies that μ22(t)=0 for all t∈[t0,tf]. We can see in Figure 3 that μ22 from Ipopt (purple dotted line) fails to satisfy this requirement, while μ22 from the DR algorithm is, at least to the eye, equal to zero.
Another difference between the multipliers μ from the DR algorithm and Ipopt can be seen in the maximum values reached by the functions. In Figure 3, we see that μ21 obtained via the DR algorithm and Ipopt have similar shapes in their plots, but the maximum value reached by using the DR algorithm is approximately 0.7, while, from Ipopt, the maximum value is approximately 33. For (PSM) Case 2 with N=103, the maximum value for μ21 that was obtained by using the DR algorithm was approximately 16, while Ipopt yielded a maximum value of approximately 310. Along with the functions having very different maximum values, we noted that, when generating these plots for N=103,104,105, the results from the DR algorithm were clearly converging to a single function μ21, while this was not obvious from the Ipopt results. For (PSM), we see the approximate maximum values of μ21 that were obtained when using Ipopt were 310 for N=103, 2500 for N=104, and 3000 for N=105. In the same example, using the DR algorithm, the maximum values of the function were 16 for N=103, 14 for N=104, and 14 for N=105.
We also observed some slight variation between the plots obtained using the DR algorithm and Ipopt at the junction times in the control variables (not shown in this paper in order to avoid excessive amount of visual material). The intervals in which the control variables reached their boundaries via the DR algorithm always appeared slightly larger than those obtained from Ipopt. The control variables in the region in which they transition between active and inactive constraints appeared more rounded at these corners when using Ipopt and exhibited a sharper transition when using the DR algorithm.
Errors and CPU times. Table 2 contains the errors in the controls, states, and costates for the DR algorithm and Ipopt, while Table 3 contains the errors in the multipliers, objective values, and CPU times. The CPU times were computed as averages of over 200 runs (up to 1,000 runs in the faster examples). The values within boxes are the smaller errors and CPU times between the DR algorithm and Ipopt. At a glance, we can see that, more often than not, the DR algorithm produced smaller errors and faster CPU times than Ipopt. Upon closer inspection, we see that, in many of the Case 2 results, the errors from the DR algorithm and Ipopt are comparable. We note that the "true" solutions used to calculate these errors are those explained earlier in this subsection, except for the multipliers μ2 in the Case 1 examples. In those examples, we applied the zero vector as our "true" solution.
We observe that most of the results in Table 2 show a smaller error in the control variable from the DR algorithm, especially in the Case 1 examples where there are no state constraints. Regarding the state variables, we see that Ipopt has smaller errors when N=103,104, though there is little difference from the DR algorithm results, and we see an improvement in the DR algorithm when N=105. Like with the error in the control variables, we see that the error in the costates is smaller for the DR algorithm in almost all examples.
From Table 3, the errors in the multipliers show an improvement in the DR results realtive to Ipopt, though the "true" solution in the Case 2 examples was obtained by using results from Ipopt, which, as previously mentioned, did not appear to converge to a specific value as we increased N thus, the quality of this "true" solution is not guaranteed. The DR algorithm produced slightly smaller objective values that were closest to the "true" solution in almost all cases, though the difference compared with Ipopt was marginal. We see that the CPU times were faster for the DR algorithm, especially in the examples where N=105.
6.
Conclusions
We have applied the DR algorithm to find a numerical solution of LQ control problems with state and control constraints after re-formulating these problems as the sum of two convex functions and deriving expressions for the proximal mappings of these functions (Theorems 1 and 2). These proximal mappings were used in the DR iterations. Within the DR algorithm (Algorithm 1), we proposed a procedure (Algorithm 2) for finding the projection onto the affine set defined by the ODEs numerically.
We carried out extensive numerical experiments on two nontrivial example problems and illustrated both the performance of the algorithm and its efficiency (in both accuracy and speed) compared to the traditional approach of direct discretization. We observed that, in general, the DR algorithm produced smaller errors and faster run times for these problems, most notably in the examples where we have increased the number of discretization points. From these numerical results, the DR algorithm could, in general, be recommended over Ipopt when high-quality solutions are desired.
Based on further extensive experiments, we conjectured on how the costate variables can be determined. We successfully used the costate variables constructed as in the conjecture, as well as the state constraint multipliers that can be calculated by using these costate variables, for the numerical verification of the optimality conditions.
We recall that Algorithm 2 involves repeated numerical solution of the ODEs in (4.2) with various ICs. To solve (4.2), we implemented the (explicit) Euler's method, which requires only a continuous right-hand side of the ODEs. Algorithm 2 appears to be successful for the examples partly owing to the fact that, in these examples, the optimal control is continuous. We tried to apply our approach to the machine tool manipulator problem from [16], which has 7 state variables, one control variable, and upper and lower bounds imposed on the single control variable and one of the state variables. However, our approach did not seem to yield a solution (so far) for this problem, conceivably owing to the fact that the optimal control variable, as well as the optimal costate variable vector, is not continuous, in that these variables jump a number of times during the process (see [16, Figure 5]). Note that discontinuities in the control make the right-hand side of (4.2) discontinuous, rendering Euler's method ineffective. Therefore, problems of the kind in [16] require further investigation.
We believe that our approach can be extended to more general convex optimal control problems, such as those with a non-quadratic objective function or mixed state and control constraints, as long as the pertaining proximal operators are not too challenging to derive.
It would also be interesting to employ and test, in the future, other projection-type methods, such as the Peaceman-Rachford algorithm [8, Section 26.4 and Proposition 28.8], which, to the best of the knowledge of the authors, has not been applied to optimal control problems.
Use of AI tools declaration
The authors declare that they have not used artificial intelligence tools in the creation of this article.
Acknowledgments
The authors thank the two anonymous reviewers whose comments and suggestions improved the paper. BIC was supported by an Australian government research training program scholarship.
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.