1.
Introduction
Unconstrained optimization is one of the fundamental problems in the optimization theory [4,26]. It has broad applications across various fields such as engineering design, financial investments, signal processing, and so on [6,18,21]. In engineering design, unconstrained optimization is used to determine the most efficient design parameters for systems or products, which may either include optimizing the dimensions of mechanical components or minimizing material usage while maintaining the structural integrity. In financial investments, unconstrained optimization techniques enable investors to identify the optimal portfolio allocation, thereby balancing the risk and the return without being confined by predefined constraints. This helps investors maximize the expected returns while minimizing the potential losses. In signal processing, unconstrained optimization is employed in tasks such as signal denoising, image quality enhancement, and performance optimization of communication systems. By either minimizing error functions or maximizing the signal-to-noise ratio, unconstrained optimization algorithms can extract meaningful information from noisy data. Therefore, the discussion of methods for solving unconstrained optimization problems becomes very important.
The trust region method is one of the most commonly used iterative algorithms to solve unconstrained optimization problems [14,27]. The fundamental concept behind the trust region method lies in constructing a surrogate model that approximates the objective function within the neighborhood of the current iteration point. This surrogate model captures the local behavior of the original objective function, thus allowing the algorithm to compute the optimal step size or the trial step within the predefined trust region.
In most practical implementations, a quadratic model is typically chosen as the approximation function due to its mathematical convenience and the availability of efficient solvers. However, the effectiveness of the quadratic model heavily depends on the proper tuning of its parameters. Researchers have proposed various techniques to solve trust region subproblems, including exact solutions, dog-leg methods, and conjugate gradient methods [1,5,33].
However, trust region subproblems are often difficult to solve, particularly for complex objective functions and high-dimensional problems, thus leading to increased computational costs. The challenges typically arise from the non-convexity and multimodal nature of the surrogate function within the trust region, requiring frequent subproblem solutions during the iterative process. To address this, researchers have explored methods to improve the efficiency of solving these subproblems by developing more efficient solvers, employing advanced optimization techniques, and exploring alternative approximation functions [8,22,23].
Overall, trust region methods remain powerful tools for unconstrained optimization, and further advancements in subproblem solvers could enhance their performance and applicability to a wider range of problems. To handle situations where the trial step is not accepted, trust region methods combined with line search techniques have been proposed [11,25]. When the trial step is rejected, these algorithms use the trial step as the search direction and apply a line search technique to determine the corresponding step size. However, these methods require the objective function to monotonically decrease at each iteration, which has some impact on the convergence rate of the algorithm. With the introduction of nonmonotone techniques [12], a new class of nonmonotone trust region algorithms was developed [7], which allowed the sequence of the objective function values to be nonmonotone. Numerical results show that these methods outperform traditional trust region methods. Further research has been conducted in this area [10,28,29].
However, in some cases, the class of aforementioned nonmonotone techniques that used a maximum of recent function values may ignore some better function values. To address this, scholars have proposed improved nonmonotone strategies. Zhang and Hager [34] introduced a new nonmonotone line search algorithm that replaced the maximum function value with the average of recent function values. Gu and Mo [13] developed a simplified nonmonotone strategy that was applied within the trust region framework, which avoided the complex parameterization process of Zhang and Hager [34]. Zhou et al. [35] proposed a nonmonotone trust region method based on a simple model, which performed well on large-scale problems. Improved nonmonotone techniques have also been applied to solve other problems, such as the nonmonotone BFGS (Broyden, Fletcher, Goldfarb and Shanno) algorithm proposed by Wan et al. [31] for smooth nonlinear equations, where the nonmonotone parameters were updated using known information of the nonlinear equation. Some scholars have developed new nonmonotone line search rules that were applied to unconstrained and box-constrained optimization problems [16,19]. In contrast to previous work, we study a stochastic nonmonotone technique that allows the algorithm to explore a larger solution space during the search process.
To reduce the number of subproblem computations and improve the overall efficiency of the algorithm, we propose a modified trust region method that incorporates recent advances in this field, with the following key innovations and advantages.
Construction of a New Function Sequence: To prevent convergence to the local optima and to improve algorithmic performance, we introduce the idea of simulated annealing to construct a new function sequence. Based on this sequence, a new nonmonotone trust region ratio is defined, which leads to corresponding improvements in the iteration criterion of the trust region framework, thus providing a greater flexibility during the iterative process.
Improved Nonmonotone Strategy: When the current trial step is not accepted, we utilize an improved nonmonotone strategy that leverages the new function sequence to determine the step size. This approach effectively reduces the required number of subproblem solutions, thus enhancing the algorithm's overall efficiency.
Global Convergence and Efficiency: Our algorithm guarantees a global convergence under certain assumptions. Furthermore, numerical experiments demonstrate that the algorithm significantly reduces the number of iterations, improves the efficiency, and delivers a strong performance.
The structure of this paper is as follows: in Section 2, we describe the new nonmonotone trust region algorithm; in Section 3, the global convergence of our algorithm is proved under certain assumptions; in Section 4, the related numerical experimental results are given; and finally, the conclusion of this paper is given in Section 5.
2.
New nonmonotone trust region method
Consider the following unconstrained optimization problem:
where f: Rn→R is continuously differentiable. The difficulty in solving the trust region problem often stems from the need to balance multiple objectives: minimizing the approximate objective function, satisfying the trust region constraint, and potentially incorporating additional requirements such as sparsity or regularization. The trust region method computes the trial step dk by solving the following subproblem:
where gk=∇f(xk), Bk is the approximation of the Hessian matrix at xk, Δk is the trust region radius, and ‖⋅‖ denotes the Euclidean norm.
As we all know, in trust region method, the trust region ratio is the key to determining whether the algorithm iterates using the current trial step. The following original trust region ratio is monotone:
where fk=f(xk), Ared is the actual reduction, and Pred is the predicted reduction. The monotone technique may slow down the rate of convergence, particularly in a narrow curved valley. In order to overcome this disadvantage, the nonmonotone trust region ratio [30] has been proposed:
where flk=max0≤j≤q(k)f(xk−j), N is a fixed positive constant, and
However, this method may result in missing some well-performing function values, so some scholars have proposed improved nonmonotone algorithms. Gu and Mo [13] constructed the following trust region ratio in 2008:
where
and 0<η<1 or a variable η.
Based on the inspiration from the above ideas, we propose a new nonmonotone trust region algorithm, which allows the algorithm to use the existing favorable information, and also avoids the algorithm from falling into local optimal solutions. To achieve these, we consider introducing the idea of simulated annealing [15,17,32] into the trust region method. A new trust region framework is constructed, which includes the improvement of the trust region iteration criterion and the trust region ratio, etc.
First, we construct a new trust region ratio:
where
and
Here, we introduce the above parameter pk [3,20] into the Rk, where Tk is the temperature during annealing, τ is a given constant, and rk is the trust region ratio. It is crucial to control the rate of the annealing cooling process [2], which is usually updated according to the following:
From the definition of pk, we know that it is dynamically updated by the value of rk. From this, it can be observed that our trust region ratio rk can be adaptively adjusted according to the existing information.
Furthermore, considering the relation between rk and τ in the parameter pk, we use the parameter pk to determine whether the current trial step dk is accepted. For a random number lk<1, if pk≥lk is satisfied, then the current trial step is accepted. When rk≥τ, pk=1, the current trial step is naturally accepted. Otherwise, the current trial step will be accepted with a certain probability, and the larger pk is more probable when the current trial step is accepted. If pk<lk, then the current trial step is rejected. Using our sequence Rk, we propose a new nonmonotone line search:
where 0<δ<1, 0<λ<1. According to the line search, the corresponding step size is obtained for the iteration. The parameter im is computed by this method, which is iterated through xk+1=xk+λimdk. The nonmonotone line search doesn't require the objective function to be monotonically decreasing, which makes the overall algorithm more flexible and faster.
Combined with the above improvements, we propose the nonmonotone trust region algorithm based on the Metropolis criterion (Algorithm 2.1).
Algorithm 2.1.
Step 0. Given an initial point x0∈Rn, R0=f(x0), Δ0>0, T0>0, ε>0, v>1,
0<λ<1, 0<δ≤0.5, 0<η1<η2<1, 0<γ1<1<γ2, 0<β<1,
0<τ≤η1, B0=ξI,ξ>0, let k:=0.
Step 1. Compute gk. If ‖gk‖≤ε, stop.
Step 2. Solve the subproblem (2.2) to obtain dk.
Step 3. Compute rk and pk by (2.4) and (2.6), respectively. Additionally, compute the following:
Step 4. If pk≥lk, then xk+1=xk+dk. Otherwise, compute im, which is the minimum nonnegative integer i which satisfies the following:
then, xk+1=xk+λimdk.
Step 5. Update the trust region radius:
Step 6. Compute fk+1 and Rk+1=(1−pk)Rk+pkfk+1, Tk+1=βTk, and compute Bk+1 by a quasi-Newton update. Set k:=k+1, then go to Step 1.
Remark 2.1. In Step 3, if rk≥τ, then the function approximation of the previous iteration is better; then, pk=1 and Rk+1=fk+1, which is reduced to the case of the traditional trust region ratio. If rk<τ, then 0<pk<1, and Rk+1 is the convex combination of Rk and fk+1.
3.
Convergence analysis
Nonmonotone trust region methods have good global convergence as described in [13,24]. In this section, we will analyze the convergence of our algorithm, which also inherits the properties of nonmonotone trust region methods. First, the following assumptions are made.
Assumption 3.1. Let L={x∈Rn|f(x)≤f(x0)}⊂Ω be the level set, where Ω∈Rn is a bounded closed set.
Assumption 3.2. There exists a positive constant m such that dTBkd≥m‖d‖2, ∀d∈Rn.
Remark 3.1. Combining with Assumptions 3.1 and 3.2, f(x) is a quadratic continuous differentiable function. It follows that there exists a positive constant M>m, such that ‖Bk‖≤M.
Lemma 3.1. Let the sequence {xn} be generated by the Algorithm 2.1. There are
Lemma 3.2. Let the sequence {xn} be generated by the Algorithm 2.1. Then,
holds for all k.
Proof. (1) From the definition of Rk+1 and pk, if rk≥τ, then we have the following:
(2) From Tk+1=βTk, if rk<τ and pk=exp{−τ−rkTk}≥lk, then we can have limk→∞Tk=0. It is known that lk∈[e−v,e−1/v], and we can obtain −v≤lnlk≤−1v. Therefore, there exists a positive integer N such that 0<−Tklnlk<ε1 holds for any k>N, ε1>0. From the definition of pk, we have the following:
From the definition of rk and (3.1), we have the following:
Let ε1=τ2; then
(3) If rk<τ and pk≤lk, then f(xk+λimdk)≤Rk+δλimgTkdk holds. Combining with (3.2), we obtain the following:
Therefore, Rk+1≥fk+1 holds. The proof is completed. □
Lemma 3.3. Let the sequence {xn} be generated by the Algorithm 2.1. The sequence {Rk} is monotonically decreasing.
Proof. (1) If rk≥τ, namely,
then
According to the definition of Rk+1, we have the following:
(2) By (3.3) and the definition of Rk+1, if rk<τ and pk>lk, then we obtain the following:
(3) If rk<τ and pk≤lk, then f(xk+λimdk)≤Rk+δλimgTkdk holds. Since gTkdk≤0, then fk+1≤Rk.
By the definition of Rk+1, we obtain the following:
Therefore, Rk+1≤Rk holds. The proof is completed. □
Lemma 3.4. Let the sequence {xn} be generated by the Algorithm 2.1. Then, the step length αk satisfies αk>(1−δ)λmM.
Proof. The proof can be proved similarly to the proof of Lemma 3.3 in [13] and is omitted here. □
To facilitate the subsequent discussion, the definition of the sequence Mk is introduced as Mk=1+ max0≤i≤k ‖Bi‖.
Lemma 3.5. Assume that the sequences {xk} and {Rk} are generated by the Algorithm 2.1. If there exists a positive constant ε, such that ‖gk‖≥ε holds, then for any k, there is the following:
where ψ=min{12τ,δ(1−δ)λmM}.
Proof. (1) By (3.4) and (3.5), if pk>lk, then we know that
(2) By combining (3.2), Lemma 3.4, and fk+1≤Rk+δαkgTkdk, if pk≤lk, then the following holds:
According to the definition of Rk+1, we obtain the following:
Additionally, if ψ=min{12τ,δ(1−δ)λmM} and ‖gk‖≥ε, then
The proof is completed. □
Lemma 3.6. Assume that Assumption 3.1 holds. If the sequence {xk} generated by the Algorithm 2.1 does not converge (i.e., there exists a positive constant ε, such that ‖gk‖≥ε), then
Proof. Algorithm 2.1 shows that R0=f0. From Lemmas 3.2 and 3.3, we know that fk+1≤Rk+1≤Rk≤f0. Assumption 3.1 shows that if {f(xk)} is bounded, then {Rk} is bounded. According to Lemma 3.5, we can know that limk→∞min{Δk,εMk}=0. □
Lemma 3.7. If the sequence {xk} generated by the Algorithm 2.1 does not converge (i.e., there exists a positive constant ε such that ‖gk‖≥ε), then there exists a set J={k|pk<lk} such that the following inequality holds for a sufficiently large k∈J:
Proof. Based on α, we consider the following two cases.
Case 1. If im=0 (i.e., αk=1), then
At this point, there is rk<τ. According to fk+1≤Rk+1 and the definition of rk, we have the following:
According to the Taylor's expansion and Remark 3.1, we obtain the following:
Combining (3.8) and (3.9), we have the following:
Combining (3.10) and Assumption 3.2, we obtain the following:
Combining (3.2) and (3.11), we have the following:
From the definition of Mk and ‖gk‖≥ε, and combined with ‖xk+1−xk‖=‖dk‖ and (3.12), we obtain the following:
Therefore, it is proved that ‖xk+1−xk‖≥√(1−τ)εmin{Δk,εMk}M−τm,αk=1.
Assuming Δk≤εMk, there is (3.6) to obtain Δk≥√(1−τ)εΔkM−τm (i.e., Δk≥(1−τ)εM−τm). Combined with Δk≤εMk, there is εMk≥Δk≥(1−τ)εM−τm. This contradicts Lemma 3.6, so we have Δk>εMk.
Case 2. If im>0 (i.e., αk<1), then by combining (3.2), Remark 3.1, Lemma 3.2, and the Step 4 of Algortihm 2.1, according to the Taylor's expansion yield,
Therefore, it is proved that ‖xk+1−xk‖>(1−δ)ελMmin{1,εΔkMk},αk<1.
There is
Suppose there exists Δk≤εMk. Combining (3.7), we have the following:
Combining (3.13), (3.14), and Assumption 3.1, there are εMk≥Δk≥‖xk+1−xk‖>(1−δ)ελM. This contradicts limk→∞min{Δk,εMk}=0. Therefore, Δk>εMk holds.
In summary, the proof is completed. □
Lemma 3.8. If the sequence {xk} generated by the Algorithm 2.1 does not converge (i.e., there exists a positive constant ε such that ‖gk‖≥ε), then Δk>εMk holds for all sufficiently large k.
Proof. (1) If there are only finitely many k such that Δk+1≤Δk (i.e., J is a finite set), then there exists a positive constant Δ∗ such that Δk>Δ∗. According to Lemma 3.6, we know that limk→∞ εMk=0.
(2) We assume that J is an infinite set. It follows from Lemma 3.7 that there exists ˉk∈J, such that Δk>εMk holds when ∀k∈J,k≥ˉk. When k∉J, k≥ˉk, noted as ˜k=max{i|i∈J,i≤k}. We have Δ˜k>εM˜k, and ˜k+s∉J, s=1,2,⋯,k−˜k. We can obtain Δ˜k≤Δ˜k+1≤Δ˜k+2≤⋯≤Δk, so Δk>εM˜k holds. According to the definition of Mk, {Mk} is a monotonically increasing sequence.
The proof is completed. □
Theorem 1. If Assumptions 3.1 and 3.2 hold, and {Bk} satisfies ∞∑k=01Mk=∞, then the sequence {xk} generated by the Algorithm 2.1 satisfies the following:
Proof. Suppose there exists a positive constant ε such that ‖gk‖≥ε. Lemma 3.8 holds, that is Δk>εMk holds for sufficiently large k. By Lemma 3.5, we know that Rk−Rk+1≥12εpkψmin{Δk,εMk}. We have ∞∑k=0(Rk−Rk+1)≥∞∑k=012ε2pkψ1Mk, then ∞∑k=01Mk<∞ contradicts ∞∑k=01Mk=∞. The theorem is proved. □
4.
Numerical experiments
In the following, we will show the effectiveness of our Algorithm 2.1 through some numerical experimental results. The numerical experiments are all implemented by using MATLAB (R2017a) on a PC with a CPU of 2.30 GHz and 8.00 GB RAM. The relevant parameters in Algorithm 2.1 and other comparison algorithms are selected as T0=200, v=100, ε=10−5, Δ0=‖g0‖, λ=0.5, δ=0.5, η1=0.25, η2=0.75, γ1=0.5, γ2=2, β=0.9, and τ=0.25. The matrix Bk is updated by the BFGS formula, i.e.,
In order to explore the influence of the simulated annealing idea, nonmonotone technologies, and different trust region ratios on the algorithm, we perform the following three sets of comparison experiments:
(a) Algorithm 2.1 is compared with the existing trust region algorithm. The NTRM (Nonmonotone Trust Region Method With Line Search) algorithm [13] is a classical nonmonotone trust region algorithm. The SATRBB (Simulated Annealing-based Trust Region Bazilai-Borwein) algorithm [20] is a trust region algorithm combined with simulated annealing.
(b) Algorithm 2.1 is compared with the algorithms that combine different line search techniques (i.e., Algorithm 2.1 without a line search (SATR) and with a monotonic line search (ATR)).
(c) Algorithm 2.1 is compared with algorithms that combine different ratios (i.e., Algorithm 2.1 by using the original trust region ratio (SATR-OR), correcting the numerator portion of the trust region ratio to the form in [13] (SATR-U), and correcting the numerator and denominator portions of the trust region ratio according to Ck in [13] (SATR-UD)).
Remark 4.1. The SATR-UD algorithm is inspired by Remark in [29], and the form of trust region ratio is as follows:
The test functions used and their associated dimensionality are shown in Table 1. We require the maximum number of iterations of the algorithm to be 5000. It is worth noting that the algorithm which uses the idea of simulated annealing is random in practice; therefore the relevant data are the average of the results of 10 tests.
To further investigate the efficiency of the algorithm, we use the performance profiles [9] to evaluate and compare the performance of the solvers. Assume that k denotes the number of iterations required, and t denotes the time required for the algorithm. Moreover, there exist ns solvers on the test set S and np problems on test set P. If the computation running time is used as the performance metric, then we define tp,s as computing time required to solve problem p by the solver s. We define the performance ratio as follows:
There exists a rM such that there is rM≥rs for any p and s. For the overall evaluation of the solver, we define the following:
where ρ is the cumulative distribution function of the performance ratio, which is the probability that the performance ratio rp,s of the solver s is within the best possible ratio factor τ.
The above three sets of experiments were conducted on the efficiency of solving some problems using the number of iterations and the time required for the algorithm as the performance metrics. The results are shown in Figures 1–3. It is not difficult to see from Figures 1 and 2 that Algorithm 2.1 has good properties compared with existing algorithms, and the nonmonotone line search technology proposed in this paper greatly improves the performance of the algorithm. Figure 3 shows that although the SATR-UD algorithm has a slight advantage for some problems. However, compared with Algorithm 2.1, there are still some problems that the SATR-UD algorithm cannot solve. All in all, the algorithm proposed in this paper is the preferred solver for the above problem with a high probability. Algorithm 2.1 solves the problems with a higher accuracy and requires fewer iterations.
5.
Conclusions
In this paper, we proposed a novel stochastic nonmonotone trust region algorithm, which incorporated the simulated annealing for an enhanced performance. This integration of ideas aimed to address some of the challenges faced by traditional optimization methods, especially in terms of avoiding the local minima and achieving a global convergence. At the heart of our algorithm lies the construction of a new sequence, which was governed by the Metropolis criterion. The trust region ratio was modified based on this sequence, and the iterative criterion was adjusted according to the improved ratio. When the current trial step was not accepted, the iteration step size was obtained based on the modified nonmonotone line search. By doing so, we not only reduced the number of solutions required for the trust region subproblem, but also enhanced the algorithm's ability to escape the local minimum and converge to a global optimum. A theoretical analysis showed that our proposed algorithm achieved a global convergence under certain conditions, making it avoid local optimal solutions as much as possible. Furthermore, numerical experiments were conducted to demonstrate the effectiveness of our algorithm in practical applications. These experiments revealed that our algorithm outperformed traditional methods.
Author contributions
Yiting Zhang: Conceptualization, Methodology, Investigation, Software, Validation, Writing-original draft; Chongyang He: Funding acquisition, Investigation, Software; Wanting Yuan: Validation, Writing-review & editing; Mingyuan Cao: Funding acquisition, Methodology, Software, Writing-review & editing. All authors have read and approved the final version of the manuscript for publication.
Acknowledgments
The key project of the natural science foundation joint fund of Jilin province (YDZJ202201ZYTS303, YDZJ202101ZYTS167, 20230508184RC, JJKH20200037KJ); The graduate innovation project of Beihua University (2023003); The science and technology development plan project of Jilin province (20200403193SF).
Conflict of interest
All authors declare no conflicts of interest in this paper.