1.
Introduction
This paper is interested in unconstrained optimization model of the form:
where f is a smooth nonlinear function, such that f:Rn⟶R is continuously differentiable. This topic has several applications in finance, engineering, security, and scientific computing [1,2,3,4,5,6,7]. As a result, reliable and efficient numerical procedures for obtaining the solution of (1.1), such as Newton-type procedures, spectral gradient methods and conjugate gradient (CG) algorithms, have been widely investigated in the literature, see [8,9,10,11,12,13]. Of all these mentioned methods used for the solution of (1.1), the CG algorithms are the most extensively used because of their nice convergence properties in addition to less memory requirements [14]. Given the starting guess, x0∈Rn, the CG algorithm recursively produces its iterative points via
where αk is the step size calculated based on either exact or inexact line search strategies. The vector dk is the CG search direction with the formula
Here, βk is the CG parameter and the gradient gk:=∇f(xk). This parameter measures the efficiency and reliability of various CG methods [8]. Hestenes and Stiefel [9] (HS) proposed one of the essential CG parameters, namely,
in which yk=gk+1−gk. The direction dk of HS satisfies the conjugacy condition dTk+1yk=0, ∀k≥0, irrespective of the line search procedure employed. Dai and Liao [15] (DL) introduced a new CG parameter
which is widely considered as an extension of HS, where t is defined as a nonnegative scalar parameter. The parameter (1.4) satisfies an extended conjugacy condition dTk+1yk=−tgTk+1sk, and it is easy to see that the parameter (1.4) reduces to βHSk for t=0. Some efficient adaptive versions of (1.4) have also been presented by Hager and Zhang in [16] and Dai and Kou in [17]. Andrei [18] highlighted that the appropriate choice for t in DL method remains an open issue in this subject. This inspired Babaie-Kafaki and Ghanbari [19,20] to use the beauty of the eigenvalue and singular values to offer some optimum alternatives for t as
The authors also offered an additional optimal solution for t by minimizing the distance between the search direction of DL method and a three-term CG algorithm presented by Zhang et al. [21], as well as the search direction matrix's Frobenius condition number [22]. Zhang et al. [23] proposed an optimal value for t by clustering all singular values and minimizing the upper bound of the dk matrix's spectral condition number, given as
Furthermore, for information on other DL methods and their variants, we recommend the reader to [24,25,26,27,28,29,30] and the references therein.
In this article, we propose an optimal choice for t in the DL CG parameter based on the well-known Barzilai-Borwein (BB) approach [31]. In Section 2, we provide our choice for t based on the BB technique. Using the recommended choices for parameter t, we explore the global convergence of the DL method and is presented in Section 3. Section 4 presents numerical comparisons and Section 5 present the application of the new algorithm on portfolio selection and image restoration problems. Finally, in Section 6, we give the conclusions.
2.
An optimal choice based on BB approach
This section presents an optimal choice for the DL CG method based on the prominent BB approach. Rewriting the search direction of the DL algorithm, we have
where
Among the excellent scaling parameters used in the spectral residual methods are ones proposed by Barzilai-Borwein [31] given by
Now, our aim is to propose another non-negative optimal choice parameter of the DL algorithm by utilizing the prominent features of the Barzilai-Borwein [31] approach, that is, by considering the minimization problem
where ‖.‖2F denotes the norm of the Frobenius matrix, and
with 0<θmin<θmax<∞.
If we let Dk+1=Hk+1−θkI and also by utilizing the beautiful properties of the Frobenius matrix norm, the minimization problem (2.1) is equivalent to minimizing trace(DTk+1Dk+1). Now, using
and simple algebra, by minimizing
where ψ represents terms independent of t, we derive another optimal choice parameter for the DL CG method as
Notice here that when θk=1, then we have the MDL3 method by Saman and Ghanbari [22]. Moreover, since the study considered the condition from strong Wolfe line search, then, dTkyk≥−(1−σ)dTkgk>0, and in this situation t∗k=θkyTksksTksk>0.
The following algorithm demonstrates the computational process of the proposed method under strong Wolfe conditions.
3.
Convergence analysis
This section will discuss the convergence analysis of the new choice parameter t for the DL method defined by (2.2).
To achieve the convergence of the proposed formula using the new choice parameter t, the assumptions defined below in addition to the Zoutendijk condition would be needed.
Assumption A.
(1) Given a starting point x0∈Rn, the level set Ω={x∈Rn|f(x)≤f(x0)} of f(x) is bounded.
(2)f is a smooth function in some neighborhood N of Ω and the gradient g(x) is Lipschitz continuous on an open convex set N containing Ω, in such a way that a constant L>0 exists and satisfies
For the function f, this assumption implies that there exists a constant γ>0 satisfying
It is important to note here that the sufficient descent condition
for a DL method (1.4) with t=t∗k cannot always be guaranteed, in which case the steepest descent direction dk=−gk is used. The result that follows is very important in the analysis of CG formulas and was presented by Zoutendijk [32].
Lemma 3.1. Let Assumption A hold. Then, for any iteration scheme of the form (1.2) and (1.3), where dk is a descent direction and the step size is computed using the Wolfe strategies,
Proof. From (2.4), if the curvature condition
holds, then, using the Lipschitz condition, we obtain that
This implies that
And from the Armijo condition (2.3), we get that
which on summing over k, and using the fact that f is bounded from below, gives
□
Now, from the strong Wolfe conditions, (3.1) and (3.2), we conclude that there exists a constant ˉα>0 such that
For strong Wolfe line search conditions, we have the following important results for conjugate gradient methods.
Lemma 3.2. [33] Let Assumption A hold. Then, for any iteration scheme of the form (1.2) and (1.3), where dk satisfies the descent condition (3.1), with the step length computed using the SWP strategies (2.3) and (2.4), either
or
Consequently, the following lemma follows.
Lemma 3.3. Let Assumption A hold. Then, for any CG process in the form (1.2) and (1.3), where the step size αk is computed using SWP conditions (2.3) and (2.4), and dk satisfies the descent property (3.1), if
it follows that
Proof. We prove this by contradiction. That is, we assume (3.5) is not true. Then there exists a constant r>0 such that ‖gk‖≥r for all k≥0. From Lemma 3.2, we have that (3.3) holds. Hence
which contradicts (3.4). Thus, (3.5) holds true. □
By Cauchy-Schwarz inequality, we have that
and therefore, the norm of dk generated by (1.4) can be proved to be bounded above for uniformly convex functions. Thus, we have the following theorem.
Theorem 3.1. Suppose Assumption A holds. Let the CG method be defined by (1.2) and (1.3), where βk follows from (1.4) and t=t∗k. If f is uniformly convex on N, that is, there exists a constant μ>0 such that
the descent condition (3.1) holds and αk>0 is computed using the SWP strategies (2.3) and (2.4), then the new formula converges such that
Proof. Suppose that gk≠0 for all k. By the uniformly convex property of f, we have
Using the triangle and Cauchy-Schwarz inequalities, it follows that
As a result, we obtain that
which implies (3.4) holds and hence (3.5) is true, which is equivalent to (3.6) for uniformly convex functions. □
Note that if the parameter βDLBBk is modified as
with t∗k defined by (2.2), and dk satisfies the descent condition (3.1), then, Theorem 3.6 of Dai and Liao [15] ensures the global convergence of the algorithm for general nonlinear functions.
4.
Numerical results
This section demonstrates the numerical efficiency of the proposed DLBB method by comparing its performance with other existing algorithms of the same class under the strong Wolfe conditions. The forty-nine (49) benchmark problems employed for this analysis (see Table 1) are taken from [34]. For each problem, different initial points are used and the dimensions considered range from 2 up to 100,000. The efficiency of the methods are measured based on the following metrics: number of iterations (NOI), Number of function evaluations (NOF), and CPU time. For the comparison, we considered the following methods
● The classical Dai-Liao method in [15].
● The DLHZ method in [16].
● The MDL3 and MDL4 methods in [22].
● The EJHJ and MEJHJ methods in [6] with δ=0.0001 and σ=0.99.
The codes used for this experiment are coded in MATLAB R2019b software and ran on a core i5 Windows 10 PC with 8GB RAM. The stopping criteria is set as ‖gk‖≤10−6 or as number of maximum iterations, i.e., 2,000. If any of these conditions does not hold, we represent that point as failure and denote it by "*". The detailed presentation of the numerical results is presented in Tables 2–6.
To graphically interpret the performance of the obtained numerical results, we employed a performance profile tool introduced by [35]. Based on the performance profile, each algorithm is represented by a curve for comparison purpose. The algorithm with the best performance has its curve lying above all other algorithms implying that it solved more number of the test problems considered in Table 1. The experimental results for all the methods are graphed in Figure 1 (NOI), Figure 2 (NOF), and Figure 3 (CPU time) under strong Wolfe line search (2.3) and (2.4) with parameter values given as δ=0.0001 and σ=0.0002.
The curve from performance results depicted in Figures 1 and 2 show that the DLBB method obviously performed better than the DL, DLHZ, MDL3, MDL4, EJHJ and MEJHJ algorithms based on NOI and NOF. However, in terms of CPU time (see Figure 3), there was a close competition from DL, MDL3 and MDL4 methods, yet, the proposed DLBB method slightly outperforms these methods. On the other hand, it can also be seen that the performance of EJHJ and MEJHJ algorithms moves in a similar pattern. This can be attested to the fact that MEJHJ is an improvement of EJHJ. However, EJHJ, MEJHJ, and DLHZ have the highest number of failure and thus, have the curves of their algorithms lying below other curves. Based on the performance analysis, it is obvious to conclude that the new DLBB method presents the best performance since the DLBB curves are always the top performer for a large number of problems and it generated a higher number of the efficient search directions compare to DL, DLHZ, MDL3, MDL4, EJHJ and MEJHJ algorithms.
5.
Application
This section investigate the performance of the new formula on portfolio selection and image restoration problems.
5.1. Portfolio selection
The analytical process of choosing and distributing a collection of investment assets is called portfolio selection. One of the well-known models for portfolio selection is the Markowitz model. Markowitz [36] proposed the mean-variance model in 1952, which calculates the expected return and risk of the generated portfolio using historical asset prices. However, in this work, we consider only the minimum variance with the simple model
where wi is the weight of each asset, Cij is the covariance of return between asset i and j and N is the total number of assets.
Stock investment is one of the investment products available to support the development of financial strength [3,7]. Stock can be seen as a person or party's capital participation sign in a limited liability company or company. The party has a claim on the income of the company as well as a claim on the company's assets by including the said capital. Therefore, we will consider investment stocks in our portfolio selection [37].
In this application, we use 5 stocks in LQ45 index, namely, Barito Pacific Tbk. (BRPT), Semen Indonesia (Persero) Tbk. (SMGR), Charoen Pokphand Indonesia Tbk. (CPIN), Waskita Karya (Persero) Tbk. (WSKT) and Unilever Indonesia Tbk. (UNVR). The data used as a reference is the closing price from June 1, 2020, to May 31, 2022 which is taken from https://finance.yahoo.com. For the data, we assume that the data follow a normal distribution, so to calculate the mean, variance and covariance using the formulas in the normal distribution.
Tables 7 and 8 present summary statistics of close prices for the five stocks. Table 7 provides the expected return and variance of the five stocks, while Table 8 provides the values of covariance among the stocks. By letting w5=1−w1−w2−w3−w4, where w1,w2,w3,w4 and w5 are proportional to CPIN, WSKT, BRPT, SMGR, and UNVR stocks, respectively, and using the data from Table 8, we obtained the following unconstrained minimization type portfolio selection model as
Now, we test the performance of all methods in solving unconstrained optimization problem defined above. By taking some initial points: P1: (0.1, 0.2, 0.3, 0.4), P2: (0.4, 0.3, 0.2, 0.1), P3: (0.1, 0.1, 0.1, 0.1), P4: (0.5, 0.1, 0.2, 0.2), P5 (0.5, 0.5, 0.5, 0.5), P6: (1, 1, 1, 1), P7: (1.5, 1.5, 1.5, 1.5), P8: (0.1, 0.5, 0.5, 0.1), P9: (0.8, 0.5, 0.3, 0.1) and P10: (0.1, 0.3, 0.5, 0.8), we have the numerical outcome as in Table 9.
Based on Table 9, it is obvious the proposed DLBB method presents the best performance with regards to NOI, NOF and CPU time when compare to DL, DLHZ, MDL3 and MDL4 methods for the above defined problem. By using all methods, we get the values w1=0.4334,w2=0.1362,w3=0.0856,w4=0.0972 and w5=0.2476. Of the total allocated funds from the formed portfolio, UNVR accounted for about 43.34% as indicated by the values of w1,...,w5, while the proportion of SMGR is 13.62%, the BRPT is 8.56%, the WSKT is 9.72% and the CPIN is 24.76%. Furthermore, we have the value of portfolio risk is 2.2397e-04 and the expected return is 0.000824524.
5.2. Image restoration
The Conjugate Gradient (CG) method has recently been extended to solve real-life application problems of image restoration that involve the restoring a degraded image to its original form [1,2,38]. These types of problems often require solving ill-posed inverse problems, where the image has been corrupted by noise and other factors. The CG algorithm is employed to recover the underlying clean image from these corrupted observations with best precision. In this study, the proposed DLBB CG algorithm is be extended to solve the image restoration model
and
where x is the original image with M×N pixel whose index set G is given as
From (5.2), we have ξ denoting the observed noisy image whose adaptive median filter is given as ˉξ. The maximum and minimum of a noisy pixel are defined by smax and smin. Also, i,j∈Q={1,2,⋅,M}×{1,2,⋅,N}, with its neighborhood computed as Tij={(i,j−1),(i,j+1),(i−1,j),(i+1,j)}.
From the above image model (5.2), the potential edge-preserving function ϕα in H(u) is defined as
where α is a constant whose value is chosen as 1.
To evaluate the relative accuracy of the proposed DLBB algorithm in restoring the corrupted images, the study compared its performance with some state-of-the-art algorithms including ALG 4.1 by Dai and Kou [17], CG-Descent by Hager and Zhang [16], EJHJ and MEJHJ by [6]. All comparisons are based on three metrics, being CPU time (CPUT), relative error (RelErr) and peak signal-to-noise ratio (PSNR). The corrupted images considered for restoration include Forest (512×512) and Building (512×512). The performance of each algorithm in restoring the images is presented in Tables 10–12, and Figure 4.
Based on the results presented in Tables 10–12, it is obvious to see that only DLBB and EJHJ algorithms were able to generate descent directions by solving all the problems. This is because the other algorithms including MEJHJ, CG-Descent and ALG 4.1 where unable to generate descent directions when the noise degree of corrupted forest image was increased to 80%. The point of failure for each metric including CPUT, RelErr and PSNR is denoted as ∗∗∗. These results have shown that the proposed DLBB algorithm has been able to improve the correlation in signals and further de-correlates the salt and pepper grey noise with better accuracy compared to the other algorithms used in the comparison which has further demonstrated the efficiency and robustness of our method.
6.
Conclusions
In this paper, we investigated the performance of a novel modification of DL algorithm for solving unconstrained optimization and portfolio selection problems. The success of the proposed algorithm is attributed to the new optimal choice parameter for the modified DL CG method that derived based on the promising Barzilai-Borwein approach. Under some suitable assumptions, we discussed the convergence analysis of the proposed method. Results from computational experiments are discussed to highlight the robustness and efficiency of the new algorithm for unconstrained optimization, portfolio selection, and image restoration problems.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
This research was supported by Ministry of Higher Education (MoHE) of Malaysia through Fundamental Research Grant Scheme (FRGS/1/2021/STG06/UUM/02/4) with S/O code 20123.
Conflict of interest
The authors declare that they have no competing interests with regards to this study.