Processing math: 59%
Research article

The primitive roots and a problem related to the Golomb conjecture

  • Received: 10 March 2020 Accepted: 20 April 2020 Published: 26 April 2020
  • MSC : 11A07, 11D85

  • In this paper, we use elementary methods, properties of Gauss sums and estimates for character sums to study a problem related to primitive roots, and prove the following result. Let p be a large enough odd prime. Then for any two distinct integers a,b{1,2,,p1}, there exist three primitive roots α, β and γ modulo p such that the congruence equations α+γamodp and β+γbmodp hold.

    Citation: Wenpeng Zhang, Tingting Wang. The primitive roots and a problem related to the Golomb conjecture[J]. AIMS Mathematics, 2020, 5(4): 3899-3905. doi: 10.3934/math.2020252

    Related Papers:

    [1] Predrag S. Stanimirović, Branislav Ivanov, Haifeng Ma, Dijana Mosić . A survey of gradient methods for solving nonlinear optimization. Electronic Research Archive, 2020, 28(4): 1573-1624. doi: 10.3934/era.2020115
    [2] Yan Xia, Songhua Wang . Global convergence in a modified RMIL-type conjugate gradient algorithm for nonlinear systems of equations and signal recovery. Electronic Research Archive, 2024, 32(11): 6153-6174. doi: 10.3934/era.2024286
    [3] Gonglin Yuan, Minjie Huang . An efficient modified HS conjugate gradient algorithm in machine learning. Electronic Research Archive, 2024, 32(11): 6175-6199. doi: 10.3934/era.2024287
    [4] Youjun Deng, Hongyu Liu, Xianchao Wang, Dong Wei, Liyan Zhu . Simultaneous recovery of surface heat flux and thickness of a solid structure by ultrasonic measurements. Electronic Research Archive, 2021, 29(5): 3081-3096. doi: 10.3934/era.2021027
    [5] Yue Feng, Yujie Liu, Ruishu Wang, Shangyou Zhang . A conforming discontinuous Galerkin finite element method on rectangular partitions. Electronic Research Archive, 2021, 29(3): 2375-2389. doi: 10.3934/era.2020120
    [6] Jiaxin Zhang, Hoang Tran, Guannan Zhang . Accelerating reinforcement learning with a Directional-Gaussian-Smoothing evolution strategy. Electronic Research Archive, 2021, 29(6): 4119-4135. doi: 10.3934/era.2021075
    [7] Sida Lin, Lixia Meng, Jinlong Yuan, Changzhi Wu, An Li, Chongyang Liu, Jun Xie . Sequential adaptive switching time optimization technique for maximum hands-off control problems. Electronic Research Archive, 2024, 32(4): 2229-2250. doi: 10.3934/era.2024101
    [8] Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg S. Weiss . Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation. Electronic Research Archive, 2023, 31(5): 2519-2554. doi: 10.3934/era.2023128
    [9] Cheng Wang . Convergence analysis of Fourier pseudo-spectral schemes for three-dimensional incompressible Navier-Stokes equations. Electronic Research Archive, 2021, 29(5): 2915-2944. doi: 10.3934/era.2021019
    [10] Yineng Ouyang, Zhaotao Liang, Zhihui Ma, Lei Wang, Zhaohua Gong, Jun Xie, Kuikui Gao . A class of constrained optimal control problems arising in an immunotherapy cancer remission process. Electronic Research Archive, 2024, 32(10): 5868-5888. doi: 10.3934/era.2024271
  • In this paper, we use elementary methods, properties of Gauss sums and estimates for character sums to study a problem related to primitive roots, and prove the following result. Let p be a large enough odd prime. Then for any two distinct integers a,b{1,2,,p1}, there exist three primitive roots α, β and γ modulo p such that the congruence equations α+γamodp and β+γbmodp hold.


    In this survey, we focus on solving the unconstrained nonlinear optimization problem

    minf(x),  xRn, (1.1)

    where the function f:RnR is continuously differentiable and bounded from below. The general iterative rule for solving (1.1) starts from an initial approximation x0Rn and generates a sequence {xk,k0} using the general iterative scheme

    xk+1=xk+αkdk,  k0, (1.2)

    where the step-size αk is a positive real parameter determined after the exact or inexact line search, xk is the last generated iterative point, xk+1 is the current iterative point, and dk is an appropriate search direction. General class of algorithms of the form (1.2) is known as the line search algorithms. These algorithms require only the search direction dkRn and the step-size αkR.

    The following notations will be used, as usual:

    g(x)=f(x), G(x)=2f(x), gk=f(xk), Gk=2f(xk),

    where f(x) denotes the gradient and 2f(x) denotes the Hessian of f. For the sake of simplicity, the notation fk will point to f(xk). Further, xT denotes the transpose of xRn. Taylor's approximation of the function f at the point xk+1=xk+αkdk is defined by

    f(xk+1)f(xk)+αkgTkdk. (1.3)

    Therefore, an appropriate descent search direction dk must be determined on the basis of the descent condition

    gTkdk<0,for allk. (1.4)

    Primary choice for descent direction is dk=gk, which reduces the general line search iterations (1.2) into the gradient descent (GD) iterative scheme

    xk+1=xkαkgk. (1.5)

    In this paper, we survey gradient methods satisfying the descent condition (1.4). If there exists a constant c>0 such that

    gTkdk<cgk2,for allk, (1.6)

    where c is a positive constant independent of k, then it is said that the vector dk satisfies the sufficient descent condition.

    As for the choice of search direction, one of the possible choices for the search direction in unconditional optimization is to move from the current point along the negative gradient in each iteration, which correspond to dk=gk. This choice of search direction leads us to a class of methods known as gradient descent methods. One negative feature of gradient methods is relatively frequent occurrence of, so called, zig-zagging phenomenon, which initiates very slow convergence of GD algorithms to the optimal point, or even divergence [90].

    Advantages and disadvantages of GD methods can be summarized as follows.

    1. GD methods are globally convergent, i.e., converge to a local minimizer regardless of the starting point.

    2. Many optimization methods switch to GD rule when they do not make sufficient progress to the convergence.

    3. The convergence is linear and usually very slow.

    4. Numerically, GD methods are often not convergent.

    Another important direction of the search is the Newton's direction dk=G1kgk, obtained from the second-order Taylor-development, assuming that Hessian Gk is positive-definite. The pure Newton method (without line search) for minimization of a function f:RnR is defined using a quadratic approximation of f(xk+1):

    Φ(d):=f(xk+d)f(xk)+gTkd+12dTGkd. (1.7)

    The solution dk=mind(Φ(d)) is given by

    dk=G1kgk.

    So, the pure Newton method is defined by

    xk+1=xkG1kgk. (1.8)

    The Newton method with line search uses an appropriate step-size αk in (1.8) with the aim to ensure global stability. The resulting iterations are of the form

    xk+1=xkαkG1kgk, (1.9)

    wherein the step-size αk is computed performing a line search.

    The Newton method exhibits three major drawbacks in practical applications.

    1. The descent (and convergence) may not be achieved if the iterations (1.8) are started far away from the local minimizer.

    2. Another drawback is numerically expensive and tedious necessity to compute the second derivative matrix (Hessian) and its inverse in every iteration. Moreover, the second derivatives may be sometimes unavailable.

    3. The main disadvantages of the Newton method are the possibility that the Hessian Gk is not positive definite.

    Due to that, numerous modifications of it were created, which can be globally divided into two large groups: modified Newton's methods and quasi-Newton (QN) methods. The QN methods are aimed to address all the above difficulties of the Newton method. The first drawback is overcome by taking an appropriately defined positive definite matrix Bk that approximates the Hessian Gk or an appropriately defined positive definite matrix Hk that approximates the true Hessian inverse G1k and then performing a line search at each iteration. For a given initial point x0Rn and a symmetric positive definite matrix H0, the search direction in the kth iteration of the quasi-Newton method is defined by dk=Hkgk, where Hk is a symmetric and positive-definite matrix.

    QN methods and modified Newton methods belong to most powerful methods for solving unconditional optimization and applicable in many nonlinear optimization problems. A survey of QN methods for solving nonlinear least-squares problems was considered in [86]. The optimization methods have found a number of applications in fluid mechanics [44], free surface flow and solid body contact [13], finding the optimal trajectory for an aircraft or a robot arm, designing a portfolio of investments, controlling a chemical process, computing the optimal shape of an automobile. See [90] for more details. A modification of the quasi-Newton method in defining the two-phase approximate greatest descent was used in [71]. Several variants of multi-step spectral gradient methods for solving large scale unconstrained optimization problems were proposed in [111]. Usage of an optimization algorithm in artificial neural networks was considered in [75]. Properties of Hessian matrix which appear in distributed gradient-based multi-agent control systems was considered in [117]. A survey of derivative-free optimization methods was given in [127]. An application of unconstrained optimization in solving the risk probability was presented in [76].

    The study of conjugate gradient (CG) methods was started by Hestenes and Stiefel in 1952 in [60], and the development of CG methods for solving large-scale unconstrained optimization problems is still ongoing. After all these years, there is still a need to find a more efficient CG method that will solve unconstrained optimization problems with thousands of variables in the shortest possible time interval as well as with a minimal number of iterations and function evaluations.

    CG methods construct a sequence of approximations xk by the line search rule (1.2), such that the search directions dk are generated by

    dk:=dk:=d(βk,gk,dk1)={g0,k=0,gk+βkdk1,k1, (1.10)

    where βk is the real value which is known as the conjugate gradient update parameter (CGUP). More precisely, the search direction dk of the CG method is defined as a proper linear combination of the gradient descent direction and a positive multiple of the direction used in the previously finished iteration. From (1.10) and (1.2), it clearly follows that the CG methods are defined simply only by the gradient direction gk and by the CGUP βk. Different CG methods arise from proper choices of the scalar βk. According to the common agreement, βMk denotes the parameter βk of the CG method M. It is important to mention that some researchers propose usage of βM+k=max{βMk,0}. So, it is possible to use βM+k instead of βMk and generate corresponding dual method.

    Popularity of CG methods is confirmed by a number of recent surveys and book chapters [42,57,85,87,88]. In addition to this basic information on the chronological development of the CG methods, it is also important to mention its applications. In general, CG methods are important in solving large-scale optimization problems. CG methods iterates are characterized by low memory allocation and strong local and global convergence properties. Based on this fact, these methods become useful in all areas where optimization problems of any kind are involved. The CG methods have wide use in solving systems of equations and image restoration problem [12,16,70,78,128,135,136,140], the linear response eigenvalue problem [74], in regression analysis [108,143]. On that way, CG methods have the influence on the development of an artificial neural networks learning algorithms [46,75]. A unique approach to the ABS type CG methods was proposed in [1]. Application of CG methods in solving very large symmetric positive semi definite linear systems that appear in optimal surface parameterizations are described in [64]. Also, it is possible to mention application in data analysis [110]. A variant of the projected preconditioned conjugate gradient method and its application in solving the linear response eigenvalue problem was investigated in [74].

    Main goals leading current research paper can be highlighted as follows.

    (1) A survey and specific classifications of CG and QN methods for nonlinear unconstrained optimization is presented.

    (2) Convergence properties of CG methods are investigated.

    (3) Specific numerical testings are performed on both the CG and QN methods. Numerical testing on some classes of CG methods and hybrid CG methods as well as on some QN methods is presented. A numerical experiment about the influence of the scalar t in Dai-Liao CG methods is performed and analysed. Also, gradient descent methods defined by appropriate acceleration parameter are tested and compared.

    The overall structure of the paper based on contents of each section is described as follows. Section 1 describes basic notation, introductory notions, preliminaries and motivation. Global algorithms and various line search variants are presented in Section 2. Overview of QN methods and their classification are considered in Section 3. Section 4 gives a specific overview of CG methods. Convergence properties of considered CG methods are investigated in Section 5. According to the presented taxonomy of basic CG methods, properties of CG methods with yTk1gk in the numerator of βk are considered in Subsection 5.1, properties of CG methods involving gk2 in the numerator of βk are given in Subsection 5.2, while the convergence properties of DL methods are presented in Subsection 5.3. Numerical experiments are performed in Section 6. In details, Subsection 6.1 arranges numerical results on QN methods with constant diagonal Hessian approximation, Subsection 6.2 compares numerically basic CG methods involving yTk1gk in the numerator of βk, Subsection 6.3 compares basic CG methods with gk2 in the numerator of βk, while numerical experiments on the hybrid CG methods are presented in Subsection 6.4. Finally, Subsection 6.5 describes numerical experiments on the modified Dai-Liao methods. Concluding remarks are given in Section 7.

    First, we present an algorithm that describes the general scheme of line search methods

    Algorithm 6 Modified Dai-Liao conjugate gradient methods.
    Require: A starting point x0, real numbers 0<ϵ<1, 0<δ<1, μ>1 and t>0.
    1: Set k=0 and compute d0=g0.
    2: If
              gkϵ   and   |fk+1fk|1+|fk|δ
    STOP;
    else perform Step 3.
    3: Determine αk(0,1) using backtracking in Algorithm 2.
    4: Compute xk+1=xk+αkdk.
    5: Calculate gk+1, yk=gk+1gk, sk=xk+1xk.
    6: Calculate βk by (4.9) or (4.10) or (4.11) or (4.12).
    7: Compute dk=gk+βkdk1.
    8: k:=k+1, and go to Step 2.

     | Show Table
    DownLoad: CSV

    To achieve the global convergence of iterative methods, an appropriate step-size αk is required. The most promising at first glance is the exact line search (ELS), which assumes the unidimensional function

    Φ(α):=f(xk+αdk) (2.1)

    and the step-size is defined after the unidimensional optimization of the form

    f(xk+αkdk)=minα>0Φ(α). (2.2)

    The ELS rule may give the most precise minimum. However, ELS is too expensive in practice or even impossible to implement, especially in situations where the iteration is far from the exact solution.

    Applying the iterative procedure (1.2), it is most logical to choose a new point so that the step length αk reduces the value of the goal function:

    Φ(αk)=f(xk+1)<Φ(0)=f(xk). (2.3)

    Methods that in each iterative step require the fulfillment of conditions (2.3), that is, the reduction of the value of the objective function, define iterations that in each step approach the minimum of the given function. The methods conceived in this way belong to the class of methods of monotone line search. Many variants of inexact line search (ILS) rules are proposed and dominant in the nonlinear optimization. The most commonly used ILS techniques are Armijo, Goldstein, Wolfe, Powel, Fletcher and other [4,8,19,50,55,56,109,126]. In most conjugate gradient methods, one of the next ILS procedures methods is used to calculate the step length αk: Wolfe line search developed in [55,56], strong Wolfe line search, or backtracking line search from [4].

    In contrast to the monotonic line search, the non-monotonic line search is also known in the literature, where it is not necessary to reduce the value of the objective function in each iteration [51,52,53]. Although non-monotone techniques do not provide a minimum approach to the function in each iteration, they are very common in practical applications and have very good convergence properties. A number of nonmonotonic linear search methods have been proposed recently (see, for example, [119,120]).

    A backtracking line search scheme based on the Armijo condition, is aimed to determining the maximal value during the moving along a given search vector. It starts with a relatively large step-size estimate and iteratively reduces the step-size value until a decrease in the value of the objective function is observed, according to the local gradient of the goal function. Let β(0,1), φ(0,1) and α>0 be given. Then there exists a smallest nonnegative integer mk satisfying

    f(xk+βmktdk)f(xk)+φβmktgTkdk, t>0. (2.4)

    The procedure for backtracking line search proposed in [4] starts from the initial value α=1 and its output values are defined such that it decreases the goal function. Consequently, Algorithm 2 from [112] is used in numerical experiments as the implementation of the ILS which defines the principal step-size αk.

    Algorithm 6 Modified Dai-Liao conjugate gradient methods.
    Require: A starting point x0, real numbers 0<ϵ<1, 0<δ<1, μ>1 and t>0.
    1: Set k=0 and compute d0=g0.
    2: If
              gkϵ   and   |fk+1fk|1+|fk|δ
    STOP;
    else perform Step 3.
    3: Determine αk(0,1) using backtracking in Algorithm 2.
    4: Compute xk+1=xk+αkdk.
    5: Calculate gk+1, yk=gk+1gk, sk=xk+1xk.
    6: Calculate βk by (4.9) or (4.10) or (4.11) or (4.12).
    7: Compute dk=gk+βkdk1.
    8: k:=k+1, and go to Step 2.

     | Show Table
    DownLoad: CSV

    In order to ensure a sufficient decrease of the objective function, Goldstein rule for ILS requires the following conditions:

    f(xk+αdk)f(xk)+ρtgTkdk (2.5)

    and

    f(xk+αdk)f(xk)+(1ρ)tgTkdk, (2.6)

    where 0<ρ<12 and t>0. Conditions (2.5) and (2.6) define the Goldstein rule for inexact line search.

    Wolfe line search conditions are well-known and these are given by

    f(xk+αkdk)f(xk)+ηαkgTkdk, (2.7)
    g(xk+αkdk)Tdkσ1gTkdk, (2.8)

    where 0<η<σ1<1. In addition, for conjugate gradient methods, the generalized strong Wolfe conditions, which are a conjunction of (2.7) and

    σ2gTkdkg(xk+αkdk)Tdkσ1gTkdk (2.9)

    are often used, where σ1>0. In the case σ1=σ2, the generalized strong Wolfe conditions reduce to the strong Wolfe conditions, which are a conjunction of (2.7) and

    |g(xk+αkdk)Tdk|σ1gTkdk. (2.10)

    The condition (2.7) of the Wolfe conditions is called the Armijo condition, which is often used apart or in the form of its variants.

    The most general iterative rule of QN type with line search is of the form

    xk+1=xkαkHkgk, (3.1)

    such that Hk is an approximation of the inverse Hesiian G1k. Further, it is assumed that Bk is an appropriately generated symmetric positive definite approximation of Gk [118]. The following notations in defining an appropriate updating formula

    sk=xk+1xk,yk=gk+1gk (3.2)

    are typical. The update Bk+1 of Bk is defined using the rule

    Bk+1=Bk+Ek, (3.3)

    where Ek is defined on the basis of the quasi-Newton property (secant condition)

    Bk+1sk=yk. (3.4)

    The quasi-Newton condition for the matrix Hk is given by

    Hk+1yk=sk. (3.5)

    Methods that require the calculation or approximation of the Hessian matrix and its inverse belong to the class of QN methods as well as its numerous modifications. The pure Newton method requires calculation of second derivatives matrix, which is avoided in QN methods. As a consequence, the Newton method is computationally expensive and exhibits slow computation, while QN methods are computationally cheap and of faster computation. On the other hand, the Newton method requires lesser number of iterative steps and generates more precise convergence path than QN methods.

    The Symmetric Rank-One update (SR1) assumes the matrix Hk+1in the form

    Hk+1=Hk+Ek,

    where Ek is assumed to be a symmetric rank-one matrix. Therefore,

    Hk+1=Hk+ukvTk, (3.6)

    where

    uk,vkRn.

    The quasi-Newton condition (3.5) initiates

    Hk+1yk=(Hk+ukvTk)yk=sk,

    that is

    (vTkyk)uk=skHkyk. (3.7)

    The conclusion is that uk must be in the direction skHkyk. Suppose that

    skHkyk0,

    (otherwise Hk would satisfy the quasi-Newton equation) and the vector vk satisfies vTkyk0. Then, on the basis of (3.6) and (3.7), it follows that

    Hk+1=Hk+1vTkyk(skHkyk)vTk. (3.8)

    The condition that the approximation Hk+1 of the Hessian inverse is symmetric requires us to take vk=skHkyk, so it was obtained

    Hk+1=Hk+(skHkyk)(skHkyk)T(skHkyk)Tyk, (3.9)

    which is the Symmetric Rank One (SR1) update.

    For general functions, Conn, Gould and Toint in [20] proved that the sequence of SR1 Hessian approximations converges to the true Hessian provided that the steps are uniformly linearly independent.

    The Hessian update Bk+1 is defined as the solution to the problem

    minBBBk,  s.t. B=BT, Bsk=yk. (3.10)

    The solution to (3.10) is equal to

    BDFPk+1=(IρkyksTk)BDFPk(IρkskyTk)+ρkykyTk,  ϱk=1yTksk.

    The inverse Hessian update can be generated using the Sherman - Morrison - Woodbury.

    Moreover, the DFP update is known as a method of updating, of rank 2, that is, Hk+1 is formed by adding the matrix Hk with two symmetric matrices, each of which is rank 1:

    Hk+1=Hk+aukuTk+bvkvTk,

    where uk,vkRn, and a, b are scalars. From the quasi-Newton condition (3.5) it follows that

    Hkyk+aukuTkyk+bvkvTkyk=sk. (3.11)

    The vectors uk and vk are not unique, but they can obviously be determined in the following way

    uk=sk,vk=Hkyk.

    Now, (3.11) implies

    a=1uTkyk=1sTkyk,b=1vTkyk=1yTkHkyk.

    So we get

    HDFPk+1=Hk+sksTksTkykHkykyTkHkyTkHkyk. (3.12)

    Formula (3.12) was proposed by Davidon and later was developed by Fletcher and Powell, so that it is called DFP update.

    One famous broadly used updating formula is the Broyden-Fletcher-Goldfarb–Shanno (BFGS) rule. The inverse Hessian update Hk+1 is defined as the solution to

    minHHHk,  s.t. H=HT, Hyk=sk. (3.13)

    Certainly, the BFGS update is overtly known as

    HBFGSk+1=(IρkskyTk)HBFGSk(IρkyksTk)+ρksksTk,  ρk=1yTksk=HBFGSkHkyksTk+skyTkHksTkyk+(1+yTkHkyksTkyk)sksTksTkyk, (3.14)

    where sk and yk are defined in (3.2).

    The update BFGS formula for the Hessian matrix can be generated using the Sherman-Morrison-Woodbury formula. A rank-one-modification (or perturbation) M=A+bc of a matrix ACm×n uses two vectors bCm×1 and cCn×1. The Sherman-Morrison formula establishes a relationship between M1 and A1 as follows [45]:

    M1=A1(1+cA1b)1A1bcA1. (3.15)

    As a result, the following update for Bk is obtained:

    BBFGSk+1=BBFGSkBksksTkBksTkBksk+ykyTksTkyk. (3.16)

    The weighted combinations of DFP and BFGS updates give the whole update class, which is known as the Broyden class. This class of update is defined by

    Hϕk+1=(1ϕ)HDFPk+1+ϕHBFGSk+1, (3.17)

    where ϕ is a real parameter. If ϕ[0,1], then (3.17) is called the Broadden convex update class. It is obvious that Broyden's class (3.17) satisfies the quasi-Newton equation (3.4). Also, the expression (3.17) can be rewritten in the following form

    Hϕk+1=HDFPk+1+ϕυkυTk=HBFGSk+1+(ϕ1)υkυTk=Hk+sksTksTkykHkykyTkHkyTkHkyk+ϕυkυTk, (3.18)

    where

    υk=(yTkHkyk)1/2[sksTkykHkykyTkHkyk]. (3.19)

    If we put in (3.18):

    ϕ=0, then we will obtain DFP update (3.12)

    ϕ=1, then we will obtain BFGS update (3.14)

    ϕ=sTkyk(skHkyk)Tyk, then we will obtain SR1 update (3.9).

    The Broyden class of methods can be derived directly from the quasi-Newton equation. Consider the general formula for updating rank 2, which contains sk and Hkyk:

    Hk+1=Hk+asksTk+b(HkyksTk+skyTkHk)+cHkykyTkHk, (3.20)

    where a,b,c are unknown scalars. We obtain

    1=asTkyk+byTkHkyk,0=1+bsTkyk+cyTkHkyk. (3.21)

    Here we have two equations with three unknowns, so we can introduce the replacement

    b=ϕ/sTkyk,

    where ϕ is a parameter. Solving system (3.21) and substituting the obtained result in (3.20), we obtain

    Hϕk+1=Hk+sksTksTkykHkykyTkHkyTkHkyk+ϕυkυTk=HDFPk+1+ϕυkυTk,

    where υk is defined by means of (3.19). Previous expression is identical to (3.18).

    A great effort has been invested to discover QN methods that do not merely possess convergence but it is also better from the BFGS update in the numerical performance. Table 1 shows some of these modifications of the quasi-Newton equations.

    Table 1.  Some modifications of quasi-Newton equations.
    Quasi-Newton Eqs. ˜yk1 Ref.
    Bksk1=˜yk1 ˜yk1=φk1yk1+(1φk1)Bk1sk1 [104]
    Bksk1=˜yk1 ˜yk1=yk1+tk1sk1,tk1106 [72]
    Bksk1=˜yk1 ˜yk1=yk1+2(fk1fk)+(gk+gk1)Tsk1sk12sk1 [123]
    Bksk1=˜yk1 ˜yk1=yk1+max(0,2(fk1fk)+(gk+gk1)Tsk1)sk12sk1 [133]
    Bksk1=˜yk1 ˜yk1=yk1+max(0,6(fk1fk)+3(gk+gk1)Tsk1)sk12sk1 [134]
    Bksk1=˜yk1 ˜yk1=12yk1+(fk1fk)12gTksk1sTk1yk1yk1 [59]

     | Show Table
    DownLoad: CSV

    Also, it is important to state spectral gradient method. Therein the updating of the formula for Bk+1 is done in the following way [111]

    Bk+1=diag(r(i)k) (3.22)

    with

    r(i)k=11+ni=1(ˆs(i)k)2ni=1(ˆs(i)k)(ˆy(i)k)ni=1(ˆs(i)k)4(ˆs(i)k)2, ˆsk=sk711sk1+211sk2,
    ˆyk=yk711yk1+211yk2.

    The general framework of the QN algorithm is given in Algorithm 3.

    Algorithm 6 Modified Dai-Liao conjugate gradient methods.
    Require: A starting point x0, real numbers 0<ϵ<1, 0<δ<1, μ>1 and t>0.
    1: Set k=0 and compute d0=g0.
    2: If
              gkϵ   and   |fk+1fk|1+|fk|δ
    STOP;
    else perform Step 3.
    3: Determine αk(0,1) using backtracking in Algorithm 2.
    4: Compute xk+1=xk+αkdk.
    5: Calculate gk+1, yk=gk+1gk, sk=xk+1xk.
    6: Calculate βk by (4.9) or (4.10) or (4.11) or (4.12).
    7: Compute dk=gk+βkdk1.
    8: k:=k+1, and go to Step 2.

     | Show Table
    DownLoad: CSV

    The general QN iterative rule with line search

    xk+1=xkαkHkgk (3.24)

    assumes that Bk (resp. Hk=B1k) is a positive definite approximation of Gk (resp. of G1k) [118]. The update Bk+1 of Bk is defined on the basis of the quasi-Newton property (3.4) or (3.5).

    According to Brezinski's classification in [15], the structure of updating Bk can be divided into three categories: scalar matrix Bk=λkI, diagonal matrix Bk=diag(λ1,,λn) and an appropriate full matrix Bk. Optimization methods included in this class of iterations are based on simplest approximation of the Hessian and its inverse as

    Bk=γkIGk, γk>0, (3.25)

    where I is a proper n×n identity matrix and and γk>0 is a parameter. Such choice leads to the iterative rule

    xk+1=xkγ1kαkgk. (3.26)

    Usually, the parameter αk is defined using an available ILS, and γk is defined according to the Taylor's development of f(x). The iterations (3.26) are termed as improved gradient descent (IGD) methods in [63].

    Andrei in [4,6] defined iterations

    xk+1=xkθkαkgk. (3.27)

    Usage of random values of θk was proposed in [6]. Later, Andrei in [4] proposed appropriate algorithm for defining θk in (3.27). The iterative rule (3.27) was called in [4] as Accelerated Gradient Descent (AGD):

    xAGDk+1=xAGDkθAGDkαkgAGDk. (3.28)

    A few modifications of the scheme (3.26) were promoted in [94,96,97,112,114]. The iterations defined in [112] are of the form (3.26), in which γkI is the Hessian approximation, where γk=γ(xk,xk1)>0 is the parameter. The SM method from [112] was defined by the iterations

    xSMk+1=xSMkαk(γSMk)1gSMk, (3.29)

    where γSMk>0 is the acceleration parameter defined using the Taylor's development of the objective f at the point xSMk+1, as follows:

    γSMk+1=2γSMkγSMk[f(xSMk+1)f(xSMk)]+αkgSMk2α2kgSMk2. (3.30)

    The Double direction and double step-size accelerated methods, termed as ADSS and ADD, respectively, were originated in [94,96].

    The next iterations are known as Accelerated double step-size (ADSS) iterations [94]:

    xADSSk+1=xADSSk(αk(γADSSk)1+lk)gADSSk, (3.31)

    where αk and lk are step-sizes, derived by two independent backtracking procedures. The TADSS method from [114] is proposed using the assumption αk+lk=1, which gives

    xTADSSk+1=xTADSSkψkgTADSSk,

    where ψk=αk((γTADSSk)11)+1. An application of the TADSS iterations in aviation industry was investigated in [68].

    The particular choice γk=1 transforms the IGD iterations (3.26) into the GD iterative rule (1.5). Further, the IGD iterations (3.26) in the case αk=1 can be viewed as the GD iterations

    xk+1=xkγ1kgk, (3.32)

    where γk becomes the primary step length which should be appropriately defined. Barzilai and Borwein in [11] originated two well known variants of the GD method, known as BB method, with the step length γBBk:=γ1k in (3.32). The step length γBBk in the first case is defined by the vector minimization minγsk1γyk12, which yields

    γBBk=sTk1yk1yTk1yk1. (3.33)

    The symmetric case assumes the minimization γsk1yk12, which produces

    γBBk=sTk1sk1sTk1yk1. (3.34)

    The BB iterations are defined using γBBk as follows:

    xBBk+1=xBBkγBBkgBBk.

    The BB method was improved in a number of research articles, main of which are [22,24,34,35,36,37,106,107,122,137].

    Another member of the IGD iterates is the Scalar Correction (SC) method [84], defined in (3.32) by the rule

    γSCk+1={sTkrkyTkrk,yTkrk>0,skyk,yTkrk0,  rk=skγkyk. (3.35)

    Accordingly, the SC iterations are defined by the relation

    xSCk+1=xSCkγSCkgSCk.

    Relaxed BB method by an additional step θk(0,2) is proposed in [105].

    A modification of GD method (1.5) was proposed in [63]. It is defined by MGD=M(GD) with

    xk+1=M(GD)(xk)=xk(αk+α2kα3k)gk. (3.36)

    Further, the next scheme was proposed as the modified SM (MSM) method in [63]:

    xMSMk+1=xMSMk(αk+α2kα3k)(γMSMk)1gMSMk. (3.37)

    The leading principle used in defining the iterations (3.37) is the replacement of αk in the GD methods (1.5) by the slightly longer step αMSMk=αk+α2kα3k. The underlying idea in defining αMSMk is the observation αMSMk>αk, which means that MSM method proposes a slightly longer step with the aim to additionally accelerate the method. As before, αk(0,1) is defined by Algorithm 2. The rationale of this modification lies in the inequalities

    αkαk+α2kα3kαk+α2k.

    So, (3.37) is based on a small increase of αk inside the interval [αk,αk+α2k].

    The acceleration parameter γk+1 in ADD, ADSS, TADSS and MSM methods are defined, respectively, as:

    γADDk+1=2f(xADDk+1)f(xADDk)αk(gADDk)T(αkdADDk(γADDk)1gADDk)(αkdADDk(γADDk)1gADDk)T(αkdADDk(γADDk)1gADDk),(ADD method[96])γADSSk+1=2f(xADSSk+1)f(xADSSk)+(αk(γADSSk)1+lk)gADSSk2(αk(γADSSk)1+lk)2gADSSk2,(ADSS method[94])γTADSSk+1=2f(xTADSSk+1)f(xTADSSk)+ψkgTADSSk2ψ2kgTADSSk2,ψk=αk((γTADSSk)11)+1,(TADSS method[114])γMSMk+1=2γMSMkγMSMk[f(xMSMk+1)f(xMSMk)]+(αk+α2kα3k)gMSMk2(αk+α2kα3k)2gMSMk2,(MSM method[63]).

    The efficiency of IGD methods was numerically tested in [98].

    The author of [41] proposed two Relaxed Gradient Descent Quasi Newton (RGDQN and RGDQN1) iteration rules

    xk+1=xkξkαkγ1kgk, (3.38)

    such that ξk is a proper real value. The RGDQN iterations are defined with randomly generated ξk(0,1), while the RGDQN1 algorithm exploits

    ξk=γkαkγk+1.

    The following algorithm is known as the SM method and introduced in [112].

    Algorithm 6 Modified Dai-Liao conjugate gradient methods.
    Require: A starting point x0, real numbers 0<ϵ<1, 0<δ<1, μ>1 and t>0.
    1: Set k=0 and compute d0=g0.
    2: If
              gkϵ   and   |fk+1fk|1+|fk|δ
    STOP;
    else perform Step 3.
    3: Determine αk(0,1) using backtracking in Algorithm 2.
    4: Compute xk+1=xk+αkdk.
    5: Calculate gk+1, yk=gk+1gk, sk=xk+1xk.
    6: Calculate βk by (4.9) or (4.10) or (4.11) or (4.12).
    7: Compute dk=gk+βkdk1.
    8: k:=k+1, and go to Step 2.

     | Show Table
    DownLoad: CSV

    An application of the Picard-Mann hybrid iterative process from [66] is another possibility to accelerate iterations for solving nonlinear optimization problems. The function T:CC in (3.40) is defined on a convex subset C of a normed space E. The hybrid iterations define two sequences xk, yk by the rules:

    {x1=xC,xk+1=Tyk,yk=(1Υk)xk+ΥkTxk, kN. (3.39)

    The real number Υk(0,1) from (3.39) is termed as the correction parameter in [97]. Instead of (3.39), it suffices to use an equivalent iteration given by

    xk+1=H(T)(xk)=T[(1Υk)xk+ΥkTxk], kN. (3.40)

    The iterates (3.40) are denoted by H(T,xk)=H(T)(xk).

    In [66] it was proposed a set of constant values α=Υk(0,1),k in numerical experiments and concluded that the process (3.40) converges faster than the Picard, Mann and Ishikawa iterations from [62,83,101]. Further, (3.40) was applied in [97] for a hybridization of the SM method, known as HSM. Using the mapping T in (3.39) or (3.40) to coincide with the SM rule (3.29):

    T(xk):=xSMk(γSMk)1αkgSMk,

    the iterations (3.40) become the so called HSM iterative rule given as

    xHSMk+1:=H(SM)(xk)=xHSMk(Υk+1)(γHSMk)1αkgHSMk, (3.41)

    where γHSMk>0 is the acceleration defined by

    γHSMk+1=2γHSMkγHSMk[f(xHSMk+1)f(xHSMk)]+(Υk+1)tkgHSMk2(Υk+1)2t2kgHSMk2.

    A modified HSM (MHSM) method is defined in [92] by proposing an appropriate initial value in the backtracking procedure.

    A hybridization of the ADD method was considered in [99] in the form

    xHADDk+1=xHADDk(Υk+1)tk(γHADDk)1gHADDk+(Υk+1)t2kdk,

    wherein

    γHADDk+1=2f(xHADDk+1)f(xHADDk)(Υk+1)(gHADDk)T(t2kdktk(γHADDk)1gHADDk)(Υk+1)2t2k(tkdk(γHADDk)1gHADDk)T(tkdk(γHADDk)1gHADDk).

    Recently, the hybridization HTADSSH(TADSS) was proposed, investigated and tested in [95].

    Nonlinear conjugate gradient (CG) methods form a class of important methods for solving unconstrained nonlinear optimization and solving system of nonlinear equations. Nonlinear CG methods are defined by the line search iterates (1.2) where the search direction dk is defined by (1.10) and the CGUP βk is given using one of many available rules.

    In this article, a review on CG methods for unconstrained optimization is given. Main convergence theorems are provided for the conjugate gradient method assuming the descent property of each search direction. Some research issues on conjugate gradient methods are mentioned.

    In [21], the nonlinear CG methods are divided into three classes: early conjugate gradient methods, descent conjugate gradient methods, and sufficient descent conjugate gradient methods. Andrei classified the CG methods in three classes: classical CG methods, scaled CG methods, and hybrid and parameterized CG methods.

    The classification presented in this paper divides CG methods into the following categories: basic conjugate gradients methods, considered in Subsection 4.1, Dai-Liao class of methods, presented in Subsection 4.2, hybrid conjugate gradient methods, described in Subsection 4.3, and combined BFGS-CG iterations, in Subsection 4.4.

    The CG methods included in Table 2 are known as early or classical conjugate gradient methods.

    Table 2.  Some modifications of quasi-Newton equations.
    βk Title Year Reference
    βHSk=yTk1gkyTk1dk1 HestensesStiefel 1952 [60]
    βFRk=gk2gk12 FletcherReeves 1964 [48]
    βDk=gTkGk1dk1dTk1Gk1dk1 1967 [38]
    βPRPk=yTk1gkgk12 PolakRibierePolyak 1969 [102,103]
    βCDk=gk2gTk1dk1 ConjugateDescent 1987 [47]
    βLSk=yTk1gkgTk1dk1 LiuStorey 1991 [79]
    βDYk=gk2yTk1dk1 DaiYuan 1999 [30]

     | Show Table
    DownLoad: CSV

    where yk1=gkgk1, sk1=xkxk1, Gk1=Δ2f(xk1) and stands for the Euclidean vector norm.

    In the listed CG methods, the numerator of the update parameter βk is either gk2 or yTk1gk and the denominator is either gk12 or yTk1dk1 or gTk1dk1. Two possible choices for the numerator and the 3 possible choices for the denominator lead to 6 different choices for βk.

    Table 3.  Classification of CG methods.
    Denominator
    Numerator gk12 yTk1dk1 gTk1dk1
    gk2 FR DY CD
    yTk1gk PRP HS LS

     | Show Table
    DownLoad: CSV

    Define the following functions

    n1:=gk2,n2=yTk1gk,d1:=gk12,d2=yTk1dk1,d3=gTk1dk1.

    Then

    βFRk=n1d1, βPRPk=n2d1, βDYk=n1d2, βHSk=n2d2, βCDk=n1d3, βLSk=n2d3.

    But, there exist exceptions to these rules. One example is given in [38]

    βDk=gTkGk1dk1dTk1Gk1dk1(1967).

    From the presented chronological development of the CGUP, we can see that the βDk choice of the CG parameter differs structurally from the other choices.

    For a large-scale unconstrained nonlinear optimization problem, in practice, choices for updating a CG parameter that do not require computation or approximation of the Hessian and its inverse are preferred over methods that require Hessian or its approximation in each iteration.

    Wei et al. [124] gave a variant of the PRP method which we call the VPRP method, with the parameter βk

    βVPRPk=gk2gk gk1 gTkgk1gk12.

    The VPRP method was extended to a variant of the HS method by Yao et al. in [132],

    βVHSk=gk2gk gk1 gTkgk1dTk1yk1.

    Zhang [138] took a little modification to the VPRP method and constructed the NPRP method as follows,

    βNPRPk=gk2gk gk1 |gTkgk1|gk12.

    Moreover, Zhang [138] extended this result to the HS method and proposed the NHS method as follows,

    βNHSk=gk2gk gk1 |gTkgk1|dTk1yk1.

    Recently, Wei et al. [125] proposed a variation of the FR method which we call the VFR method. the parameter βk in the VFR method is given by

    βVFRk=μ1gk2μ2|gTkdk1|+μ3gk12,

    where μ1(0,+), μ2(μ1+ϵ1,+), μ3(0,+) and ϵ1 is an any given positive constant. Motivated by these modifications, in [29] the authors defined the modified PRP method as

    βDPRPk=gk2gk gk1 |gTkgk1|μ|gTkdk1|+gk12,  μ>1.

    Recently, Wei et al. [18] gave a variant of the PRP method which we call the WYL method, that is,

    βWYLk=gTk(gkgkgk1gk1)gk12.

    The WYL method was extended to a variant of the LS method by Yao et al. [132], that is,

    βMLSk=gTk(gkgkgk1gk1)gTkdk1.

    Also, the following function will be useful:

    N1=gTk(gkgkgk1gk1)=n1gkgk1gTkgk1=gTk^yk1N2=n1gkgk1|gTkgk1|.

    Then

    βWYLk=N1d1,βVPRPk=N1d1,βVHSk=N1d2βNPRPk=N2d1,βNHSk=N2d2.

    Some particular CG variants are βDHSk [29] and βDLSk [139], defined by

    βDHSk=gk2gk gk1 |gTkgk1|μ|gTkdk1|+dTk1yk1 (2012)βDLSk=gk2gk gk1 |gTkgk1|μ|gTkdk1|dTk1gk1,  μ>1 (2017). (4.1)

    If the functions

    D1(μ)=μ|gTkdk1|+dTk1yk1,D2(μ)=μ|gTkdk1|dTk1gk1

    are defined, then

    βDHSk=N2D1(μ),βDLSk=N2D2(μ).

    An extension of the conjugacy condition

    dTkyk1=0 (4.2)

    was studied by Perry [93]. Perry tried to incorporate the second-order information of the objective function into the CG method to accelerate it. Specifically, by using the secant condition and the search direction of the QN methods, which are respectively defined by

    Bksk1=yk1andBkdk=gk, (4.3)

    the following relation is obtained:

    dTkyk1=dTk(Bksk1)=(Bkdk)Tsk1=gTksk1, (4.4)

    where Bk is a symmetric approximation to the Hessian matrix Gk. Then Perry accordingly replaced the conjugacy condition (4.2) by the following condition

    dTkyk1=gTksk1. (4.5)

    Furthermore, Dai and Liao [26] included a nonnegative parameter t into Perry's condition and gave

    dTkyk1=tgTksk1. (4.6)

    In order to find the search direction dk in (1.10) which satisfies the conjugacy condition (4.6), it suffices to multiply (1.10) by yk1 and use (4.6), yielding

    βDLk=gTkyk1dTk1yk1tgTksk1dTk1yk1=gTk(yk1tsk1)dTk1yk1,  t>0. (4.7)

    Expression (4.7) for defining βk characterizes the Dai and Liao (DL) CG method. Later, motivated by the DL method, researchers in papers [18,80,100,129,131,139,141], suggested modified variants of the DL method. Some well-known formulas for βk were created modifying the CG parameter βDLk [18,26,80,100,129,131,139,141]. Main of them are βDLk [26], βDHSDLk [139], βDLSDLk [139], βMHSDLk [131], defined as

    βDLk=yTk1gkyTk1dk1tgTksk1dTk1yk1=βHSktgTksk1dTk1yk1, (2001) (4.8)
    βDHSDLk=gk2gk gk1 |gTkgk1|μ|gTkdk1|+dTk1yk1tgTksk1dTk1yk1=βDHSktgTksk1dTk1yk1, (2017) (4.9)
    βDLSDLk=gk2gk gk1 |gTkgk1|μ|gTkdk1|dTk1gk1tgTksk1dTk1yk1=βDLSktgTksk1dTk1yk1, (2017) (4.10)
    βMHSDLk=gk2gk gk1 gTkgk1dTk1yk1tgTksk1dTk1yk1=gTk^yk1dTk1yk1tgTksk1dTk1yk1, (2013)  (4.11)

    where t>0 is a scalar and ^yk1=gkgkgk1gk1.

    Clearly βDLk with t>0 defines a class of nonlinear CG methods. Moreover, in the case of the exact line search, i.e., gTksk1=0, then βDLk=βHSk.

    Some additional CG methods from the DL class are βMLSDLk [18] and βZZDLk [141], defined as follows:

    βMLSDLk=gTk^yk1dTk1gk1tgTksk1dTk1yk1 (4.12)
    βZZDLk=gTkzk1zTk1dk1tgTksk1dTk1zk1, (4.13)

    where t>0 is a scalar, zk1=yk1+Cgk1rsk1 and ^yk1=gkgkgk1gk1.

    In order to characterize this family of CG methods, define the mapping

    F(βMk,t)=βMktgTksk1dTk1yk1.

    Then

    βDLk=F(βHSk,t),βDHSDLk=F(βDHSk,t),
    βDLSDLk=F(βDLSk,t),βMHSDLk=F(βDHSk,t).

    The research for the best values of the parameter t was divided into two directions. One direction was to find the best fixed value for t and the other the best approximation for t in each iteration. Analyzing the results from [18,26,131,139], we conclude that the scalar t was defined by a fixed value of 0.1 in numerical experiments. Also, numerical experience related to the fixed valued t=1 was reported in [26]. Common numerical experience is that different choices of t initiate totally different numerical experience. This was the reason for further research to focus on the values of t that change through iterations. The value of t in arbitrary kth iteration will be denoted by tk. Hager and Zhang in [55] defined tk by the rule

    tk=2yk12yTk1sk1.

    Babaie-Kafaki and Ghanbari [9] presented two appropriate choices of the parameter t in (4.7):

    tk=sTk1yk1sk12+yk1sk1

    and

    tk=yk1sk1.

    Andrei in [5] suggested the following value for t in (4.7) which becomes a variant of the DL method, denoted by DLE:

    tk=sTk1yk1sk12. (4.14)

    Lotfi and Hosseini in [81] discovered the most recent approximations of the parameter tk.

    In the subsequent sections, we will survey recent advances in CG methods. Two main research streams can be observed: the algorithms which improve the scalar parameter t and the algorithms which improve the CG parameter βk.

    Hybrid CG methods can be segmented into two classes: mixed methods as well as methods combined together by introducing one or more parameters.

    The following hybrid CG method was suggested in [121]:

    βk={βPRPk,if0βPRPkβFRk,βFRk,otherwise. (4.15)

    When a jam in iterations occurs again, the PRP update parameter is used. Hu and Storey in [61] had a similar motivation and suggested the following rule

    βk=max{0,min{βPRPk,βFRk}}. (4.16)

    In [49] it is pointed out that βPRPk can be negative, even for strongly convex functions. In an effort to extend the allowed choices for the PRP update parameter, while retaining global convergence, Nocedal and Gilbert [49] suggested the choice

    βk=max{βFRk,min{βPRPk,βFRk}}. (4.17)

    Dai and Yuan [31] combined the DY method with other CG methods, which leads to the following CGUP parameters:

    βk=max{0,min{βHSk,βDYk}} (4.18)

    and

    βk=max{cβDYk,min{βHSk,βDYk}},  c=1σ1+σ. (4.19)

    In [28], they tested different CG methods for large-scale unconstrained optimization problems and concluded that the hybrid CG method (4.18) gave the best results.

    Next hybrid CG method, proposed by Dai in [23], employs either the DY scheme or the CD scheme:

    βk=gk2max{dTk1yk1,gTk1dk1}. (4.20)

    A modified CG method defined as the hybridization of known LS and CD conjugate gradient methods is presented and analyzed in [130] by the rule

    βLSCDk=max{0,min{βLSk,βCDk}}. (4.21)

    CG methods can be combined together by introducing one or more parameters. In [32,33], Dai and Yuan proposed a one-parameter family of CG methods with

    βk=gk2θkgk12+(1θk)dTk1yk1, (4.22)

    where θk[0,1] is a parameter. Note that βk=βFRk in the case θk=1, and βk=βDYk if θk=0.

    Another hybrid method, proposed by Delladji, Belloufi and Sellami in [39], exploits either the PRP scheme or the HZ scheme, as

    βhPRPHZk=θkβPRPk+(1θk)βHZk=θkyTk1gkgk12+(1θk)1dTk1yk1(yk12dk1yk12dTk1yk1)Tgk, (4.23)

    in which θk[0,1] is called the hybridization parameter. Note that if θk=1 then βhPRPHZk=βPRPk, and if θk=0 then βhPRPHZk=βHZk.

    Nazareth in [89] proposed a two-parameter family of CGUP parameters using convex combinations of numerators and denominators, as

    βk=νkgk2+(1νk)gTkyk1θkgk12+(1θk)dTk1yk1, (4.24)

    where νk,θk[0,1]. This two-parameter family includes FR, DY, PRP, and HS methods as extreme cases.

    In [113], the authors proposed hybrid CG methods where the search direction dk:=dk, k1, is improved using one of the rules

    dk:=D(βk,gk,dk1)=(1+βkgTkdk1gk2)gk+βkdk1, (4.25)
    dk:=D1(βk,gk,dk1)=Bkgk+D(βk,gk,dk1), (4.26)

    and βk is determined using appropriate combinations of βk used in Table 2 and/or previously defined hybridizations. In [113], the authors defined a modification of LSCD method, defined in [130] by

    βLSCDk=max{0,min{βLSk,βCDk}},dk=d(βLSCDk,gk,dk1). (4.27)

    The resulting method is known as the MLSCD method with the search direction

    dk:=D(βLSCDk,gk,dk1). (4.28)

    In general, the idea is based on the replacement of dk=dk(βLSCDk,gk,dk1) from [130] by dk=D(βLSCDk,gk,dk1).

    Now we give the general framework of the CG class of methods.

    Algorithm 6 Modified Dai-Liao conjugate gradient methods.
    Require: A starting point x0, real numbers 0<ϵ<1, 0<δ<1, μ>1 and t>0.
    1: Set k=0 and compute d0=g0.
    2: If
              gkϵ   and   |fk+1fk|1+|fk|δ
    STOP;
    else perform Step 3.
    3: Determine αk(0,1) using backtracking in Algorithm 2.
    4: Compute xk+1=xk+αkdk.
    5: Calculate gk+1, yk=gk+1gk, sk=xk+1xk.
    6: Calculate βk by (4.9) or (4.10) or (4.11) or (4.12).
    7: Compute dk=gk+βkdk1.
    8: k:=k+1, and go to Step 2.

     | Show Table
    DownLoad: CSV

    The first iteration in CG methods is a gradient step. Also, it is common to restart the algorithm periodically by taking the gradient step.

    Known fact is that CG iterates are better than QN methods in terms of the CPU time. Moreover, BFGS updates require greater memory space usage than CG. On the other hand, the QN methods require lesser number of iterations as well as the number of function evaluations. For this purpose, one of the modern trends in defining new CG methods is usage of the BFGS update in defining new rules for defining βk. A hybrid method which solves the system of nonlinear equations combining the QN method with chaos optimization was discovered in [82]. In [58], the authors defined a combination of a QN and the Cauchy descent method for solving unconstrained optimization problems, which is known as the quasi-Newton SD method. A hybrid direction defined as a combination of the BFGS update of Bk and the CGUP βk was considered in [10,67]. The DFP-CG method was originated in [91]. A three-term hybrid BFGS-CG method (termed as H-BFGS-CG1) was proposed in [113] by the search direction

    dk:={Bkgk,k=0,D1(βLSCDk,gk,dk1),k1.

    In [113], the authors investigated hybrid CG algorithms based on the modified search direction which is defined using one of the following two hybridizations:

    dk:=D(βk,gk,dk1)=(1+βkgTkdk1gk2)gk+βkdk1, (4.29)
    dk:=D1(βk,gk,dk1)=Bkgk+D(βk,gk,dk1), (4.30)

    as well as on the usage of βk defined using convenient combinations of the parameters βk involved in Table 2 and previously defined hybridizations. The matrix Bk in (4.26) is defined as an appropriate Hessian approximation by the BFGS update. A three-term BFGS-CG method, known as H-BFGS-CG1, was defined in [113] using

    dk:={Bkgk,k=0,D1(βLSCDk,gk,dk1),k1.

    Below we give some assumptions related to line search procedures.

    Assumption 5.1. (1) The level set S={xRn| f(x)f(x0)} is bounded, where x0 is an initial point of the iterative method (1.2).

    (2) The objective function f is continuous and differentiable in a neighborhood N of S, and its gradient g is Lipschitz continuous. So, there exists a positive constant L>0, satisfying

    g(u)g(v)Luv,u,vN. (5.1)

    Assumption 5.1 initiates the existence of a positive constants D and γ satisfying

    uvD,u,vN (5.2)

    and

    g(u)γ,uN. (5.3)

    The proof of Lemma 5.1 is given in [142] and known as the Zoutendijk condition.

    Lemma 5.1. [17,142] Let Assumption 5.1 be accomplished and the points {xk} be generated by the method (1.2) and (1.10). Then it holds

    k=0gk4dk2<+. (5.4)

    In this subsection, we recall properties of the HS, PRP and LS methods. If we look at the chronological development presented in Table 2, a clear observation is that HS, PRP and LS methods involve the expression yTk1gk in the numerator of the parameter βk. We first mention the Property (*) for βk given by Gilbert and Nocedal [49]. The Property (*) implies that βk is bounded and small when the step sk1 is small.

    Property (*) [49] Let a method defined by (1.2) and (1.10) satisfies

    0<γgkˉγ, (5.5)

    for all k0. Under this assumption we say that the method possesses the Property (*) if there exist constants b>1 and λ>0 such that for all k:

    |βk|b, (5.6)

    and

    sk1λ|βk|12b. (5.7)

    In order to prove that conjugate gradient methods have Property (*), it suffices to show that there exists a constant c>0 such that

    |βk|csk1for allk, (5.8)

    under the assumption (5.5). Then, by putting λ=12bc, we have |βk|max{1,2bc}b and

    sk1λ|βk|12b, (5.9)

    which confirms the Property (*). It is easily shown that (5.8) holds for the HS, PRP and LS methods, and thus these methods have Property (*).

    Next, we give the global convergence theorem of CG methods satisfying Property (*). The proof of Theorem 5.2 is given in [49].

    Theorem 5.2. Consider any conjugate gradient method (1.2), (1.10) that satisfies the following conditions:

    (a) βk0.

    (b) The search directions satisfy the sufficient descent condition (1.6)

    (c) The Zoutendijk condition holds.

    (d) The Property (*) holds.

    If the Lipschitz and Boundedness Assumptions hold, then the iterates are globally convergent.

    In this section, we recall properties of the FR, CD and DY methods. If we look at the chronological development presented in Table 2, it is observable that FR, CD and DY methods involve the value gk2 in the numerator of the parameter βk. If the step-size αk satisfies the generalized strong Wolfe conditions (2.7) and (2.9), the following properties are obtained.

    Proposition 1. The following statements hold:

    (a) For the FR method, if αk satisfies the generalized strong Wolfe conditions (2.7) and (2.9) with σ1+σ2<1, then

    11σ1gTkdkgk21+σ21σ1. (5.10)

    (b) For the DY method, if αk satisfies the generalized strong Wolfe conditions (2.7) and (2.9), then

    11σ1gTkdkgk211+σ2. (5.11)

    (c) For the CD method, if αk satisfies the generalized strong Wolfe conditions (2.7) and (2.9) with σ2<1, then

    1σ1gTkdkgk21+σ2. (5.12)

    Proposition 1 implies that the FR, CD and DY methods satisfy the sufficient descent condition (1.6), dependent on line searches.

    We now give the global convergence properties of the FR and DY methods, which were proven in [2] and [30], respectively.

    Theorem 5.3. Suppose that Assumption 5.1 holds. Let the sequence {xk} be generated by the conjugate gradient method of the form (1.2)-(1.10).

    (a) If βk=βFRk and αk satisfies the generalized strong Wolfe conditions (2.7) and (2.9) with σ1+σ2<1, then {xk} converges globally with the limit

    lim infkgk=0. (5.13)

    (b) If βk=βDYk and αk satisfies the Wolfe conditions (2.7) and (2.8), then {dk} satisfies the descent condition (1.4) and {xk} converges globally in the sense that (5.13) holds.

    Methods surveyed in Subsection 5.2 exhibit strong convergence properties, but they may not be efficient in practice due to appearance of jamming. On the other hand however, methods given in Subsection 5.1 may not be convergent in general, they often perform better than the methods restated in Subsection 5.2. Details about this fact are given in [65]. This clearly implies that combinations of these methods have been proposed with the aim of exploiting attractive features of each family of methods.

    The following assumptions will be commonly used in the subsequent convergence analysis of DHSDL, DLSDL, MHSDL and MLSDL methods.

    It is supposed that the conditions in Assumption 5.1 hold. Assumption 5.1 initiates the existence of positive constants D and γ satisfying (5.2) and (5.3).

    By the uniform convexity of f, there exists a constant θ>0 such that

    (g(u)g(v))T(uv)θuv2,for allu,vS, (5.14)

    or equivalently,

    f(u)f(v)+g(v)T(uv)+θ2uv2,for allu,vS. (5.15)

    It follows from (5.14) and (5.15) that

    sTk1yk1θsk12 (5.16)

    and

    fk1fkgTksk1+θ2sk12. (5.17)

    By (5.1) and (5.16), we have

    θsk12sTk1yk1Lsk12, (5.18)

    where the inequalities imply θL.

    Further, (5.18) implies

    sTk1yk1=αk1dTk1yk1>0. (5.19)

    From αk1>0 and (5.19), it follows that

    dTk1yk1>0. (5.20)

    In order to improve presentation, an arbitrary method defined by (1.2) and (1.10) will be denoted by M(αk,βk). In our current investigation, it will be assumed βk{βDHSDLk,βDLSDLk,βMHSDLk,βMLSDLk}. It is assumed that αk satisfies the backtracking condition. Further, we will use the notation βk{βDHSk,βDLSk,βMHSk,βMLSk}. Clearly,

    βk=βktgTksk1dTk1yk1. (5.21)

    Lemma 5.4. [17,142] Let Assumption 5.1 be accomplished and the points {xk} be generated by the method M(αk,βk). Then (5.4) holds

    Lemma 5.5. The parameters βk in M(αk,βk) satisfy

    0βkgk2λ|gTkdk1|, λ1 (5.22)

    in each iterative step k.

    Proof. In the case βk{βDHSk,βDLSk} the inequalities

    0βDHSk,  βDLSkgk2μ|gTkdk1|

    are known from [29].

    For βkβMHSk, it is possible to verify

    0βMHSk=gk2gk gk1|gTkgk1|dTk1yk1gk2|gTkdk1|.

    For βkβMLSk, we have

    0βMLSk=gk2gk gk1|gTkgk1|dTk1gk1gk2|gTkdk1|.

    Since μ>1 the proof is completed.

    Lemma 5.6. The iterations M(αk,βk) satisfy

    gTkdkcgk2 (5.23)

    for some 0c1, all k0 and arbitrary βk.

    Proof. The inequality (5.23) will be verified by induction. In the initial situation k=0, one obtains gT0d0=g02. Since c1, obviously (5.23) is satisfied in the basic case. Suppose that (5.23) is valid for some k1. By taking the inner product of the left and right hand side in (1.10) with the vector gTk, it can be obtained

    gTkdk=gk2+βkgTkdk1. (5.24)

    An application of (5.21) and sk1=αk1dk1 leads to further conclusions:

    gTkdk=gk2+(βktgTksk1dTk1yk1)gTkdk1=gk2+βkgTkdk1tαk1gTkdk1dTk1yk1gTkdk1=gk2+βkgTkdk1tαk1(gTkdk1)2dTk1yk1.

    From (5.20), t>0 and αk1>0, one obtains

    tαk1(gTkdk1)2dTk1yk1>0. (5.25)

    Now (5.25) in conjunction with (5.22) implies

    gTkdkgk2+βkgTkdk1gk2+gk2λ|gTkdk1||gTkdk1|gk2+gk2λ=(11λ)gk2.

    In view of λ1, the inequality (5.23) is satisfied for c=(11λ) and arbitrary k0.

    Lemma 5.7. The parameter βk{βDHSDLk,βDLSDLk,βMHSDLk} satisfies

    βkgk2gkgk1|gTkgk1|tgTksk1dTk1yk1. (5.26)

    Proof. For βkβDHSDLk, in view of (5.20), it follows that

    βDHSDLk=gk2gkgk1|gTkgk1|μ|gTkdk1|+dTk1yk1tgTksk1dTk1yk1gk2gkgk1|gTkgk1|dTk1yk1tgTksk1dTk1yk1=gk2gkgk1|gTkgk1|tgTksk1dTk1yk1. (5.27)

    As μ>1 we conclude

    dTk1yk1=dTk1(gkgk1)=dTk1gkdTk1gk1|dTk1gk|dTk1gk1μ|dTk1gk|dTk1gk1. (5.28)

    Using inequalities (5.20) and (5.28) in βkβDLSDLk one obtains

    βDLSDLk=gk2gk gk1|gTkgk1|μ|gTkdk1|dTk1gk1tgTksk1dTk1yk1gk2gk gk1|gTkgk1|dTk1yk1tgTksk1dTk1yk1=gk2gkgk1|gTkgk1|tgTksk1dTk1yk1. (5.29)

    The following conclusion is valid in the case βkβMHSDLk:

    βMHSDLk=gk2gk gk1|gTkgk1|dTk1yk1tgTksk1dTk1yk1=gk2gkgk1|gTkgk1|tgTksk1dTk1yk1. (5.30)

    Therefore, the inequality (5.26) follows from (5.27), (5.29) and (5.30).

    The global convergence of the proposed methods is confirmed by Theorem 5.8.

    Theorem 5.8. Assume that Assumption 5.1 is true and f is a uniformly convex function. Then the sequence {xk} generated by the M(αk,βk) method fulfils

    lim infkgk=0. (5.31)

    Proof. Assume that (5.31) is not true. This implies the existence of a constant c1>0 such that

    gkc1,for allk. (5.32)

    Squaring both sides of (1.10) implies

    dk2=gk22βkgTkdk1+(βk)2dk12. (5.33)

    Taking into account (5.22), one can obtain

    2βkgTkdk1=2(βktgTksk1dTk1yk1)gTkdk1=2(βkgTkdk1tαk(gTkdk1)2dTk1yk1). (5.34)

    Now from (5.25), with respect to tαk1(gTkdk1)2dTk1yk1>0, the following inequalities hold:

    2βkgTkdk12|βk||gTkdk1|2gk2λ|gTkdk1||gTkdk1|2gk2λ. (5.35)

    Case 1. In the cases βk{βDHSDLk,βDLSDLk,βMHSDLk} from Lemma 5.7, we conclude

    βkgk2gkgk1|gTkgk1|tgTksk1dTk1yk1|gTkgkgkgk1|gTkgk1|tgTksk1dTk1yk1||gTk(gkgkgk1gk1tsk1)|θαk1dk12=|gTk(gkgk1+gk1gkgk1gk1tsk1)|θαk1dk12gk(gkgk1+gk1(1gkgk1)+tsk1)θαk1dk12gk(gkgk1+|1gkgk1|gk1+tsk1)θαk1dk12gk(gkgk1+|gk1gk|+tsk1)θαk1dk12gk(gkgk1+gk1gk+tsk1)θαk1dk12 (5.36)

    So,

    \begin{equation} \begin{aligned} \beta_k &\leq \frac{\left\|\mathbf{g}_k\right\|\left(2\left\|\mathbf{g}_k-\mathbf{g}_{k-1}\right\| + t \left\|\mathbf{s}_{k-1}\right\|\right)}{\theta\alpha_{k-1}{\left\|\mathbf{d}_{k-1}\right\|}^2}\\ &\leq \frac{\left\|\mathbf{g}_k\right\|\left(2L\left\|\mathbf{s}_{k-1}\right\| + t \left\|\mathbf{s}_{k-1}\right\|\right)}{\theta\alpha_{k-1}{\left\|\mathbf{d}_{k-1}\right\|}^2}\\ &\leq \frac{\left(2L + t\right) \left\|\mathbf{g}_k\right\|\left\|\mathbf{s}_{k-1}\right\|}{\theta\alpha_{k-1}{\left\|\mathbf{d}_{k-1}\right\|}^2} \end{aligned} \end{equation} (5.37)
    \begin{equation*} \begin{aligned}& = \frac{\left(2L + t\right) \left\|\mathbf{g}_k\right\| \alpha_{k-1}\left\|\mathbf{d}_{k-1}\right\|}{\theta\alpha_{k-1}{\left\|\mathbf{d}_{k-1}\right\|}^2}\\ & = \frac{\left(2L + t\right) \left\|\mathbf{g}_k\right\|}{\theta\left\|\mathbf{d}_{k-1}\right\|}. \end{aligned} \end{equation*}

    Using (5.35) and (5.37) in (5.33), we obtain

    \begin{equation} \begin{aligned} \|\mathbf{d}_k\|^2 &\!\leq\! \|\mathbf{g}_k\|^2 + 2 \frac{\|\mathbf{g}_k\|^2}{\lambda}+ \frac{\left(2L + t\right)^2 {\left\|\mathbf{g}_k\right\|}^2}{\theta^2{\left\|\mathbf{d}_{k-1}\right\|}^2} \|\mathbf{d}_{k-1}\|^2\\ &\!\leq\! \|\mathbf{g}_k\|^2 + 2 \frac{\|\mathbf{g}_k\|^2}{\lambda}+ \frac{\left(2L + t\right)^2}{\theta^2}{\left\|\mathbf{g}_k\right\|}^2\\ &\!\leq\! \left(1 + \frac{2}{\lambda}+ \frac{\left(2L + t\right)^2}{\theta^2}\right){\left\|\mathbf{g}_k\right\|}^2\\ &\!\leq\! \left(\frac{\lambda+2}{\lambda}+ \frac{\left(2L + t\right)^2}{\theta^2}\right){\left\|\mathbf{g}_k\right\|}^2\\ &\!\leq\! \frac{(\lambda+2)\theta^2 + \lambda\left(2L + t\right)^2}{\lambda\theta^2}{\left\|\mathbf{g}_k\right\|}^2. \end{aligned} \end{equation} (5.38)

    Next, dividing both sides of (5.38) by {\left\|\mathbf{g}_k\right\|}^4 and using (5.32), it can be concluded

    \begin{equation} \begin{aligned} \frac{\|\mathbf{d}_k\|^2}{\|\mathbf{g}_k\|^4} &\!\leq\! \frac{(\lambda+2)\theta^2 + \lambda\left(2L + t\right)^2}{\lambda\theta^2} \cdot \frac{1}{c_1^2}\\ \frac{\|\mathbf{g}_k\|^4}{\|\mathbf{d}_k\|^2} &\!\geq\!\frac{\lambda\theta^2\cdot c_1^2}{(\lambda+2)\theta^2 + \lambda\left(2L + t\right)^2} . \end{aligned} \end{equation} (5.39)

    The inequalities in (5.39) imply

    \begin{equation} \sum\limits_{k = 0}^{\infty}\frac{\|\mathbf{g}_k\|^4}{\|\mathbf{d}_k\|^2}\geq \sum\limits_{k = 0}^{\infty}\frac{\lambda\theta^2\cdot c_1^2}{(\lambda+2)\theta^2 + \lambda\left(2L + t\right)^2}\! = \!\infty. \end{equation} (5.40)

    Therefore, \|\mathbf{g}_k\|\geq c_1 causes a contradiction to (5.4). Consequently, (5.31) is confirmed for Case 1.

    Case 2. In the cases \beta_k\equiv {\beta}^{\mathrm{MLSDL}}_k , applying Lemma 5.6 and Assumption 5.1, we have

    \begin{equation*} \begin{aligned} {\beta}^{\mathrm{MLSDL}}_k &\leq \left|\frac{{\left\|\mathbf{g}_k\right\|}^2-\frac{\left\|\mathbf{g}_k\right\|}{\left\|\mathbf{g}_{k-1}\right\|}\left|\mathbf{g}_k^ \mathrm{T} \mathbf{g}_{k-1}\right|}{-\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{g}_{k-1}}-t\frac{\mathbf{g}_k^ \mathrm{T} \mathbf{s}_{k-1}}{\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{y}_{k-1}}\right|\\ &\leq \left|\frac{\mathbf{g}_k^T \mathbf{g}_k-\frac{\left\|\mathbf{g}_k\right\|}{\left\|\mathbf{g}_{k-1}\right\|}\left|\mathbf{g}_k^ \mathrm{T} \mathbf{g}_{k-1}\right|}{-\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{g}_{k-1}}\right|+t \frac{\left|\mathbf{g}_k^ \mathrm{T} \mathbf{s}_{k-1}\right|}{\left|\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{y}_{k-1}\right|}\\ &\leq \frac{\left|\mathbf{g}_k^T \left(\mathbf{g}_k-\frac{\left\|\mathbf{g}_k\right\|}{\left\|\mathbf{g}_{k-1}\right\|}\mathbf{g}_{k-1}\right)\right|}{\left|-\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{g}_{k-1}\right|}+t \frac{\left|\mathbf{g}_k^ \mathrm{T} \mathbf{s}_{k-1}\right|}{\left|\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{y}_{k-1}\right|}\\ & = \frac{\left|\mathbf{g}_k^T \left(\mathbf{g}_k-\mathbf{g}_{k-1}+\mathbf{g}_{k-1}-\frac{\left\|\mathbf{g}_k\right\|}{\left\|\mathbf{g}_{k-1}\right\|}\mathbf{g}_{k-1}\right)\right|}{\left|-\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{g}_{k-1}\right|}+t \frac{\left|\mathbf{g}_k^ \mathrm{T} \mathbf{s}_{k-1}\right|}{\left|\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{y}_{k-1}\right|} \end{aligned} \end{equation*}
    \begin{equation} \begin{aligned}&\leq \frac{\left\|\mathbf{g}_k\right\| \left(\left\|\mathbf{g}_k-\mathbf{g}_{k-1}\right\|+ \left\|\mathbf{g}_{k-1}-\frac{\left\|\mathbf{g}_k\right\|}{\left\|\mathbf{g}_{k-1}\right\|}\mathbf{g}_{k-1}\right\|\right)}{\left|-\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{g}_{k-1}\right|}+t \frac{\left\|\mathbf{g}_k\right\| \left\|\mathbf{s}_{k-1}\right\|}{\left|\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{y}_{k-1}\right|}\\ &\leq \frac{\left\|\mathbf{g}_k\right\| \left(\left\|\mathbf{g}_k-\mathbf{g}_{k-1}\right\|+ \left\|\mathbf{g}_{k-1}(1-\frac{\left\|\mathbf{g}_k\right\|}{\left\|\mathbf{g}_{k-1}\right\|})\right\|\right)}{\left|-\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{g}_{k-1}\right|}+t \frac{\left\|\mathbf{g}_k\right\| \left\|\mathbf{s}_{k-1}\right\|}{\left|\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{y}_{k-1}\right|}\\ &\leq \frac{\left\|\mathbf{g}_k\right\| \left(\left\|\mathbf{g}_k-\mathbf{g}_{k-1}\right\|+ \left|1-\frac{\left\|\mathbf{g}_k\right\|}{\left\|\mathbf{g}_{k-1}\right\|}\right|\left\|\mathbf{g}_{k-1}\right\|\right)}{\left|-\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{g}_{k-1}\right|}+t \frac{\left\|\mathbf{g}_k\right\| \left\|\mathbf{s}_{k-1}\right\|}{\left|\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{y}_{k-1}\right|}\\ &\leq \frac{\left\|\mathbf{g}_k\right\| \left(\left\|\mathbf{g}_k-\mathbf{g}_{k-1}\right\|+ \left|\left\|\mathbf{g}_{k-1}\right\|-\left\|\mathbf{g}_k\right\|\right|\right)}{\left|-\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{g}_{k-1}\right|}+t \frac{\left\|\mathbf{g}_k\right\| \left\|\mathbf{s}_{k-1}\right\|}{\left|\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{y}_{k-1}\right|}\\ &\leq \frac{\left\|\mathbf{g}_k\right\| \left(\left\|\mathbf{g}_k-\mathbf{g}_{k-1}\right\|+ \left\|\mathbf{g}_k - \mathbf{g}_{k-1}\right\|\right)}{\left|-\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{g}_{k-1}\right|}+t \frac{\left\|\mathbf{g}_k\right\| \left\|\mathbf{s}_{k-1}\right\|}{\left|\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{y}_{k-1}\right|}\\ &\leq \frac{2\cdot\left\|\mathbf{g}_k\right\| \left\|\mathbf{g}_k-\mathbf{g}_{k-1}\right\|}{c{\left\|\mathbf{g}_{k-1}\right\|}^2}+t \frac{\left\|\mathbf{g}_k\right\| \left\|\mathbf{s}_{k-1}\right\|}{\left|\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{y}_{k-1}\right|}\\ &\leq \frac{2L\cdot\left\|\mathbf{g}_k\right\| \left\|\mathbf{s}_{k-1}\right\|}{c{\left\|\mathbf{g}_{k-1}\right\|}^2}+t \frac{\left\|\mathbf{g}_k\right\| \left\|\mathbf{s}_{k-1}\right\|}{\left|\mathbf{d}_{k-1}^ \mathrm{T} \mathbf{y}_{k-1}\right|}. \end{aligned} \end{equation} (5.41)

    From (5.18), (5.19) and (5.41), we conclude

    \begin{equation} \begin{aligned} {\beta}^{\mathrm{MLSDL}}_k &\leq \frac{2L\cdot\left\|\mathbf{g}_k\right\| \left\|\mathbf{s}_{k-1}\right\|}{c{\left\|\mathbf{g}_{k-1}\right\|}^2}+t \frac{\left\|\mathbf{g}_k\right\| \left\|\mathbf{s}_{k-1}\right\|}{\theta{\left\|\mathbf{s}_{k-1}\right\|}^2}\\ & = \frac{2L\cdot\left\|\mathbf{g}_k\right\| \left\|\mathbf{s}_{k-1}\right\|}{c{\left\|\mathbf{g}_{k-1}\right\|}^2}+ \frac{t\left\|\mathbf{g}_k\right\|}{\theta\left\|\mathbf{s}_{k-1}\right\|}\\ &\leq \frac{2L\cdot\left\|\mathbf{g}_k\right\| \left\|\mathbf{s}_{k-1}\right\|}{c\cdot c_1^2}+ \frac{t\left\|\mathbf{g}_k\right\|}{\theta\left\|\mathbf{s}_{k-1}\right\|}\\ &\leq \frac{2L\theta\cdot\left\|\mathbf{g}_k\right\| {\left\|\mathbf{s}_{k-1}\right\|}^2+ c\cdot c_1^2\cdot t\left\|\mathbf{g}_k\right\|}{c\cdot c_1^2 \cdot\theta\left\|\mathbf{s}_{k-1}\right\|}\\ &\leq \frac{\left(2L\theta\cdot {D}^2+ c\cdot c_1^2\cdot t\right)\left\|\mathbf{g}_k\right\|}{c\cdot c_1^2 \cdot\theta\cdot\alpha_{k-1}\left\|\mathbf{d}_{k-1}\right\|}. \end{aligned} \end{equation} (5.42)

    Replacement of (5.35) and (5.42) in (5.33) leads to

    \begin{equation} \begin{aligned} \|\mathbf{d}_k\|^2 &\!\leq\! \|\mathbf{g}_k\|^2 + 2 \frac{\|\mathbf{g}_k\|^2}{\lambda}+ \frac{\left(2L\theta\cdot {D}^2+ c\cdot c_1^2\cdot t\right)^2{\left\|\mathbf{g}_k\right\|}^2}{\left(c\cdot c_1^2 \cdot\theta\cdot\alpha_{k-1}\right)^2{\left\|\mathbf{d}_{k-1}\right\|}^2} \|\mathbf{d}_{k-1}\|^2\\ &\! = \! \|\mathbf{g}_k\|^2 + 2 \frac{\|\mathbf{g}_k\|^2}{\lambda}+ \frac{\left(2L\theta\cdot {D}^2+ c\cdot c_1^2\cdot t\right)^2}{\left(c\cdot c_1^2 \cdot\theta\cdot\alpha_{k-1}\right)^2}{\left\|\mathbf{g}_k\right\|}^2\\ &\! = \! \left(1 + \frac{2}{\lambda}+ \frac{\left(2L\theta\cdot {D}^2+ c\cdot c_1^2\cdot t\right)^2}{\left(c\cdot c_1^2 \cdot\theta\cdot\alpha_{k-1}\right)^2}\right){\left\|\mathbf{g}_k\right\|}^2\\ &\! = \! \left(\frac{\lambda+2}{\lambda}+ \frac{\left(2L\theta\cdot {D}^2+ c\cdot c_1^2\cdot t\right)^2}{\left(c\cdot c_1^2 \cdot\theta\cdot\alpha_{k-1}\right)^2}\right){\left\|\mathbf{g}_k\right\|}^2\\ &\! = \! \frac{(\lambda+2)\left(c\cdot c_1^2 \cdot\theta\cdot\alpha_{k-1}\right)^2 + \lambda\left(2L\theta\cdot {D}^2+ c\cdot c_1^2\cdot t\right)^2}{\lambda\left(c\cdot c_1^2 \cdot\theta\cdot\alpha_{k-1}\right)^2}{\left\|\mathbf{g}_k\right\|}^2. \end{aligned} \end{equation} (5.43)

    Next, dividing both sides of (5.43) by {\left\|\mathbf{g}_k\right\|}^4 and using (5.32), it can be concluded

    \begin{equation} \begin{aligned} \frac{\|\mathbf{d}_k\|^2}{\|\mathbf{g}_k\|^4} &\!\leq\! \frac{(\lambda+2)\left(c\cdot c_1^2 \cdot\theta\cdot\alpha_{k-1}\right)^2 + \lambda\left(2L\theta\cdot {D}^2+ c\cdot c_1^2\cdot t\right)^2}{\lambda\left(c\cdot c_1^2 \cdot\theta\cdot\alpha_{k-1}\right)^2} \cdot \frac{1}{c_1^2}\\ \frac{\|\mathbf{g}_k\|^4}{\|\mathbf{d}_k\|^2} &\!\geq\!\frac{\lambda\left(c\cdot c_1^2 \cdot\theta\cdot\alpha_{k-1}\right)^2\cdot c_1^2}{(\lambda+2)\left(c\cdot c_1^2 \cdot\theta\cdot\alpha_{k-1}\right)^2 + \lambda\left(2L\theta\cdot {D}^2+ c\cdot c_1^2\cdot t\right)^2}. \end{aligned} \end{equation} (5.44)

    The inequalities in (5.44) imply

    \begin{equation} \sum\limits_{k = 0}^{\infty}\frac{\|\mathbf{g}_k\|^4}{\|\mathbf{d}_k\|^2}\geq \sum\limits_{k = 0}^{\infty}\frac{\lambda\left(c\cdot c_1^2 \cdot\theta\cdot\alpha_{k-1}\right)^2\cdot c_1^2}{(\lambda+2)\left(c\cdot c_1^2 \cdot\theta\cdot\alpha_{k-1}\right)^2 + \lambda\left(2L\theta\cdot {D}^2+ c\cdot c_1^2\cdot t\right)^2} \! = \!\infty. \end{equation} (5.45)

    Therefore, \|\mathbf{g}_k\|\geq c_1 causes a contradiction to (5.4). Consequently, (5.31) is confirmed for Case 2. The proof is complete.

    The code used in the testing experiments is written in the software Matlab R2017a, and executed on the personal computer Workstation Intel Core i3 2.0 GHz, 8GB of RAM memory, and Windows 10 operating system. Three important criteria: the number of iterations (IT), number of function evaluations (FE) and CPU time (CPU) in all tested methods are analyzed.

    The numerical experiments are performed on contains functions presented in [3], where much of the problems are taken over from CUTEr collection [14]. Each test function is tested 10 times with a gradually increasing values of the dimension by the rule n = 10 , 50 , 100 , 200 , 300 , 500 , 700 , 800 , 1000 and 1500 .

    Strong Wolfe line search use the following choice of parameters for all algorithms \sigma_1 \! = \! 0.0001 and \sigma_2 \! = \!0.5 .

    We utilized the performance profile given in [43] to compare numerical results (IT, FE and CPU) for all tested methods. The upper curve of the selected performance profile corresponds to the method that shows the best performance.

    BFGS, DFP, SR1 updates in QN methods with respect to different ILS strategies are compared in [40]. The main conclusion is that the BFGS method is superior to the others. Continuing such research, we compare the numerical performances obtained from AGD, MSM and SM methods, i.e, gradient methods with acceleration parameter. The numerical experiment contains 25 test functions proposed in [3]. For each of tested functions, we performed 12 numerical experiments with 100 , 200 , 300 , 500 , 1000 , 2000 , 3000 , 5000 , 7000 , 8000 , 10000 , and 15000 variables. Tested algorithms are based on the same implementation of the backtracking line search (Algorithm 2), which we set \omega = 0.0001 and \varphi = 0.8

    The uniform stopping criteria in this numerical experiments are

    \|\mathbf{g}_k\| \leq 10^{-6} \ \ \ \mathrm{and}\ \ \ \frac{|f_{k+1}-f_k|}{1+|f_k|} \leq 10^{-16}.

    Summary numerical data generated by AGD, MSM and SM method, tried on 25 test functions, are arranged in Table 4.

    Table 4.  Summary numerical results of the AGD, MSM and SM methods with respect to IT, FE and CPU.
    IT profile FE profile CPU time
    Test function AGD MSM SM AGD MSM SM AGD MSM SM
    Perturbed Quadratic 353897 34828 59908 13916515 200106 337910 6756.047 116.281 185.641
    Raydan 1 22620 26046 14918 431804 311260 81412 158.359 31.906 36.078
    Diagonal 3 120416 7030 12827 4264718 38158 69906 5527.844 52.609 102.875
    Generalized Tridiagonal 1 670 346 325 9334 1191 1094 11.344 1.469 1.203
    Extended Tridiagonal 1 3564 1370 4206 14292 10989 35621 55.891 29.047 90.281
    Extended TET 443 156 156 3794 528 528 3.219 0.516 0.594
    Diagonal 4 120 96 96 1332 636 636 0.781 0.203 0.141
    Extended Himmelblau 396 260 196 6897 976 668 1.953 0.297 0.188
    Perturbed quadratic diagonal 2542050 37454 44903 94921578 341299 460028 44978.750 139.625 185.266
    Quadratic QF1 366183 36169 62927 13310016 208286 352975 12602.563 81.531 138.172
    Extended quadratic penalty QP1 210 369 271 2613 2196 2326 1.266 1.000 0.797
    Extended quadratic penalty QP2 395887 1674 3489 9852040 11491 25905 3558.734 3.516 6.547
    Quadratic QF2 100286 32727 64076 3989239 183142 353935 1582.766 73.438 132.703
    Extended quadratic exponential EP1 48 100 73 990 894 661 0.750 0.688 0.438
    Extended Tridiagonal 2 1657 659 543 8166 2866 2728 3.719 1.047 1.031
    ARWHEAD (CUTE) 5667 430 270 214284 5322 3919 95.641 1.969 1.359
    Almost Perturbed Quadratic 356094 33652 60789 14003318 194876 338797 13337.125 73.047 133.516
    LIARWHD (CUTE) 1054019 3029 18691 47476667 27974 180457 27221.516 9.250 82.016
    ENGVAL1 (CUTE) 743 461 375 6882 2285 2702 3.906 1.047 1.188
    QUARTC (CUTE) 171 217 290 402 494 640 2.469 1.844 2.313
    Generalized Quartic 187 181 189 849 493 507 0.797 0.281 0.188
    Diagonal 7 72 147 108 333 504 335 0.625 0.547 0.375
    Diagonal 8 60 120 118 304 383 711 0.438 0.469 0.797
    Full Hessian FH3 45 63 63 1352 566 631 1.438 0.391 0.391
    Diagonal 9 329768 10540 13619 13144711 68189 89287 6353.172 43.609 38.672

     | Show Table
    DownLoad: CSV

    Table 4 contains numerical results corresponding to IT, FE and CPU criteria for the AGD, MSM and SM methods. Figures 1, 2, 3 illustrate the performance profiles corresponding to the results in Table 4 corresponding to the criterion IT, GE and CPU, respectively.

    Figure 1.  IT performance profile for AGD, MSM and SM methods.
    Figure 2.  FE performance profile for AGD, MSM and SM methods.
    Figure 3.  CPU time performance profile for AGD, MSM and SM methods.

    From Table 4, we conclude that the AGD, MSM and SM methods have successfully solved all test functions.

    Figure 1 presents the performance profiles of the IT of the AGD, MSM and SM methods. In this figure, it is observable that MSM method is best in 52.00 % of the test functions compared with: AGD (24.00\%) and SM (32.00\%) .

    From Figure 1, it is observable that the graph of the MSM method comes first to the top, which means that the MSM is superior compared to other considered methods with respect to the IT profile.

    Figure 2 presents the performance profiles of the FE of the AGD, MSM and SM methods. It is observable that MSM method is best in 64.00 % of tested functions compared with: AGD (12.00\%) and SM (32.00\%) . In view of Figure 2, the MSM graph first comes to the top, which means that the MSM is winer with respect to the FE profile.

    Figure 3 presents the performance profiles of the CPU of the AGD, MSM and SM methods. It is obvious that MSM is winer in 56.00 % of the test functions with respect to: AGD (4.00\%) and SM (44.00\%) . Figure 3 demonstrates that the graph of the MSM method first comes to the highest level, which signifies that the MSM is winer with respect to the CPU.

    From the previous analysis of the results shown in Table 4 and Figures 1-3, we can conclude that the MSM iterates are most efficient in terms of all three basic metrics: IT, FE and CPU. The MSM method has the smallest IT, FE and the CPU time compared to the other two methods on the most test functions.

    The uniform stopping criterion during testing CG methods is

    \|{ \mathbf{g}}_k\| \leq \epsilon,

    where \epsilon = 10^{-6} or when the number of function evaluations becomes greater than 1000000 .

    In this subsection, we compare the numerical results obtained from HS, PRP and LS methods, i.e., methods with \mathbf{y}_{k-1}^ \mathrm{T}\mathbf{g}_k in the numerator of \beta_k . The numerical experiment is based on 26 test functions. Summary numerical results for HS, PRP and LS method, tried on 26 test functions, are presented in Table 5.

    Table 5.  Summary numerical results of the HS, PRP and LS methods with respect to the IT, FE and CPU.
    IT profile FE profile CPU time
    Test function HS PRP LS HS PRP LS HS PRP LS
    Perturbed Quadratic 1157 1157 6662 3481 3481 19996 0.234 0.719 1.438
    Raydan 2 NaN 174 40 NaN 373 120 NaN 0.094 0.078
    Diagonal 2 NaN 1721 5007 NaN 6594 15498 NaN 1.313 2.891
    Extended Tridiagonal 1 NaN 170 17079 NaN 560 54812 NaN 0.422 13.641
    Diagonal 4 NaN 70 1927 NaN 180 5739 NaN 0.078 0.391
    Diagonal 5 NaN 154 30 NaN 338 90 NaN 0.172 0.078
    Extended Himmelblau 160 120 241 820 600 1043 0.172 0.125 0.172
    Full Hessian FH2 5096 5686 348414 15294 17065 1045123 83.891 80.625 5081.875
    Perturbed quadratic diagonal 1472 1120 21667 4419 3363 65057 0.438 0.391 2.547
    Quadratic QF1 1158 1158 5612 3484 3484 16813 0.281 0.313 1.047
    Extended quadratic penalty QP2 NaN 533 NaN NaN 5395 NaN NaN 0.781 NaN
    Quadratic QF2 2056 2311 NaN 9168 9862 NaN 0.969 0.859 NaN
    Extended quadratic exponential EP1 NaN NaN 70 NaN NaN 350 NaN NaN 0.141
    TRIDIA (CUTE) 6835 6744 NaN 20521 20248 NaN 1.438 1.094 NaN
    Almost Perturbed Quadratic 1158 1158 5996 3484 3484 17998 0.281 0.328 1.063
    LIARWHD (CUTE) NaN 408 11498 NaN 4571 50814 NaN 0.438 2.969
    POWER (CUTE) 7781 7789 190882 23353 23377 572656 1.422 1.219 14.609
    NONSCOMP (CUTE) 4545 3647 NaN 15128 12433 NaN 0.875 0.656 NaN
    QUARTC (CUTE) NaN 165 155 NaN 1347 1466 NaN 0.781 0.766
    Diagonal 6 NaN 174 137 NaN 373 442 NaN 0.109 0.125
    DIXON3DQ (CUTE) NaN 12595 12039 NaN 37714 36091 NaN 1.641 2.859
    BIGGSB1 (CUTE) NaN 11454 11517 NaN 34293 34530 NaN 1.969 2.141
    Generalized Quartic NaN 134 139 NaN 458 445 NaN 0.125 0.094
    Diagonal 7 NaN 51 80 NaN 142 240 NaN 0.063 0.109
    Diagonal 8 NaN 70 80 NaN 180 180 NaN 0.063 0.125
    FLETCHCR (CUTE) 18292 19084 20354 178305 170266 171992 8.859 6.203 7.484

     | Show Table
    DownLoad: CSV

    Table 5 shows the numerical results (IT, FE and CPU) for the HS, PRP and LS methods.

    Figures 4, 5 and 6 plot the performance profiles for the results in Table 5 with respect to IT, FE and CPU criterion, respectively.

    Figure 4.  IT performance profile for HS, PRP and LS methods.
    Figure 5.  FE performance profile for HS, PRP and LS method.
    Figure 6.  CPU time performance profile for HS, PRP and LS methods.

    Figure 4 presents the performance profiles of the IT correspondig to the HS, PRP and LS methods. In this figure, it is observable that PRP method is best in 61.54 % of the test functions compared with: HS (26.92\%) and LS (23.08\%) . From Figure 4, it is observable that the graph of the PRP method comes first to the top, which signifies that the PRP outperforms other considered methods with respect to the IT criterion.

    Figure 5 presents the performance profiles of the FE of the HS, PRP and LS methods. It is observable that PRP method is best in 69.23 % of the test functions compared with: HS (23.08\%) and LS (23.08\%) . From Figure 5, it is observed that the PRP graph first comes to the top, which signifies that the PRP is the best with respect to the FE.

    Figure 6 presents the performance profiles of the CPU of the HS, PRP and LS methods. It is obvious that PRP is best in 69.23 % of the test functions compared with: HS (11.54\%) and LS (19.23\%) . Figure 6 demonstrates that the graph of the PRP method first comes to the top, which signifies that the PRP is the best with respect to the CPU.

    From the previous analysis of the results shown in Table 5 and Figures 4-6, we can conclude that the PRP method achieved the best and most efficient results in terms of all three basic metrics: IT, FE and CPU.

    In this subsection, we compare the numerical results obtained from DY, FR and CD methods, i.e, methods with \|\mathbf{g}_{k}\|^2 in the numerator of \beta_k . The numerical experiment contains 25 test functions. Summary numerical results for DY, FR and CD method, tried on 25 test functions, are presented in Table 6.

    Table 6.  Summary numerical results of the DY, FR and CD methods with respect to IT, FE and CPU.
    IT profile FE profile CPU time
    Test function DY FR CD DY FR CD DY FR CD
    Perturbed Quadratic 1157 1157 1157 3481 3481 3481 0.469 0.609 0.531
    Raydan 2 86 40 40 192 100 100 0.063 0.016 0.016
    Diagonal 2 1636 3440 2058 4774 7982 8063 0.922 1.563 1.297
    Extended Tridiagonal 1 2081 690 1140 4639 2022 2984 1.703 1.141 1.578
    Diagonal 4 70 70 70 200 200 200 0.047 0.031 0.016
    Diagonal 5 40 124 155 100 258 320 0.109 0.141 0.125
    Extended Himmelblau 383 339 207 1669 1467 961 0.219 0.172 0.172
    Full Hessian FH2 4682 4868 4794 14054 14610 14390 65.938 66.469 65.922
    Perturbed quadratic diagonal 1036 1084 1276 3114 3258 3834 0.406 0.422 0.422
    Quadratic QF1 1158 1158 1158 3484 3484 3484 0.297 0.297 0.328
    Quadratic QF2 NaN NaN 2349 NaN NaN 10073 NaN NaN 1.531
    Extended quadratic exponential EP1 NaN 60 60 NaN 310 310 NaN 0.109 0.125
    Almost Perturbed Quadratic 1158 1158 1158 3484 3484 3484 0.422 0.453 0.391
    LIARWHD (CUTE) 2812 1202 1255 12366 7834 7379 0.938 1.000 1.109
    POWER (CUTE) 7779 7781 7782 23347 23353 23356 1.078 1.500 1.328
    NONSCOMP (CUTE) 2558 13483 10901 49960 43268 33413 1.203 1.406 1.422
    QUARTC (CUTE) 134 94 95 1132 901 916 0.688 0.672 0.563
    Diagonal 6 86 40 40 192 100 100 0.047 0.063 0.063
    DIXON3DQ (CUTE) 16047 18776 19376 48172 56369 58176 2.266 2.516 2.734
    BIGGSB1 (CUTE) 15274 17835 18374 45853 53546 55170 2.875 2.922 2.484
    Generalized Quartic 142 214 173 497 712 589 0.078 0.172 0.109
    Diagonal 7 50 50 50 160 160 160 0.063 0.047 0.094
    Diagonal 8 50 40 40 160 130 130 0.109 0.125 0.063
    Full Hessian FH3 43 43 43 139 139 139 0.063 0.109 0.109
    FLETCHCR (CUTE) NaN NaN 26793 NaN NaN 240237 NaN NaN 10.203

     | Show Table
    DownLoad: CSV

    Table 6 contains numerical results (IT, FE and CPU) for the DY, FR and CD methods. Figures 7, 8 and 9 plot the performance profiles for the results in Table 6 with respect to profiles IT, FE and CPU, respectively.

    Figure 7.  IT performance profile for DY, FR and CD methods.
    Figure 8.  FE performance profile for DY, FR and CD methods.
    Figure 9.  CPU time performance profile for DY, FR and CD methods.

    From Figure 7, it is observable that the graph of the CD method comes first to the top, which signifies that the CD outperforms other considered methods with respect to the IT.

    Figure 8 presents the performance profiles of the FE of the DY, FR and CD methods. From Figure 8, it is observed that the CD graph first comes to the highest level, which means that the CD possesses best performances with respect to the criterion FE.

    Figure 9 presents the performance profiles of the CPU of the DY, FR and CD methods. Figure 9 demonstrates that the graph of the CD method first achieves the top level, so that the CD is winer with respect to the CPU.

    From the previous analysis of the results shown in Table 6 and Figures 7-9, it is clear that the CD method achieved most efficient results in terms of all three basic metrics: IT, FE and CPU.

    This subsection analyses numerical results obtained by running a MATLAB implementation with predefined conditions given at the beginning section. The following ten hybrid CG methods in the form of (1.2) and (1.10), which differ only in the choice of the CG parameter \beta_k , are tested:

    - HCG1: The CG method with \beta_k defined by (4.15).

    - HCG2: The CG method with \beta_k defined by (4.16).

    - HCG3: The CG method with \beta_k defined by (4.17).

    - HCG4: The CG method with \beta_k defined by (4.18).

    - HCG5: The CG method with \beta_k defined by (4.19) in which c = \frac{1-\sigma}{1+\sigma} .

    - HCG6: The CG method with \beta_k defined by (4.20).

    - HCG7: The CG method with the parameter \beta_k defined by (4.21).

    - HCG8: The CG method with the parameter \beta_k defined by (4.22) in which \theta_k\in[0,1] .

    - HCG9: The CG method with the parameter \beta_k defined by (4.23) in which \theta_k\in[0,1] .

    - HCG10: The CG method with the parameter \beta_k defined by (4.24) in which \nu_k, \theta_k\in[0,1] .

    The numerical experiment contains 25 test functions.

    Summary numerical results for hybrid CG methods, tried on 25 test functions, with respect to IT, FE and CPU profiles are presented in Table 7.

    Table 7.  Summary numerical results of the hybrid CG methods HCG1–HCG10 with respect to IT.
    Test function HCG1 HCG2 HCG3 HCG4 HCG5 HCG6 HCG7 HCG8 HCG9 HCG10
    Perturbed Quadratic 1157 1157 1157 1157 1157 1157 1157 1157 1157 1157
    Raydan 2 40 40 40 57 78 81 40 69 NaN 126
    Diagonal 2 1584 1581 1542 1488 1500 2110 2193 1843 1475 1453
    Extended Tridiagonal 1 805 623 754 2110 2160 10129 1167 966 NaN 270
    Diagonal 4 60 60 70 60 70 70 60 70 NaN 113
    Diagonal 5 124 39 98 39 120 109 39 141 154 130
    Extended Himmelblau 145 139 111 161 181 207 159 381 109 108
    Full Hessian FH2 5036 5036 5036 4820 4820 4800 4994 4789 5163 5705
    Perturbed quadratic diagonal 1228 1214 1266 934 1093 987 996 1016 NaN 2679
    Quadratic QF1 1158 1158 1158 1158 1158 1158 1158 1158 NaN 1158
    Quadratic QF2 2125 2098 2174 1995 1991 2425 2378 NaN 2204 2034
    TRIDIA (CUTE) NaN NaN NaN 6210 6210 5594 NaN NaN 6748 7345
    Almost Perturbed Quadratic 1158 1158 1158 1158 1158 1158 1158 1158 1158 1158
    LIARWHD (CUTE) 1367 817 1592 1024 1831 1774 531 2152 NaN 573
    POWER (CUTE) 7782 7782 7782 7779 7779 7802 7781 7780 NaN 7781
    NONSCOMP (CUTE) 10092 10746 8896 10466 9972 13390 11029 3520 3988 11411
    QUARTC (CUTE) 94 160 145 150 126 95 160 114 165 154
    Diagonal 6 40 40 40 57 78 81 40 69 NaN 126
    DIXON3DQ (CUTE) 12182 5160 11257 5160 11977 14302 5160 17080 NaN 12264
    BIGGSB1 (CUTE) 10664 5160 10479 5160 11082 13600 5160 16192 NaN 11151
    Generalized Quartic 129 107 110 107 142 153 107 123 131 145
    Diagonal 7 50 NaN 40 NaN 40 50 NaN 50 51 40
    Diagonal 8 40 40 40 50 NaN 50 40 NaN NaN 40
    Full Hessian FH3 43 42 42 42 42 43 42 43 NaN NaN
    FLETCHCR (CUTE) 17821 17632 18568 17272 17446 26794 24865 NaN 17315 20813

     | Show Table
    DownLoad: CSV

    Figure 10 plot corresponding performance profiles IT for the results included in Table 7, in three columns denoted by IT.

    Figure 10.  IT performance profile for hybrid CG methods HCG1–HCG10.

    From Figure 10, it is observable that the graph of the HCG6 method comes first to the top, which signifies that the HCG6 outperforms other considered methods with respect to the IT. However, if we look in more detail Figure 10, we can see that the HCG6 method does not have the best results, it has the best results in only (16\%) , while the HCG7 method has the best results in (52\%) on the number of functions tested. The reason for such behavior lies in the fact that the HCG6 method is the only one that has successfully solved all the test problems.

    The numerical results of the hybrid CG methods with respect to the FE are arranged in Table 8.

    Table 8.  Summary numerical results of the hybrid CG methods HCG1–HCG10 with respect to FE.
    Test function HCG1 HCG2 HCG3 HCG4 HCG5 HCG6 HCG7 HCG8 HCG9 HCG10
    Perturbed Quadratic 3481 3481 3481 3481 3481 3481 3481 3481 3481 3481
    Raydan 2 100 100 100 134 176 182 100 158 NaN 282
    Diagonal 2 6136 6217 6006 5923 5944 8281 8594 4822 5711 5636
    Extended Tridiagonal 1 2369 1991 2275 4678 4924 22418 3119 2661 NaN 869
    Diagonal 4 170 170 200 170 200 200 170 200 NaN 339
    Diagonal 5 258 88 206 88 270 228 88 292 338 270
    Extended Himmelblau 855 687 583 763 813 961 757 1613 567 594
    Full Hessian FH2 15115 15115 15115 14467 14467 14407 14989 14374 15495 17122
    Perturbed quadratic diagonal 3686 3647 3805 2805 3282 2967 2993 3053 NaN 8044
    Quadratic QF1 3484 3484 3484 3484 3484 3484 3484 3484 NaN 3484
    Quadratic QF2 9455 9202 9501 9016 9054 10229 10086 NaN 9531 9085
    TRIDIA (CUTE) NaN NaN NaN 18640 18640 16792 NaN NaN 20260 22051
    Almost Perturbed Quadratic 3484 3484 3484 3484 3484 3484 3484 3484 3484 3484
    LIARWHD (CUTE) 7712 5931 8275 6165 8113 9395 5854 10305 NaN 4848
    POWER (CUTE) 23356 23356 23356 23347 23347 23416 23353 23350 NaN 23353
    NONSCOMP (CUTE) 31355 33211 27801 32705 31458 40807 34013 23411 13367 35106
    QUARTC (CUTE) 901 1254 1261 1224 1224 916 1254 1041 1347 1305
    Diagonal 6 100 100 100 134 176 182 100 158 NaN 282
    DIXON3DQ (CUTE) 36508 15534 33759 15534 35926 42952 15534 51284 NaN 36796
    BIGGSB1 (CUTE) 31960 15534 31427 15534 33247 40846 15534 48620 NaN 33469
    Generalized Quartic 457 371 370 371 481 529 371 439 446 467
    Diagonal 7 160 NaN 130 NaN 130 160 NaN 160 142 13
    Diagonal 8 130 130 130 160 NaN 160 130 NaN NaN 130
    Full Hessian FH3 139 136 136 136 136 139 136 139 NaN NaN
    FLETCHCR (CUTE) 166463 165774 168739 175309 175845 240240 184939 NaN 174406 215687

     | Show Table
    DownLoad: CSV

    Figure 11 plots the performance profiles for the results in Table 8, in three columns denoted by FE.

    Figure 11.  FE performance profile for hybrid CG methods HCG1–HCG10).

    From Figure 11, it is observable that the graph of the HCG6 method comes first to the top, which signifies that the HCG6 outperforms other considered methods with respect to the FE. However, if we look in more detail Figure 11, we can see an identical situation as in Figure 10 that the HCG6 method does not have the best results, it has the best results in only (16 \%) , while the HCG2 method has the best results in (48\%) on the number of functions tested.

    Table 9 contains numerical results of the hybrid CG methods with respect to the CPU.

    Table 9.  Summary numerical results of the hybrid CG methods HCG1–HCG10 with respect to the CPU (sec).
    Test function HCG1 HCG2 HCG3 HCG4 HCG5 HCG6 HCG7 HCG8 HCG9 HCG10
    Perturbed Quadratic 0.656 0.516 0.781 0.719 0.594 0.438 0.719 0.688 0.844 0.688
    Raydan 2 0.031 0.063 0.078 0.078 0.078 0.078 0.078 0.078 NaN 0.078
    Diagonal 2 1.453 1.328 1.656 1.172 1.438 1.797 1.813 1.266 1.250 1.141
    Extended Tridiagonal 1 1.016 1.125 1.359 2.250 2.375 7.578 1.672 1.375 NaN 0.922
    Diagonal 4 0.031 0.031 0.031 0.078 0.078 0.047 0.109 0.094 NaN 0.094
    Diagonal 5 0.141 0.063 0.156 0.094 0.094 0.125 0.109 0.078 0.219 0.156
    Extended Himmelblau 0.172 0.172 0.109 0.141 0.172 0.141 0.125 0.141 0.172 0.125
    Full Hessian FH2 83.125 91.938 86.984 85.766 94.484 78.281 77.141 74.500 80.969 82.469
    Perturbed quadratic diagonal 0.406 0.609 0.641 0.375 0.563 0.359 0.328 0.344 NaN 0.734
    Quadratic QF1 0.359 0.438 0.422 0.422 0.406 0.391 0.484 0.422 NaN 0.281
    Quadratic QF2 1.047 1.313 1.203 1.156 1.063 1.156 1.000 NaN 1.094 1.047
    TRIDIA (CUTE) NaN NaN NaN 1.688 1.391 1.859 NaN NaN 1.875 1.391
    Almost Perturbed Quadratic 0.406 0.438 0.516 0.594 0.250 0.359 0.406 0.578 0.641 0.422
    LIARWHD (CUTE) 0.938 0.828 1.203 0.797 1.125 1.172 0.938 1.203 NaN 0.594
    POWER (CUTE) 1.563 1.672 1.750 1.609 1.625 1.578 1.625 1.188 NaN 1.453
    NONSCOMP (CUTE) 1.547 1.484 1.063 1.766 1.422 1.719 1.516 1.063 1.203 1.703
    QUARTC (CUTE) 0.750 1.000 0.969 0.969 0.875 0.797 0.938 0.703 1.266 0.93
    Diagonal 6 0.078 0.078 0.078 0.094 0.063 0.016 0.016 0.125 NaN 0.109
    DIXON3DQ (CUTE) 2.047 1.453 2.016 1.484 2.359 2.234 1.406 2.297 NaN 2.078
    BIGGSB1 (CUTE) 1.875 2.047 2.359 1.750 2.250 2.391 1.422 2.672 NaN 2.422
    Generalized Quartic 0.063 0.125 0.141 0.156 0.125 0.094 0.078 0.109 0.172 0.109
    Diagonal 7 0.063 NaN 0.016 NaN 0.109 0.063 NaN 0.063 0.063 0.063
    Diagonal 8 0.078 0.125 0.078 0.031 NaN 0.063 0.109 NaN NaN 0.078
    Full Hessian FH3 0.063 0.047 0.109 0.047 0.031 0.063 0.047 0.109 NaN NaN
    FLETCHCR (CUTE) 5.656 6.750 7.922 9.484 6.484 8.766 7.281 NaN 6.906 7.547

     | Show Table
    DownLoad: CSV

    Figure 12 plots the performance profiles for the results in Tables 9.

    Figure 12.  CPU time performance profile for hybrid CG methods HCG1–HCG10.

    From Figure 12, it is observable that the graph of the HCG6 method comes first to the top, which signifies that the HCG6 outperforms other considered methods with respect to the CPU. However, if we look in more detail Figure 12, we can see an identical situation as in the figures 10 and 11 that the HCG6 method does not have the best results, it has the best results in only (8\%) , while the HCG7 and HCG10 methods has the best results in (20\%) on the number of functions tested.

    The numerical experiments presented in this subsection investigate the influence of the scalar size in the modified Dai-Liao methods. The previously mentioned variants of the DL method use a fixed value of the parameter t . It can also be seen that the scalar t in all the above papers are greater than 0 and is less than 1. Analyzing the results from [18,26,131,139], we conclude that the scalar t was defined by a fixed value of 0.1 in numerical experiments. Also, numerical experience related the fixed valued t = 1 was reported in [26]. Common numerical experience is that different choice of t initiate totally different numerical experience.

    That is why we come to the next question. What is the 'best' value of t\in(0,1) from the computational point of view? Because of that, our intention is to investigate numerically and theoretically behavior of different variants of the DL conjugate gradient framework with respect to various values t . For this reason, we started this research with the aim to find answer to the aforementioned question. Our strategy is to select several values of the parameter t within the interval (0,1) and to compare the obtained results based on different criteria. In that way, we will get the answer to the question: whether it is better to take values closer to zero or closer to one.

    As we have already indicated in the previous section, the aim of this subsection is to answer to the question: what is the 'best' value of t\in(0,1) in DL CG computational scheme? Our plan is to examine numerically the influence of the scalar t in the DL class of iterations and determine some rules for its appropriate choice and, if possible, find the best value. The detailed research plan is to find the answer to two challenging questions:

    - Does and how much the values of the scalar t affect each of the methods DHSDL, DLSDL, MHSDL and MLSDL, which are observed individually, with respect to IT, FE and the CPU time (CPU)?

    - Does the choice of t favor one (or some) of the considered methods?

    To give an answer to these questions, we would have to test all the methods under the same conditions. During testing, we will compare all the considered methods with the same values of required scalars. The Algorithm 2, i.e. the backtracking line search, determines the step-size \alpha_k in (1.2).

    Algorithm 6 gives the corresponding general framework for DHSDL, DLSDL, MHSDL and MLSDL methods.

    Algorithm 6 Modified Dai-Liao conjugate gradient methods.
    Require: A starting point \mathbf{x}_0 , real numbers 0<\epsilon<1, 0<\delta<1, \mu>1 and t>0 .
    1: Set k\!=\!0 and compute \mathbf{d}_0\!=\!-\mathbf{g}_0 .
    2: If
              \|\mathbf{g}_k\| \leq \epsilon \ \ \ \mathrm{and}\ \ \ \frac{|f_{k+1}-f_k|}{1+|f_k|} \leq \delta
    STOP;
    else perform Step 3.
    3: Determine \alpha_k\in (0,1) using backtracking in Algorithm 2.
    4: Compute \mathbf{x}_{k+1}\!=\! \mathbf{x}_k + \alpha_k \mathbf{d}_k .
    5: Calculate \mathbf{g}_{k+1} , \mathbf{y}_k\!=\!\mathbf{g}_{k+1}-\mathbf{g}_k , \mathbf{s}_k\!=\!\mathbf{x}_{k+1}-\mathbf{x}_k .
    6: Calculate \beta_k by (4.9) or (4.10) or (4.11) or (4.12).
    7: Compute \mathbf{d}_k \!=\!-\mathbf{g}_k+\beta_k\mathbf{d}_{k-1} .
    8: k:\!=\! k+1, and go to Step 2.

     | Show Table
    DownLoad: CSV

    Values of t used during the testing of the observed methods are given in the Table 10. Each particular value of the scalar t is marked with one of the labels T1 to T6 , corresponding to a joined value of t . Five of the six given values are fixed during testing of the observed methods. Only the value of t labeled by T1 is variable during the iterations, where the value in k th iteration is denoted by t_k . In this case, the value t_k is obtained from backtracking line search, i.e., t_k = \alpha_k inherits a new value obtained from Algorithm 2 in each iteration.

    Table 10.  Labels and values of scalar t in the DHSDL, DLSDL, MHSDL and MLSDL methods.
    Label T1 T2 T3 T4 T5 T6
    Value of the scalar t t_k=\alpha_k 0.05 0.1 0.2 0.5 0.9

     | Show Table
    DownLoad: CSV

    The reason why we decided to define values t_k from the backtracking line search algorithm is:

    - t_k\in(0,1) ;

    - t_k also affects the iterative steps when computing the next value for \mathbf{x}_k .

    The convergence of the above methods has already considered in the mentioned references. Our goal is to give a unified analysis of the convergence of proposed DL methods. The aim of our research is to investigate the influence of the scalar t in the DL iterates.

    Each particular value of the scalar t is marked with one of the labels T1 to T6 , corresponding to a joined value of t . Five of the six given values are fixed during testing of the observed methods. Only the value of t labeled by T1 is variable during the iterations, where the value in k th iteration is denoted by t_k . In this case, the value t_k is obtained from backtracking line search, i.e., t_k = \alpha_k inherits a new value obtained from Algorithm 2 in each iteration.

    The convergence of the above methods has already considered in the mentioned references. Our goal is to give a unified analysis of the convergence of the proposed DL methods. The aim of our research is to investigate the influence of the scalar t in the DL iterates.

    Arranged numerical results are generated by testing and comparing DHSDL, DLSDL, MHSDL and MLSDL methods on 6 different values t , denoted by T1 - T6 . Three important criteria (IT, CPU, FE) in all tested methods are analyzed. Numerical report is based on 22 test functions proposed in [3]. For each of tested functions, we performed 10 numerical experiments with 100 , 500 , 1000 , 3000 , 5000 , 7000 , 8000 , 10000 , 15000 and 20000 variables. Summary results for DHSDL, DLSDL, MHSDL and MLSDL methods, tried on 22 tests, are presented.

    The uniform stopping criteria are (refer to previous)

    \|\mathbf{g}_k\| \leq 10^{-6} \ \ \ \mathrm{and}\ \ \ \frac{|f_{k+1}-f_k|}{1+|f_k|} \leq 10^{-16}.

    The backtracking parameters for all algorithms are \omega \! = \! 0.0001 and \varphi \! = \!0.8 .

    During the testing of the DHSDL and DLSDL methods, the constant parameter \mu = 1.2 was used in each iteration.

    Figure 13 indicates the performance profiles of the IT criterion with respect to the DHSDL method depending on the scalar value t . This figure exhibits that the DHSDL method successfully solved all the problems for all values of the scalar t . Also, the DHSDL-T2 method is finest in 45.5 % of testings with respect to DHSDL-T1 (18.2%), DHSDL-T3 (9.1%), DHSDL-T4 (4.5%), DHSDL-T5 (13.6%) and DHSDL-T6 (13.6%). It can be noticed that the graph of DHSDL-T2 reaches the top first, which signifies that DHSDL-T2 is the best with respect to IT.

    Figure 13.  Performance profiles of DHSDL (T1, T2, T3, T4, T5, T6) method based on IT.

    Figure 14 shows the performance profile given by FE of the DHSDL solver with respect to t . Evidently, DHSDL solves all test problems for all values of the scalar t , and the DHSDL-T2 method is winer in 50.0 % of the test problems compared to the DHSDL-T1 (18.2%), DHSDL-T3 (9.1%), DHSDL-T4 (0%), DHSDL-T5 (13.6%) and DHSDL-T6(9.1%). From Figure 14, it is notifiable that the DHSDL-T2 is the best with respect to FE.

    Figure 14.  Performance profiles of DHSDL (T1, T2, T3, T4, T5, T6) method based on FE.

    Figure 15 illustrates the CPU criterion spanned by the DHSDL method depending on the metrics t . Again, the DHSDL method is able to solve all the tested problems for all values of the scalar t . Further, the DHSDL-T6 method is superior in 40.9 % of tests with respect to the DHSDL-T1 (4.5%), DHSDL-T2 (27.3%), DHSDL-T3 (9.1%), DHSDL-T4 (4.5%) and DHSDL-T5 (13.6%). We also observed that at the beginning, DHSDL-T2 does not perform well but as the number of problems increases, its graph crosses other graphs and achieves the top fist, which means that the DHSDL-T2 is dominant with respect to CPU.

    Figure 15.  Performance profile of DHSDL (T1, T2, T3, T4, T5, T6) method based on CPU time.

    The graphs displayed in figures 1315 show that the DHSDL method has achieved superior results for t\equiv T2 = 0.05 .

    Figure 16 illustrates the IT criterion in the DLSDL method depending on the scalar value t . It is observable that DLSDL successfully solves all the problems for all values of the scalar t . Moreover, the DLSDL-T2 method is winner in 36.4 % of the tests compared to DLSDL-T1 (22.7%), DLSDL-T3 (13.6%), DLSDL-T4 (9.1%), DLSDL-T5 (9.1%) and DLSDL-T6 (13.6%). Figure 16 (left) exhibits that the graph DLSDL-T2 achieves the top first, which means that DLSDL-T2 outperforms all the other methods with respect to IT.

    Figure 16.  Performance profiles of DLSDL (T1, T2, T3, T4, T5, T6) method based on IT.

    Figure 17 shows the performance profiles of the criterion FE corresponding to the DLSDL method and the scalar value t . In this Figure, it is observable that DLSDL for all values of the scalar t successfully solves all tests, and DLSDL-T2 is best in 36.4 % of the test functions in comparison to DLSDL-T1 (18.2%), DLSDL-T3 (4.5%), DLSDL-T4 (18.2%), DLSDL-T5 (13.6%) and DLSDL-T6 (9.1%). From Figure 17, it is observed that DLSDL-T2 is the best with respect to FE.

    Figure 17.  Performance profiles of DLSDL (T1, T2, T3, T4, T5, T6) method based on FE.

    Figure 18 shows the performance profiles of the CPU time of the DLSDL method depending on t . The graphs in this figure indicate that DLSDL method solved all the problems for all values of t , and the DLSDL-T4 method is superior in 31.8 % of the test problems compared to DLSDL-T1 (13.6%), DLSDL-T2 (22.7%), DLSDL-T3 (18.2%), DLSDL-T5 (9.1%) and DLSDL-T6 (4.6%). We also observed that at the beginning, DLSDL-T2 does not perform well; but as the number of problems increases, its graph crosses other graphs & d comes to the top which signifies that, with respect to CPU, the DLSDL-T2 is the best.

    Figure 18.  Performance profile of DLSDL (T1, T2, T3, T4, T5, T6) method based on CPU.

    Based on figures (1618) analysis, we come to the conclusion that the DLSDL method has achieved best responses for t\equiv T2 = 0.05 .

    In Subsection 6.5.1, we have defined two questions. Our aim in this subsection is to give answers. In order to answer properly to the first question, in Tables 11, 12, 13 are collected average values for all three considered criteria.

    Table 11.  Average IT values for 22 test functions tested on 10 numerical experiments.
    Method T1 T2 T3 T4 T5 T6
    DHSDL 32980.14 31281.32 33640.45 32942.36 34448.32 33872.36
    DLSDL 30694.00 28701.14 31048.32 30594.77 31926.59 31573.05
    MHSDL 29289.73 27653.64 29660.00 29713.50 30491.18 30197.27
    MLSDL 25398.82 22941.77 24758.27 24250.68 25722.64 25032.64

     | Show Table
    DownLoad: CSV
    Table 12.  Average values of FE for 22 test functions tested on 10 numerical experiments.
    Method T1 T2 T3 T4 T5 T6
    DHSDL 1228585.50 1191960.55 1252957.09 1238044.36 1271176.59 1255710.45
    DLSDL 1131421.41 1083535.14 1149482.41 1134315.00 1167030.14 1158554.77
    MHSDL 1089700.41 1036710.32 1089777.64 1091985.41 1105299.91 1101380.18
    MLSDL 904217.14 845017.55 891669.50 879473.14 913165.68 895652.36

     | Show Table
    DownLoad: CSV
    Table 13.  Average CPU time for 22 test functions tested on 10 numerical experiments.
    Method T1 T2 T3 T4 T5 T6
    DHSDL 902.06 894.73 917.77 930.56 911.28 870.93
    DLSDL 816.08 790.63 804.69 816.28 803.84 809.67
    MHSDL 770.78 751.65 728.61 749.70 712.64 720.57
    MLSDL 573.14 587.41 581.50 576.32 582.62 580.96

     | Show Table
    DownLoad: CSV

    After the numerical testing of the compared methods and the individual analysis for each method, we can now give a global conclusion of the behavior of the observed methods. The first conclusion is that the value of the scalar t significantly affects on each of the DHSDL, DLSDL, MHSDL and MLSDL methods with respect to all three criteria. Further, a common conclusion is that all the methods give the best performance profiles for the scalar value t = 0.05 . We can also give an answer to another question. According to the previous analysis, a particular selection of the scalar value t can give priority to one of the observed method. All methods do not behave identically for the same values of the scalar t . If we observe Table 11 which contains the average number of iterations, we can notice that the difference between the smallest and the biggest average result of the observed method is in the range of 10.1% to 12.1% in relation to the minimum obtained value. Table 12 shows the average results related FE. An individual comparison of considered methods leads to the conclusion that the difference between the smallest and the biggest average results is in the range of 6.6% to 8.9% in relation to the minimum obtained value. This brings us to the same conclusion once again, that is, the value of the scalar t affects the methods in a given percentage. Also, if we observe Table 13 with the average CPU time, we can notice that the difference between the smallest and the biggest average CPU time observed method is in the range of 2.5% to 10.6% in relation to the minimum value.

    Overview of QN methods, CG methods and their classification are presented. Section 5 investigates convergence properties of CG methods, following the CG classes in accordance with the presented taxonomy of basic CG methods. Numerical experiments compare main classes of QN and CG methods. More precisely, main QN methods with constant diagonal Hessian approximation are compared as well as two classes of basic CG methods, hybrid CG methods, and finally some variants of modified Dai-Liao methods.

    The problem of defining further improvements of the CGUP \beta_k is still open. Moreover, new CG methods could be defined using appropriate updates of the parameter t . Another research stream includes various hybridizations of so far proposed CG methods. On the other hand, there are open possibilities for defining new updates of the matrices B_k and H_k , used in defining QN methods. Continuing research on some composite definitions of \beta_k based on CG and BFGS updates, it is possible to discover new three-term (or even different) CG variants.

    One of prospective fields for further research includes a generalization of the discrete-time approach to continuous-time approach, considered in [69]. Another possibility for further research is extension of gradient methods to tensor case. This possibility was exploited in [77] on solving \mathcal{M} -tensor equations.

    Moreover, an application of low rank updates used in optimization can be transferred to appropriate numerical methods for computing generalized inverses. Applications of rank-one update formula are investigated in [115,116].

    Predrag Stanimirović and Dijana Mosić are supported by the Ministry of Education, Science and Technological Development, Grants No. 174013 and 174007. Haifeng Ma is supported by the National Natural Science Foundation of China under grant 11971136 and the bilateral project between China and and Poland (no. 37-18).



    [1] S. W. Golomb, Algebraic constructions for costas arrays, J. Comb. Theory Ser. A, 37 (1984), 13-21. doi: 10.1016/0097-3165(84)90015-3
    [2] L. Qi, W. P. Zhang, On the generalization of Golomb's conjecture, Journal of Northwest University, Natural Science Edition, 45 (2015), 199-201.
    [3] Q. Sun, On primitive roots in a finite field, Journal of Sichuan University, Natural Science Edition, 25 (1988), 133-139.
    [4] T. Tian, W. Qi, Primitive normal element and its inverse in finite fields, Acta Math. Sin., 49 (2006), 657-668.
    [5] P. Wang, X. Cao, R. Feng, On the existence of some specific elements in finite fields of characteristic 2, Finite Fields Th. App., 18 (2012), 800-813.
    [6] J. P. Wang, On Golomb's conjecture, Sci. China Ser. A, 31 (1988), 152-161.
    [7] T. T. Wang, X. N. Wang, On the Golomb's conjecture and Lehmer's numbers, Open Math., 15 (2017), 1003-1009. doi: 10.1515/math-2017-0083
    [8] W. Q. Wang, W. P. Zhang, A mean aalue related to primitive roots and Golomb's conjectures, Abstr. Appl. Anal., 2014 (2014), 1-5.
    [9] W. P. Zhang, On a problem related to Golomb's conjectures, J. Syst. Sci. Complex., 16 (2003), 13-18.
    [10] S. D. Cohen, W. P. Zhang, Sums of two exact powers, Finite Fields Th. App., 8 (2002), 471-477. doi: 10.1016/S1071-5797(02)90354-0
    [11] S. D. Cohen, Pairs of primitive roots, Mathematica, 32 (1985), 276-285.
    [12] S. D. Cohen, T. Trudgian, Lehmer numbers and primitive roots modulo a prime, J. Number Theory, 203 (2019), 68-79. doi: 10.1016/j.jnt.2019.03.004
    [13] R. K. Guy, Unsolved Problems in Number Theory, Springer-Verlag, 1981.
    [14] W. P. Zhang, H. L. Li, Elementary Number Theory, Shaanxi Normal University Press, Xi'an, 2013.
    [15] T. M. Apostol, Introduction to Analytic Number Theory, Springer-Verlag, New York, 1976.
    [16] W. Narkiewicz, Classical Problems in Number Theory, Polish Scientifc Publishers, Warszawa, 1987.
    [17] J. Bourgain, Z. M. Garaev, V. S. Konyagin, On the hidden shifted power problem, SIAM J. Comput., 41 (2012), 1524-1557. doi: 10.1137/110850414
    [18] K. Gong, C. H. Jia, Shifted character sums with multiplicative coefficients, J. Number Theory, 153 (2015), 364-371. doi: 10.1016/j.jnt.2015.01.015
  • This article has been cited by:

    1. Yiyuan Qian, Kai Zhang, Jingzhi Li, Xiaoshen Wang, Adaptive neural network surrogate model for solving the implied volatility of time-dependent American option via Bayesian inference, 2022, 30, 2688-1594, 2335, 10.3934/era.2022119
    2. Branislav Ivanov, Gradimir V. Milovanović, Predrag S. Stanimirović, Accelerated Dai-Liao projection method for solving systems of monotone nonlinear equations with application to image deblurring, 2023, 85, 0925-5001, 377, 10.1007/s10898-022-01213-4
    3. Krzysztof Gdawiec, Wiesław Kotarski, Agnieszka Lisowska, On the robust Newton’s method with the Mann iteration and the artistic patterns from its dynamics, 2021, 104, 0924-090X, 297, 10.1007/s11071-021-06306-5
    4. Mohd Rizman Sultan Mohd, Juliana Johari, Fazlina Ahmat Ruslan, Noorfadzli Abdul Razak, Salmiah Ahmad, Ahmad Syahiman Mohd Shah, 2022, Chapter 47, 978-981-16-8514-9, 619, 10.1007/978-981-16-8515-6_47
    5. Jingxin Lin, Nianfeng Wang, Hao Zhu, Xianmin Zhang, Xuewei Zheng, 2021, Fabric defect detection based on multi-input neural network, 978-1-6654-3153-8, 458, 10.1109/M2VIP49856.2021.9665111
    6. Yapeng Li, Xiangzhen Wang, Wenjie Cheng, Songyang Gao, Chuntian Cheng, A combination approach for downstream plants to solve scheduling information asymmetry problem in electricity markets, 2023, 149, 01420615, 108935, 10.1016/j.ijepes.2022.108935
    7. Mei Liu, Xiaoyan Zhang, Mingsheng Shang, Computational Neural Dynamics Model for Time-Variant Constrained Nonlinear Optimization Applied to Winner-Take-All Operation, 2022, 18, 1551-3203, 5936, 10.1109/TII.2021.3138794
    8. Huiping Cao, Xiaomin An, Jing Han, Solving nonlinear equations with a direct Broyden method and its acceleration, 2022, 1598-5865, 10.1007/s12190-022-01818-8
    9. Tina Mai, Daniele Mortari, Theory of functional connections applied to quadratic and nonlinear programming under equality constraints, 2022, 406, 03770427, 113912, 10.1016/j.cam.2021.113912
    10. Saman Babaie-Kafaki, A survey on the Dai–Liao family of nonlinear conjugate gradient methods, 2023, 57, 0399-0559, 43, 10.1051/ro/2022213
    11. Branislav Ivanov, Gradimir V. Milovanović, Predrag S. Stanimirović, Aliyu Muhammed Awwal, Lev A. Kazakovtsev, Vladimir N. Krutikov, Xian-Ming Gu, A Modified Dai–Liao Conjugate Gradient Method Based on a Scalar Matrix Approximation of Hessian and Its Application, 2023, 2023, 2314-4785, 1, 10.1155/2023/9945581
    12. Sanaz Bojari, Mahmoud Paripour, Xian-Ming Gu, Solving Large-Scale Unconstrained Optimization Problems with an Efficient Conjugate Gradient Class, 2024, 2024, 2314-4785, 1, 10.1155/2024/5548724
    13. Diego Armando Perez-Rosero, Andrés Marino Álvarez-Meza, Cesar German Castellanos-Dominguez, A Regularized Physics-Informed Neural Network to Support Data-Driven Nonlinear Constrained Optimization, 2024, 13, 2073-431X, 176, 10.3390/computers13070176
    14. Naleli Jubert Matjelo, Sekhants'o Paul Lara, Moruti Kao, Limakatso Lepekola, Mampesi Thato Matobako, Madan Singh, 2024, Double Decay Parameter Estimation with Autoregressive and Differential Linear Regression, 979-8-3503-9591-4, 1, 10.1109/ICECET61485.2024.10698202
    15. O. B. Onuoha, R. O. Ayinla, G. Egenti, O. E. Taiwo, Exploring Enhanced Conjugate Gradient Methods: A Novel Family of Techniques for Efficient Unconstrained Minimization, 2024, 2581-8147, 773, 10.34198/ejms.14424.773791
    16. Long Jin, Lin Wei, Xin Lv, 2025, Chapter 4, 978-3-031-68593-4, 99, 10.1007/978-3-031-68594-1_4
    17. Nur Idalisa, Norhaslinda Zullpakkal, Mohd Rivaie, Nurul ’Aini, Nurul Hajar, Wan Khadijah, Nurul Hafawati Fadhilah, 2024, 3080, 0094-243X, 030012, 10.1063/5.0192308
    18. Elena Tovbis, Vladimir Krutikov, Predrag Stanimirović, Vladimir Meshechkin, Aleksey Popov, Lev Kazakovtsev, A Family of Multi-Step Subgradient Minimization Methods, 2023, 11, 2227-7390, 2264, 10.3390/math11102264
    19. João Miguel Ferreira, Optimal control policies for a non-eruptive population of rodents—The relevance of migration, 2023, 484, 03043800, 110477, 10.1016/j.ecolmodel.2023.110477
    20. Peng Yu, Shuping Tan, Jin Guo, Yong Song, Data-driven optimal controller design for sub-satellite deployment of tethered satellite system, 2024, 32, 2688-1594, 505, 10.3934/era.2024025
    21. Jamilu Sabi’u, Sekson Sirisubtawee, An inertial Dai-Liao conjugate method for convex constrained monotone equations that avoids the direction of maximum magnification, 2024, 70, 1598-5865, 4319, 10.1007/s12190-024-02123-2
    22. Chunming Tang, Wancheng Tan, Yongshen Zhang, Zhixian Liu, An accelerated spectral CG based algorithm for optimization techniques on Riemannian manifolds and its comparative evaluation, 2025, 462, 03770427, 116482, 10.1016/j.cam.2024.116482
  • Reader Comments
  • © 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(4741) PDF downloads(771) Cited by(4)

Article outline

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog