1.
Introduction
We consider the numerical solution of the following system of nonlinear equations
where F:Rn→Rm is a continuously differentiable function.
Nonlinear equations of the form (1.1) are often solved as a key ingredient in simulations of many real-world problems. Classic methods for solving (1.1) include the Gauss-Newton method, the inexact Newton method, the Broyden's method and the trust region method [9,13,15]. In actual computations, however, the Gauss-Newton method becomes less competitive when the Jacobian is (nearly) rank-deficient. By adopting a trust-region approach in place of the line search in the Gauss-Newton method, the Levenberg-Marquardt (LM) method circumvents this shortcoming even though it uses the same Hessian approximations as in the Gauss-Newton method.
In the trial step of the LM method, one needs to solve per step the following linear system
where λk≥0, Fk=F(xk), Jk=J(xk) is the Jacobian and I∈Rn×n stands for the identity matrix. If Jk is nonsingular and Lipschitz continuous for the case m=n, the initial guess x0 is close enough to the solution x∗ of (1.1) and the LM parameter λk is updated recursively, then the LM method has a quadratic convergence rate.
For some applications, the need for a nonsingular Jacobian Jk can be rather stringent. Therefore, it is necessary to come up with numerical methods in the absence of a nonsingular Jacobian. To this end, some efforts have been made recently; for instance, Yamashita et al. propose a local error bound condition which does not requires nonsingularity of the Jacobian [19]. In what follows, we denote by X∗ the nonempty solution set of (1.1) and use ‖⋅‖ to represent the 2-norm of vectors or matrices if there is no ambiguity. Let N(x∗,b)={x|‖x−x∗‖≤b} be a subset of the n-dimensional vector space such that the intersection X∗∩N(x∗,b) is nonempty. The LM method is shown to have a quadratic convergence rate if there exists a positive constant c satisfying the following local error bound condition [2,6,19]
where dist(x,X∗) is the distance from x to X∗.
In spite of the advantage of avoiding nonsingularity of the Jacobian, the local error bound condition (1.3) is not always applicable for some ill-conditioned nonlinear equations from application fields like biochemical systems. In light of this, Guo et al. present the Hölderian error bound condition that is more applicable than (1.3) [8]. The {Hölderian error bound condition} is given by
where c>0 and γ∈(0,1]. Obviously, the Hölderian error bound condition (1.4) includes the local error bound condition (1.3) as a special case. In fact, the bound (1.4) reduces to (1.3) when γ=1. It should be noted that the Hölderian local error bound condition is also called Hölder metric subregularity which is closely related with the Łojasiewica inequalities; see [14,17] for detail. With the assumption (1.4), the LM method converges at least superlinearly when γ and the LM parameter satisfy certain conditions [1,8,18,21].
Apart from its application in solving systems of nonlinear equations, the LM method also finds its way into numerical solution of nonlinear least squares problems. To investigate the local convergence of the LM method for the nonlinear least-squares problem with possible nonzero residue, Behling et al. [3] present a local error bound condition characterized by ‖J(x)TF(x)‖, i.e.,
where c>0. We stress that the local error bound condition (1.5) can also be derived from the bound (1.3) [10,Lemma 5.4]. However, the former is more practical than the latter in that it does not require the nonsingularity of the Jacobian. With the assumption (1.5), the LM method is shown to have at least linearly convergence order with suitable choices of the LM parameter [3].
As observed from (1.2), the LM parameter λk is introduced in case that JTkJk is (nearly) singular. Such practice not only guarantees the uniqueness of solution of (1.2) but also helps to reduce the iteration steps. In this sense, the LM parameter plays a key role in the LM method. Some promising candidates of the LM parameter have been proposed recently; for instance, Yamashita et al. [19] select λk=‖Fk‖2 and show that the LM method has quadratically convergence with the assumption (1.3). Fan and Yuan [6] generalize it with the LM parameter λk=‖Fk‖δ with δ∈[1,2]. It is shown that the quadratic convergence is still retained with the assumption (1.3). Dennis and Schnable consider the choice λk=O(‖JTkFk‖) [4]. Following this reasoning, Fischer employs λk=‖JTkFk‖ in [7] which is further generalized to the form λk=‖JTkFk‖δ with δ∈(0,1] in [3]. With the assumtion (1.5), Behling et al. conclude that the LM method converges at least linearly to some solution of (1.1) when δ∈(0,1) and quadratically when δ=1 [3]. More recent progress in choosing the LM parameter λk can be found in [5,10,11].
Instead of adopting the choice used in [3], we propose to use the LM parameter λk=‖JTkFk‖δ with δ∈[1,2] in this work. The motivation of our work is clarified as follows. Intuitively, the step size ‖dk‖ is small if ‖JTkFk‖ is too large, which may hamper a fast convergence. Fortunately, it poses no difficulty by considering the following choice, i.e.,
From the convergence theory, we know ‖JTkFk‖ always converges to 0, hence ‖JTkFk‖>1 only occurs at beginning finite iterate steps and it is a special case for the numerical method. Since the choice of λk in (1.6) is adaptive, then the variant LM method is called an adaptive Levenberg-Marquardt method (ALMM) in this paper.
The rest of this paper is organized as follows. In Section 2, the adaptive Levenberg-Marquardt method is introduced. Its convergence rate under the assumption (1.5) is examined. In section 3, the adaptive Levenberg-Marquardt method with Wolfe line search rule as well as its global convergence are investigated. In Section 4, some numerical experiments are used to verify the effectiveness of the new method. Finally, some conclusions are given in Section 5.
2.
Local Convergence of the adaptive LM method
In this section, we consider the adaptive LM method with unit step size and investigate its local convergence near a solution.
To begin with our discussion, we present the following adaptive LM method:
where the LM parameter is defined in (1.6).
To establish the local convergence results for the adaptive LM algorithm, we need the following assumptions throughout the paper.
Assumption 2.1. (a) The Jacobian J(x) is Lipschitz continuous in a neighborhood N(x∗,b), i.e., there exists a constant L1>0 such that
(b) We said that ‖J(x)TF(x)‖ provides a local error bound on N(x∗,b) if there exists a constant c>0 such that
To guarantee the initial point x0 is sufficiently close to x∗, we assume b>0 is sufficient small.
From Assumption 2.1(a), we note that
By compactness, we have
where constants L2>0 and β>0. Therefore, it follows from the mean value inequality that
Denote by ˉxk∈X∗ which satisfies
Lemma 2.1. Let the sequence {xk} be generate by the adaptive LM method and Assumptions 2.1 hold. There exists some positive constants c1,˜c1, such that
Proof. We derive the proof in two cases.
Case I: ‖JTkFk‖≤1. Then λk=‖JTkFk‖δ. From Assumption 2.1 (b), the inequality in the left-hand side (2.7) is obtained, i.e.,
Now, we verify the right-hand side inequality in (2.7).
It follows from (2.5) and (2.6) that
where c1=L22+βL1. Since λk=‖JTkFk‖δ, then we obtain
Case II: ‖JTkFk‖>1. Then λk=‖JTkFk‖−δ<1. From (2.8), we also have
Summarizing the above two cases, we obtain the inequality (2.7) with ˜c1=min{cδ,c−δ1}. The proof is completed.
Lemma 2.2. Let the sequence {xk} be generate by the adaptive LM method and Assumptions 2.1 hold. If xk∈N(x∗,b/2), there exists a constant c2>0 such that
{Proof. From the assumption, we have
which indicates that ˉxk∈N(x∗,b). Define
From (2.1) and the convexity of φk(d), we note that dk is not only a stationary point but also a minimizer of φk(d). By using the fact that xk,ˉxk∈N(x∗,b), we have from (2.4) and Lemma 2.1 that
It implies that
where c2=√L21˜c1+1. The proof is completed.
Lemma 2.3. Let the sequence {xk} be generate by the adaptive LM method and Assumptions 2.1 hold. Assume xk,xk+1∈N(x∗,b/2), then
Proof. For all xk,xk+1∈N(x∗,b/2), we get from (2.4) and (2.5) that
and
By the triangle inequality, the above inequality yields
For all ˉxk∈X∗∩N(x∗,b), we obtain
Similarly, using the triangle inequality yields
It follows from (1.2), (2.11) and (2.13) that
From Lemma 2.2, we have ‖dk‖≤c2‖ˉxk−xk‖, which implies that
Since ˉxk∈X∗∩N(x∗,b) and δ∈[1,2], together with Assumption 2.1 (b), (2.12), (2.14) and (2.15), we obtain
The proof is completed.
Henceforth, according to the choices of the LM parameter, namely ‖JTkFk‖≤1 and ‖JTkFk‖>1, we divide the convergence analysis in two cases.
Case 1: ‖JTkFk‖≤1
Firstly, we consider the convergence rate of the adaptive LM method with the LM paramter ‖JTkFk‖≤1 in this subsection.
Lemma 2.4. Let the sequence {xk} be generate by the adaptive LM method and Assumptions 2.1 hold. If xk,xk+1∈N(x∗,b/2) and ‖JTkFk‖≤1, then there exists a positive constant c3 such that
Proof. From Lemmas 2.1 and 2.3, we have
Since δ∈[1,2], then Lemma 2.4 holds with c3=c−1(L1L2(2+3c2+2c22)+L2c2cδ1+L1L22(2+c2)(1+c2)2). The proof is completed.
Lemma 2.4 shows that if xk∈N(x∗,b/2) for all k, then {dist(xk,X∗)} converges to zero quadratically. Next, we show that the latter theory holds if x0 is sufficiently close to x∗. Let
Lemma 2.5. Let the sequence {xk} be generate by the adaptive LM method and Assumptions 2.1 hold. If x0∈N(x∗,r) with r given by (2.17), then for all k, we have xk∈N(x∗,b/2).
Proof. We show the proof by induction. It follows from Lemma 2.2 that
It indicates that x1∈N(x∗,b/2). Assume for i=2,⋯,k, xi∈N(x∗,b/2). It follows from Lemma 2.4 that
where the last inequality is derived from ‖x0−x∗‖≤r and r≤1/2c3. Therefore, we have from Lemma 2.2
for i=1,⋯,k. It then follows from (2.17) that
which indicates that xk+1∈N(x∗,b/2). The proof is completed.
Theorem 2.1. Let Assumption 2.1 hold and {xk} be the LM sequence which is generated by the adaptive LM method with x0∈N(x∗,r), where r is given by (2.17). If ‖JTkFk‖≤1, then the sequence {dist(xk,X∗)} converges to zero quadratically. Moreover, {xk} converges to a solution of (1.1).
Proof. Lemma 2.4 and 2.5 indicates that the sequence {dist(xk,X∗)} converges to 0 quadratically. So, we only have to prove the second part.
According to the assumption, we have xk∈N(x∗,b/2) for all k. Then we only have to prove that {xk} converges to some solution ˉx∈X∗. In fact, for any p,q∈N+ (let p≥q, we also obtain the same result for p<q), from (2.18), we have
The above inequality indicates that the sequence {xk} is a Cauchy sequence, and hence {xk} converges. The proof is completed.
Theorem 2.1 shows that the sequence {dist(xk,X∗)} converges to zero quadratically and {xk} converges to the solution set X∗. However, little is known about the behaviour of the sequence {xk}. In the following theorem, we will see that the sequence {xk} converges to a solution ˉx of (1.1), and that the rate of convergence is also locally quadratic.
Theorem 2.2. Let Assumption 2.1 hold, {xk} be the LM sequence which is generated by the adaptive LM method with x0∈N(x∗,r) where r is given by (2.17), and limit point ˆx∗∈X∗∩N(x∗,b/2). If ‖JTkFk‖≤1, then the sequence {xk} converges to ˆx∗ quadratically.
Proof. In view of Theorem 2.1, we have dist(xk+1,X∗)≤12dist(xk,X∗) for all sufficiently large k. By letting p→∞ in (2.19), we deduce from Lemma 2.2 and 2.4 that
where the last inequality follows from the definition of dist(xk,X∗). Hence, the sequence {xk} converges to ˆx∗ quadratically. The proof is completed.
Case 2: ‖JTkFk‖>1
Now, we consider the convergence rate of adaptive LM method with the LM paramter ‖JTkFk‖>1.
Lemma 2.6. Let the sequence {xk} be generate by the adaptive LM method and Assumptions 2.1 hold. Assume ‖JTkFk‖>1, if xk,xk+1∈N(x∗,b/2) and dist(xk,X∗)≤r<1, where
with c>L2c2, then there exists a positive constant c4∈(0,1) such that
Proof. From Lemma 2.1, we have λk<1. Together with Lemma 2.3, we obtain
which indicates that Lemma 2.6 holds with c4=c−1(L1L2(2+3c2+2c22)+L1L22(2+c2)(1+c2)2)r+c−1L2c2.
The proof is completed.
Lemma 2.7. Let the sequence {xk} be generate by the adaptive LM method and Assumptions 2.1 hold. If x0∈N(x∗,r) with r given by (2.20), then for all k, we have xk∈N(x∗,b/2) and dist(xk,X∗)≤r.
Proof. Since the proof is analogous to the one of Lemma 2.5, we only verify the inductive step, i.e., assume Lemma 2.7 holds with i=k and consider the next step.
It follows from Lemma 2.6 that
and
Thus, from Lemma 2.2 and (2.20), we have
which indicates that xk+1∈N(x∗,b/2). The proof is completed.
Theorem 2.3. Let Assumption 2.1 hold and {xk} be the LM sequence which is generated by the adaptive LM method with x0∈N(x∗,r), where r is given by (2.20). If ‖JTkFk‖>1, then the sequence {dist(xk,X∗)} converges to zero linearly. Moreover, the sequence {xk} converges to a solution ˆx∗∈X∗∩N(x∗,b/2) linearly.
Proof. The proof is similar to the proofs of Theorems 2.1 and 2.2.
3.
Global convergence of the adaptive LM method
To establish the global convergence of the adaptive LM method, we employ some line search rules such as Armijo rule, Goldstein rule and Wolfe rule [15] etc. Consider the merit function
At iteration k, the next step is computed by
where dk is a direction from (2.1) and αk is a step size satisfying certain line search conditions. The Wolfe line search is one of commonly used inexact line search which requires αk>0 satisfies
and
Here σ1≤σ2 are two constants in (0,1).
Algorithm 3.1 (The adaptive LM method with Wolfe line search).
Step 1: Given x0∈Rn, δ∈[1,2], η∈(0,1), σ1∈(0,1/2), σ2∈(σ1,1), k:=0.
Step 2: If ‖JTkFk‖=0, stop. Set λk as (1.6); determine dk by computing (2.1).
Step 3: If dk satisfies
set xk+1=xk+dk, and go to step 5. Otherwise, go to step 4.
Step 4: Set xk+1=xk+αkdk, where αk is determined by Wolfe line search.
Step 5: Set k:=k+1; go to Step 2.
Theorem 3.1. Assume F(x) is continuously differentiable. Let {xk} be a sequence generated by Algorithm 3.1. Then any accumulation point x∗ of {xk} is a stationary point of Φ.
Proof. From [20,Eq (2.10)], the inequality (3.1) implies that
where σ3 is some positive constant. Together with Steps 3 of Algorithm 3.1, the sequence {‖f(xk)‖} is monotonically decreasing and bounded from below, and thus converges to zero. Hence {xk} converges to a stationary point x∗ of Φ. The proof is completed.
Theorem 3.2. Under Assumption 2.1, let {xk} be a sequence generated by Algorithm 3.1 and has an accumulation point x∗. If x∗ is a solution of system of nonlinear Eq (1.1), then the sequence {xk} converges to x∗ at least linearly.
Proof. It is sufficient to show that ‖F(xk+dk)‖≤η‖F(xk)‖ holds for all large k.
If ‖JTkFk‖≤1. Since the sequence {xk} converges to a stationary point x∗ which is a solution of system of nonlinear Eq (1.1), we have that
and
hold for all sufficiently large K∈N, where r is defined by (2.17), and c, c3 and L2 are given in Section 2.
Let sequence {yk} be generated by the adaptive LM method with unit step size and y0=xK. Then, by the result of Theorem 2.1, the sequence dist(yl,X∗) quadratic converges to zero. Hence, we only have to prove that xK+l=yl for all l∈N, i.e., the sequence {yl} satisfies
Let ˉyl+1∈X∗ such that dist(yl+1,X∗)=‖ˉyl+1−yl+1‖. Then we obtain from Assumption 2.1(b), Lemma 2.4, (2.6) and (3.4) that
holds for η∈(0,1) and all l. The above inequality indicates that the step size αk=1 holds for all large k in Algorithm 3.1. We conclude that (3.2) holds for all k≥K. Consequently, by mathematical induction, Algorithm 3.1 reduces to the adaptive LM method for all k≥K. Thus, we have that {xk} converges to the solution x∗ quadratically.
Similar to the above process, when ‖JTkFk‖>1, we obtain that {xk} converges to the solution x∗ linearly.
The proof is completed.
4.
Numerical examples
In this section, we carry out some numerical experiments to verify the effectiveness of the proposed adaptive Levenberg-Marquardt method (ALMM). The Levenberg-Marquardt method (LMM) given by Behling et al. [3] is used for comparison. The first test is a nonlinear least squares problem while the second are some systems of nonlinear equations.
Example 4.1. Consider the nonlinear least squares problem [3]
where F(x)=(x31−x1x2+1,x31+x1x2+1)T.
Consider X∗={(0,ξ),ξ∈R} be the non-isolated set of minimizers such that dist(x,X∗)=|x1|. Then the rank of the Jacobian will be 0 at the origin, 1 at x with x1=0 for x2≠0, and 2 for x1≠0. Thus the Jacobian is not always of full rank at the stationary points. The starting point is set to be x0=(0.008,2)T. All methods terminate if ‖JTkFk‖<10−10. The results are tabulated in Table 1.
As illustrated, ALMM generally converges to the required accuracy with less iterations than LMM. Besides, distances between xk obtained from ALMM and the solution set X∗ are shorter than those from LMM.
Example 4.2. Consider systems of nonlinear equations adapted from the nonsingular problems given in [12,16]
where F(x) is the standard nonsingular test function, x∗ is its root, and A∈Rn×k has full column rank with 1≤k≤n. It is easy to check that ˆF(x∗)=0 and the rank of ˆJ(x∗)=J(x∗)(I−A(ATA)−1AT) is n−k. A disadvantage of these problems is that ˆF(x) may have roots that are not roots of F(x). We present two sets of singular problems with the rank of ˆJ(x∗) being n−1 and n−2, respectively. The corresponding matrices of A and AT are given by
and
Note that the size of the original problem which has n+2 equations in n unknowns is reduced by eliminating the (n−1)st and the nth equations.
Several choices of the LM parameter are considered in the two LM methods. In accordance with the range of δ defined in LMM and ALMM, we use δ=10−4, 0.5 and 1 associated with λk=‖JTkFk‖δ for LMM and employ δ=1, 1.5 and 2 for ALMM. All algorithms are terminated if ‖JTkFk‖<10−6 or the number of the iterations exceeds 100(n+1). Numerical results for the rank n−1 case and the rank n−2 case are listed in Table 2 and in Table 3, respectively. The values 1, 10 and 100 in the third column associate with starting points with x0, 10x0, and 100x0, where x0 is the option suggested in [12]. The symbol "–" is used if the corresponding method fails to reach the required accuracy within the prescribed maximum iterations. To ensure the numerical stability, we use the MATLAB function pcg (the preconditioned conjugate gradient method) to solve the inner linear system (1.2).
Some remarks are in order. In all tests, ALMM converges to the required accuracy within the maximum iterations while LMM fails for some cases; see, for instance, Powell badly scaled problem in Table 2 and Discrete integral equation problem in Table 3. Furthermore, the number of iteration step required by ALMM is less than that by LMM. For this reason, we conclude that ALMM is a competitive variant of the Levenberg-Marquardt method.
5.
Conclusions
We present a Levenberg-Marquardt method with an adaptive LM parameter for solving systems of nonlinear equations. We have analyzed its local and global convergence under a new error bound condition of function, which can be derived from the local error bound condition, and Lipschitz continuity of the Jacobian. These properties hold in many applied problems, as they are satisfied by any real analytic function. The effectiveness of the adaptive Levenberg-Marquardt method is validated by the numerical examples.
Acknowledgements
The work of Lin Zheng was supported by the Natural Science Foundation of the Higher Education Institutions of Anhui Province grant KJ2020A0017. The work of Liang Chen was supported by the Abroad Visiting of Excellent Young Talents in Universities of Anhui Province grant GXGWFX2019022 and the Natural Science Foundation of the Higher Education Institutions of Anhui Province grant KJ2020ZD008. The work of Yanfang Ma was supported by the Natural Science Foundation of Anhui Province grant 2108085MF204 and the Natural Science Foundation of the Higher Education Institutions of Anhui Province grant KJ2019A0604. Part of this work was done while Liang Chen and Yanfang Ma were visiting scholars at Department of Mathematics, the University of Texas at Arlington from August 2019 to August 2020. They would like to thank Prof. Ren-Cang Li for his hospitality during the visit.
Conflict of interest
The authors declare no conflict of interest.