1.
Introduction
Consider the problem of finding the roots of the system of the nonlinear equations
where f:D⊆Rn→Rm. We assume throughout that f(x)=[f1(x),⋯,fm(x)]T is a continuously differentiable vector-valued function, and x=(x1,⋯,xn)T is an n-dimensional unknown vector. In this article, we exclusively study the overdetermined (m≥n) nonlinear system, where there exists a single solution x∗ such that f(x∗)=0. This kind of nonlinear problem widely exists in practical applications, such as machine learning [6], differential equations [26], convex optimization, and deep neural networks [15].
The Newton-Raphson (NR) method, Broyden method [1], and directional secant method [2] are some iterative methods for solving nonlinear equations. The iterative formula of the Newton-Raphson method is as follows:
where f′(xk)=[∇f1(xk),⋯,∇fm(xk)]T∈Rm×n is the Jacobian matrix of f at xk. ∇fi(xk)T is its i-th row and (f′(xk))† is the Moore-Penrose pseudoinverse of f′(xk). This approach, of course, is highly disadvantageous in the computation as it necessitates the computation of the full Jacobian matrix as well as its Moore-Penrose pseudoinverse. This leads to high calculation costs. In recent years, the nonlinear Kaczmarz method has also been developed. Wang, Li and Bao [28] proposed the nonlinear Kaczmarz (NK) method by generalizing the Kaczmarz method [14] to the nonlinear case, which still uses the core idea of the Kaczmarz method (i.e., only one row of the coefficient matrix is used in each iteration). Furthermore, for the case where the Jacobian matrix in the problem is singular, this method can be computed quickly and circumvents the shortcomings of classical methods such as the Newton method.
In contrast to the nonlinear Kaczmarz methods, the linear Kaczmarz methods are now more advanced. Next, we tried to get more efficient nonlinear Kaczmarz methods to use the optimization idea of the linear Kaczmarz methods. It is commonly known that the Kaczmarz method [14], whose main idea is to project the current point onto the solution space given by a row of the coefficient matrix, has drawn a lot of attention lately due to its computational simplicity and efficiency. In 2009, Strohmer and Vershynin proposed a randomized Kaczmarz (RK) [27] method, which selects row indexes in a random order, rather than a cyclic order, during each iteration. They proved the linear convergence of the method, which gave a great impetus to the research on the Kaczmarz methods [7,8,9,10,16,18,23,35]. Some researchers have also extended the Kaczmarz method to solve systems of inequality [21] and tensor equations [29].
Enhancing the Kaczmarz method includes two primary measures. One approach is to incorporate certain criteria to the selection of the working rows so that the larger residuals can be quickly eliminated. One of the most typical approaches is the greedy randomized Kaczmarz (GRK) method [4]. In addition, six typical work row selection rules are summarized in [3], which are the uniform, non-uniform, residual, distance, maximal residual, and maximal distance selection rules.
Another method is to select multiple working rows for each iteration, which we call the block method. By adopting the block idea [5,11,25], many researchers have sped up the standard Kaczmarz method's convergence. The block technique involves using a few rows of the coefficient matrix A for the linear system Ax=b at each iteration. The following is a description of the block Kaczmarz technique [24]:
where A†τk represents the Moore-Penrose pseudoinverse of the chosen submatrix Aτk and τk is a subset of indicators selected according to some rule. However, the Moore-Penrose pseudoinverse must be computed for each iteration in the block Kaczmarz approach, and this is typically rather costly. Necoara [22] used some updated convex combinations as the new direction of the next iteration to build a unified framework for the randomized average block Kaczmarz method [30]. The Gaussian Kaczmarz method [13] can be regarded as another kind of block Kaczmarz method, that is
where η is a Gaussian vector with mean 0∈Rm and the covariance matrix I∈Rm×m so that η∼N(0,I).
In this paper, motivated by the above two ideas, we attempt to implement them in the nonlinear Kaczmarz method. The main contributions of this paper are as follows:
First, the common nonlinear block Kaczmarz method has the same drawback as the classical linear block Kaczmarz method in that it requires the computation of matrix pseudoinverses. The computation of the pseudoinverse of the Jacobian matrix requires a lot of computation, especially when the problem dimension is large. To solve this issue, this paper extends the averaging technique from linear to nonlinear methods.
Second, there are certain benefits for choosing the greedy criteria in this research. To avoid calculating the Frobenius norm of the entire Jacobian matrix, we used the second greedy rule in [33] as the criterion in the first method. In [32], Zhang et al. proposed a nonlinear Kaczmarz method with a greedy selection strategy, which is specified as follows:
which aims at choosing the maximum component of the residual vector. They also showed that in terms of numerical experiments and theoretical analysis, the methods with greedy rules are faster than the NK method. Therefore, we used it as the second greedy criterion in this paper.
In summary, inspired by [32,33], we extended the pseudoinverse-free block Kaczmarz method for solving linear equations to the nonlinear situation and incorporated greedy principles to accelerate the convergence of algorithms. We presented two kinds of pseudoinverse-free greedy block nonlinear Kaczmarz methods: the nonlinear greedy average block Kaczmarz (NGABK) method and the maximum residual nonlinear average block Kaczmarz (MRNABK) method. The convergence analyses of the two algorithms are given in detail. Numerical experiments showed that our proposed methods are more effective than the previous methods. In most cases, the MRNABK method was better than the NGABK method, and both of them were better than several state-of-the-art solvers.
The rest of this paper is organized as follows: In Section 2, the notations and preliminaries are provided. In Section 3, we provide the two pseudoinverse-free greedy block nonlinear Kaczmarz methods. We establish their convergence theorems in Section 4. The numerical experiments are given in Section 5. Finally, we make a summary of the whole work in Section 6.
2.
Notations and preliminaries
For any matrix A∈Rm×n, we use σmax(A), σmin(A), ‖A‖2, ‖A‖F=√∑mi=1∑nj=1|aij|2, A†, and Aτ to denote the maximum and minimum nonzero singular values of A, the spectral norm, the Frobenius norm, the Moore-Penrose pseudoinverse, and the row submatrix of matrix A indexed by the index set τ. rk denotes the residual vector of the k-th iteration. ei∈Rm denotes the unit vector where the i-th element is 1 and the rest are 0. For an integer m≥1, let [m]:={1,…,m}. At the k-th iteration, |τk| is the cardinal number of the set τk. Set γ>0, B(x∗,γ)≜{x∈Rn∣‖x−x∗‖2≤γ}.
Definition 1 ([28]) If for every i∈[m] and ∀x1,x2∈D⊆Rn, there exists ξi∈[0,ξ) satisfying ξ=maxiξi<12 such that
then the function f:D⊆Rn→Rm is referred to satisfy the local tangential cone condition.
Lemma 1 ([34]). If the function f satisfies the local tangential cone condition, then for ∀x1,x2∈D⊆Rn and an index subset τ⊆[m], we have
Lemma 2 ([19]). Let ρ1≥⋯≥ρn and ζ1≥⋯≥ζn be the singular values of the matrices A and B respectively. Then
Lemma 3. Suppose f′(x) is a full column rank matrix for ∀x∈D⊆Rn. Then there exist σ_ and ¯σ such that
Proof. For any x∈D, f′(x) is full column rank, then we have σmax(f′(x))=σ1(f′(x))≥⋯≥σn(f′(x))=σmin(f′(x))>0. By Lemma 2, σi(f′(x))(i=1,…,n) is a continuous function of f′(x). In addition, f′(x) continuously depends on x. Then, σi(x)(i=1,…,n) is a continuous function of x.
Since D is bounded closed and σmin(x) is a continuous function of x, there exists a point x_∈D such that σ_=σmin(x_)=infx∈Dσmin(x)>0. Then, we have
Similarly, since D is bounded closed and σmax(x) is a continuous function of x, there exists a point ¯x∈D such that ¯σ=σmax(¯x)=supx∈Dσmax(x)<∞. Then, we have
□
3.
Pseudoinverse-free greedy block nonlinear Kaczmarz methods
Yuan et al. developed a randomized NR method based on the sketch-and-project technique [31], which is called the sketched Newton-Raphson (SNR) method. The formula of the SNR method is written as follows,
When Sk=ηk=∑i∈τk(−fi(xk))ei in Eq (3.1), we get an iterative formula as follows:
Based on the greedy randomized Kaczmarz method [4], the block indices Ik in distance-residual capped nonlinear Kaczmarz (DR-CNK) method [33] is chosen by
with
that is, the information of the entire Jacobian matrix is required. This will result in a large amount of computation. Now, by choosing the τk by
with
we get a nonlinear greedy average block Kaczmarz (NGABK) method which is described in case 1 of Algorithm 1. At the k-th iteration, a random vector ηk is drawn and a search direction is formed by f′(xk)Tηk. When f′(xk)Tηk=0, no line search is performed.
Remark 1. The computational complexity of the iterative formula of our proposed method is O(|τk|n) much smaller than that of the Newton-Raphson method O(n2) at each iteration. Here, in conjunction with the numerical experiments later, the size of τk is usually m/3.
Remark 2. The index set τk is always nonempty. Because
and then
implies ik∈τk. Therefore, the method is well-defined.
According to the maximum residual rule, we establish the maximum residual nonlinear average block Kaczmarz (MRNABK) method in case 2 of Algorithm 1. In this method, ϱ∈[0,1] is the relaxation parameter, which can be determined in numerical experiments. It is obvious that in the k-th iteration of the MRNABK Algorithm, the set τk is also non-empty. This is because
That is to say, the largest residual component ik is always in the set τk.
From the MRNABK method, we can see that the larger components of the residual are eliminated preferentially, which greatly improves the efficiency of the algorithm. We establish its convergence theorem in Section 4.
4.
Convergence analysis
Lemma 4. If the function f satisfies the local tangential cone condition, then for i∈[m], ξ=maxi∈[m]ξi<12, ∀x1,x2∈B(x∗,γ)⊆D and the updating formula (3.2), we have
Proof. From the updating formula (3.2), we have
According to the definition of ηk and fi(x∗)=0, we have
From Definition 1, we have
□
Remark 3. It follows from Lemma 4 that xk+1∈B(x∗,γ)⊆D when xk∈B(x∗,γ)⊆D. So, if f′(x) is a full column rank matrix, the iterative sequence {xk} generated by the algorithm is well-defined.
Theorem 1. Consider that the nonlinear system of equations f(x)=0, f:D⊆Rn→Rm on a bounded closed set D, and there exists x∗ such that f(x∗)=0. For ∀x∈D, the nonlinear function f satisfies the local tangential cone condition given in Definition 1, ξ=maxi∈[m]ξi<12 and f′(x) is a full column rank matrix. Assume that x0∈B(x∗,γ)⊆D⊆Rn, then the iterations of the NGABK method in case 1 of Algorithm 1 satisfy
Proof. Let Ek∈Rm×|τk| be the matrix whose columns consist of all the vectors ei∈Rm with i∈τk. Denote f′τk(xk)=ETkf′(xk), ˆηk=ETkηk, then
and
Therefore, we have
where σmax(f′τk(xk)) is the largest singular value of submatrix f′τk(xk) of the Jacobian matrix f′(xk). From the definition of ηk, we have
From the definition of τk, we have
Further, using Lemma 4, we can obtain
In addition, we have σ2max(f′τk(xk))=‖f′τk(xk)‖22≤‖f′(xk)‖22≤¯σ2 and δk=12(maxi∈m|fi(xk)|2‖f(xk)‖22+1m)≥1m and use Lemma 3, so
So, the convergence of the NGABK method is proved. □
Remark 4. Since 1−2ξ<1<1+ξ2 and σ_2<m¯σ2, we have
This shows that the convergence factor of our method is strictly smaller than 1.
Now, we give the convergence theorem of the MRNABK method.
Theorem 2. Consider that the nonlinear system of equations f(x)=0, f:D⊆Rn→Rm on a bounded closed set D, and there exists x∗ such that f(x∗)=0. For ∀x∈D, the nonlinear function f satisfies the local tangential cone condition given in Definition 1, ξ=maxi∈[m]ξi<12 and f′(x) is a full column rank matrix. Assume that x0∈B(x∗,γ)⊆D⊆Rn, then the iterations of the MRNABK method in case 2 of Algorithm 1 satisfy
Proof. Following an analogous proof process to the NGABK method, we get the following formula:
The second inequality follows from the definition of τk. Using max1≤i≤m|fi(xk)|2≥1m‖f(xk)‖22, we can get the third inequality.
From Lemma 1, it follows that
Further, using Lemmas 4 and 3, we obtain
So, the convergence of the MRNABK method is proved. □
Remark 5. Similarly, we have
This shows that the convergence factor of our method is strictly smaller than 1.
5.
Numerical examples
In this section, we primarily compare the efficiency of our new methods with the Broyden method [1], the NRK method [28], the residual-distance capped nonlinear Kaczmarz (RD-CNK) method, and the residual-based block capped nonlinear Kaczmarz (RB-CNK) method [33] for solving the nonlinear systems of equations in the iteration steps (denoted as 'IT') and computing time in seconds (denoted as 'CPU'). The RD-CNK method and the NRK method are based on a single sample. The RB-CNK method is based on multi-sampling and uses the following iteration scheme:
where Ik is the selected index subset. The target block in the MRNABK method is calculated by
where ϱ∈(0,1]. However, the choice of ϱ is only for the experiments in this paper.
In the numerical experiment, IT and CPU are the average of the results of 10 times repeated runs of the corresponding method. All experiments are terminated when the number of iterations exceeds 200, 000 or ‖f(xk)‖22<10−6. All of the tables below show the number of IT and CPU required for several algorithms to reach the stop preparation (''–'' indicates that the convergence cannot be achieved under the stop criterion). Additionally, the logarithm diagram of the norm of the nonlinear residual and the number of IT, as well as the connection diagram between the number of CPU or IT and the number of equations, are provided for each experiment. Our experiment is implemented on MATLAB (version R2018b).
Example 1. In this example, we consider the following equations:
The system of equations is called H-equation, which is usually used to solve the problem of outlet distribution in radiation transmission [28]. In this problem, N represents the number of equations and μi=(i−12)/N. When c∈(0,1), the discrete problem has solutions. We set x0 be the zero vector and c=0.9. In the Broyden method, we set the approximate matrix for the Jacobian matrix to be the identity matrix. First of all, we tested the value of parameter ϱ. In Table 1, we observed that in most cases, the MRNABK method required relatively less computing time, when ϱ=0.1,0.2,0.3. When the number of equations was fixed, we found that the larger ϱ was, the longer the calculation time of the MRNABK method was. So, in this example, we set ϱ=0.1. Next, we tested the performance of our methods and other methods. The results of the numerical experiments are listed in Tables 2 and 3. The results show that the NGABK method and the RB-CNK method based on multiple sampling were substantially faster than the RD-CNK method and the NRK method based on single sampling, as shown in Figure 1. Figure 2 plots the running time (CPU) and iteration steps (IT) of different methods for different N. According to Figure 2, the NGABK method outperformed the RB-CNK method in terms of CPU. From Table 4, the MRNABK method converged faster than the NGABK method in terms of computing time and iteration steps. For H-equation with f:Rn→Rn, the Broyden method had a better numerical result.
Example 2. In this example, we consider the Brown almost linear function [20],
In this experiment, we set the initial value x0=0.5∗ones(n,1). The solution of this problem is x∗=(1,1,...,1)T. The number of equations and the number of unknowns is set to 50×50, 100×100, 150×150, 200×200, 250×250, 300×300, 350×350, and 400×400. We list the computing time and iteration numbers of these methods, respectively, in Tables 5 and 6. The outcomes demonstrate how much better our new approaches perform than the NRK method. Table 6 shows that the NGABK method and the MRNABK method have almost identical iteration times, yet they are both superior to the RB-CNK method.
Example 3. In this example, we consider the Singular Broyden problem [12],
In this experiment, we set the initial value x0=−0.5∗ones(n,1) and n is the number of the equations. The Singular Broyden problem [12] is a square nonlinear system of equations, and its Jacobian matrix is singular at the solution. First, we conducted an experiment on the values of parameter ϱ. From Table 7, we can see that for a fixed number of equations, the CPU of the MRNABK method was better when ϱ∈[0.1,0.3]. So, we set ϱ=0.2 in this example. Tables 8 and 9 demonstrate that, in terms of both computation time and iteration steps, the NGABK approach converged more quickly than the other three methods. The MRNABK method's residuals fell the fastest, whereas the NRK method's residuals declined the slowest, as seen in Figure 3. It is evident from Figure 4 that all five approaches may get the approximate results.
Example 4. Consider the following problem, which is a chained serpentine overdetermined problem [17],
where m represents the number of equations. In this experiment, we set the initial value x0=0.5∗ones(n,1), and n= 100, 300, 500, 1000, 2000, respectively. From Table 10, we set ϱ=0.2. As we can see, when compared to the other four approaches, the NGABK method produced better numerical results from Tables 11 and 12. Furthermore, we note that the NGABK method and the MRNABK method required fewer iterations as the dimension of overdetermined issues grew, and the NRK method's iteration time was nearly equal to that of the NGABK method. The Broyden method took too long when the problem's dimension grew. The Broyden method took more than 10 minutes to iterate when there were 1998 equations.
6.
Conclusion
Based on the Gaussian Kaczmarz method and the RD-CNK method, we introduced a new class of nonlinear Kaczmarz block approaches to solve nonlinear equations and investigated their convergence theories. By employing an averaging technique, these methods avoid computing the Moore–Penrose pseudoinverse of the Jacobian matrix at each iteration, significantly reducing the computational cost. Experimental results demonstrated that the NGABK and MRNABK methods performed better in terms of CPU and IT than the NRK method (and other methods) for the singular Jacobian matrix issue and the overdetermined problem. For problems like the H-equation, several traditional methods, including the Broyden method, and the Newton method perform well, but our proposed methods also worked better than the other Kaczmarz methods. Additionally, selecting the ideal parameter ϱ was a crucial matter. We plan to keep working on the more significant research of the pseudoinverse-free approach and the more effective greedy rules in the future.
Use of AI tools declaration
The authors declare that they have not used Artificial Intelligence (AI) tools in the creation of this article.
Conflict of interest
The authors declare that there are no conflicts of interest.