1.
Introduction
Consider solving the large-scale linear inequalities
where A∈Rm×n is an m-by-n (m>n) real coefficient matrix, b∈Rm is an m-dimensional right-hand side, and x∈Rn is an unknown n-dimensional vector. We confine the scope of this work to the regime of m≫n, where iterative methods are typically employed. We denote the feasible region of Eq (1.1) by S={x∈Rn|Ax≤b}. Through this paper, we assume that the coefficient matrix A has no zero rows and S≠∅.
The Kaczmarz method [1], or the algebraic reconstruction technique (ART) [2], is one of the most popular solvers to solve linear systems of equations. Originally proposed by Polish mathematician Stefan Kaczmarz in 1937, it has found a wide range of applications in many fields such as image reconstruction, medical scanners, computerized tomography, and digital signal processing. At each iteration, the Kaczmarz method uses a cyclic rule to select a row of the matrix and projects the current iteration onto the corresponding hyperplane. Let Ai,: stand for the i-th row of the coefficient matrix A and b=(b1,b2,⋯,bm)T, hence, for the algebraic reconstruction technique (Kaczmarz)
where i=(k mod m)+1, ⟨⋅,⋅⟩ is the Euclidean inner product, and ‖⋅‖2 is the corresponding norm in Rn. A lot of applications have demonstrated that random row selection in the coefficient matrix A can significantly enhance its convergence rate compared to sequential selection. Strohmer and Vershyn [3] proposed the randomized Kaczmarz (RK) method by selecting the working row with a probability proportional to its Euclidean norm and proved its expected linear convergence rate in 2009. Then, Bai and Wu in [4] discussed an improved estimate of the convergence rate of the randomized Kaczmarz method. Subsequently, based on the RK method, many iterative methods have been proposed to further improve its convergence rate and computational efficiency. For instance, many variants of the RK method with different selection rules for working rows or various generalizations of the RK method are explored (see [5, 6, 7, 8] for more details). Especially, the study on the block Kaczmarz algorithm is deepening. The block Kaczmarz method was first proposed by Elfving [9]. Unlike the Kaczmarz single-projection method, the block Kaczmarz method is equivalent to solving multiple equations at each iteration. Needell and Tropp [10] proposed the random block Kaczmarz (RBK) method. For the randomized (block) Kaczmarz method, it is necessary to traverse all rows of the coefficient matrix. There is a large amount of data in the coefficient matrix, which leads to a huge amount of computation and storage. To avoid computing the pseudoinverses, Gower and Richtárik [11] proposed the Gaussian Kaczmarz (GK) method. The Gaussian Kaczmarz (GK) method, defined by
can be regarded as another kind of block Kaczmarz method that writes directly the increment in the form of a linear combination of all columns of AT at each iteration, where η is a Gaussian vector with mean 0∈Rm and the covariance matrix I∈Rm×m, i.e., η∼N(0,I). Here I denotes the identity matrix. In Eq (1.3), all columns of AT are used. The expected linear convergence rate was analyzed in [11] in the case that A is of full column rank. Recently, Chen and Huang [12] proposed a fast deterministic block Kaczmarz (FDBK) method, in which a set Uk is first computed according to the greedy index selection strategy [13] and then the vector ηk is constructed by
Its relaxed version was given by [14]. In these methods [11, 12, 13, 14], the calculations on the pseudoinverses are not needed, while, to compute Uk, one has to scan the residual vector from scratch during each iteration. In fact, Eq (1.3) is viewed as a special case of the following prototype projection iteration (for more details, see Section 5.1.2 in [15]),
for k=0,1,2,⋯, where V and W are two parameter matrices, which is proposed by Saad based on Petrov-Galerkin (PG) conditions. It is easy to see that with different V and W, one can obtain different popular iterations as special cases, including the multiple row-action iterate scheme. For example, let W=η and V=ATW with η∈Rm being any non-zero vector, the iteration step (1.5) becomes the form (1.3).
For solving the linear feasibility problem (1.1), Leventhal and Lewis [16] extended the randomized Kaczmarz method. At each iteration k, if the inequality is already satisfied for the selected row i, then set xk+1=xk. If the inequality is not satisfied, the previous iterate only projects onto the solution hyperplane {x|⟨Ai,:,x⟩=bi}. The update rule for this algorithm is thus
One can see that xk+1 in Eq (1.6) is indeed the projection of xk onto the set {x|⟨Ai,:,x⟩≤bi}. Leventhal and Lewis [16] (Theorem 4.3) proved that a randomized projection (RP) method converges to a feasible solution linearly in expectation. Recently, by combining the ideas of Kaczmarz and Motzkin methods [17, 18], Loera et al. proposed the sampling Kaczmarz-Motzkin (SKM) method for solving the linear feasibility problem (1.1) in [19]. Later, Morshed et al. [20] developed a generalized framework, namely the generalized sampling Kaczmarz-Motzkin (GSKM) method, that extends the SKM algorithm and proves the existence of a family of SKM-type methods. In addition, they also proposed a Nesterov-type acceleration scheme in the SKM method called probably accelerated sampling Kaczmarz-Motzkin (PASKM), which provides a bridge between Nesterov-type acceleration of machine learning and sampling Kaczmarz methods for solving linear feasibility problems.
In this paper, inspired by [13, 20], we develop a randomized multiple row-action (RMR) method for the linear feasibility problem (1.1). Partial rows indexed by Uk are instead of that of [13] to reduce the calculation on the residual vector in the RMR method, Moreover, by using history information in updating the current update, we establish a generalized version of randomized multiple row-action (GRMR) method. This general framework will provide an ideal platform for researchers to experiment with a wide range of iterative projection methods and design efficient algorithms for solving optimization problems in areas such as artificial intelligence, machine learning, etc. We emphasize that our algorithms are pseudoinverse-free and therefore different from projection-based block algorithms. We prove the linear convergence of our algorithms in the mean-square sense. Numerical results are presented to illustrate the efficiency of our algorithms.
The rest of the paper is organized as follows. In Section 2, the notations and preliminaries are provided. In Section 3, we introduce the RMR method and analyze its convergence properties. Experimental results on both randomly generated and real-world data are reported and discussed in Section 4. Finally, we present some conclusions in Section 5.
2.
Preliminaries and notations
Throughout the paper, for any random variables ξ and ζ, we use E[ξ] and E[ξ|ζ] to denote the expectation of ξ and the conditional expectation of ξ given ζ. For an integer m≥1, let [m]:={1,⋯,m}. For any real matrix A, we use ai, aj, ai,j, AT, A†, ‖A‖2, ‖A‖F and Range(A) to denote the i-th row, the j-th column, the (i,j)-th entry, the transpose, the Moore-Penrose pseudoinverse, the spectral norm, the Frobenius norm, and the column space of A, respectively. The nonzero singular values of a matrix A are σmax(A)=σ1(A)≥σ2(A)≥⋯≥σr(A):=σmin(A)>0, where r is the rank of A, and we use σmax(A) and σmin(A) to denote the biggest and smallest nonzero singular values of A. We see that ‖A‖2=σ1(A) and ‖A‖F=√∑ri=1σ2i(A). Matrix A with m rows and n columns belongs to Rm×n, the corresponding compact singular value decomposition of A∈Rm×n is denoted as A=UDVT, where U and V are unitary matrices with appropriate size and D is the nonsingular and diagonal matrix with singular value on the diagonal. Let PS(x) be the projection of x onto the nonempty closed convex set S: that is, PS(x) is the vector y that is the optimal solution to minz∈S=‖x−z‖2. Additionally, define the distance from x to a set S by d(x,S)=minz∈S‖x−z‖=‖x−PS(x)‖ as denoted in [21]. For any c∈R, u∈Rn, we define c+=max{0,c}, u+=((u1)+,⋯,(un)+)T. For an index set τ, we use A(τ,:), A(:,τ), v(τ) and |τ| to denote the row and column submatrix of A indexed by τ, the subvector of v with component indices listed in τ and the cardinality of the set τ, respectively. We use In to denote the n-order identity matrix and I for short if its size is without confusion. We use ei=I(:,i) to represent the ith column of the identity matrix.
Lemma 1 (Hoffman[22]). Let x∈Rn and S be the feasible region of the linear feasibility problem (1.1). There exists a constant L>0 such that the following inequality holds:
Lemma 1 is a famous result of Hoffmann's work on systems of linear inequalities. The constant L is called the Hoffman constant. For a consistent system of equations (i.e., there exists a unique x∗ such that Ax=b), L can be expressed in terms of the smallest singular value of matrix A, i.e.,
Lemma 2 ([20]). For any x∈Rn and ˉx∈S, the following identity holds:
In the paper, for any ϕ1,ϕ2≥0, the following parameters are defined:
Lemma 3 ([20]). Let {Gk} be a non-negative real sequence satisfying the following relation:
if ϕ1,ϕ2≥0 and ϕ1+ϕ2<1, then the following bounds hold:
1. Let ϕ be the largest root of ϕ2+ϕ1ϕ−ϕ2=0, then
2. Define ρ=ϕ+ϕ1, then we have the following:
where 0≤ϕ<1 and 0<ρ<1.
Lemma 4 ([20]). Let the real sequences Hk≥0 and Fk≥0 satisfy the following recurrence relation:
where Π1,Π2,Π3,Π4≥0 such that the following relations
hold. Then the sequences {Hk} and {Fk} converge and the following result holds:
where
and Π1,Π3≥0 and 0≤ρ1≤ρ2<1.
3.
The randomized multiple row-action methods and convergence analysis
In this paper, for solving the linear feasibility problem (1.1), combined with the iteration schemes (1.3), (1.6), and the parameter W=ηk in Eq (1.4), we propose a randomized multiple row-action (RMR) method and its variant (GRMR), respectively, described in Algorithms 1 and 2.
Before we delve into the main theorems, we give an important lemma as follows:
Lemma 5. Assume that ηk=∑i∈Uik(Ai,:x−bi)+ei with partition {Uik}sik=1, βUmax:=maxj∈[s]{σ2max(AUj,:)/‖AUj,:‖2F}, and βUmin:=minj∈[s]{σ2min(AUj,:)/‖AUj,:‖2F}. Then, for any x∈Rn there exists the following relation:
where 0<μ1=1βUmax‖A‖2FL2≤μ2=min{1,σ2max(A)βUmin‖A‖2F}≤1.
Proof. From the definition of η and AUik,:x−bUik≤AUik,:(x−PS(x)), there results in
Since (AUik,:x−bUik)+∈R(A), the inequality comes from ‖ATUik,:(AUik,:x−bUik)+‖22≥σ2min(AUik,:)‖(AUik,:x−bUik)+‖22. Thus, we have the following relation,
Now, taking the expectation conditional on the first k iterations on both sides of Eq (3.3), we have
where the third inequality comes from APS(x)≤b and the last equality follows from the fact that ‖Ax‖22≤σ2max(A)‖x‖22.
Meanwhile, from Eq (3.2), it is easy to see that the following relation exists,
Therefore, with the use of Eqs (3.4) and (3.5), there holds that
Similarly, we have
Then, taking the conditional expectation on the above inequalities, we obtain
where the last inequality is obtained by Lemma 1.
Hence, from Eqs (3.6) and (3.7), the conclusion is obtained.
Using the above lemma, we have the following theorem that provides the convergence of the RMR method.
Theorem 1. Assume that the linear feasibility problem (1.1) is consistent, and the stepsize is 0<α<2. Then the iteration sequence {xk} generated by the RMR method for arbitrary x0∈Rn satisfies
where h(α):=(1−2α−α2L2βUmax‖A‖2F)<1 with βUmax:=maxj∈[s]{σ2max(AUj,:)/‖AUj,:‖2F}.
Proof. From direct calculation results, there results in
Then, there follows that
Taking the conditional expectation on the first k iterations and using Lemma 5, we have
By the law of total expectation, there holds that
Finally, unrolling the recurrence gives the desired result, i.e.
Remark 1. It can be seen from Theorem 1 that the rate of the RMR algorithm is given by h(α)=(1−2α−α2L2βUmax‖A‖2F), and it reaches the minimum value h(α)=(1−1L2βUmax‖A‖2F) when α=1.
Next, to improve the RMR method, we propose a generalized version of the RMR method (GRMR) in which the history information is used. Here, we take two iterates, xk−1 and xk, generated by the successive iteration in the RMR method, and update the next iterate, xk+1, as an affine combination of the previous two updates, i.e., starting with x0=x1,z0=z1∈Rn,
where zk=xk−αηTk(Axk−b)+‖ATηk‖22ATηk is the k-th update of the GRMR algorithm, which is described in Algorithm 2. It is easy to see that when ξ=0, the GRMR algorithm reduces to the original RMR algorithm.
Before the convergence analysis of the proposed GRMR method is explored, the following sets are first defined. For any ξ∈R, let us denote the sets Q,Q1, and Q2 as
Theorem 2. Suppose that the linear feasibility problem (1.1) is consistent. For arbitrary x1=x0∈Rn, the sequence of iterates {xk} by the GRMR method converges with 0<α<2 and 0≤ξ≤1(ξ∈Q1). The following results hold:
1. Take ρ=ϕ1+ϕ,ϕ=−ϕ1+√ϕ21+4ϕ22, then
2. Take R1,R2,R3, and R4 as in Eq (2.3), then
where 0≤ϕ,ϕ1,ϕ2<1, 0<ρ=ϕ1+ϕ<1, h(α):=(1−2α−α2L2βUmax‖A‖2F)<1 with βUmax:=maxj∈[s]{σ2max(AUj,:)/‖AUj,:‖2F}.
Proof. Since for any ξ∈Q1, (1−ξ)PS(xk)+ξPS(xk−1)∈S, straightforward calculations, we have
By taking the conditional expectation on Eq (3.12) with respect to index k,k-1 we have
Meanwhile, with the use of Eqs (3.9) and (3.10), there holds that
and
Taking the total expectation on Eq (3.13) and combining Eqs (3.14) and (3.15), we can get
which satisfies the condition of Lemma 3 with ϕ1=(1−ξ)h(α) and ϕ2=ξh(α). Therefore, using the first part of Lemma 3, we have
Furthermore, using the second part of Lemma 3 and Eq (3.16), we can get the second part of Theorem 2.
Remark 2. When 0≤ξ≤1, we obtain a global linear rate for the GRMR method:
Since 0<h(α)<1 and 0≤ξ≤1, we can obtain that ρ is greater than or equal to h(α). Therefore, the theoretical convergence rate of the GRMR method with 0≤ξ≤1 is always worse or equal compared to that of the RMR method.
In the next theorem, we will investigate a global linear rate for the GRMR method with ξ∈Q2.
Theorem 3. Assume that the linear feasibility problem (1.1) is consistent. Let {xk} and {zk} be the sequence of random iterates generated by GRMR with 0<α<2 and ξ∈Q2. Define
and Γ1,Γ2,Γ3,ρ1,ρ2 as in Eq (2.6) with the parameter choice of Eq (3.17). Then, the following results hold,
where Γ1,Γ3≥0 and 0≤ρ1≤ρ2<1.
Proof. From Step 5 of the GRMR method and Eq (3.14), there follows that
Taking the total expectation on Eq (3.18), we have
Then using the update formula for zk+1 in Step 5 of the GRMR method and Lemma 5, we have
Taking the total expectation on Eq (3.20) and using Eq (3.19), we have
Combining Eqs (3.19) and (3.21), we can deduce the following inequality:
From the definition, we check that Π1,Π2,Π3,Π4≥0. Since ξ∈Q2, we have
We have
From the above formula (3.24), there holds that Π1+Π4<1+|ξ|√h(α)=1+min{1,|ξ|√h(α)}=1+min{1,Π1Π4−Π2Π3}. With the use of Eq (3.23), there results in Π2Π3−Π1Π4≤0, which is precisely the condition provided in Eq (2.5).
Let the sequences Fk=E[‖zk−zk−1‖] and Hk=E[‖xk−PS(xk)‖]. Then, by using Lemma 4, we have
where Γ1,Γ2,Γ3,ρ1,ρ2 can be derived from Eq (2.6) using the parameter choice of Eq (3.17).
Since x1=x0 and z1=z0, there follows that F1=E[‖z1−z0‖]=0 and H1=E[‖x1−PS(x1)‖]=H0. Thus, the formula (3.25) become into the following form:
From Lemma 4, we have Γ1,Γ3≥0 and 0≤ρ1≤ρ2<1, which proves the conclusion.
4.
Numerical experiments
In this section, we discuss the numerical experiments performed to show the computational efficiency of the proposed algorithms (Algorithms 1 and 2). As mentioned before, we limit our focus on the over-determined systems regime (i.e., m≫n), where iterative methods are competitive in general. We present some numerical examples, both synthetic and real-world data, to demonstrate the convergence of the RMR and GRMR methods.
We suppose that the subset {Ui}si=1 is computed by
where τ=20 is the size of Ui. All experiments are carried out using MATLAB (version R2021b) on a laptop with a 2.50-GHz intel Core i9-12900H processor, 16 GB memory, and Windows 11 operating system.
In testing synthetic data and the SuiteSparse Matrix Collection, the stopping criterion is
or the maximum iteration steps of 300,000 being reached. Besides, we use the symbol "−" to indicate the case that either the corresponding iteration method can not reach the stopping criterion RSE≤10−6 within 300,000 iteration steps or the computing time exceeds 1800 seconds.
In testing the sparse Netlib LP data, we set the stopping criterion to be
where γ is the tolerance gap.
In this section, IT and CPU denote the number of iteration steps and computing times (in seconds), respectively. IT and CPU are the medians of the required iteration steps and the elapsed CPU times for 20 times repeated runs of the corresponding method. The SKM, GSKM, and PASKM algorithms involve the selection of many parameters as well, and we have selected a set of parameters with better performance based on the literature [20]. To ensure that the system (1.1) is consistent, we randomly generate vectors y1∈Rn, y2∈Rn and set the right-hand side as b=0.5Ay1+0.5Ay2. Both y1 and y2 are generated randomly by the MATLAB function "randn".
4.1. Experiments on synthetic data
Example 1. For the coefficient matrix A, we mainly consider two types, namely dense and sparse matrices, respectively. We randomly generate the dense matrix by the MATLAB function "randn". The sparse matrix is generated randomly by the MATLAB function "sprandn" with a density of 12logmn for the nonzero elements. We compared RMR and GRMR with SKM, GSKM, PASKM1, and PASKM2 with the initial vector x0,z0∈Rn (x0,z0 generated randomly by the MATLAB function "randn").
From Tables 1–4, we list IT and CPU of SKM, GSKM, PASKM1, PASKM2, RMR, and GRMR methods for the consistent linear feasible problem Ax≤b in Example 1. We set ξ=−0.2,α=1.05 in tables. The performance of these algorithms was tested on both dense and sparse coefficient matrices in two sets of experiments presented in Tables 1 and 2 with a constant number of rows but an increasing number of columns. Tables 3 and 4 show the performance of the algorithms with different orders of coefficient matrices. The convergence rates of the different methods are observed for consistent systems, as depicted in Figure 1. From Tables 1–4, it can be observed that the RMR and GRMR methods outperform the SKM, GSKM, PASKM1, and PASKM2 methods for consistent systems. Furthermore, Figure 1 demonstrates that compared with other methods, the RMR and GRMR approaches achieve higher accuracy with fewer iterations (IT) and computational time (CPU).
In Table 5, we list IT and CPU of the RMR method for m×n dense matrices A with different α for Example 1. From Table 5, we can see that the RMR algorithm with α=1.05 performs better. As shown in Figure 2, the choice of parameter α affects IT and CPU required by the RMR method to achieve desired accuracy levels. When α is appropriately selected, the IT and CPU needed by the RMR method are significantly reduced.
Example 2. For given m,n,r and κ>1, we construct a matrix A by A=UDV, where U∈Rm×r, D∈Rr×r and V∈Rn×r. These matrices are generated by [U,∼]=qr(randn(m,r),0), [V,∼]=qr(randn(n,r),0), and D=diag(1+(κ−1).∗rand(r,1)).
This example gives us many flexibilities to adjust the input parameters m,n,r, and κ. We consider two types of rank-deficient cases by setting m=30n, r=n/2, and κ=n/10 in Table 6 and m=5000, r=n/2, and κ=n/10 in Table 7. We compared RMR and GRMR with SKM, GSKM, PASKM1, and PASKM2 with the initial vector x0,z0∈Rn (x0,z0 generated randomly by the MATLAB function "randn").
In Tables 6 and 7, we report the numerical results of the SKM, GSKM, PASKM1, PASKM2, RMR, and GRMR algorithms with β=100, δ=0.5, ξ=−0.2, α=1 for two rank-deficient consistent linear systems. We can observe the following phenomena: the relative solution error of GRMR decays faster than those of SKM, GSKM, PASKM1, PASKM2, and RMR when the number of iteration steps and computing time increase.
In Table 8, we list IT and CPU of the GRMR method for m×n dense matrices A with m=5000, r=n/2, κ=n/10, α=0.95, and different ξ for Example 2. We can find that the choice of ξ=−0.4 is the best choice for the GRMR method in Figure 3.
4.2. Experiments on real-world data
We consider the following two types of real-world test data: the SuiteSparse Matrix Collection and the sparse Netlib LP instances.
Example 3. The SuiteSparse Matrix Collection[23]. In Table 9, the coefficient matrix A is chosen from the SuiteSparse Matrix Collection. In testing the SuiteSparse Matrix Collection, we take τ=⌈‖A‖22⌉. We compared RMR and GRMR with SKM, GSKM, PASKM1, and PASKM2 with the initial vector x0=z0=0∈Rn. For details, we list their sizes, densities, condition numbers (i.e., cond(A)), and squared Euclidean norms in Table 10, where the density of a matrix is defined by
In Table 9, we list the IT and CPU of SKM, GSKM, PASKM1, PASKM2, RMR, and GRMR methods for the linear feasible problem Ax≤b in Example 3 with β=100, δ=0.3, ξ=−0.2, and α=0.95. We observe that the GRMR method outperforms SKM, GSKM, PASKM, and RMR methods in terms of both the iteration counts and the CPU time from Table 9.
Example 4. Netlib LP instances. We follow the standard framework used by De Loera et al. [19] and Morshed et al. [20] in their work on linear feasibility problems. The problems are transformed from standard LP problems (i.e., min cTx subject to Ax=b, l≤x≤u with optimum value p∗) to an equivalent linear feasibility formulation (i.e., Ax≤b, where A=[AT−ATI−Ic]T and b=[bT−bTuT−lTp∗]T). In testing the sparse Netlib LP instances, we take τ=5 and initial vectors x0=z0=0∈Rn.
In Table 11, we list the IT and CPU of SKM, GSKM, PASKM2, RMR, and GRMR methods for the linear feasible problem Ax≤b in Example 4. From Table 11, we know that Algorithm GRMR takes less computing time compared to the other algorithms.
5.
Conclusions
In this paper, based on partial rows of the residual vector, the RMR method and its general framework (GRMR) are provided to solve the linear feasibility problems. The GRMR method unifies various RMR-type algorithms and adds the relaxation parameter ξ. The convergence results are proved. Some numerical examples, including synthetic data and real-world applications, demonstrate that the two methods often outperform the original methods. Especially, the GRMR method with ξ∈Q2 takes less computing time. This implies that GRMR is a variant of the competitive row-action type for solving linear feasibility problems. Meanwhile, from the numerical results, it can be seen that the appropriate choice of parameters can lead to more effective methods for different types of problems. In future work, we intend to identify the optimal choices of ξ.
Author contributions
Hui Song: Writing the original draft and deriving the convergence. Wendi Bao: Conceptualization, Methodology. Lili Xing: Visualization, Software. Weiguo Li: Review, Conceptualization.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
This article was supported by the National Natural Science Foundation of China [grant number 61931025] and the National Natural Science Foundation of China [grant number 42374156].
Conflict of interest
The authors declare there is no conflict of interest.