Loading [MathJax]/extensions/TeX/mathchoice.js
Research article

An efficient variant of the greedy block Kaczmarz algorithm for solving large linear systems

  • Received: 06 November 2023 Revised: 11 December 2023 Accepted: 19 December 2023 Published: 25 December 2023
  • MSC : 65F10, 65F20, 15A06

  • By exploiting the concept of row partitioning, we propose an efficient variant of the greedy block Kaczmarz algorithm for solving consistent large linear systems. The number of blocks is determined a priori through numerical experiments. The new algorithm works with a reduced linear system, which dramatically diminishes the computational overhead per iteration. The theoretical result validates that this method converges to the unique least-norm solution of the linear system. The effectiveness of the proposed algorithm is also justified by comparing it with some block Kaczmarz algorithms in extensive numerical experiments.

    Citation: Ke Zhang, Hong-Yan Yin, Xiang-Long Jiang. An efficient variant of the greedy block Kaczmarz algorithm for solving large linear systems[J]. AIMS Mathematics, 2024, 9(1): 2473-2499. doi: 10.3934/math.2024122

    Related Papers:

    [1] Li Wen, Feng Yin, Yimou Liao, Guangxin Huang . A greedy average block Kaczmarz method for the large scaled consistent system of linear equations. AIMS Mathematics, 2022, 7(4): 6792-6806. doi: 10.3934/math.2022378
    [2] Wen-Ning Sun, Mei Qin . On maximum residual block Kaczmarz method for solving large consistent linear systems. AIMS Mathematics, 2024, 9(12): 33843-33860. doi: 10.3934/math.20241614
    [3] Feng Yin, Bu-Yue Zhang, Guang-Xin Huang . A partially block randomized extended Kaczmarz method for solving large overdetermined inconsistent linear systems. AIMS Mathematics, 2023, 8(8): 18512-18527. doi: 10.3934/math.2023941
    [4] Ran-Ran Li, Hao Liu . The maximum residual block Kaczmarz algorithm based on feature selection. AIMS Mathematics, 2025, 10(3): 6270-6290. doi: 10.3934/math.2025286
    [5] J. Alberto Conejero, Antonio Falcó, María Mora–Jiménez . A pre-processing procedure for the implementation of the greedy rank-one algorithm to solve high-dimensional linear systems. AIMS Mathematics, 2023, 8(11): 25633-25653. doi: 10.3934/math.20231308
    [6] Shu-Xin Miao, Jing Zhang . On Uzawa-SSI method for non-Hermitian saddle point problems. AIMS Mathematics, 2020, 5(6): 7301-7315. doi: 10.3934/math.2020467
    [7] James Daniel, Kayode Ayinde, Adewale F. Lukman, Olayan Albalawi, Jeza Allohibi, Abdulmajeed Atiah Alharbi . Optimised block bootstrap: an efficient variant of circular block bootstrap method with application to South African economic time series data. AIMS Mathematics, 2024, 9(11): 30781-30815. doi: 10.3934/math.20241487
    [8] Wenya Shi, Zhixiang Chen . A breakdown-free block conjugate gradient method for large-scale discriminant analysis. AIMS Mathematics, 2024, 9(7): 18777-18795. doi: 10.3934/math.2024914
    [9] Mouhamad Al Sayed Ali, Miloud Sadkane . Acceleration of implicit schemes for large systems of nonlinear differential-algebraic equations. AIMS Mathematics, 2020, 5(1): 603-618. doi: 10.3934/math.2020040
    [10] Ridwanulahi Iyanda Abdulganiy, Shiping Wen, Yuming Feng, Wei Zhang, Ning Tang . Adapted block hybrid method for the numerical solution of Duffing equations and related problems. AIMS Mathematics, 2021, 6(12): 14013-14034. doi: 10.3934/math.2021810
  • By exploiting the concept of row partitioning, we propose an efficient variant of the greedy block Kaczmarz algorithm for solving consistent large linear systems. The number of blocks is determined a priori through numerical experiments. The new algorithm works with a reduced linear system, which dramatically diminishes the computational overhead per iteration. The theoretical result validates that this method converges to the unique least-norm solution of the linear system. The effectiveness of the proposed algorithm is also justified by comparing it with some block Kaczmarz algorithms in extensive numerical experiments.



    A mainstay of scientific computing is solving large linear systems of the form

    Ax=b, (1.1)

    where ARm×n, bRm and xRn.

    In what follows, we assume that the linear system is consistent and consider only the least-norm solution x. For linear systems of extremely large size, iterative methods are often the only option. These methods date back to the early 19th century that were initiated by the work of Gauss. An explosion of activities has been sparked ever since due to demands from the engineering and scientific fields. Among those candidates, the Kaczmarz algorithm proposed in 1937 by Stefan Kaczmarz is one that is still very appealing nowadays [22]. Based on the BLAS level-1 subroutines, it has successfully found its way into a variety of applications in image reconstruction [16], computed tomography [20] and signal processing [9], to mention a few. In its simplest form, the classical Kaczmarz algorithm is given as

    xk+1=xk+b(ik)A(ik)xkA(ik)22(A(ik))T, (1.2)

    where A(i) denotes the ith row of the matrix A, b(i) represents the ith entry of the vector b and (A(i))T stands for the transpose of A(i). At each iteration, the current iterate xk is projected onto the hyperplane A(ik)x=b(ik), where the working row in (1.2) is selected cyclically by applying ik=(kmodm)+1.

    The philosophy behind the classical Kaczmarz algorithm is to apply sequences of orthogonal projections onto hyperplanes. Its convergence is warranted. However, it is not easy to analyze the convergence rate. For more than a decade, the classical Kaczmarz algorithm was in oblivion. Fortunately, it resurged sporadically in the works of [8,15,30]. In 1970, it was rediscovered in the field of electron microscopy and X-ray photography, termed the algebraic reconstruction technique [16]. A breakthrough in the research of Kaczmarz-type algorithms is the randomized Kaczmarz algorithm, which adopts a non-uniform selection rule by picking the working row with probability that is proportional to the norm of the rows [29]. Agaskar et al. [1] propose a complete characterization of the randomized Kaczmarz algorithm, which includes an exact formula for the mean squared error and the annealed error exponent portraying its decay rate. Several approaches for determining working rows have been proposed ever since; for instance, Bai and Wu propose a greedy randomized Kaczmarz (GRK) method which selects the working row by applying the residual rule [5,6]. Gower et al. analyze adaptive sampling rules in the sketch-and-project setting, with an emphasis on the distance rule [17]. Griebel and Oswald investigate the maximal residual rule that greedily chooses the subspace with the largest residual norm for the next update [19]. Eldar and Needell consider a Johnson-Lindenstrauss-type projection when implementing the maximal distance rule for solving linear systems [13]. More about the selection rules for working rows and convergence analysis can be found in [3,4,7,31,34] and the references therein.

    While the selection rules for working rows are of great importance in the implementation of the Kaczmarz-type algorithms, the idea of exploiting blocks also matters. To put it simply, the block Kaczmarz method is an iterative scheme for solving linear systems. At each iteration step, the block Kaczmarz algorithm projects the current iterate onto the solution space of a subset of the constraints [27]. In [14], the coefficient matrix is partitioned by blocks of rows whose generalized inverse is then applied in Jacobi or successive overrelaxation iterative schemes. Needell and Tropp [27] put forward a block Kaczmarz algorithm that uses a randomized control scheme to choose the subset per iteration. It lays the path for the research of "row paving" and is the first block Kaczmarz method with an expected linear rate of convergence that is expressed in the geometric properties of the matrix and its submatrices. By using the block Meany inequality, Bai and Liu [2] improve some existing results and develop new convergence theorem for row-action iterative schemes like the block Kaczmarz and the Householder-Bauer methods for solving large linear systems and least-squares problems. Necoara presents a group of randomized block Kaczmarz algorithms that involve a subset of the constraints and extrapolated step sizes [26]. Niu and Zheng investigate a block Kaczmarz algorithm which incorporates the greedy row-selection rule [28]. To capture large block entries of the residual vector, Liu and Gu propose a greedy randomized block Kaczmarz algorithm for solving consistent linear systems [24]. Inspired by the work of [5], Miao and Wu present a greedy randomized average block Kaczmarz method for solving linear systems [25]. Motivated by the Gaussian Kaczmarz method [18] and the greedy rule for selecting working rows in [5,6], Chen and Huang consider a fast deterministic block Kaczmarz method [10]. Based on the Kaczmarz-Motzkin method, Zhang and Li develop several efficient block Kaczmarz-Motzkin variants by exploiting different sampling strategies [36,37]. Recently, the block Kaczmarz-type algorithms have been generalized to solve least squares problems and linear systems with multiple right-hand sides; for instance, Du et al. [12] present a randomized extended average block Kaczmarz algorithm that is suitable for solving least squares problems. Wu and Chang propose semi-randomized block Kaczmarz methods with simple random sampling for linear systems with multiple right-hand sides [32].

    In this work, we propose an efficient variant of the greedy block Kaczmarz (VGBK) algorithm for solving consistent linear systems in the form of (1.1). It proceeds with a partition of rows which can dramatically reduce the computational complexity per iteration. Numerical experiments show that the new algorithm is competitive with some existing block Kaczmarz-type methods. Throughout this work, we denote by AF the matrix Frobenius norm and 2 the 2-norm of a matrix or vector if there is no ambiguity. The notion A defines the Moore-Penrose pseudoinverse. The symbol ()T stands for the transpose of a matrix or vector. The smallest nonzero singular value and its largest peer of a matrix are represented by σmin() and σmax(), respectively. Unless otherwise stated, matrices are denoted by capital letters, while vectors are usually represented by lowercase ones; capital letters with superscripts are occasionally used to indicate a vector, e.g., A(ik) for the ikth row of a matrix A. Scalars are usually signified by Greek letters, except those, such as m, n, i, j, k, s and p, which are used for indexing. The set {1,,m} is abbreviated as [m]. An empty set is denoted by . An index set is usually indicated by a calligraphic capital letter like I, whose cardinality reads as |I|.

    This work is organized as follows. In Section 2, we review three related GBK-type algorithms. In Section 3, we first make clear the motivation for this paper, and then we develop a new GBK algorithm with analysis that includes the convergence rate and complexity. In Section 4, we present some numerical experiments to justify the effectiveness of the proposed algorithm. In Section 5, some conclusions and the future work are given.

    In this section, we recapitulate the GBK algorithm [28], the fast deterministic block Kaczmarz (FDBK) algorithm [10] and the fast greedy block Kaczmarz (FGBK) algorithm [33]. These three algorithms facilitate the delineation of our new algorithm in Section 3.

    In the classical Kaczmarz algorithm, the current iterate xk is projected onto a single hyperplane. In its block counterpart, however, the current iterate xk is projected onto a combination of several hyperplanes, i.e.,

    xk+1=xk+AIGk(bIGkAIGkxk), (2.1)

    where IGk is a set that consists of the indexed rows [14,27]. In general, the block Kaczmarz algorithms take advantage of the BLAS level-2 operations and benefit more from exploiting multiple rows simultaneously.

    In [28], the block technique is integrated with the idea of maximal distance, which yields the GBK algorithm. In that setting, an index set IGk is determined once the squared distance of the current iterate to the hyperplane A(ik)xk=b(ik) exceeds εk which is a parameter varying from zero to the maximal squared distance. In what follows, we do not elaborate on the algorithmic implementation, but refer to Algorithm 1 for details.

    Algorithm 1 The GBK algorithm
    Input: A, b, x0, l and α(0,1]
    Output: xl+1
    1: for k=0,1,,l do
    2:   Compute
           εk=αmax1im{|b(i)A(i)xk|2A(i)22}.
    3:   Determine the index set of positive integers
          IGk={ik||b(ik)A(ik)xk|2εkA(ik)22}.
    4:   Set xk+1=xk+AIGk(bIGkAIGkxk).
    5: end for

    The convergence result of the GBK algorithm is detailed in the following theorem.

    Theorem 1. For the consistent linear system (1.1), the iterative sequence of {xk}k=0, as generated by Algorithm 1, converges to the unique least-norm solution x=Ab. For k0, the norm of error satisfies

    xk+1x22(1βkσ2min(A)A2F)xkx22,

    where α(0,1], βk=αA2FAIGk2Fσ2max(AIGk)(A2FAIGk12F) with β0=αAIG02Fσ2max(AIG0) and σmin() and σmax() denote the smallest nonzero and largest singular values of a matrix, respectively.

    Another way to implement the block Kaczmarz algorithm is to make use of the Gaussian Kaczmarz method [18,p. 1677] defined by

    xk+1=xk+gT(bAxk)ATg22ATg,

    where gRm is a random Gaussian vector with a zero mean and the covariance matrix being the identity matrix.

    Motivated by the Gaussian Kaczmarz method and the greedy rule for selecting working rows in [5,6], Chen and Huang propose the FDBK algorithm [10]. When updating the iteration, recall from Algorithm 1 that one needs to compute the Moore-Penrose pseudoinverse of sub-matrices of A. In FDBK, however, one only needs to compute a linear combination of submatrices of AT. The algorithmic implementation of FDBK is presented in Algorithm 2; see [10,Algorithm 1] for more detail.

    Algorithm 2 The FDBK algorithm
    Input: A, b, x0 and l
    Output: xl+1
    1:   for k=0,1,,l do
    2:   Compute
          εk=12(1bAxk22max1im{|b(i)A(i)xk|2A(i)22}+1A2F),
        where A(i) draws the ith row of A and b(i) the ith entry of b.
    3:   Determine the index set of positive integers
           IFDk={ik||b(ik)A(ik)xk|2εkbAxk22A(ik)22}.
    4:   Compute ck=ikIFDk(b(ik)A(ik)xk)eik, where eik is the ikth canonical basis vector in Rm.

    5:   Set
          xk+1=xk+cTk(bAxk)ATck22ATck.
    6: end for

    The convergence result of the FDBK algorithm is wrapped up in Theorem 2.

    Theorem 2. Suppose that the linear system (1.1) is consistent and there is no row with entries all zero in A. For any initial guess x0 in the column space of AT, the iterative sequence of {xk}k=0, as generated by Algorithm 2, converges to the unique least-norm solution x=Ab, with the error satisfying

    xk+1x22(1γkAIFDk2Fσ2max(AIFDk)σ2min(A)A2F)xkx22, k0,

    where

    γk=12(A2FθkAξk2F+(1θk)AIFDk2F+1)1, k0,

    with

     ξk={i|b(i)A(i)xk0, i[m]}, k0,

    and

    θk={maxiξkIFDk{|b(i)A(i)xk|2A(i)22}max1im{|b(i)A(i)xk|2A(i)22}<1,if IFDk[m] and ξkIFDk, k0,1,otherwise.

    As explained in the previous two subsections, the GBK algorithm captures hyperplanes onto which the current iterate is projected with relatively large distance, while the FDBK algorithm follows a Gaussian Kaczmarz practice that makes use of all rows of A and b simultaneously.

    In order to speed up the convergence of the GBK and FDBK algorithms, Xiao et al. [33] propose an FGBK algorithm that exploits the advantages of the GBK and FDBK algorithms. In particular, a parameter p is introduced to allow for more flexibility in implementing the algorithm. More algorithmic details can be found in Algorithm 3.

    Algorithm 3 The FGBK algorithm
    Input: A, b, x0, α(0,1], p[1,+)
    1: for all k=0, 1, 2, do
    2:   Compute
         εk=αmax1im{|b(i)A(i)xk|pA(i)pp}.
    3:   Determine the index set of positive integers
          IFGk={i||b(i)A(i)xk|pεkA(i)pp}.
    4: Compute ck=iIFGk(b(i)A(i)xk)ei.
    5: Set
          xk+1=xk+cTk(bAxk)ATck22ATck.
    6: end for

    The convergence result of the FGBK algorithm is characterized by the following theorem [33].

    Theorem 3. Suppose that the linear system (1.1) is consistent. For any initial guess x0 in the column space of AT, the iterative sequence {xk}k=0, as generated by Algorithm 3, converges to the unique minimum norm solution x=Ab, with the error satisfying

    xk+1x22(1βk(α,p)σ2min(A))xkx22,k0,

    where βk(α,p)=α2pi[m]IFGk1A(i)2piIFGkA(i)2pσ2max(AIFGk), α(0,1] and p[1,+).

    Remark 1. We are in a position to give some remarks on the origins of the above-mentioned block Kaczmarz-type algorithms. In [5], Bai and Wu propose a novel GRK algorithm that outperforms the prototypical randomized Kaczmarz algorithm in many scenarios. The GRK algorithm exploits the greedy selection strategy prioritizing large entries of the residual vector with higher probabilities. As shown in Algorithms 1–3, the GBK, FDBK and FGBK algorithms fall into the category of block Kaczmarz-type algorithms. In fact, they can be regarded as extensions of the GRK algorithm with some effective modifications. For instance, the GBK algorithm combines the maximum-distance rule and a modified index set originating from the GRK algorithm. Following the line of the GRK algorithm, the FDBK algorithm employs the same index set as in GRK and incorporates the idea of Gaussian-Kaczmarz iteration.

    In this section, we first clarify the initial rationale for developing the new algorithm, and then we give an in-depth analysis of the algorithmic implementation as well as the theoretical results. In Section 2, recall that two existing algorithms, i.e., GBK and FDBK, offer two different ways to incorporate the block techniques; the GBK algorithm takes advantage of rows selected from the index set, while the FDBK algorithm avoids the computation of the Moore-Penrose pseudoinverse and updates the current iterate in a manner analogous to that of the Gaussian Kaczmarz method. In the actual computations, however, both algorithms require one to find the maximum squared distance to m hyperplanes; see Step 2 in Algorithms 1 and 2. It may pose a challenge to the numerical performance when the size of A is huge. Now we switch tacks. Intuitively, it would be desirable to seek the maximum distance to merely a small fraction of the hyperplanes since the computational cost of working out the maximum distance shall be reduced. Therefore, it is interesting to investigate the case in which these desirable attributes are merged. Motivated by this intuition, we construct an efficient variant of the greedy block Kaczmarz (VGBK) algorithm for solving large linear system (1.1). A complete coverage of its algorithmic implementation is displayed in Algorithm 4.

    Now, let us present the details of the VGBK algorithm. To reduce the computational complexity in computing the maximum distance, a user-prescribed row partition of the set [m] is imposed at the outset, namely,

    sj=1τj=[m], and τiτj= for ij,

    where τj[m] is a subset comprising the row indices of the jth partition for j=1,,s. As for the choices of partition before the iteration (Step 1 in Algorithm 4), several candidates are available; see, for instance, [21,27,35]. In the numerical experiments, we adopt a simple approach by partitioning [m] with τTj=[j:s:m], where the MATLAB colon operator ":'' is used to create a vector with evenly spaced increments for j=1,,s. It should be stressed that this practice is not necessarily the optimal one, but it turns out to be quite successful as compared with other block Kaczmarz algorithms; see the numerical examples in Section 4. The following toy example gives the reader a sense of the appearance of the partitioning. Assume that m=11 and s=3. It follows from the above discussion that

    τ1={1,4,7,10}, τ2={2,5,8,11}, τ3={3,6,9}.

    The corresponding partitions of A and b can thus be given; for example, if AR11×5, then Aτ3R3×5, assembled by A(3,:), A(6,:) and A(9,:), is the third submatrix (block) from the partition.

    After the partition, we have at hand s submatrices Aτ1,,Aτs that mimic the coefficient matrix A. Assume that there are no rows with entries all zero in A and, thus, Aτj20 for j=1,,s. We select cyclically the index for the working submatrix Aτjk and vector bτjk with jk=(k mod s)+1. It turns out that the maximum squared distance

    maxiτjk{|b(i)τjkA(i)τjkxk|2A(i)τjk22}

    is now unraveled from a subset τjk instead of [m]. Then, a set Ik is used to collect the indices ik's of rows when

    |b(ik)τjkA(ik)τjkxk|2A(ik)τjk22αmaxiτjk{|b(i)τjkA(i)τjkxk|2A(i)τjk22}

    for α(0,1]. It is easy to verify that Ik is a subset of τjk and is non-empty. Inspired by the Gaussian Kaczmarz method, we update the iterate xk+1 by

    xk+1=xk+cTk(bτjkAτjkxk)ATτjkck22ATτjkck,

    where ck=ikIk(b(ik)τjkA(ik)τjkxk)eik, with eik being the ith canonical basis vector in R|τjk|.

    Remark 2. We are ready to give some remarks. As introduced in Section 2, the novelty of the related FGBK algorithm lies in the choice of exponent p, a parameter encountered in εk and the index set. At each iteration, however, the FGBK algorithm requires one to compute the pth power of the maximum distance to m hyperplanes, which may be increasingly time-consuming as the size of A increases; see Algorithm 3 for more details. Such difficulty is circumvented in our VGBK algorithm since one only needs to figure out the maximum distance from a much smaller subset τjk per iteration. Numerical experiments in Section 4 also corroborate that the VGBK algorithm improves upon the FGBK method in terms of CPU time.

    Algorithm 4 A VGBK algorithm
    Input: A, b, x0, α(0,1], l and the number of blocks s
    Output: xl+1
    1: Partition before iteration:
    Assume that τ1,,τs is a user-prescribed partition of [m], i.e., sj=1τj=[m] and τiτj= for ij, where τj[m] is a subset comprising the row indices of the jth block for j=1,,s.
    2: for k=0,1,,l do
    3:   Compute the index for the working block with j_{k} = (k\mod s)+1 .
    4:   Compute
           \varepsilon_{k} = \alpha\cdot\mathop{\max}\limits_{i\in\tau_{j_{k} }}\left\lbrace\dfrac{|b^{(i)}_{\tau_{j_{k}}}-A^{(i)}_{\tau_{j_{k}}}x_{k}|^{2}}{\|A^{(i)}_{\tau_{j_{k}}}\|^{2}_{2}} \right\rbrace,
      where A_{\tau_{j_{k}}}\in\mathbb{R}^{|\tau_{j_{k}}|\times n} and b_{\tau_{j_{k}}}\in\mathbb{R}^{|\tau_{j_{k}}|} denote the submatrix and subvector with rows indexed in the set \tau_{j_{k}} , respectively.
    5:   Determine the index set of positive integers
           \mathcal{I}_k = \left\lbrace i_{k}\Big|\:|b^{(i_{k})}_{\tau_{j_{k}}}-A^{(i_{k})}_{\tau_{j_{k}}}x_{k}|^{2} \geq\varepsilon_{k}\|A^{(i_{k})}_{\tau_{j_{k}}}\|^{2}_{2}\right\rbrace.
    6:   Compute c_{k} = \mathop{\sum}\limits_{i_{k}\in \mathcal{I}_k }\left(b_{\tau_{j_{k}}}^{(i_{k})}-A_{\tau_{j_{k}}}^{(i_{k})}x_{k} \right)e_{i_k}\in\mathbb{R}^{|\tau_{j_{k}}|} , where e_{i_k} is the i_k th canonical basis vector in \mathbb{R}^{|\tau_{j_{k}}|} .
    7:   Set
           x_{k+1} = x_{k}+\dfrac{c _{k}^{T}(b_{\tau_{j_{k}}}-A_{\tau_{j_{k}}}x_{k})}{\|A_{\tau_{j_{k}}}^{T}c_{k}\|_{2}^{2}}A_{\tau_{j_{k}}}^{T}c_{k}.
    8: end for

    The theorem below establishes the convergence result of the VGBK algorithm.

    Theorem 4. For the consistent linear system (1.1), assume that there are no zero rows in A . Then, the iterative sequence \{x_k\}_{k = 0}^{\infty} , as generated by Algorithm 4, converges to the unique least-norm solution x_{*} = A^{\dagger}b in expectation for any initial guess x_0 that is in the column space of A^T . Moreover, the norm of error satisfies

    \begin{equation} \|x_{k+1}-x_{*}\|_{2}^{2}\leq(1-\alpha\omega_k\cdot\dfrac{\sigma_{\min}^{2}(A_{\tau_{j_{k}}})}{\sigma_{\max}^{2}(A_{\mathcal{I}_{k}})}\cdot\dfrac{\|A_{\mathcal{I}_{k}}\|_{F}^{2}}{\|A_{\tau_{j_{k}}}\|_{F}^{2}})\|x_{k}-x_{*}\|_{2}^{2}, \end{equation} (3.1)

    where \tau_{j_k} is the partition block chosen at the k th iteration, \mathcal{I}_k is the index set defined in Algorithm 4, \alpha\in (0, 1] and \omega_k\in (0, 1] for k = 0, 1, \ldots.

    Proof. The proof is partly inspired by [10,Theorem 3.1]. It follows from Steps 6 and 7 of Algorithm 4 that

    \begin{eqnarray} x_{k+1} = x_{k}+\dfrac{c _{k}^{T}(b_{\tau_{j_{k}}}-A_{\tau_{j_{k}}}x_{k})}{\|A_{\tau_{j_{k}}}^{T}c_{k}\|_{2}^{2}}A_{\tau_{j_{k}}}^{T}c_{k}, \end{eqnarray} (3.2)

    where c_{k} = \mathop{\sum}\limits_{i_k\in \mathcal{I}_{k} }\left(b^{(i_k)}-A^{(i_k)}x_{k} \right)e_{i}\in\mathbb{R}^{|\tau_{j_{k}}|} . Subtracting x_* from both sides of (3.2) yields

    \begin{eqnarray} x_{k+1}-x_{*} = x_{k}-x_{*}+\dfrac{c _{k}^{T}(b_{\tau_{j_{k}}}-A_{\tau_{j_{k}}}x_{k})}{\|A_{\tau_{j_{k}}}^{T}c_{k}\|_{2}^{2}}A_{\tau_{j_{k}}}^{T}c_{k} = (I-P_{k})(x_{k}-x_{*}), \end{eqnarray} (3.3)

    where P_{k} = \dfrac{A_{\tau_{j_{k}}}^{T}c_{k}c _{k}^{T}A_{\tau_{j_{k}}}}{\|A_{\tau_{j_{k}}}^{T}c_{k}\|_{2}^{2}}\in\mathbb{R}^{n\times n} is an orthogonal projection matrix, namely, P_k^T = P_k and P_k^2 = P_k . From (3.2) and (3.3), we have

    \begin{eqnarray*} (x_{k+1}-x_{*})^{T}(x_{k+1}-x_{k}) & = &(x_{k}-x_{*})^{T}\cdot(I-P_{k})^{T}\cdot\dfrac{c _{k}^{T}(b_{\tau_{j_{k}}}-A_{\tau_{j_{k}}}x_{k})}{\|A_{\tau_{j_{k}}}^{T}c_{k}\|_{2}^{2}}A_{\tau_{j_{k}}}^{T}c_{k}\\ & = &-(x_{k}-x_{*})^{T}\cdot(I-P_{k})^{T}\cdot P_{k} (x_{k}-x_{*})\\ & = &-(x_{k}-x_{*})^{T}\cdot (P_{k}-P_{k}^{2})\cdot(x_{k}-x_{*}) \\ & = &0, \end{eqnarray*}

    which implies that x_{k+1}-x_{*} is orthogonal to x_{k+1}-x_{k} .

    Computing the squared 2-norm of x_{k+1}-x_{*} in (3.3), together with the use of the Pythagorean theorem, renders

    \begin{eqnarray} \|x_{k+1}-x_{*}\|_{2}^{2} & = &\|(I-P_{k})(x_{k}-x_{*})\|_{2}^{2} \\ & = &\|x_{k}-x_{*}\|_{2}^{2}-\|P_{k}(x_{k}-x_{*})\|_{2}^{2} \\ & = &\|x_{k}-x_{*}\|_{2}^{2}-\|\dfrac{A_{\tau_{j_{k}}}^{T}c_{k}c _{k}^{T}A_{\tau_{j_{k}}}}{\|A_{\tau_{j_{k}}}^{T}c_{k}\|_{2}^{2}} (x_{k}-x_{*})\|_{2}^{2} \\ & = &\|x_{k}-x_{*}\|_{2}^{2}-\dfrac{|c _{k}^{T}(b_{\tau_{j_{k}}}-A_{\tau_{j_{k}}}x_{k})|^{2}}{\|A_{\tau_{j_{k}}}^{T}c_{k}\|_{2}^{2}}. \end{eqnarray} (3.4)

    Let |\mathcal{I}_{k}| be the number of elements in the index set \mathcal{I}_{k} and E_{k}\in \mathbb{R}^{|\tau_{j_{k}}|\times |\mathcal{I}_{k}|} be a matrix consisting of the canonical basis vectors e_{i_k}\in\mathbb{R}^{|\tau_{j_{k}}|} for i_k\in\mathcal{I}_{k} . Therefore, we get

    \begin{equation} \hat c_{k} = E_{k}^{T}c_{k}\in \mathbb{R}^{|\mathcal{I}_{k}|} \end{equation} (3.5)

    and

    \begin{equation} A_{\mathcal{I}_{k}} = E_{k}^{T}A_{\tau_{j_{k}}}\in \mathbb{R}^{|\mathcal{I}_{k}|\times n}. \end{equation} (3.6)

    In a nutshell, \hat{c}_k and A_{\mathcal{I}_{k}} are condensed counterparts of {c}_k and A_{\tau_{j_{k}}} by encapsulating rows indexed in the set \mathcal{I}_{k} . A straightforward computation using (3.5) and (3.6) yields

    \begin{eqnarray} \|\hat c_{k}\|_{2}^{2} & = &c_{k}^{T}E_{k}E_{k}^{T}c_{k} = \|c_{k}\|_{2}^{2} = \mathop{\sum}\limits_{i_k\in \mathcal{I}_{k} }|b^{(i_k)}-A^{(i_k)}x_{k} |^{2}, \end{eqnarray} (3.7)
    \begin{eqnarray} \|A_{\tau_{j_{k}}}^{T}c_{k}\|_{2}^{2}& = &\hat c_{k}^{T}E_{k}^{T}A_{\tau_{j_{k}}}A_{\tau_{j_{k}}}^{T}E_{k}\hat c_{k} = \hat c_{k}^{T}A_{\mathcal{I}_{k}}A_{\mathcal{I}_{k}}^{T}\hat c_{k} = \|A_{\mathcal{I}_{k}}^{T}\hat c_{k}\|_{2}^{2}\leq\sigma_{\max}^{2}(A_{\mathcal{I}_{k}})\|\hat c_{k}\|_{2}^{2}, \end{eqnarray} (3.8)

    where \sigma_{\max}^{2}(A_{\mathcal{I}_{k}}) is the largest singular value of A_{\mathcal{I}_{k}} . Besides, it follows from the definition of c_{k} and (3.7) that

    \begin{equation} c_{k}^{T}(b_{\tau_{j_{k}}}-A_{\tau_{j_{k}}}x_{k}) = \mathop{\sum}\limits_{i_k\in \mathcal{I}_{k} }|b_{\tau_{j_{k}}}^{(i_k)}-A_{\tau_{j_{k}}}^{(i_k)}x_{k} |^{2} = \|\hat c_{k}\|_{2}^{2}.\\ \end{equation} (3.9)

    Combining (3.7)–(3.9), we have

    \begin{eqnarray} \dfrac{|c _{k}^{T}(b_{\tau_{j_{k}}}-A_{\tau_{j_{k}}}x_{k})|^{2}}{\|A_{\tau_{j_{k}}}^{T}c_{k}\|_{2}^{2}}& = &\dfrac{\mathop{\sum}\limits_{i_k\in \mathcal{I}_{k}}|b_{\tau_{j_{k}}}^{(i_k)}-A_{\tau_{j_{k}}}^{(i_k)}x_{k}|^{2}\|\hat c_{k}\|_{2}^{2}}{\|A_{\mathcal{I}_{k}}^{T}\hat c_{k}\|_{2}^{2}}\\ &\geq&\dfrac{\mathop{\sum}\limits_{i_k\in \mathcal{I}_{k} }|b_{\tau_{j_{k}}}^{(i_k)}-A_{\tau_{j_{k}}}^{(i_k)}x_{k} |^{2}\|\hat c_{k}\|_{2}^{2}}{\sigma_{\max}^{2}(A_{\mathcal{I}_{k}})\|\hat c_{k}\|_{2}^{2}}\\ & = &\dfrac{\mathop{\sum}\limits_{i_k\in \mathcal{I}_{k} }|b_{\tau_{j_{k}}}^{(i_k)}-A_{\tau_{j_{k}}}^{(i_k)}x_{k} |^{2}}{\sigma_{\max}^{2}(A_{\mathcal{I}_{k}})}\\ & = &\mathop{\sum}\limits_{i_k\in \mathcal{I}_{k}}(\dfrac{|b_{\tau_{j_{k}}}^{(i_k)}-A_{\tau_{j_{k}}}^{(i_k)}x_{k}|^{2}}{\|A_{\tau_{j_{k}}}^{(i_k)}\|_{2}^{2}}\cdot\dfrac{\|A_{\tau_{j_{k}}}^{(i_k)}\|_{2}^{2}}{\sigma_{\max}^{2}(A_{\mathcal{I}_{k}})}) \\ &\geq&\alpha\cdot\mathop{\max}\limits_{i\in \tau_{j_{k}} }\left\lbrace\dfrac{|b_{\tau_{j_{k}}}^{(i)}-A_{\tau_{j_{k}}}^{(i)}x_{k}|^{2}}{\|A_{\tau_{j_{k}}}^{(i)}\|^{2}_{2}} \right\rbrace\cdot\dfrac{\mathop{\sum}\limits_{i_k\in \mathcal{I}_{k}}\|A_{\tau_{j_{k}}}^{(i_k)}\|_{2}^{2}}{\sigma_{\max}^{2}(A_{\mathcal{I}_{k}})} . \end{eqnarray} (3.10)

    Moreover, \|b_{\tau_{j_{k}}}-A_{\tau_{j_{k}}}x_{k}\|_{2}^{2} can be recast as

    \begin{eqnarray} \|b_{\tau_{j_{k}}}-A_{\tau_{j_{k}}}x_{k}\|_{2}^{2} = \mathop{\sum}\limits_{i\in \tau_{j_{k}}}\dfrac{|b_{\tau_{j_{k}}}^{(i)}-A_{\tau_{j_{k}}}^{(i)}x_{k} |^{2}}{\|A_{\tau_{j_{k}}}^{(i)}\|_{2}^{2}}\|A_{\tau_{j_{k}}}^{(i)}\|_{2}^{2}\leq\mathop{\max}\limits_{i\in \tau_{j_{k}}}\left\lbrace{\dfrac{|b_{\tau_{j_{k}}}^{(i)}-A_{\tau_{j_{k}}}^{(i)}x_{k} |^{2}}{\|A_{\tau_{j_{k}}}^{(i)}\|_{2}^{2}}}\right\rbrace\mathop{\sum}\limits_{i\in \tau_{j_{k}}}\|A_{\tau_{j_{k}}}^{(i)}\|_{2}^{2}, \end{eqnarray}

    or, equivalently,

    \begin{eqnarray} \dfrac{\|b_{\tau_{j_{k}}}-A_{\tau_{j_{k}}}x_{k}\|_{2}^{2}}{\mathop{\sum}\limits_{i\in \tau_{j_{k}}}\|A_{\tau_{j_{k}}}^{(i)}\|_{2}^{2}} \leq \mathop{\max}\limits_{i\in \tau_{j_{k}}}\left\lbrace{\dfrac{|b_{\tau_{j_{k}}}^{(i)}-A_{\tau_{j_{k}}}^{(i)}x_{k} |^{2}}{\|A_{\tau_{j_{k}}}^{(i)}\|_{2}^{2}}}\right\rbrace. \end{eqnarray} (3.11)

    Denote u_k = x_k-x_* and u_k = u_{k1}+u_{k2} , where u_{k1} is in the column space of A_{\tau_{j_{k}}}^T and u_{k2} is in the subspace orthogonal to the column space of A_{\tau_{j_{k}}}^T , i.e., A_{\tau_{j_{k}}} u_{k2} = 0 . Let \|u_{k1}\|_2^2 = \omega_k\|u_{k}\|_2^2 with \omega_k\in(0, 1] . Therefore,

    \begin{eqnarray*} \|b_{\tau_{j_{k}}}-A_{\tau_{j_{k}}}x_{k}\|_{2}^{2} = \|A_{\tau_{j_{k}}}u_{k}\|_{2}^{2} = \|A_{\tau_{j_{k}}}u_{k1}\|_{2}^{2}\geq \sigma_{\min}^{2}(A_{\tau_{j_{k}}})\|u_{k1}\|_2^2 = \omega_k \sigma_{\min}^{2}(A_{\tau_{j_{k}}})\|u_{k}\|_2^2. \end{eqnarray*}

    Multiplying both sides of (3.11) with \alpha , we attain that

    \begin{eqnarray} \alpha\cdot\mathop{\max}\limits_{i\in \tau_{j_{k}} }\left\lbrace\dfrac{|b_{\tau_{j_{k}}}^{(i)}-A_{\tau_{j_{k}}}^{(i)}x_{k}|^{2}}{\|A_{\tau_{j_{k}}}^{(i)}\|^{2}_{2}} \right\rbrace\geq \dfrac{\alpha\cdot\|b_{\tau_{j_{k}}}-A_{\tau_{j_{k}}}x_{k}\|_{2}^{2}}{\mathop{\sum}\limits_{i\in \tau_{j_{k}}}\|A_{\tau_{j_{k}}}^{(i)}\|_{2}^{2}} \geq\dfrac{\alpha\omega_k\sigma_{\min}^{2}(A_{\tau_{j_{k}}})\cdot\|u_k\|_{2}^{2}}{\mathop{\sum}\limits_{i\in \tau_{j_{k}}}\|A_{\tau_{j_{k}}}^{(i)}\|_{2}^{2}}, \end{eqnarray} (3.12)

    where \sigma_{\min}^{2}(A_{\tau_{j_{k}}}) is the smallest nonzero singular value of A_{\tau_{j_{k}}} .

    Substituting the inequality (3.12) into (3.10) gives

    \begin{eqnarray} \dfrac{|c_{k}^{T}(b_{\tau_{j_{k}}}-A_{\tau_{j_{k}}}x_{k})|^{2}}{\|A_{\tau_{j_{k}}}^{T}c_{k}\|_{2}^{2}}\geq \dfrac{\alpha\omega_k\sigma_{\min}^{2}(A_{\tau_{j_{k}}})}{\mathop{\sum}\limits_{i\in \tau_{j_{k}}}\|A_{\tau_{j_{k}}}^{(i)}\|_{2}^{2}}\cdot\dfrac{\mathop{\sum}\limits_{i_k\in \mathcal{I}_{k}}\|A_{\tau_{j_{k}}}^{(i_k)}\|_{2}^{2}}{\sigma_{\max}^{2}(A_{\mathcal{I}_{k}})}\|u_k\|_{2}^{2}. \end{eqnarray} (3.13)

    Finally, it follows from (3.4) and (3.13) that

    \begin{eqnarray*} \|u_{k+1}\|_{2}^{2} & = &\|u_k\|_{2}^{2}-\dfrac{|c _{k}^{T}(b_{\tau_{j_{k}}}-A_{\tau_{j_{k}}}x_{k})|^{2}}{\|A_{\tau_{j_{k}}}^{T}c_{k}\|_{2}^{2}} \nonumber\\ &\leq&\|u_k\|_{2}^{2}-\dfrac{\alpha\omega_k\sigma_{\min}^{2}(A_{\tau_{j_{k}}})}{\mathop{\sum}\limits_{i\in \tau_{j_{k}}}\|A_{\tau_{j_{k}}}^{(i)}\|_{2}^{2}}\cdot\dfrac{\mathop{\sum}\limits_{i_k\in \mathcal{I}_{k}}\|A_{\tau_{j_{k}}}^{(i_k)}\|_{2}^{2}}{\sigma_{\max}^{2}(A_{\mathcal{I}_{k}})}\|u_k\|_{2}^{2}\nonumber\\ & = &(1-\dfrac{\alpha\omega_k\sigma_{\min}^{2}(A_{\tau_{j_{k}}})}{\mathop{\sum}\limits_{i\in \tau_{j_{k}}}\|A_{\tau_{j_{k}}}^{(i)}\|_{2}^{2}}\cdot\dfrac{\mathop{\sum}\limits_{i_k\in \mathcal{I}_{k}}\|A_{\tau_{j_{k}}}^{(i_k)}\|_{2}^{2}}{\sigma_{\max}^{2}(A_{\mathcal{I}_{k}})})\|u_k\|_{2}^{2}\nonumber\\& = &(1-\alpha\omega_k\cdot\dfrac{\sigma_{\min}^{2}(A_{\tau_{j_{k}}})}{\sigma_{\max}^{2}(A_{\mathcal{I}_{k}})}\cdot\dfrac{\|A_{\mathcal{I}_{k}}\|_{F}^{2}}{\|A_{\tau_{j_{k}}}\|_{F}^{2}})\|u_k\|_{2}^{2}, \end{eqnarray*}

    which completes the proof.

    Remark 3. Some remarks are in order. Since \|A_{\mathcal{I}_k}\|_F^2\geq\sigma_{\max}^2(A_{\mathcal{I}_k}) , it follows that

    \begin{eqnarray} 1-\alpha\omega_k\cdot\dfrac{\sigma_{\min}^{2}(A_{\tau_{j_{k}}})}{\sigma_{\max}^{2}(A_{\mathcal{I}_{k}})}\cdot\dfrac{\|A_{\mathcal{I}_{k}}\|_{F}^{2}}{\|A_{\tau_{j_{k}}}\|_{F}^{2}} \leq1-\alpha\omega_k\cdot\dfrac{\sigma_{\min}^{2}(A_{\tau_{j_{k}}})}{\|A_{\tau_{j_{k}}}\|_{F}^{2}} < 1, \end{eqnarray} (3.14)

    where \alpha\in(0, 1] and \omega_k\in(0, 1] . Therefore, the VGBK algorithm converges to the least-norm solution under the assumption of Theorem 4. A special case of the VGBK algorithm exists when the number of blocks s = 1 , i.e., A_{\tau_{j_{k}}} = A . If so, then \omega_k = 1 ; thus (3.14) is simplified to

    \begin{eqnarray*} 1-\alpha\omega_k\cdot\dfrac{\sigma_{\min}^{2}(A_{\tau_{j_{k}}})}{\sigma_{\max}^{2}(A_{\mathcal{I}_{k}})}\cdot\dfrac{\|A_{\mathcal{I}_{k}}\|_{F}^{2}}{\|A_{\tau_{j_{k}}}\|_{F}^{2}} \leq1-\alpha\cdot\frac{\sigma_{\min}^{2}(A)}{\|A\|_{F}^{2}}. \end{eqnarray*}

    More generally, if \sigma_{\min}^{2}(A_{\tau_{j_{k}}})/\|A_{\tau_{j_{k}}}\|_{F}^2\geq\sigma_{\min}^{2}(A)/\|A\|_{F}^2 holds in (3.14), then the inequality (3.14) reduces to

    \begin{eqnarray*} 1-\alpha\omega_k\cdot\dfrac{\sigma_{\min}^{2}(A_{\tau_{j_{k}}})}{\sigma_{\max}^{2}(A_{\mathcal{I}_{k}})}\cdot\dfrac{\|A_{\mathcal{I}_{k}}\|_{F}^{2}}{\|A_{\tau_{j_{k}}}\|_{F}^{2}} \leq1-\alpha\omega_k\cdot \frac{\sigma_{\min}^{2}(A)}{\|A\|_{F}^2}, \end{eqnarray*}

    which resembles those error bounds given in Theorems 1–3. It should be noted that the condition \sigma_{\min}^{2}(A_{\tau_{j_{k}}})/\|A_{\tau_{j_{k}}}\|_{F}^2\geq\sigma_{\min}^{2}(A)/\|A\|_{F}^2 is not too stringent; for instance, if A is a matrix of normally distributed random numbers, then \|A_{\tau_{j_{k}}}\|_{F}^2/\|A\|_{F}^2\approx s^{-1} for sufficiently large m , where s is the number of blocks and m is the number of rows in A . Consequently, the condition \sigma_{\min}^{2}(A_{\tau_{j_{k}}})/\|A_{\tau_{j_{k}}}\|_{F}^2\geq\sigma_{\min}^{2}(A)/\|A\|_{F}^2 holds as long as \sigma_{\min}^{2}(A_{\tau_{j_{k}}})/\sigma_{\min}^{2}(A)\gtrsim s^{-1} . In the numerical experiments, s is always chosen as s = \lfloor 0.8\%m\rfloor or s = \lfloor 4\%m\rfloor , which implies that s^{-1} approaches 0 if m is large enough. In the implementation of the VGBK algorithm, we only work on a small fraction of A and b , instead of exploiting the full information of A and b as done in the GBK, FDBK and FGBK methods. For this reason, the VGBK algorithm generally requires more iteration steps than these three algorithms to reach the required accuracy. Nevertheless, this does not necessarily imply that the VGBK algorithm is numerically inferior to the three counterparts. On the contrary, as we will explain soon, the VGBK algorithm consumes far less computational cost than the other three algorithms per iteration step, which can readily reduce the total computing time. Numerical examples in the next section also validate that the VGBK algorithm enjoys noticeable gains in CPU time as compared to the other three algorithms.

    Before ending this section, we investigate the computational overhead of VGBK at each iteration step. The complexities of the block Kaczmarz algorithms outlined in Section 2, including the GBK, FDBK and FGBK algorithms, are also analyzed for comparison. For each method, the total computational overhead per iteration consists of the cost for determining the pseudoinverse of a matrix, finding the maximum of a vector and updating the iteration and other costs. To keep the exposition simple, we assume that the coefficient matrix A in (1.1) is dense and p = 2 in the FGBK algorithm. The results are tabulated in Table 1.

    Table 1.  Comparison of computational complexities for GBK, FDBK, FGBK and VGBK with m -D standing for m -dimensional.
    Method Pseudoinverse-free No. of finding maximums Iteration update (FLOPS) Others (FLOPS)
    GBK No 1 ( m -D vector) (2|\mathcal{I}_k^{G}|+1)n (2n+3)m+1
    FDBK Yes 1 ( m -D vector) (2|\mathcal{I}_k^{FD}|+4)n+2|\mathcal{I}_k^{FD}|+1 (2n+5)m+4
    FGBK Yes 1 ( m -D vector) (2|\mathcal{I}_k^{FG}|+4)n+2|\mathcal{I}_k^{FG}|+1 (2n+3)m+1
    VGBK Yes 1 ( |\tau_{j_k}| -D vector) (2|\mathcal{I}_k|+4)n+2|\mathcal{I}_k|+1 (2n+3)|\tau_{j_k}|+1

     | Show Table
    DownLoad: CSV

    The second column of Table 1 indicates whether the implementation of the corresponding method involves computing the pseudoinverse. We see that the FDBK, FGBK and VGBK algorithms are pseudoinverse-free, while the GBK method involves computing the pseudoinverse of a submatrix of A . The computation involving the pseudoinverse of a matrix is expensive and often constitutes the bulk of the total computational cost. A standard work-around refers to iterative methods such as CGLS [27]. In Table 1, we do not specify the method for addressing the pseudoinverse in GBK as we just qualitatively point out that the GBK algorithm is not pseudoinverse-free; a detailed analysis can be made available if the method is prescribed, such as CGLS [28,p. 5]. The third column shows the number of maximum to be found. Take the FDBK algorithm for example. We need to figure out \max_{1\leq i\leq m}\{{|b^{(i)}-A^{(i)}x_{k}|^{2}}/{\|A^{(i)}\|_{2}^{2}}\} at Step 2 of Algorithm 2, which is equivalent to finding the maximum of an m -dimensional vector. The fourth column displays the FLOPS needed when updating the iterate x_k . The last column collects FLOPS that are not counted in the previous three columns; for instance, in the case of FDBK, it includes the cost of computing the parameter \varepsilon_k , determining the index set \mathcal{I}_k^{FD} and accessing the vector c_k .

    In the VGBK algorithm, recall that \mathcal{I}_k \subset \tau_{j_{k}} and \tau_{j_{k}}\subset [m] . It follows that |\mathcal{I}_k| < |\tau_{j_{k}}|\ll m . Furthermore, it generally holds that |\mathcal{I}_k|\ll\min\{|\mathcal{I}^{G}_k|, |\mathcal{I}^{FD}_k|, |\mathcal{I}^{FG}_k|\} since \mathcal{I}_k is a subset of \tau_{j_{k}} , which itself is a subset of [m] while \mathcal{I}^{G}_k , \mathcal{I}^{FD}_k , \mathcal{I}^{FG}_k are subsets of [m] . Therefore, the total cost per iteration for the VGBK algorithm can be much less than that for its three counterparts. To see this intuitively, we consider the number of blocks s = \lfloor 0.8\%m\rfloor , which is a practical choice for the overdetermined linear system given in Section 4. Thus each block \tau_{j_{k}} has about m/s\approx125 rows on average. In light of this, the computational cost per iteration for the VGBK algorithm roughly accounts for 1/s of those for the GBK, FDBK and FGBK methods, which greatly reduces the computing time and offsets the excessive iterations due to the initial partitioning. Such an advantage becomes more pronounced when m is large since 1/s = 1/\lfloor 0.8\%m\rfloor goes to 0 as m increases; see Section 4 for more details.

    In this section, some numerical experiments are presented to verify the effectiveness of the VGBK algorithm in comparison with three block Kaczmarz counterparts, i.e., GBK [28], FDBK [10] and FGBK [33]. All experiments were run on a laptop with an AMD R5-5600H CPU @3.30GHZ 16 GB RAM by operating MATLAB 2022a on a Windows 11 system. Numerical performance of the four algorithms is assessed by means of the CPU time in seconds (CPU) and the number of iteration steps (IT). To demonstrate the effectiveness, we determine the speed-up values of VGBK against GBK, FDBK and FGBK, which are respectively defined by

    \begin{equation*} \text {speed}-\text {up1} = \frac{\text { CPU time of GBK }}{\text { CPU time of VGBK }}, \end{equation*}
    \begin{equation*} \text {speed}-\text {up2} = \frac{\text { CPU time of FDBK }}{\text { CPU time of VGBK }}, \end{equation*}
    \begin{equation*} \text {speed}-\text {up3} = \frac{\text { CPU time of FGBK }}{\text { CPU time of VGBK }}. \end{equation*}

    As stated in Section 1, we are only concerned with the solution of consistent linear system (1.1) in this work. To this end, the right-hand side vector b is set to be Ax_* in all examples, where x_{*} is generated by the MATLAB built-in function \texttt{randn}. The coefficient matrices A are either given by \texttt{randn} or taken from real-world applications [11,23]; see Table 2 for detailed properties of these matrices. As such, we are guaranteed to work on the consistent linear system (1.1). The initial guess is given by x_0 = 0 . All computations are terminated once the relative solution error (RSE) at the current iteration satisfies that \text{RSE} < 10^{-6} or when the number of iteration steps exceeds the maximum of 200000 , where

    \begin{equation*} \mathrm{RSE} = \dfrac{\|x_{k}-x_{*}\|_{2}^{2}}{\|x_{*}\|_{2}^{2}}. \end{equation*}
    Table 2.  Properties of the test matrices.
    Matrix Size Density cond( A ) Application field
    \texttt{ash958} 958\times 292 0.6849% 3.2014 Least Squares Problem
    \texttt{abtaha1} 14596\times209 1.6819% 12.2284 Combinatorial Problem
    \texttt{abtaha2} 37932\times331 1.0930% 12.2199 Combinatorial Problem
    \texttt{ch7-9-b2} 17640\times1512 0.1984% 1.6077\cdot10^{15} Combinatorial Problem
    \texttt{ch8-8-b2} 18816\times1568 0.1913% 1.6326\cdot10^{15} Combinatorial Problem
    \texttt{Franz10} 19588\times4164 0.1195% 1.2694\cdot10^{16} Combinatorial Problem
    \texttt{mk11-b3} 17325\times6930 0.0577% 5.8553\cdot10^{15} Combinatorial Problem
    \texttt{bibd_49_3} 1176\times18424 0.2551% 1.7701 Combinatorial Problem
    \texttt{cat_ears_3_4} 5226\times13271 0.0571% 17.9503 Combinatorial Problem
    \texttt{GL7d25} 2798\times21074 0.1385% 1.4241\cdot10^{19} Combinatorial Problem
    \texttt{df2177} 630\times 10358 0.3423% 2.0066 Linear Programming Problem
    \texttt{lp22} 2958\times 16392 0.1413% 25.7823 Linear Programming Problem
    \texttt{lp_maros_r7} 3136\times9408 0.4910% 2.3231 Linear Programming Problem

     | Show Table
    DownLoad: CSV

    In Algorithm 4, recall that the implementation of VGBK involves two parameters, i.e., the number of blocks s and the tuning parameter \alpha . Numerical experiments show that the VGBK algorithm often offers sound performance with appropriate choices of s and \alpha . The experimental parameters in the GBK, FDBK, FGBK and VGBK algorithms are set as follows:

    I. GBK [28]: The parameter \alpha in Algorithm 1 is chosen to be

    \begin{eqnarray*} \alpha = \frac{1}{2}+\frac{\|b-Ax_k\|_2^2}{2\|A\|_F^2}\left(\mathop{\max}\limits_{1\leq i\leq m}\left\lbrace\dfrac{|b^{(i)}-A^{(i)}x_{k}|^{2}}{\|A^{(i)}\|^{2}_{2}} \right\rbrace\right)^{-1}, \end{eqnarray*}

    which is recommended in the original work on the GBK [28].

    II. FDBK [10]: Algorithm 2 is parameter-free.

    III. FGBK [33]: In [33], FGBK( p ) with p = 2 requires the least CPU time for most tests. Therefore, we set p = 2 . Besides, the parameter \alpha is set to be 0.05 or 0.1 , which are preferred in [33].

    IV. VGBK: The tuning parameter \alpha and the number of blocks s are required in Algorithm 4. We set s = \lfloor 0.8\%m\rfloor (overdetermined linear system), s = \lfloor 4\%m\rfloor (underdetermined linear system) and \alpha = 0.1 , the reason for which is given in Examples 1–2. Here, \lfloor \cdot \rfloor denotes the floor function.

    The purpose of this section is two-fold. In Section 4.1, we look into choices of the two parameters s and \alpha for various types of test matrices. In Section 4.2, the numerical performance of the VGBK algorithm is appraised by comparing it with that of the GBK, FDBK and FGBK algorithms for abundant experiments.

    Theoretically identifying the optimal values of s and \alpha in the VGBK method, if they exist, is not available for now, and we shall not pursue it here. Instead, we attempt to experimentally determine practical choices of s and \alpha in this subsection. In Example 1, we empirically analyze the influence of s on VGBK. In Example 2, we tackle the problem of choosing \alpha .

    Example 1. This example serves to show how the number of blocks s impacts the numerical performance of the VGBK algorithm. For the moment, \alpha is selected to be 0.1 ; numerical experiments in Example 2 indicate that setting \alpha = 0.1 turns out to be practical. We consider linear system (1.1) with thin ( m\geq n ) or fat ( m\leq n ) coefficient matrices A which are either generated by the MATLAB function \texttt{randn} or adapted from the SuiteSparse Matrix Collection [23]. Numerical results with varying values of s are illustrated in Figures 13.

    Figure 1.  CPU and IT results for thin coefficient matrices \texttt{randn}(m, 1000) with varying s for the VGBK algorithm, where m = 2000 , 3000 , 4000 and 5000 , respectively.
    Figure 2.  CPU and IT results for fat coefficient matrices \texttt{randn}(1000, n) with varying s for the VGBK algorithm, where n = 2000 , 3000 , 4000 and 5000 , respectively.
    Figure 3.  CPU and IT results for thin (\texttt{abtaha1, ash958}) and fat coefficient matrices (\texttt{df2177, lp_maros_r7}) with varying s for the VGBK algorithm.

    We are now in a position to give some remarks. For either thin or fat coefficient matrices, as demonstrated in Figures 13, the CPU time (though with wiggles) exhibits a roughly V -shaped curve while the number of IT increases in general as s increases. It can be explained as follows. As for the CPU time, if s is small, or put it another way, if each submatrix/vector associated with \tau_{j} from the partition in the VGBK algorithm is of large size, then the computation overhead tends to increase, which in turn may consume more CPU time. On the contrary, if s is large, i.e., each submatrix/vector indexed in \tau_{j} is of small size, then only a linear system of small size is handled; more iterations are required in exchange for such inexpensive computation, which probably boosts the total CPU time. This explains why the CPU time first drops and then bounces up as s grows. Nevertheless, the scenario for the number of IT differs. In fact, small values of s often lead to large blocks which maintain most information of the original linear system; computation with such large blocks tends to less demanding in the number of iterations, which is quite in line with the observations reported in Figures 13.

    In practice, it is often the CPU time that counts. Therefore, we restrict ourselves to diminishing the CPU time here. As portrayed in Figures 13, one needs to strike a balance when choosing the number of blocks s . To this end, we pinpoint the "experimental minimum" of the CPU time by tracking the MATLAB data tips. Take the thin matrix \texttt{abtaha1} as an example; the first subplot in Figure 3 illustrates that the"experimental minimum" of the CPU time is achieved at s = 121 , with approximately 0.82\% of the row size of \texttt{abtaha1} ( m = 14596 ). For all experiments in this example, the percentage s/m ranges from 0.37\% to 1.14\% for thin matrices while that for fat matrices ranges from 2.10\% to 5.71\% . Heuristically, any value in the intervals [\lfloor 0.37\%m\rfloor, \lfloor 1.14\%m\rfloor] and [\lfloor 2.10\%m\rfloor, \lfloor 5.71\%m\rfloor] , separately, can be a reasonable choice for thin and fat matrices, where \lfloor \cdot\rfloor is the floor function and m is the row size of A . In light of this, we pick two intermediate values by setting s = \lfloor 0.8\%m\rfloor for overdetermined linear systems and s = \lfloor 4\%m\rfloor for underdetermined ones in Examples 2–5.

    Example 2. In this example, we examine how the tuning parameter \alpha influences the numerical performance of the VGBK algorithm. In Algorithm 4, if the value of \alpha is large, then only a small fraction of rows are extracted from A and b to update the iterate x_{k+1} . To ameliorate the possible loss of information, more iterations (and thus CPU time) are required to reach the prescribed accuracy; see Steps 5–7 in Algorithm 4. Conversely, if \alpha is chosen to be small, more rows will be retained, which amounts to updating the iterate x_{k+1} for a relatively large linear system. As a result, the computational overhead per iteration increases, which results in more CPU time. Such an observation tallies with Figures 46 for various test matrices. Therefore, one should exercise caution in picking the value of \alpha . As shown in Figures 46, values of \alpha that entail the least CPU time for either thin or fat matrices lie in some neighborhood of 0.1 . For this reason, the choice \alpha = 0.1 can be practical for the VGBK algorithm. In what follows, we assume that \alpha = 0.1 in all experiments. It should also be stressed that the value 0.1 is also a candidate of choice for \alpha in the FGBK algorithm [33,Section 3].

    Figure 4.  CPU and IT results for thin coefficient matrices \texttt{randn} (m, 3000) , where m = 6000 , 8000 , 10000 and 12000 , with varying \alpha for the VGBK algorithm.
    Figure 5.  CPU and IT results for fat coefficient matrices \texttt{randn} (3000, n) , where n = 6000 , 8000 , 10000 and 12000 , with varying \alpha for the VGBK algorithm.
    Figure 6.  CPU and IT results for thin (\texttt{abtaha2, ash958}) and fat coefficient matrices (\texttt{df2177, lp_maros_r7}), with varying \alpha for the VGBK algorithm.

    In this subsection, the effectiveness of the VGBK algorithm is justified by comparing it with the GBK, FDBK and FGBK algorithms. Reminiscent of Section 4.1, we consider consistent linear system (1.1) with coefficient matrices adapted from the MATLAB built-in function \texttt{randn} or the SuiteSparse Matrix Collection [23]. The structure of this subsection is as follows. In Example 3, we consider solving overdetermined linear system (1.1). In Example 4, we look on solving underdetermined linear system (1.1). In Example 5, we touch upon solving linear system (1.1) with large coefficient matrices from real-world problems.

    Example 3. In this example, we are concerned with solving linear systems with thin coefficient matrices generated by the MATLAB built-in function \texttt{randn}. As recommended in Examples 1 and 2, we set s = \lfloor 0.8\%m\rfloor and \alpha = 0.1 for the VGBK algorithm. The number of IT and CPU time for the GBK, FDBK, FGBK and VGBK algorithms are tabulated in Table 3. A rough conclusion is that FDBK, FGBK and VGBK outperform their prototype GBK in terms of CPU time; see, e.g., the case of 10000\times 5000 where the speed-up of VGBK against GBK is up to 13.8089. Moreover, VGBK surpasses all block Kaczmarz peers regarding CPU time, albeit with more iteration steps. This can be interpreted as follows. In the VGBK, we only need to work on a small linear system with just \lfloor 0.8\%m\rfloor rows once the partition is applied, which dramatically reduces the computational cost (and thus CPU) per iteration. In general, the gain in CPU time reduction becomes more pronounced as m grows. It tends to give us a hint that the VGBK algorithm is suitable for solving large overdetermined linear systems. Finally, we explain how the comparison with the FGBK method is made. In fact, the speed-up3 is obtained by FGBK with less CPU time for either \alpha = 0.05 or \alpha = 0.1 as compared to that of VGBK; for example, speed-up3 = 2.2912 is computed from a ratio of 3.4606 to 1.5104 in the case of 10000\times 5000 .

    Table 3.  Numerical results for overdetermined linear systems in Example 3.
    m\times n m 10000 12000 14000 16000 18000 20000
    n 5000 5000 5000 5000 5000 5000
    GBK IT 466 289 231 185 158 131
    CPU 20.8569 14.0318 11.7329 10.0255 8.9900 8.2728
    FDBK IT 489 307 229 177 152 136
    CPU 9.0201 6.6980 5.5091 4.7936 4.5030 4.5538
    FGBK ( \alpha =0.05) IT 71 46 36 30 25 22
    CPU 4.3284 3.2734 3.0066 2.7094 2.6355 2.6696
    FGBK ( \alpha =0.1) IT 74 47 37 31 26 23
    CPU 3.4606 2.6517 2.3205 2.2693 2.0589 1.9840
    VGBK IT 1522 1107 968 867 813 744
    CPU 1.5104 1.0828 0.8782 0.8355 0.7402 0.6693
    s 80 96 112 128 144 160
    speed-up1 13.8089 12.9588 13.3602 11.9994 12.1454 12.3604
    speed-up2 5.9720 6.1858 6.2732 5.7374 6.0835 6.8038
    speed-up3 2.2912 2.4489 2.6423 2.7161 2.7815 2.9643

     | Show Table
    DownLoad: CSV

    Example 4. In this example, we consider solving linear systems with fat coefficient matrices generated by \texttt{randn}. The number of IT and CPU time for the GBK, FDBK, FGBK and VGBK algorithms are listed in Table 4. We choose s = \lfloor 4\%m\rfloor and \alpha = 0.1 for VGBK, as suggested in Examples 1 and 2. For all test matrices, VGBK outperforms the other three algorithms regarding CPU time, with the speed-up ranging from 1.0887 to 18.0378 . Besides, the speed-up becomes noticeable as m increases; for instance, the speed-up of VGBK against GBK reaches as high as 18.0378 for the case 12000\times15000 . It should be noted that FGBK renders a performance that is comparable to that of the VGBK when m is relatively small. As m increases, however, the VGBK algorithm has a remarkable advantage over FGBK; see the results for speed-up3 in Table 4.

    Table 4.  Numerical results for underdetermined linear systems in Example 4.
    m\times n m 2000 4000 6000 8000 10000 12000
    n 15000 15000 15000 15000 15000 15000
    GBK IT 70 158 329 735 1628 5285
    CPU 4.6013 17.3493 48.2193 136.6090 323.2486 1156.3538
    FDBK IT 71 157 338 746 1664 5292
    CPU 1.2994 4.8180 13.8537 38.0042 104.2648 373.6110
    FGBK (\alpha =0.05) IT 13 24 45 90 189 573
    CPU 0.5065 1.8356 5.0438 13.7018 34.3899 124.1716
    FGBK ( \alpha =0.1) IT 15 26 47 92 195 589
    CPU 0.4837 1.6001 4.3786 11.3237 29.5134 106.1159
    VGBK IT 628 1756 4218 10190 24687 82799
    CPU 0.4443 1.3651 3.2769 8.9984 19.1812 64.1071
    s 80 160 240 320 400 480
    speed-up1 10.3563 12.7092 14.7149 15.1815 16.8524 18.0378
    speed-up2 2.9246 3.5294 4.2277 4.2234 5.4358 5.8279
    speed-up3 1.0887 1.1721 1.3362 1.2584 1.5387 1.6553

     | Show Table
    DownLoad: CSV

    Example 5. In this example, we shall compare the performance of the VGBK algorithm with those of GBK, FDBK and FGBK for solving linear systems with coefficient matrices from the SuiteSparse Matrix Collection whose properties are listed in Table 2. The corresponding numerical results are displayed in Tables 5 and 6. The effectiveness of VGBK is again verified regarding CPU time; for instance, the speed-up of VGBK against GBK, FDBK and FGBK varies from 2.0647 to 15.0433 , 2.2771 to 7.2730 and 1.2906 to 9.3184 , respectively. In this sense, the VGBK algorithm is very appealing for solving linear systems from real-world problems.

    Table 5.  Numerical results for overdetermined linear systems in Example 5.
    Test Matrix \texttt{abtaha1} \texttt{abtaha2} \texttt{ch7-9-b2} \texttt{ch8-8-b2} \texttt{Franz10} \texttt{mk11-b3}
    GBK IT 22 18 23 23 104 90
    CPU 0.0351 0.1121 0.6100 0.6007 5.3835 9.3379
    FDBK IT 101 83 39 43 198 95
    CPU 0.0954 0.2982 0.3535 0.4401 5.2831 4.0183
    FGBK ( \alpha =0.05) IT 82 72 5 6 47 14
    CPU 0.1400 0.4799 0.1658 0.1873 4.6478 1.6653
    FGBK ( \alpha =0.1) IT 89 92 9 8 49 20
    CPU 0.1245 0.4771 0.1701 0.1900 3.7400 1.8330
    VGBK IT 467 965 502 540 990 736
    CPU 0.0170 0.0512 0.0935 0.1017 0.7264 1.0474
    s 116 303 141 150 156 138
    speed-up1 2.0647 2.1895 6.5241 5.9066 7.4112 8.9153
    speed-up2 5.6118 5.8242 3.7807 4.3274 7.2730 3.8365
    speed-up3 7.3235 9.3184 1.7733 1.8417 5.1487 1.5899

     | Show Table
    DownLoad: CSV
    Table 6.  Numerical results for underdetermined linear systems in Example 5.
    Test Matrix \texttt{bibd_49_3} \texttt{cat_ears_3_4} \texttt{df2177} \texttt{GL7d25} \texttt{lp22} \texttt{lp_maros_r7}
    GBK IT 36 4045 41 57 1457 91
    CPU 2.4013 317.1875 0.6528 8.0641 61.2920 4.7440
    FDBK IT 37 4044 42 276 2447 92
    CPU 0.6679 112.6446 0.2235 7.7782 45.7603 1.4195
    FGBK ( \alpha =0.05) IT 10 465 11 277 589 18
    CPU 0.3229 29.8907 0.1087 7.4476 25.3378 0.7734
    FGBK ( \alpha =0.1) IT 11 580 12 275 657 18
    CPU 0.3400 30.8552 0.0978 7.0040 23.5825 0.6171
    VGBK IT 276 25568 159 3790 15019 912
    CPU 0.2502 21.0849 0.0686 3.4158 11.4409 0.3425
    s 47 209 25 111 118 125
    speed-up1 9.5975 15.0433 9.5160 2.3608 5.3573 13.8511
    speed-up2 2.6695 5.3424 3.2580 2.2771 3.9997 4.1445
    speed-up3 1.2906 1.4176 1.4257 2.0505 2.0612 1.8018

     | Show Table
    DownLoad: CSV

    By imposing a row partitioning, we put forth a VGBK algorithm for solving consistent linear systems. We establish theoretical results that characterize the convergence of the proposed algorithm. Numerical experiments demonstrate that the new algorithm offers good performance for a broad class of problems and outperforms other block Kaczmarz-type counterparts in terms of CPU time for large linear systems. The choice of partitioning approach is crucial for the performance of the proposed algorithm, and it can be regarded as important future work.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    The authors are grateful to Miss Xiang-Xiang Chen of Shanghai Maritime University for helpful discussions. The authors are grateful to the referees for their insightful comments and valuable suggestions, which greatly improved the original version of this paper.

    This work is supported by the National Natural Science Foundation of China (No. 12271342).

    The authors declare that they have no conflict of interest.



    [1] A. Agaskar, C. Wang, Y. M. Lu, Randomized Kaczmarz algorithms: Exact MSE analysis and optimal sampling probabilities, 2014 IEEE Global Conference on Signal and Information Processing (Global-SIP), 389–393. https://doi.org/10.1109/GlobalSIP.2014.7032145
    [2] Z. Z. Bai, X. G. Liu, On the Meany inequality with applications to convergence analysis of several row-action iteration methods, Numer. Math., 124 (2013), 215–236. https://doi.org/10.1007/s00211-012-0512-6 doi: 10.1007/s00211-012-0512-6
    [3] Z. Z. Bai, L. Wang, On convergence rates of Kaczmarz-type methods with different selection rules of working rows, Appl. Numer. Math., 186 (2023), 289–319. https://doi.org/10.1016/j.apnum.2023.01.013 doi: 10.1016/j.apnum.2023.01.013
    [4] Z. Z. Bai, W. T. Wu, On convergence rate of the randomized Kaczmarz method, Linear Algebra Appl., 553 (2018), 252–269. https://doi.org/10.1016/j.laa.2018.05.009 doi: 10.1016/j.laa.2018.05.009
    [5] Z. Z. Bai, W. T. Wu, On greedy randomized Kaczmarz method for solving large sparse linear systems, SIAM J. Sci. Comput., 40 (2018), A592–A606. https://doi.org/10.1137/17M1137747 doi: 10.1137/17M1137747
    [6] Z. Z. Bai, W. T. Wu, On relaxed greedy randomized Kaczmarz methods for solving large sparse linear systems, Appl. Math. Lett., 83 (2018), 21–26. https://doi.org/10.1016/j.aml.2018.03.008 doi: 10.1016/j.aml.2018.03.008
    [7] Z. Z. Bai, W. T. Wu, Randomized Kaczmarz iteration methods: Algorithmic extensions and convergence theory, Jpn. J. Ind. Appl. Math., 40 (2023), 1421–1443. https://doi.org/10.1007/s13160-023-00586-7 doi: 10.1007/s13160-023-00586-7
    [8] E. Bodewig, Bericht über die verschiedenen Methoden zur Lösung eines System linearer Gleichungen mit reellen Koeffizienten, Cons. Naz. Ric., 324 (1948), 441–452.
    [9] C. Byrne, A unified treatment of some iterative algorithms in signal processing and image reconstruction, Inverse Probl., 20 (2004), 103–120. https://doi.org/10.1088/0266-5611/20/1/006 doi: 10.1088/0266-5611/20/1/006
    [10] J. Q. Chen, Z. D. Huang, On a fast deterministic block Kaczmarz method for solving large-scale linear systems, Numer. Algorithms, 89 (2022), 1007–1029. https://doi.org/10.1007/s11075-021-01143-4 doi: 10.1007/s11075-021-01143-4
    [11] T. A. Davis, Y. Hu, The university of Florida sparse matrix collection, ACM T. Math. Software, 38 (2011), 1–25. https://doi.org/10.1145/2049662.2049663 doi: 10.1145/2049662.2049663
    [12] K. Du, W. T. Si, X. H. Sun, Randomized extended average block Kaczmarz for solving least squares, SIAM J. Sci. Comput., 42 (2020), A3541–A3559. https://doi.org/10.1137/20M1312629 doi: 10.1137/20M1312629
    [13] Y. C. Eldar, D. Needell, Acceleration of randomized Kaczmarz methods via the Johnson-Lindenstrauss lemma, Numer. Algorithms, 58 (2011), 163–177. https://doi.org/10.1007/s11075-011-9451-z doi: 10.1007/s11075-011-9451-z
    [14] T. Elfving, Block-iterative methods for consistent and inconsistent linear equations, Numer. Math., 35 (1980), 1–12. https://doi.org/10.1007/BF01396365 doi: 10.1007/BF01396365
    [15] G. E. Forsythe, Solving linear algebraic equations can be interesting, Bull. Amer. Math. Soc., 59 (1953), 299–329.
    [16] R. Gordon, R. Bender, G. T. Herman, Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and X-ray photography, J. Theoret. Biol., 29 (1970), 471–481. https://doi.org/10.1016/0022-5193(70)90109-8 doi: 10.1016/0022-5193(70)90109-8
    [17] R. M. Gower, D. Molitor, J. Moorman, D. Needell, On adaptive sketch-and-project for solving linear systems, SIAM J. Matrix Anal. A., 42 (2021), 954–989. https://doi.org/10.1137/19M1285846 doi: 10.1137/19M1285846
    [18] R. M. Gower, P. Richtárik, Randomized iterative methods for linear systems, SIAM J. Matrix Anal. A., 36 (2015), 1660–1690. https://doi.org/10.1137/15M1025487 doi: 10.1137/15M1025487
    [19] M. Griebel, P. Oswald, Greedy and randomized versions of the multiplicative Schwarz method, Linear Algebra Appl., 437 (2012), 1596–1610. https://doi.org/10.1016/j.laa.2012.04.052 doi: 10.1016/j.laa.2012.04.052
    [20] G. T. Herman, Fundamentals of computerized tomography: Image reconstruction from projections, 2 Eds., Springer, Dordrecht, 2009. https://doi.org/10.1007/978-1-84628-723-7
    [21] X. L. Jiang, K. Zhang, J. F. Yin, Randomized block Kaczmarz methods with k-means clustering for solving large linear systems, J. Comput. Appl. Math., 403 (2022), 113828. https://doi.org/10.1016/j.cam.2021.113828 doi: 10.1016/j.cam.2021.113828
    [22] S. Kaczmarz, Angenäherte Auflösung von systemen linearer gleichungen, Bull. Int. Acad. Pol. Sci. Lett. A, 35 (1937), 355–357.
    [23] S. P. Kolodziej, M. Aznaveh, M. Bullock, J. David, T. A. Davis, M. Henderson et al., The SuiteSparse matrix collection website interface, J. Open Source Softw., 4 (2019), 1244–1248. https://doi.org/10.21105/joss.01244 doi: 10.21105/joss.01244
    [24] Y. Liu, C. Q. Gu, On greedy randomized block Kaczmarz method for consistent linear systems, Linear Algebra Appl., 616 (2021), 178–200. https://doi.org/10.1016/j.laa.2021.01.024 doi: 10.1016/j.laa.2021.01.024
    [25] C. Q. Miao, W. T. Wu, On greedy randomized average block Kaczmarz method for solving large linear systems, J. Comput. Appl. Math., 413 (2022), 114372. https://doi.org/10.1016/j.cam.2022.114372 doi: 10.1016/j.cam.2022.114372
    [26] I. Necoara, Faster randomized block Kaczmarz algorithms, SIAM J. Matrix Anal. A., 40 (2019), 1425–1452. https://doi.org/10.1137/19M1251643 doi: 10.1137/19M1251643
    [27] D. Needell, J. A. Tropp, Paved with good intentions: Analysis of a randomized block Kaczmarz method, Linear Algebra Appl., 441 (2014), 199–221. https://doi.org/10.1016/j.laa.2012.12.022 doi: 10.1016/j.laa.2012.12.022
    [28] Y. Q. Niu, B. Zheng, A greedy block Kaczmarz algorithm for solving large-scale linear systems, Appl. Math. Lett., 104 (2020), 106294. https://doi.org/10.1016/j.aml.2020.106294 doi: 10.1016/j.aml.2020.106294
    [29] T. Strohmer, R. Vershynin, A randomized Kaczmarz algorithm with exponential convergence, J. Fourier Anal. Appl., 15 (2009), 262–278. https://doi.org/10.1007/s00041-008-9030-4 doi: 10.1007/s00041-008-9030-4
    [30] C. Tompkins, Projection methods in calculation of some linear problems, Bull. Amer. Math. Soc., 55 (1949), 520.
    [31] L. Wen, F. Yin, Y. M. Liao, G. X. Huang, A greedy average block Kaczmarz method for the large scaled consistent system of linear equations, AIMS Math., 7 (2022), 6792–6806. https://doi.org/10.3934/math.2022378 doi: 10.3934/math.2022378
    [32] G. Wu, Q. Chang, Semi-randomized block Kaczmarz methods with simple random sampling for large-scale linear systems, arXiv: 2212.08797, 2023. https://doi.org/10.48550/arXiv.2212.08797
    [33] A. Q. Xiao, J. F. Yin, N. Zheng, On fast greedy block Kaczmarz methods for solving large consistent linear systems, Comput. Appl. Math., 42 (2023), 119. https://doi.org/10.1007/s40314-023-02232-x doi: 10.1007/s40314-023-02232-x
    [34] F. Yin, B. Y. Zhang, G. X. Huang, A partially block randomized extended Kaczmarz method for solving large overdetermined inconsistent linear systems, AIMS Math., 8 (2023), 18512–18527. https://doi.org/10.3934/math.2023941 doi: 10.3934/math.2023941
    [35] K. Zhang, F. T. Li, X. L. Jiang, Multi-step greedy Kaczmarz algorithms with simple random sampling for solving large linear systems, Comput. Appl. Math., 41 (2022), 332. https://doi.org/10.1007/s40314-022-02044-5 doi: 10.1007/s40314-022-02044-5
    [36] Y. Zhang, H. Li, Block sampling Kaczmarz-Motzkin methods for consistent linear systems, Calcolo, 58 (2021), 39. https://doi.org/10.1007/s10092-021-00429-2 doi: 10.1007/s10092-021-00429-2
    [37] Y. Zhang, H. Li, Randomized block subsampling Kaczmarz-Motzkin method, Linear Algebra Appl., 667 (2023), 133–150. https://doi.org/10.1016/j.laa.2023.03.003 doi: 10.1016/j.laa.2023.03.003
  • This article has been cited by:

    1. Ke Zhang, Jin-Yu Deng, Xiang-Long Jiang, A block coordinate descent linear least squares solver by quantile statistics, 2025, 0916-7005, 10.1007/s13160-025-00694-6
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1309) PDF downloads(84) Cited by(1)

Figures and Tables

Figures(6)  /  Tables(6)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog