Distributed Bayesian posterior voting strategy for massive data

Xuerui Li; Lican Kang; Yanyan Liu; Yuanshan Wu; Xuerui Li; Lican Kang; Yanyan Liu; Yuanshan Wu

doi:10.3934/era.2022098

Electronic Research Archive

2022, Volume 30, Issue 5: 1936-1953. doi: 10.3934/era.2022098

Previous Article Next Article

Research article

Distributed Bayesian posterior voting strategy for massive data

1.
School of Mathematics and Statistics, Wuhan University, China
2.
Center for Quantitative Medicine Duke-NUS Medical School, Singapore
3.
School of Statistics and Mathematics, Zhongnan University of Economics and Law, China

Received: 21 March 2022 Revised: 01 April 2022 Accepted: 06 April 2022 Published: 11 April 2022

The emergence of massive data has driven recent interest in developing statistical learning and large-scale algorithms for analysis on distributed platforms. One of the widely used statistical approaches is split-and-conquer (SaC), which was originally performed by aggregating all local solutions through a simple average to reduce the computational burden caused by communication costs. Aiming at lower computation cost and satisfactorily acceptable accuracy, this paper extends SaC to Bayesian variable selection for ultra-high dimensional linear regression and builds BVSaC for aggregation. Suppose ultrahigh-dimensional data are stored in a distributed manner across multiple computing nodes, with each computing resource containing a disjoint subset of data. On each node machine, we perform variable selection and coefficient estimation through a hierarchical Bayes formulation. Then, a weighted majority voting method BVSaC is used to combine the local results to retain good performance. The proposed approach only requires a small portion of computation cost on each local dataset and therefore eases the computational burden, especially in Bayesian computation, meanwhile, pays a little cost to receive accuracy, which in turn increases the feasibility of analyzing extraordinarily large datasets. Simulations and a real-world example show that the proposed approach performed as well as the whole sample hierarchical Bayes method in terms of the accuracy of variable selection and estimation.

Keywords:

Citation: Xuerui Li, Lican Kang, Yanyan Liu, Yuanshan Wu. Distributed Bayesian posterior voting strategy for massive data[J]. Electronic Research Archive, 2022, 30(5): 1936-1953. doi: 10.3934/era.2022098

Related Papers:

[1]	Wenya Shi, Xinpeng Yan, Zhan Huan . Faster free pseudoinverse greedy block Kaczmarz method for image recovery. Electronic Research Archive, 2024, 32(6): 3973-3988. doi: 10.3934/era.2024178
[2]	Ranran Li, Hao Liu . On global randomized block Kaczmarz method for image reconstruction. Electronic Research Archive, 2022, 30(4): 1442-1453. doi: 10.3934/era.2022075
[3]	Yimou Liao, Tianxiu Lu, Feng Yin . A two-step randomized Gauss-Seidel method for solving large-scale linear least squares problems. Electronic Research Archive, 2022, 30(2): 755-779. doi: 10.3934/era.2022040
[4]	Jun Guo, Yanchao Shi, Weihua Luo, Yanzhao Cheng, Shengye Wang . Exponential projective synchronization analysis for quaternion-valued memristor-based neural networks with time delays. Electronic Research Archive, 2023, 31(9): 5609-5631. doi: 10.3934/era.2023285
[5]	Yun Ni, Jinqing Zhan, Min Liu . Topological design of continuum structures with global stress constraints considering self-weight loads. Electronic Research Archive, 2023, 31(8): 4708-4728. doi: 10.3934/era.2023241
[6]	Yanlong Zhang, Rui Zhang, Li Wang . Oblique impact dynamic analysis of wedge friction damper with Dankowicz dynamic friction. Electronic Research Archive, 2024, 32(2): 962-978. doi: 10.3934/era.2024047
[7]	Dongmei Yu, Yifei Yuan, Yiming Zhang . A preconditioned new modulus-based matrix splitting method for solving linear complementarity problem of $H_+$ -matrices. Electronic Research Archive, 2023, 31(1): 123-146. doi: 10.3934/era.2023007
[8]	Haoqing Wang, Wen Yi, Yannick Liu . Optimal assignment of infrastructure construction workers. Electronic Research Archive, 2022, 30(11): 4178-4190. doi: 10.3934/era.2022211
[9]	Yu Wang . Bi-shifting semantic auto-encoder for zero-shot learning. Electronic Research Archive, 2022, 30(1): 140-167. doi: 10.3934/era.2022008
[10]	Yaguo Guo, Shilin Yang . Projective class rings of the category of Yetter-Drinfeld modules over the $2$ -rank Taft algebra. Electronic Research Archive, 2023, 31(8): 5006-5024. doi: 10.3934/era.2023256

Abstract

1. Introduction

Consider to solve a large-scale consistent linear system

$\begin{align} Ax = b, \end{align}$

(1.1)

where the matrix $A\in \mathbb{R}^{m\times n}$ , $b\in \mathbb{R}^m$ . One of the solutions of the system (1.1) is $x^* = A^† b$ , which is the least Euclidean norm solution. Especially, when the coefficient matrix $A$ is full column rank, $x^*$ is the unique solution of the system (1.1).

There are many researches on solving the system (1.1) through iterative methods, among which the Kaczmarz method is a representative and efficient row-action method. The Kaczmarz method ^[1] selects the rows of the matrix $A$ by using the cyclic rule and in each iteration, the current iteration point is orthogonally projected onto the corresponding hyperplane. In 1970, Gordon et al. ^[2] first applied the Kaczmarz method, also known as algebraic reconstruction technique (ART), to the field of computed tomography (CT). In the development of CT field, representative methods include filtered-back projection (FBP) method ^[3], ART and Maximum entropy method ^[4,5], etc. However, if the collected data are incomplete, the FBP method performs very poorly and the ART method is widely used in this field ^[6,7] due to its superior anti-interference performance, implicity and low storage characteristics. Kaczmarz method is also widely applied in image reconstruction ^[8,9,10,11], distributed computing ^[12] and signal processing ^[13]; and so on ^{[14,15,16,17]}. In 1971, Tanabe ^[18] analyzed the theoretical convergence of the Kaczmarz method and obtained the conclusion that when initial vector $x^{(0)}\in N (A)$ , the Kaczmarz method converges to the minimum norm solution $x^{*}$ of the problem (1.1). In recent years, Kang et al. ^[19,20] obtained the theoretical convergence rate proof of the Kaczmarz method.

Since the Kaczmarz method cycles through the rows of $A$ , the performance may depend heavily on the ordering of these rows. A poor ordering may result in a very slow convergence rate. McCormick ^[21] proposed a maximal weighted residual Kaczmarz (MWRK) method and proved its convergence. In recent work, a new theoretical convergence estimate was proposed for the MWRK method in ^[22]. Strohmer and Vershynin ^[23] proposed a randomized Kaczmarz (RK) method which selects a given row with proportional to the Euclidean norm of the rows of the coefficient matrix $A$ and proved its convergence. After the above work, research on the Kaczmarz-type methods was reignited recently, see for example, the randomized block Kaczmarz-type methods ^{[24,25,26,27]}, the greedy version of Kaczmarz-type methods ^{[28,29,30,31,32]}, the extended version of Kaczmarz-type methods ^[33,34,35] and many others ^{[36,37,38,39,40,41]}. The related works of Kaczmarz also accelerated the development of column action iterative methods represented by the coordinate descent (CD) method ^[42] (see ^{[43,44,45,46,47,48,49,50]}, etc.).

Recently, Bai and Wu ^[28] proposed a new randomized row index selection strategy, which is aimed at grasping larger entries of the residual vector at each iteration and constructed a greedy randomized Kaczmarz (GRK) method. They proved that the convergence of the GRK method is faster than that of the RK method. In ^[38,51], Popa gave the definition of oblique projection, which broke the limitation of orthogonal projection. In ^[52], Lorenez and Rose et al. used oblique projection to construct random Kaczmarz methods with mismatched adjoint (RKMA). Recently, Li and Wang et al. ^[53] proposed a Kaczmarz method with oblique projection (KO). This method continuously projects the current iteration point to the intersection of two hyperplanes and can solve the problem that the rows of the coefficient matrix $A$ are highly linearly correlated as well as the algorithms proposed in ^[36,54]. They also proposed a uniform random version of KO method—random Kaczmarz method with oblique projection (RKO) and proved theoretically and numerically that KO method and RKO method are faster than Kaczmarz method ^[1] and RK method ^[23] respectively. In this paper, we first briefly introduce the oblique projection and give the relationship between the KO method and the CD method. Based on the row index selection rules of two representative randomized and non-randomized Kaczmarz-type methods—the GRK method and the MWRK method, we propose two new Kaczmarz-type methods with oblique projection (KO-type)—the greedy randomized Kaczmarz method with oblique projection (GRKO) and the maximal weighted residual Kaczmarz method with oblique projection (MWRKO) respectively and prove their convergence theoretically and numerically. We emphasize the efficiency of our proposed methods when the rows of the matrix $A$ are highly linearly correlated and find that Kaczmarz-type methods based on orthogonal projection performed poorly when applied to this kind of matrices.

The organization of this paper is as follows. In Section 2, we introduce the KO-type method and give its two lemmas. In Section 3, we propose the GRKO method and MWRKO method naturally and prove the convergence of the two methods. In Section 4, some numerical examples are provided to illustrate the efficiency of our new methods. Finally, some brief concluding remarks are described in Section 5.

In this paper, $\langle \cdot, \cdot \rangle$ stands for the scalar product. $\|x\|$ is the Euclidean norm of $x\in \mathbb{R}^n$ . For a given matrix $G = (g_{ij})\in \mathbb{R}^{m\times n}$ , $g_i^T$ , $G^T$ , $G^†$ , $R(G)$ , $N(G)$ , $\|G\|_F$ and $\lambda_{min}(G)$ , are used to denote the ith row, the transpose, the Moore-Penrose pseudoinverse ^[55], the range space, the null space, the Frobenius norm and the smallest nonzero eigenvalue of $G$ respectively. $P_C(x)$ is the orthogonal projection of $x$ onto $C$ , $\tilde{x}$ is any solution of the system (1.1). Let $E_k$ denote the expected value conditonal on the first k iterations, that is,

$E_k[\cdot] = E[\cdot|j_0,j_1,\cdots,j_{k-1}],$

where $j_s \; (s = 0, 1, \cdots, k-1)$ is the column chosen at the $s$ th iteration.

2. Kaczmarz-type method with oblique projection and its lemmas

In this chapter, we first briefly introduce the definition of oblique projection and analyze the relationship between CD method ^[42] and KO-type method ^[53]. Finally, we give two lemmas of KO-type method, which provide a theoretical guarantee for the two oblique projection methods proposed in the next chapter.

2.1. Relationship between coordinate descent method and Kaczmarz-type method with oblique projection

The sets $H_i = \left\{x\in \mathbb{R}^n\ | \ \langle a_i, x\rangle = b_i \right\}\ (i = 1, 2, \cdots, m)$ are the hyperplanes which associated to the $i$ th equation of the system (1.1). To project the current iteration point $x^{(k)}$ to one of the hyperplanes, the oblique projection ^[38,51] can be expressed as follows:

$\begin{align} x^{(k+1)} = P_{H_{i}}^d(x^{(k)}) = x^{(k)}-\frac{\langle a_i,x^{(k)}\rangle-b_i}{\langle d,a_i\rangle}d, \end{align}$

(2.1)

where $d\in \mathbb{R}^n$ is a given direction. In , $x^{(k+1)}$ is obtained by oblique projection of the current iteration point $x^{(k)}$ to the hyperplane $H_{i_{k+1}}$ along the direction $d_{1}$ , i.e., $x^{(k+1)} = P_{H_{i_{k+1}}}^{d_{1}}(x^{(k)})$ . $y^{(k+1)}$ is the iteration point obtained when the direction $d_{2} = a_{i_{k+1}}$ , i.e., $y^{(k+1)} = P_{H_{i_{k+1}}}^{a_{i_{k+1}}}(x^{(k)})$ . When the direction $d = a_{i} \; (i = mod(k, m) + 1)$ , it is the classic Kaczmarz method. However, when the hyperplanes are close to linearly parallel, the Kaczmarz method based on orthogonal projection has a slow iteration speed. In this paper, we use a iteration direction $d_{3} = w_{i_{k}} = a_{i_{k+1}}-\frac{\langle a_{i_{k}}, a_{i_{k+1}}\rangle}{\|a_i\|^2}a_{i_{k}}$ mentioned in ^[53], to make the current iteration point approach to the intersection of two hyperplanes, i.e., $z^{(k+1)} = P_{H_{i_{k+1}}}^{w_{i_{k}}}(x^{(k)})$ .

Figure 1. Oblique projection in different directions.

DownLoad: Full-Size Img PowerPoint

The framework of KO-type method is given as follows:

Algorithm 1 Kaczmarz-type method with oblique projection (KO-type)

Require:

$A\in \mathbb{R}^{m\times n}$ ,

$b\in \mathbb{R}^{m}$ ,

$x^{(0)}\in \mathbb{R}^n$ ,

$K$
1: For

$i = 1:m$ ,

$M(i) = \|a_i\|^2$
2: Choose

$i_{1}$ based on a certain selection rule
3: Compute

$x^{(1)} = x^{(0)}+\frac{b_{i_{1}}-\langle a_{i_{1}}, x^{(0)}\rangle}{M(i_{1})}a_{i_{1}}$
4: for

$k = 1, 2, \cdots, K$ do
5: Choose

$i_{k+1}$ based on a certain selection rule
6: Compute

$D_{i_k} = \langle a_{i_{k}}, a_{i_{k+1}}\rangle$ and

$r_{i_{k+1}}^{(k)} = b_{i_{k+1}}-\langle a_{i_{k+1}}, x^{(k)}\rangle$
7: Compute

$w_{i_{k}} = a_{i_{k+1}}-\frac{D_{i_k}}{M(i_k)}a_{i_{k}}$ and

$h_{i_k}( = \|w_{i_{k}}\|^2) = M(i_{k+1})-\frac{D_{i_k}}{M(i_k)}D_{i_k}$
8:

$\alpha_{i_k}^{(k)} = \frac{r_{i_{k+1}}^{(k)}}{h_{i_k}}$ and

$x^{(k+1)} = x^{(k)}+\alpha_{i_k}^{(k)} w_{i_{k}}$
9: end for
10: Output

$x^{(K+1)}$

For KO-type method, the residual satisfies

$\begin{equation} \begin{aligned} r^{(k+1)}& = b-Ax^{(k+1)}\\ & = b-A\left(x^{(k)}+\alpha^{(k)}_{i_{k}}(a_{i_{k+1}}-\frac{\langle a_{i_{k}},a_{i_{k+1}}\rangle}{||a_{i_{k}}||^{2}}a_{i_{k}})\right)\\ & = r^{(k)}-\alpha^{(k)}_{i_{k}}\left(Aa_{i_{k+1}}-\frac{\langle a_{i_{k}},a_{i_{k+1}}\rangle}{||a_{i_{k}}||^{2}}Aa_{i_{k}}\right). \end{aligned} \end{equation}$

(2.2)

In the next section, the certain selection rules in step 2 and step 5 of algorithm 1 will be given.

In order to get the relationship between the CD method and the KO-type method, we need to explain the construction idea of the CD method ^[42]. Consider a linear system

$\begin{align} \tilde{A}x = b, \end{align}$

(2.3)

where the coefficient matrix $\tilde{A}\in \mathbb{R}^{n\times n}$ is a positive semidefinite matrix and $b\in \mathbb{R}^n$ is a real m dimensional vector. In this case, solving the system (2.3) is equivalent to the following strict convex quadratic minimization problem

$\begin{align*} f(x) = \frac{1}{2}x^T\tilde{A}x-b^Tx. \end{align*}$

From ^[42], the next iteration point $x^{(k+1)}$ is the solution to $\mathop{min}\limits_{\substack{t \in R}}f(x^{(k)}+td)$ , i.e.,

$\begin{align} x^{(k+1)} = x^{(k)}+\frac{(b-\tilde{A}x^{(k)})^{T}d}{d^T\tilde{A}d}d, \end{align}$

(2.4)

where $d$ is a nonzero direction and $x^{(k)}$ is a current iteration point.

Since the requirement that matrix $\tilde{A}$ is positive semidefinite is not general, problem (2.3) is usually transformed into the following two regularizing linear systems:

$\begin{align} A^{T}Ax = A^{T}b, \end{align}$

(2.5)

and

$\begin{equation} \left\{ \begin{array}{lr} AA^{T}y = b,&\\ x = A^{T}y,&\\ \end{array} \right. \end{equation}$

(2.6)

where $A\in \mathbb{R}^{m\times n}$ in (2.5) and (2.6) is an arbitrary matrix. Obviously, both $A^{T}A$ and $AA^{T}$ are positive semidefinite matrices, so we can apply systems (2.5) and (2.6) to iteration (2.4).

One natural choice of a set of easily computable search directions is to choose $d$ by successively cycling through the set of canonical unit vectors $\left\{e_1, \cdots, e_n\right\}$ , where $e_i \in \mathbb{R}^n \ (i = 1, \ \cdots, \ n)$ . Applying system (2.5) to iteration (2.4), we can get:

$x^{(k+1)} = x^{(k)}+\frac{\langle r^{(k)},A_i\rangle}{||A_i||^2}e_i,$

where $i = mod(k, n)+1$ , $A_{i}$ is the $i$ -th column of matrix $A$ . This is the iterative formula of CD method ^[42], also known as Gauss-Seidel method. When $d = e_{i}$ , where $e_i \in \mathbb{R}^m \ (i = 1, \ \cdots, \ m)$ , applying system (2.6) to iteration (2.4), we can get:

$x^{(k+1)} = x^{(k)}+\frac{b_{i}-{a}_{i}^{T}x^{(k)}}{\|a_{i}\|^2}a_{i},$

where $i = mod(k, m)+1$ . This is the iterative formula of Kaczmarz method ^[1].

Next, we will prove that the KO-type method is an iterative form in the new direction $d = e_{i_{k+1}}-\frac{\langle a_{i_{k+1}}, a_{i_{k}}\rangle}{||a_{i_k}||^2}e_{i_k} = e_{i_{k+1}}-\frac{D_{i_{k}}}{M(i_k)}e_{i_{k}}$ , where $e_{i} \in \mathbb{R}^{m} \ (i = 1, \ \cdots, \ m)$ . Applying system (2.6) to iteration (2.4), we get:

$\begin{align*} y^{(k+1)}& = y^{(k)}+\frac{(b-AA^Ty^{(k)})^T(e_{i_{k+1}}-\frac{D_{i_{k}}}{M(i_k)}e_{i_{k}})}{\left\|A^T(e_{i_{k+1}}-\frac{D_{i_{k}}}{M(i_k)}e_{i_{k}})\right\|^2}(e_{i_{k+1}}-\frac{D_{i_{k}}}{M(i_k)}e_{i_{k}})\\ & = y^{(k)}+\frac{(b-Ax^{(k)})^T(e_{i_{k+1}}-\frac{D_{i_{k}}}{M(i_k)}e_{i_{k}})}{h_{i_{k}}}(e_{i_{k+1}}-\frac{D_{i_{k}}}{M(i_k)}e_{i_{k}})\\ & = y^{(k)}+\frac{r_{i_{k+1}}^{(k)}-\frac{D_{i_{k}}}{M(i_{k})}r_{i_{k}}^{(k)}}{h_{i_k}}(e_{i_{k+1}}-\frac{D_{i_{k}}}{M(i_k)}e_{i_{k}}).\ \end{align*}$

Multiply left by $A^T$ on both sides of the above equation, we get

$\begin{align} x^{(k+1)}& = x^{(k)}+\frac{r_{i_{k+1}}^{(k)}-\frac{D_{i_{k}}}{M(i_{k})}r_{i_{k}}^{(k)}}{h_{i_k}}w_{i_{k}} \\ & = x^{(k)}+\alpha^{(k)}_{i_{k}}w_{i_{k}}-\frac{D_{i_{k}}r^{(k)}_{i_{k}}}{M(i_{k})h_{i_{k}}}w_{i_{k}},\ \ k = 1,2,\cdots. \end{align}$

(2.7)

Now we prove that $r_{i_k}^{(k)} = 0$ . In fact,

$\begin{align*} r_{i_k}^{(k)}& = b_{i_k}-\langle a_{i_k},x^{(k)} \rangle \\ & = b_{i_{k}}-\langle a_{i_{k}},x^{(k-1)}+\frac{r_{i_{k}}^{(k-1)}-\frac{D_{i_{k-1}}}{M(i_{k-1})}r_{i_{k-1}}^{(k-1)}}{h_{i_{k-1}}}w_{i_{k-1}}\rangle\\ & = r_{i_k}^{(k-1)}-r_{i_k}^{(k-1)}+\frac{D_{i_{k-1}}}{M(i_{k-1})}r_{i_{k-1}}^{(k-1)}\\ & = \frac{D_{i_{k-1}}}{M(i_{k-1})}r_{i_{k-1}}^{(k-1)}, \ \ k = 2,3,\cdots. \end{align*}$

Therefore, the following formula holds:

$\begin{align*} r_{i_{k}}^{(k)} = \prod\limits_{s = 1}^{k-1}\frac{D_{i_s}}{M(i_{s})}r_{i_{s}}^{(s)}. \end{align*}$

By the step 3 of Algorithm 1, $r_{i_{1}}^{(1)} = 0$ , so

$\begin{equation} r_{i_{k}}^{(k)} = 0\quad(\forall k > 0). \end{equation}$

(2.8)

Thus the iteration (2.7) becomes

$\begin{align*} x^{(k+1)} = x^{(k)}+\alpha^{(k)}_{i_{k}}w_{i_{k}},\ \ k = 1,2,\cdots. \end{align*}$

From the above deduction, we have confirmed our idea.

Remark 1. When $d = e_{i_{k+1}}-\frac{\langle A_{i_{k+1}}, \ A_{i_{k}}\rangle}{\|A_{i_{k}}\|^{2}}e_{i_{k}}$ , where $e_i \in \mathbb{R}^n \ (i = 1, \ \cdots, \ n)$ , applying system (2.5) to iteration (2.4), we can get the Gauss-Seidel method with oblique direction, see ^[56] for details.

Remark 2. Formally, iteration (2.1) is the same as RKMA method ^[52]. Algorithm 1 can be regarded as a special case of $v_{i_{k}} = a_{i_{k+1}}-\frac{\langle a_{i_{k}}, a_{i_{k+1}}\rangle}{\|a_i\|^2}a_{i_{k}}$ in RKMA method. However, the oblique projection $v_{i_{k}}$ in RKMA method is more defined in a way of mismatched adjoint, which is different from the oblique projection concept proposed here, see ^[52] for details.

2.2. Theoretical guarantees of Kaczmarz-type method with oblique projection

In this section, we will give two very important lemmas of Algorithm 1 to serve the algorithms mentioned in Chapter 3. The selection rules of row indices $i_{1}$ in step 2 and $i_{k+1}\ (k > 0)$ in step 5 of Algorithm 1 do not affect the lemmas.

Lemma 2.1. For the Kaczmarz-type method with oblique projection, the residual satisfies the following equations:

$\begin{equation} r_{i_{k-1}}^{(k)} = 0\quad(\forall k > 1). \end{equation}$

(2.9)

Proof. From the definition of the KO-type method, since $k > 1$ ,

$x^{(k)} = x^{(k-1)}+\alpha^{(k-1)}_{i_{k-1}}w_{i_{k-1}}.$

We get

$(b-Ax^{(k)})_{i_{k-1}} = (b-Ax^{(k-1)})_{i_{k-1}}-(A\alpha^{(k-1)}_{i_{k-1}}w_{i_{k-1}})_{i_{k-1}},$

that is,

$\begin{align*} r^{(k)}_{i_{k-1}}& = r_{i_{k-1}}^{(k-1)}-\alpha^{(k-1)}_{i_{k-1}}\langle a_{i_{k-1}},w_{i_{k-1}}\rangle\\ &\overset{(i)}{ = }\alpha^{(k-1)}_{i_{k-1}}\langle a_{i_{k-1}},a_{i_{k}}-\frac{\langle a_{i_{k-1}},a_{i_{k}}\rangle}{||a_{i_{k-1}}||^2}a_{i_{k-1}}\rangle\\ & = 0. \end{align*}$

The equality $(i)$ holds due to the Eq (2.8). Thus, the Eq (2.9) holds.

Lemma 2.2. The iteration sequence $\left\{x^{(k)}\right\}^\infty_{k = 0}$ generated by the Kaczmarz-type method with oblique projection, satisifies the following equations:

$\begin{equation} ||x^{(k+1)}-\tilde{x}||^2 = ||x^{(k)}-\tilde{x}||^2-||x^{(k+1)}-x^{(k)}||^2\quad(\forall k \geq 0), \end{equation}$

(2.10)

where $\tilde{x}$ is an arbitrary solution of the system (1.1). Especially, when $P_{N(A)}(x^{(0)}) = P_{N(A)}(\tilde{x})$ , $x^{(k)}-\tilde{x}\in R(A^T)$ .

Proof. For $k = 0$ , the iteration in step 3 of Algorithm 1 is the classical Kaczmarz iteration, so we have

$\begin{align*} \langle a_{i_{1}},x^{(1)}-\tilde{x}\rangle& = \langle a_{i_{1}},x^{(0)}-\tilde{x}+\frac{b_{i_{1}}-\langle a_{i_{1}},x^{(0)}\rangle}{M(i_1)}a_{i_{1}}\rangle\\ & = \langle a_{i_{1}},x^{(0)}\rangle-b_{i_{1}}+\langle a_{i_{1}},\frac{b_{i_{1}}-\langle a_{i_{1}},x^{(0)}\rangle}{M(i_1)}a_{i_{1}}\rangle\\ & = 0, \end{align*}$

which shows that $x^{(1)}-\tilde{x}$ is orthogonal to $a_{i_{1}}$ . Therefore, we know

$\begin{align*} (x^{(1)}-x^{(0)})^T(x^{(1)}-\tilde{x}) = 0. \end{align*}$

It follows that

$\begin{align} ||x^{(1)}-\tilde{x}||^2 = ||x^{(0)}-\tilde{x}||^2-||x^{(1)}-x^{(0)}||^2. \end{align}$

For $k > 0$ , we have

$\begin{align} \langle w_{i_{k}},x^{(k+1)}-\tilde{x}\rangle& = \langle w_{i_{k}},x^{(k)}-\tilde{x}+\alpha^{(k)}_{i_{k}}w_{i_{k}}\rangle \\ & = \langle a_{i_{k+1}}-\frac{D_{i_{k}}}{M(i_{k})}a_{i_{k}},x^{(k)}-\tilde{x}\rangle+\langle w_{i_{k}},\frac{r_{i_{k+1}}^{(k)}}{h_{i_{k}}}w_{i_{k}}\rangle \\ &\overset{(ii)}{ = }-r^{(k)}_{i_{k+1}}+\frac{D_{i_{k}}}{M(i_{k})}r_{i_{k}}^{(k)}+r_{i_{k+1}}^{(k)} \\ &\overset{(iii)}{ = }0. \end{align}$

The equality $(ii)$ and equality $(iii)$ hold due to $h_{i_{k}} = \|w_{i_{k}}\|^2$ and the Eq (2.8) respectively. Thus we get that $x^{(k+1)}-\tilde{x}$ is orthogonal to $w_{i_{k}}$ . Therefore, we get that

$\begin{align*} (x^{(k+1)}-x^{(k)})^T(x^{(k+1)}-\tilde{x}) = 0. \end{align*}$

It follows that

$\begin{align} ||x^{(k+1)}-\tilde{x}||^2 = ||x^{(k)}-\tilde{x}||^2-||x^{(k+1)}-x^{(k)}||^2\quad(\forall k > 0). \end{align}$

(2.11)

Thus, from the above proof, the Eq (2.10) holds.

According to the iterative formula

$\left\{ \begin{aligned} x^{(1)}& = x^{(0)}+\frac{b_{i_{1}}-\langle a_{i_{1}},x^{(0)}\rangle}{M(i_1)}a_{i_{1}},\\ x^{(k+1)}& = x^{(k)}+\alpha^{(k)}_{i_{k}}w_{i_{k}}\quad(\forall k > 0),\\ \end{aligned} \right.$

we can get $P_{N(A)}(x^{(k)}) = P_{N(A)}(x^{(k-1)}) = \cdots = P_{N(A)}(x^{(0)}),$ and by the fact that $P_{N(A)}(x^{(0)}) = P_{N(A)}(\tilde{x})$ , we can deduce that $x^{(k)}-\tilde{x} \in R(A^T)$ .

3. Greedy randomized and maximal weighted residual Kaczmarz methods with oblique projection

In this section, we combine the oblique projection with the GRK method ^[28] and the MWRK method ^[21] to obtain the GRKO method and the MWRKO method and prove their convergence. Theoretical results show that the KO-type method can accelerate the convergence when there are suitable row index selection strategies.

3.1. Greedy randomized Kaczmarz method with oblique projection

The core of the GRK method ^[28] is a new probability criterion, which can grasp the large items of the residual vector in each iteration and randomly select the item with probability in proportion to the retained residual norm. Theories and experiments prove that it can speed up convergence speed. This paper uses the row index selection rule in combination with the Algorithm 1 to obtain the GRKO method and the algorithm is as follows:

Algorithm 2 Greedy randomized Kaczmarz method with oblique projection (GRKO)

Require:

$A\in \mathbb{R}^{m\times n}$ ,

$b\in \mathbb{R}^{m}$ ,

$x^{(0)}\in \mathbb{R}^n$ ,

$K$
1: For

$i = 1:m$ ,

$M(i) = \|a_i\|^2$
2: Uniformly randomly select select

$i_1$ and compute

$x^{(1)} = x^{(0)}+\frac{b_{i_1}-\langle a_{i_1}, x^{(0)} \rangle}{M(i_1)}$

$a_{i_1}$
3: for

$k = 1, 2, \cdots, K-1$ do
4: Compute

$\varepsilon _k = \frac{1}{2}\left(\frac{1}{||b-Ax^{(k)}||^2}\mathop{max}\limits_{1\leq i_{k+1} \leq m}\left\{ \frac{|b_{i_{k+1}}-\langle a_{i_{k+1}}, x^{(k)} \rangle|^2 }{||a_{i_{k+1}}||^2} \right\}+\frac{1}{||A||^2_F}\right)$
5: Determine the index set of positive integers

$\mathscr{U}_k = \left\{ i_{k+1}\bigg| |b_{i_{k+1}}-\langle a_{i_{k+1}}, x^{(k)}\rangle|^2 \ge \varepsilon _k||b-Ax^{(k)}||^2||a_{i_{k+1}}||^2 \right\}$
6: Compute the ith entry

$\tilde{r}_i^{(k)}$ of the vector

$\tilde{r}^{(k)}$ according to

$\tilde{r}_i^{(k)} = \begin{cases} b_i-\langle a_i, x^{(k)}\rangle, & \mbox{if }i \in \mathscr{U}_k\\ 0 & \mbox{otherwise } \end{cases}$
7: Select

$i_{k+1} \in \mathscr{U}_k$ with probability

$Pr(row = i_{k+1}) = \frac{|\tilde{r}_{i_{k+1}}^{(k)}|^2}{||\tilde{r}^{(k)}||^2}$
8: Compute

$D_{i_{k}} = \langle a_{i_k}, a_{i_{k+1}}\rangle$
9: Compute

$w_{i_{k}} = a_{i_{k+1}}-\frac{D_{i_k}}{M(i_k)}a_{i_{k}}$ and

$h_{i_k}( = \|w_{i_{k}}\|^2) = M(i_{k+1})-\frac{D_{i_k}}{M(i_k)}D_{i_k}$
10:

$\alpha_{i_k}^{(k)} = \frac{\tilde{r}_{i_{k+1}}^{(k)}}{h_{i_k}}\left( = \frac{r^{(k)}_{i_{k+1}}}{h_{i_{k}}}\right)$ and

$x^{(k+1)} = x^{(k)}+\alpha_{i_k}^{(k)} w_{i_{k}}$
11: end for
12: Output

$x^{(K)}$

The convergence of the GRKO method is provided as follows.

Theorem 3.1. Consider the consistent linear system (1.1), where the coefficient matrix $A\in \mathbb{R}^{m\times n}$ , $b \in \mathbb{R}^m$ . Let $x^{(0)}\in \mathbb{R}^{n}$ be an arbitrary initial approximation, $\tilde{x}$ is a solution of system (1.1) such that $P_{N(A)}(\tilde{x}) = P_{N(A)}(x^{(0)})$ . Then the iteration sequence $\left\{x^{(k)}\right\}^{\infty}_{k = 1}$ generated by the GRKO method obeys

$\begin{align} E||x^{(k)}-\tilde{x}||^2\leq \mathop{\Pi}\limits_{s = 0}^{k-1}\zeta_s||x^{(0)}-\tilde{x}||^2. \end{align}$

(3.1)

where $\zeta_0 = 1-\frac{(\lambda_{min}(A^TA))}{m||A||^2_F},$ $\zeta_1 = 1-\frac{1}{2}(\frac{1}{\gamma_1}||A||^2_F+1)\frac{\lambda_{min}(A^TA)}{\Delta\cdot||A||^2_F},$ $\zeta_k = 1-\frac{1}{2}(\frac{1}{\gamma_2}||A||^2_F+1)\frac{\lambda_{min}(A^TA)}{\Delta\cdot||A||^2_F}\ (\forall k > 1)$ , which

$\begin{align} \gamma_1 = \mathop{max}\limits_{\substack{1\leq i\leq m}}\sum\limits_{\substack{s = 1\\s\neq i}}^{m}||a_s||^2, \end{align}$

(3.2)

$\begin{align} \gamma_2 = \mathop{max}\limits_{\substack{1\leq i,j\leq m\\i\neq j}}\sum\limits_{\substack{s = 1\\s\neq i,j}}^{m}||a_s||^2, \end{align}$

(3.3)

$\begin{align} \Delta = \mathop{max}\limits_{\substack{j\neq k}}sin^{2}\langle a_j,a_k\rangle(\in(0,1]). \end{align}$

(3.4)

In addition, if $x^{(0)}\in R(A^T)$ , the sequence $\left\{x^{(k)}\right\}^{\infty}_{k = 1}$ converges to the least-norm solution of the system (1.1), i.e., $\lim\limits_{k\rightarrow \infty}x^{(k)} = x^* = A^† b$ .

Proof. When $k = 1$ , we can get

$\begin{equation} \begin{aligned} \varepsilon_1||A||^2_F& = \frac{\mathop{max}\limits_{1\leq i_{2} \leq m} \left\{\frac{|b_{i_{2}}-\langle a_{i_{2}},x^{(1)}\rangle|^2}{||a_{i_{2}}||^2}\right\}}{2\sum\limits_{i_{2} = 1}^m\frac{||a_{i_{2}}||^2}{||A||^2_F}.\frac{|b_{i_{2}}-\langle a_{i_{2}},x^{(1)}\rangle|^2}{||a_{i_{2}}||^2}}+\frac{1}{2}\\ &\overset{(iv)}{ = }\frac{\mathop{max}\limits_{1\leq i_{2} \leq m} \left\{\frac{|b_{i_{2}}-\langle a_{i_{2}},x^{(1)}\rangle|^2}{||a_{i_{2}}||^2}\right\}}{2\sum\limits_{\substack{i_{2} = 1\\i_{2}\neq i_{1}}}^m\frac{||a_{i_{2}}||^2}{||A||^2_F}.\frac{|b_{i_{2}}-\langle a_{i_{2}},x^{(1)}\rangle|^2}{||a_{i_{2}}||^2}}+\frac{1}{2}\\ &\geq \frac{1}{2} \left( \frac{||A||^2_F}{\sum\limits_{\substack{i_{2} = 1\\i_{2}\neq i_{1}}}^m||a_{i_{2}}||^2}+1\right)\\ &\geq \frac{1}{2}\left( \frac{1}{\gamma_1}||A||^2_F+1\right). \end{aligned} \end{equation}$

(3.5)

The equality $(iv)$ holds due to the Eq (2.8).

When $k > 1$ , we get

$\begin{align} \varepsilon_k||A||^2_F& = \frac{\mathop{max}\limits_{1\leq i_{k+1} \leq m} \left(\frac{|b_{i_{k+1}}-\langle a_{i_{k+1}},x^{(k)}\rangle|^2}{||a_{i_{k+1}}||^2}\right)}{2\sum\limits_{i_{k+1} = 1}^m\frac{||a_{i_{k+1}}||^2}{||A||^2_F}.\frac{|b_{i_{k+1}}-\langle a_{i_{k+1}},x^{(k)}\rangle|^2}{||a_{i_{k+1}}||^2}}+\frac{1}{2} \\ &\overset{(v)}{ = }\frac{\mathop{max}\limits_{1\leq i_{k+1} \leq m}\left(\frac{|b_{i_{k+1}}-\langle a_{i_{k+1}},x^{(k)}\rangle|^2}{||a_{i_{k+1}}||^2}\right)}{2 \sum\limits_{\substack{i_{k+1} = 1\\ i_{k+1} \neq i_{k}, i_{k-1}}}^{m} \frac{||a_{i_{k+1}}||^2}{||A||^2_F}.\frac{|b_{i_{k+1}}-\langle a_{i_{k+1}},x^{(k)}\rangle|^2}{||a_{i_{k+1}}||^2}}+\frac{1}{2} \\ &\geq \frac{1}{2} \left(\ \frac{||A||^2_F}{\sum\limits_{\substack{i_{k+1} = 1\\i_{k+1}\neq i_{k}, i_{k-1}}}^{m}||a_{i_{k+1}}||^2}+1\right) \\ &\geq \frac{1}{2}\left( \frac{1}{\gamma_2}||A||^2_F+1\right) . \end{align}$

(3.6)

The equality $(v)$ holds due to the Eqs (2.8) and (2.9).

Under the GRKO method, Lemma 2.2 still holds, so we can take the full expectation on both sides of the Eq (2.10) and get that for $k = 0$ ,

$\begin{equation} \begin{aligned} E||x^{(1)}-\tilde{x}||^2& = ||x^{(0)}-\tilde{x}||^2-E||x^{(1)}-x^{(0)}||^2\\ & = ||x^{(0)}-\tilde{x}||^2-\frac{1}{m}\sum\limits_{i_{1} = 1}^m||\frac{b_{i_{1}}-\langle a_{i_{1}},x^{(0)}\rangle}{M(i_1)}a_{i_{1}}||^2\\ &\overset{(vi)}{\leq}||x^{(0)}-\tilde{x}||^2-\frac{1}{m}\frac{||b-Ax^{(0)}||^2}{||A||_F^2}\\ &\overset{(vii)}{\leq} \left(1-\frac{\lambda_{min}(A^TA)}{m||A||^2_F}\right)||x^{(0)}-\tilde{x}||^2\\ & = \zeta_0||x^{(0)}-\tilde{x}||^2, \end{aligned} \end{equation}$

(3.7)

and for $k > 0$ ,

$\begin{equation} \begin{aligned} \mathrm{E}_k||x^{(k+1)}-\tilde{x}||^2& = ||x^{(k)}-\tilde{x}||^2-\mathrm{E}_k||x^{(k+1)}-x^{(k)}||^2\\ & = ||x^{(k)}-\tilde{x}||^2-\sum\limits_{i_{k+1} \in \mathscr{U}_k}\frac{|b_{i_{k+1}}-\langle a_{i_{k+1}},x^{(k)}\rangle |^2}{\sum\limits_{i_{k+1} \in \mathscr{U}_k}|b_{i_{k+1}}-\langle a_{i_{k+1}},x^{(k)}\rangle|^2}.\frac{|r^{(k)}_{i_{k+1}}|^2}{||w_{i_{k}}||^2}\\ &\overset{(viii)}{\leq}||x^{(k)}-\tilde{x}||^2-\sum\limits_{i_{k+1} \in \mathscr{U}_k}\frac{|b_{i_{k+1}}-\langle a_{i_{k+1}},x^{(k)}\rangle |^2}{\sum\limits_{i_{k+1} \in \mathscr{U}_k}|b_{i_{k+1}}-\langle a_{i_{k+1}},x^{(k)}\rangle|^2}.\frac{|r^{(k)}_{i_{k+1}}|^2}{\Delta\cdot||a_{i_{k+1}}||^2}\\ &\overset{(ix)}{\leq}||x^{(k)}-\tilde{x}||^2-\frac{\varepsilon_k}{\Delta}||b-Ax^{(k)}||^2\\ & = ||x^{(k)}-\tilde{x}||^2-\frac{\varepsilon_k}{\Delta}||A(\tilde{x}-x^{(k)})||^2\\ &\overset{(vii)}{\leq}(1-\frac{\varepsilon_k\lambda_{min}(A^TA)}{\Delta})||x^{(k)}-\tilde{x}||^2. \end{aligned} \end{equation}$

(3.8)

The inequality $(vi)$ of the Eq (3.7) is achieved with the use of the fact that $\frac{|b_1|}{|a_1|}+\frac{|b_2|}{|a_2|}\geq\frac{|b_1|+|b_2|}{|a_1|+|a_2|}$ (if $|a_1| > 0$ , $|a_2| > 0$ ) and the inequality $(viii)$ of the Eq (3.8) is achieved with the use of the fact that

$\begin{equation} \begin{aligned} ||w_{i_{k}}||^2& = ||a_{i_{k+1}}||^2-\frac{\langle a_{i_k},a_{i_{k+1}}\rangle^2}{||a_{i_{k}}||^2}\\ & = sin^{2}\langle a_{i_{k}},a_{i_{k+1}}\rangle||a_{i_{k+1}}||^2\\ &\leq\Delta\cdot||a_{i_{k+1}}||^2, \end{aligned} \end{equation}$

(3.9)

and the inequality $(ix)$ of the Eq (3.8) is achieved with the use of the definition of $\mathscr{U}_k$ which lead to

$|b_{i_{k+1}}-\langle a_{i_{k+1}},x^{(k)}\rangle|^2\geq\varepsilon_k||b-Ax^{(k)}||^2||a_{i_{k+1}}||^2,{\forall}i_{k+1} \in \mathscr{U}_k .$

Here in the last inequalities $(vii)$ of the Eqs (3.7) and (3.8), we have used the estimate $||Au||^2_2\geq\lambda_{min}(A^TA)||u||^2$ , which holds true for any $u \in C^n$ belonging to the column space of $A^T$ . According to the lemma 2.2, it holds.

By making use of the Eqs (3.5), (3.6) and (3.8), we get

$\begin{align*} E_1||x^{(2)}-\tilde{x}||^2&\leq \left[1-\frac{1}{2}(\frac{1}{\gamma_1}||A||^2_F+1)\frac{\lambda_{min}(A^TA)}{\Delta\cdot||A||^2_F}\right]||x^{(1)}-\tilde{x}||^2\\ & = \zeta_1||x^{(1)}-\tilde{x}||^2, \end{align*}$

$\begin{align*} E_k||x^{(k+1)}-\tilde{x}||^2&\leq \left[1-\frac{1}{2}(\frac{1}{\gamma_2}||A||^2_F+1)\frac{\lambda_{min}(A^TA)}{\Delta\cdot||A||^2_F}\right]||x^{(k)}-\tilde{x}||^2 \\ & = \zeta_k||x^{(k)}-\tilde{x}||^2\quad(\forall k > 1). \end{align*}$

Finally, by recursion and taking the full expectation, the inequality (3.1) holds.

Remark 3. In the GRKO method, $h_{i_{k}}$ is not zero. Suppose $h_{i_{k}} = 0$ , which means $\exists \lambda > 0$ , $\lambda a_{i_{k}} = a_{i_{k+1}}$ . Due to the system is consistent, it holds $\langle a_{i_{k+1}}, x^{*}\rangle = \lambda\langle a_{i_{k}}, x^{*}\rangle = \lambda b_{i_{k}} = b_{i_{k+1}}$ . According to the Eq (2.8), it holds $r^{(k)}_{i_{k+1}} = \lambda r^{(k)}_{i_{k}} = 0$ . From step 5 of Algorithm 1, we can know that such index $i_{k+1}$ will not be selected.

Remark 4. Set $\tilde{\zeta_k} = 1-\frac{1}{2}(\frac{1}{\gamma_1}||A||^2_F+1)\frac{\lambda_{min}(A^TA)}{||A||^2_F}\quad(\forall k > 0)$ and the convergence of GRK method in ^[28] meets:

$E_k\|x^{(k+1)}-x^*\|^2\leq\tilde{\zeta_k}\|x^{(k)}-x^*\|^2.$

Obviously, $\forall \Delta \in (0, 1]$ , $\zeta_1\leq\tilde{\zeta_1}, \zeta_k < \tilde{\zeta_k}\quad(\forall k > 1)$ is satisfied, so the convergence speed of GRKO method is faster than GRK method.

Remark 5. In fact, in each iteration, the most computationally expensive part is computing the residual $r^{(k)}$ . If $B = AA^{T}$ is calculated before iteration, the GRK method ^[28] costs $7m+2n+2$ flopping operations and the GRKO method costs $9m+3n+6$ flopping opeartions, where the residual $r^{(k)}$ is calculated according to Eq (2.2).

3.2. Maximal weighted residual Kaczmarz method with oblique projection

The selection strategy for the index $i_k$ used in the maximal weighted residual Kaczmarz (MWRK) method ^[21] is: Set

$i_{k} = \mathop{arg\max}\limits_{i\in \left\{1,2,\cdots,m\right\}}\frac{|a_{i}^{T}x^{(k)}-b_{i}|}{\|a_i\|}.$

Mccormick proved the exponential convergence of the MWRK method. In ^[22], a new convergence conclusion of the MWRK method is given. We use its row index selection rule combined with KO-type method to obtain MWRKO method and the algorithm is as follows:

Algorithm 3 Maximal Weighted Residual Kaczmarz Method with Oblique Projection (MWRKO)

Require:

$A\in \mathbb{R}^{m\times n}$ ,

$b\in \mathbb{R}^{m}$ ,

$x^{(0)}\in \mathbb{R}^n$ ,

$K$
1: For

$i = 1:m$ ,

$M(i) = \|a_i\|^2$
2: Compute

$i_1 = \mathop{arg\max}\limits_{i\in \left\{1, 2, \cdots, m\right\}}\frac{|a_{i}^{T}x^{(0)}-b_{i}|}{\|a_i\|}$ and

$x^{(1)} = x^{(0)}+\frac{b_{i_{1}}-\langle a_{i_{1}}, x^{(0)}\rangle}{M(i_{1})}a_{i_{1}}$
3: for

$k = 1, 2, \cdots, K$ do
4: Compute

$i_{k+1} = \mathop{arg\max}\limits_{i\in \left\{1, 2, \cdots, m\right\}}\frac{|a_{i}^{T}x^{(k)}-b_{i}|}{\|a_i\|}$
5: Compute

$D_{i_k} = \langle a_{i_{k}}, a_{i_{k+1}}\rangle$ and

$r_{i_{k+1}}^{(k)} = b_{i_{k+1}}-\langle a_{i_{k+1}}, x^{(k)}\rangle$
6: Compute

$w_{i_{k}} = a_{i_{k+1}}-\frac{D_{i_k}}{M(i_k)}a_{i_{k}}$ and

$h_{i_k}( = \|w_{i_{k}}\|^2) = M(i_{k+1})-\frac{D_{i_k}}{M(i_k)}D_{i_k}$
7:

$\alpha_{i_k}^{(k)} = \frac{r_{i_{k+1}}^{(k)}}{h_{i_k}}$ and

$x^{(k+1)} = x^{(k)}+\alpha_{i_k}^{(k)} w_{i_{k}}$
8: end for
9: Output

$x^{(K+1)}$

The convergence of the MWRKO method is provided as follows.

Theorem 3.2. Consider the consistent linear system (1.1), where the coefficient matrix $A\in \mathbb{R}^{m\times n}$ , $b \in \mathbb{R}^m$ . Let $x^{(0)}\in \mathbb{R}^{n}$ be an arbitrary initial approximation, $\tilde{x}$ is a solution of system (1.1) such that $P_{N(A)}(\tilde{x}) = P_{N(A)}(x^{(0)})$ . Then the iteration sequence $\left\{x^{(k)}\right\}^{\infty}_{k = 1}$ generated by the MWRKO method obeys

$\begin{align} ||x^{(k)}-\tilde{x}||^2\leq \mathop{\Pi}\limits_{s = 0}^{k-1}\rho_s||x^{(0)}-\tilde{x}||^2, \end{align}$

(3.10)

where $\rho_0 = 1-\frac{\lambda_{min}(A^TA)}{\|A\|^2_F},$ $\rho_1 = 1-\frac{\lambda_{min}(A^TA)}{\Delta\cdot\gamma_1},$ $\rho_k = 1-\frac{\lambda_{min}(A^TA)}{\Delta\cdot\gamma_2}\quad(\forall k > 1),$ which $\gamma_1$ , $\gamma_2$ and $\Delta$ are defined by Eqs (3.2), (3.3) and (3.4) respectively.

Proof. Under the MWRKO method, Lemma 2.2 still holds. For $k = 1$ , we have

$\begin{align} \|x^{(1)}-\tilde{x}\|^2& = \|x^{(0)}-\tilde{x}\|^2-\|x^{(1)}-x^{(0)}\|^2\\ & = \|x^{(0)}-\tilde{x}\|^2-\frac{|b_{i_{1}}-\langle a_{i_{1}},x^{(0)}\rangle|^2}{M(i_1)}\\ & = \|x^{(0)}-\tilde{x}\|^2-\frac{|b_{i_{1}}-\langle a_{i_{1}},x^{(0)}\rangle|^2}{M(i_1)}\cdot\frac{\|b-Ax^{(0)}\|^2}{\sum\limits_{\substack{i = 1}}^{m}\frac{|b_{i}-\langle a_{i},x^{(0)}\rangle|^2}{M(i)}\cdot M(i)}\\ &\leq\|x^{(0)}-\tilde{x}\|^2-\frac{\|A(\tilde{x}-x^{(0)})\|^2}{\|A\|^2_F}\\ &\overset{(i0)}{\leq}\|x^{(0)}-\tilde{x}\|^2-\frac{\lambda_{min}(A^TA)}{\|A\|^2_F}\|x^{(0)}-\tilde{x}\|^2\\ & = \left(1-\frac{\lambda_{min}(A^TA)}{\|A\|^2_F}\right)\|x^{(0)}-x^*\|^2\\ & = \rho_0\|x^{(0)}-x^*\|^2. \end{align}$

(3.11)

For $k = 1$ , we have

$\begin{align} \|x^{(2)}-\tilde{x}\|^2& = \|x^{(1)}-\tilde{x}\|^2-\|x^{(2)}-x^{(1)}\|^2\\ & = \|x^{(1)}-\tilde{x}\|^2-\frac{|b_{i_{2}}-\langle a_{i_{2}},x^{(1)}\rangle|^2}{\|w_{i_{1}}\|^2}\\ &\overset{(i1)}{\leq}|x^{(1)}-\tilde{x}\|^2-\frac{|b_{i_{2}}-\langle a_{i_{2}},x^{(1)}\rangle|^2}{\Delta\cdot M(i_2)}\cdot\frac{\|b-Ax^{(1)}\|^2}{\sum\limits_{\substack{i = 1,i\neq i_1}}^{m}\frac{|b_{i}-\langle a_{i},x^{(1)}\rangle|^2}{M(i)}\cdot M(i)}\\ &\overset{(i2)}{\leq}\|x^{(1)}-\tilde{x}\|^2-\frac{\|A(\tilde{x}-x^{(1)})\|^2}{\Delta\cdot\gamma_1}\\ &\overset{(i0)}{\leq}\|x^{(1)}-\tilde{x}\|^2-\frac{\lambda_{min}(A^TA)}{\Delta\cdot\gamma_1}\|x^{(1)}-\tilde{x}\|^2\\ & = \left(1-\frac{\lambda_{min}(A^TA)}{\Delta\cdot\gamma_1}\right)\|x^{(1)}-\tilde{x}\|^2\\ & = \rho_1\|x^{(1)}-\tilde{x}\|^2, \end{align}$

(3.12)

where the inequality $(xi)$ can be obtained by using Eqs (3.9) and (2.8). For inequality $(xii)$ , using the row index selection rule of the MWRKO method, we get:

$\begin{align} &\frac{|b_{i_{2}}-\langle a_{i_{2}},x^{(1)}\rangle|^2}{\Delta\cdot M(i_2)}\cdot\frac{\|b-Ax^{(1)}\|^2}{\sum\limits_{\substack{i = 1,i\neq i_1}}^{m}\frac{|b_{i}-\langle a_{i},x^{(1)}\rangle|^2}{M(i)}\cdot M(i)}\\ = &\mathop{\max}\limits_{i\in \left\{1,2,\cdots,m\right\}}\frac{|b_{i}-\langle a_{i},x^{(1)}\rangle|^2}{\Delta\cdot M(i)}\cdot\frac{\|b-Ax^{(1)}\|^2}{\sum\limits_{\substack{i = 1,i\neq i_1}}^{m}\frac{|b_{i}-\langle a_{i},x^{(1)}\rangle|^2}{M(i)}\cdot M(i)}\\ \geq&\frac{\|b-Ax^{(1)}\|^2}{\Delta\cdot\sum\limits_{\substack{i = 1,i\neq i_1}}^{m}M(i)}\\ \geq&\frac{\|b-Ax^{(1)}\|^2}{\Delta\cdot\gamma_{1}}. \end{align}$

(3.13)

For $k > 1$ , we have

$\begin{equation} \begin{aligned} \|x^{(k+1)}-\tilde{x}\|^2& = \|x^{(k)}-\tilde{x}\|^2-\|x^{(k+1)}-x^{(k)}\|^2\\ & = \|x^{(k)}-\tilde{x}\|^2-\frac{|b_{i_{k+1}}-\langle a_{i_{k+1}},x^{(k)}\rangle|^2}{\|w_{i_{k}}\|^2}\\ &\overset{(i3)}{\leq}\|x^{(k)}-\tilde{x}\|^2-\frac{|b_{i_{k+1}}-\langle a_{i_{k+1}},x^{(k)}\rangle|^2}{\Delta\cdot M(i_{k+1})}\cdot\frac{\|b-Ax^{(k)}\|^2}{\sum\limits_{\substack{i = 1,i\neq i_k,i_{k-1}}}^{m}\frac{|b_{i}-\langle a_{i},x^{(k)}\rangle|^2}{M(i)}\cdot M(i)}\\ &\overset{(i4)}{\leq}\|x^{(k)}-\tilde{x}\|^2-\frac{\|A(\tilde{x}-x^{(k)})\|^2}{\Delta\cdot\gamma_2}\\ &\overset{(i0)}{\leq}\|x^{(k)}-\tilde{x}\|^2-\frac{\lambda_{min}(A^TA)}{\Delta\cdot\gamma_2}\|x^{(k)}-\tilde{x}\|^2\\ & = \left(1-\frac{\lambda_{min}(A^TA)}{\Delta\cdot\gamma_2}\right)\|x^{(k)}-\tilde{x}\|^2\\ & = \rho_k\|x^{(k)}-\tilde{x}\|^2, \end{aligned} \end{equation}$

(3.14)

where the inequality $(xiii)$ can be obtained by using Eqs (3.9), (2.8) and (2.9). For the inequality $(xiv)$ , it can be easily obtained by using a derivation similar to Eq (3.13). In the inequalities $(x)$ of the Eqs (3.11), (3.12) and (3.14), we have used the estimate

$||Au||^2_2\geq\lambda_{min}(A^TA)||u||^2,$

which holds true for any $u \in C^n$ belonging to the column space of $A^T$ . According to the lemma 2.2, it holds.

From the Eqs (3.11), (3.12) and (3.14), the Eq (3.10) holds.

Remark 6. When multiple indicators $i_{k+1}$ are met in Step 2 of Algorithm 2 in the iterative process, we randomly select any one of them.

Remark 7. In the MWRKO method, the reason of $h_{i_{k}}\neq 0$ is similar to Remark 3.

Remark 8. Set $\tilde{\rho}_0 = 1-\frac{\lambda_{min}(A^TA)}{\|A\|^2_F}$ , $\tilde{\rho}_k = 1-\frac{\lambda_{min}(A^TA)}{\gamma_1}\quad(\forall k > 0),$ and the convergence of MWRK method in ^[22] meets:

$||x^{(k)}-x^*||^2\leq \mathop{\Pi}\limits_{s = 0}^{k-1}\tilde{\rho_s}||x^{(0)}-x^*||^2.$

Obviously, $\forall \Delta \in (0, 1]$ , $\rho_k < \tilde{\rho_k}\quad(\forall k > 1)$ , $\rho_1\leq\tilde{\rho_1}$ and $\rho_0 = \tilde{\rho_0}$ , so the convergence speed of MWRKO method is faster than MWRK method. Note that $\tilde{\rho}_k < \tilde{\zeta}_k$ , $\rho_k < \zeta_k\quad(\forall k > 0, \forall \Delta \in (0, 1])$ , that is $V_{MWRK} < V_{MWRKO}$ , $V_{GRK} < V_{GRKO}$ , $V_{GRK} < V_{MWRK}$ , $V_{GRKO} < V_{MWRKO}$ , where $V$ represents the convergence speed.

Remark 9. If $B = AA^{T}$ is calculated before iteration, the MWRK method costs $4m+2n$ flopping operations and the MWRKO method costs $6m+3n+4$ flopping opeartions, where the residual $r^{(k)}$ is calculated according to Eq (2.2).

4. Numerical experiments

In this section, some numerical examples are provided to illustrate the effectiveness of the greedy randomized Kaczmarz (GRK) method, the greedy randomized Kaczmarz method with oblique projection (GRKO), the maximal weighted residual Kaczmarz method (MWRK) and the maximal weighted residual Kaczmarz method (MWRKO). All experiments are carried out using MATLAB (version R2019b) on a personal laptop with 1.60 GHz central processing unit (Intel(R) Core(TM) i5-10210U CPU), 8.00 GB memory and Windows operating system (64 bit Windows 10).

In our implementations, the right vector $b = Ax^*$ such that the exact solution $x^*\in \mathbb{R}^n$ is a vector generated by the $rand$ function. Define the relative residual error (RRE) at the $k$ th iteration as follows:

$\text{RRE} = \frac{\|b-Ax^{(k)}\|^2}{\|b\|^2}.$

The initial point $x^{(0)}\in \mathbb{R}^n$ is set to be a zero vector and the iterations are terminated once the relative solution error satisfies $\text{RRE} < \omega$ or the number of iteration steps exceeds 100,000. If the number of iteration steps exceeds 100,000, it is denoted as "-".

We will compare the numerical performance of these methods in terms of the number of iteration steps (denoted as "IT") and the computing time in seconds (denoted as "CPU"). Here the CPU and IT mean the arithmetical averages of the elapsed running times and the required iteration steps with respect to 50 trials repeated runs of the corresponding method.

4.1. Experiments for random matrix collection in $[0, 1]$

The random matrix collection in $[0, 1]$ is randomly generated by using the MATLAB function $rand$ and the numerical results are reported in Tables 1 and and and . In this subsection, we let $\omega = 0.5\times10^{-8}$ . According to the characteristics of the matrix generated by MATLAB function $rand$ , Tables 1 and 2 are the experiments for the overdetermined consistent linear systems, underdetermined consistent linear systems respectively. Under the premise of convergence, all methods can find the unique least Euclidean norm solution.

Table 1. IT and CPU of GRK, GRKO, MWRK and MWRKO for

$m \times n$ matrices

$A$ with

$n = 500$ and different

$m$ when the consistent linear system is overdetermined.

$m$	IT				CPU
$m$	GRK	GRKO	MWRK	MWRKO	GRK	GRKO	MWRK	MWRKO
1000	12,072	2105	11,265	1913	1.2824	0.2099	0.7192	0.1089
2000	4726	1088	4292	898	1.4792	0.3413	1.1107	0.2157
3000	3362	897	3234	771	1.7550	0.5172	1.5711	0.3575
4000	2663	859	2517	668	1.9415	0.6396	1.6634	0.4807
5000	2398	826	2282	605	2.4134	0.8160	2.1528	0.5801
6000	2100	772	2018	586	2.6235	0.8912	2.0975	0.6486
7000	1970	752	1829	562	2.6019	1.0720	2.5441	0.7822
8000	1861	747	1703	555	3.1035	1.2421	2.4987	0.8390
9000	1750	747	1612	530	3.0223	1.3055	2.6148	0.8730

| Show Table

DownLoad: CSV

Table 2. IT and CPU of GRK, GRKO, MWRK and MWRKO for

$m \times n$ matrices

$A$ with

$n = 2000$ and different

$m$ when the consistent linear system is underdetermined.

$m$	IT				CPU
$m$	GRK	GRKO	MWRK	MWRKO	GRK	GRKO	MWRK	MWRKO l
100	802	286	848	272	0.0496	0.0223	0.0258	0.0165
200	1968	523	1948	481	0.1648	0.0496	0.0831	0.0276
300	3104	759	3148	709	0.3982	0.1090	0.2404	0.0664
400	4586	1002	4612	930	1.0539	0.2594	0.8433	0.1920
500	6233	1250	6336	1215	1.9528	0.4409	1.6836	0.3576
600	8671	1576	8882	1497	3.6363	0.7493	3.1625	0.5957
700	11,895	2063	11,575	1879	5.8642	1.1078	5.0029	0.9087
800	14,758	2451	14,888	2394	8.4280	1.5350	7.7007	1.6405
900	18,223	3250	18,608	2945	12.0469	2.2750	10.9511	1.8884

| Show Table

DownLoad: CSV

Figure 2. (a) IT and (b) CPU versus

$m$ for four methods with matrices

$A\in \mathbb{R}^{m\times 500}$ generated by the

$rand$ fuction in the interval

$[0, 1]$ .

DownLoad: Full-Size Img PowerPoint

Figure 3. (a) IT and (b) CPU versus

$m$ for four methods with matrices

$A\in \mathbb{R}^{m\times 2000}$ generated by the

$rand$ fuction in the interval

$[0, 1]$ .

DownLoad: Full-Size Img PowerPoint

From and , we can see that when the linear system is overdetermined, with the increase of $m$ , the IT of all methods decreases, but the CPU shows an increasing trend. Our new methods—the GRKO method and the MWRKO method, perform better than the GRK method and the MWRK method respectively in both iteration steps and running time. Among the four methods, the MWRKO method performs best. From and , we can see that in the case of underdetermined linear system, with the increase of $m$ , the IT and CPU of all methods decrease.

In this group of experiments, whether it is an overdetermined or underdetermined linear system, whether in terms of the IT or CPU, the GRKO method and the MWRKO method perform very well compared with the GRK method and the MWRK method. These experimental phenomena are consistent with the theoretical convergence conclusions we got.

4.2. Experiments for random matrix collection in $[c, 1]$

In this subsection, the entries of our coefficient matrix are randomly generated in the interval $[c, 1]$ . This set of experiments was also done in ^[36] and ^[54] and pointed out that when the value of $c$ is close to $1$ , the rows of matrix $A$ are more linearly correlated. Theorems 3.1 and 3.2 have shown the effectiveness of the GRKO method and the MWRKO method in this case. In order to verify this phenomenon, we construct several $1000 \times 500$ and $500 \times 1000$ matrices $A$ , which entries is independent identically distributed uniform random variables on some interval $[c, 1]$ . Note that there is nothing special about this interval and other intervals yield the same results when the interval length remains the same. In the experiment of this subsection, we take $\omega = 0.5\times10^{-8}$ .

From and , it can be seen that when the linear system is overdetermined, with $c$ getting closer to $1$ , the GRK method and the MWRK method have a significant increase in the number of iterations and running time. When $c$ increases to $0.7$ , the GRK method and the MWRK method exceeds the maximum number of iterations. But the IT and CPU of the GRKO method and the MWRKO method have decreasing trends. From , we can get that the numerical experiment of the coefficient matrix $A$ in the underdetermined case has similar trends to the numerical experiment in the overdetermined case in Figure 4. In this group of experiments, it can be observed that when the rows of the matrix are more linearly correlated, the GRKO method and the MWRKO method can find the least Euclidean norm solution more quickly than the GRK method and the MWRK method.

Table 3. IT and CPU of GRK, GRKO, MWRK and MWRKO for matrices

$A\in \mathbb{R}^{1000\times500}$ generated by the

$rand$ function in the interval

$[c, 1]$ .

$c$	IT				CPU
$c$	GRK	GRKO	MWRK	MWRKO	GRK	GRKO	MWRK	MWRKO
0.1	14,757	2036	14,594	1830	1.5811	0.2180	0.9419	0.0969
0.2	21,103	1840	20,717	1714	2.1684	0.2287	1.1828	0.1003
0.3	27,375	1708	26,986	1569	3.5926	0.1789	1.5865	0.1195
0.4	36,293	1708	35,595	1394	3.6751	0.1802	2.0682	0.0885
0.5	53,485	1428	52,853	1310	5.3642	0.1486	3.0024	0.0847
0.6	84,204	1353	81,647	1185	9.0879	0.1388	4.5468	0.0767
0.7	-	1227	-	1036	-	0.1298	-	0.0564
0.8	-	1080	-	926	-	0.1107	-	0.0580
0.9	-	715	-	583	-	0.0707	-	0.0324

| Show Table

DownLoad: CSV

Figure 4. (a) IT and (b) CPU versus

$c$ for four methods with matrices

$A\in \mathbb{R}^{1000\times 500}$ generated by the

$rand$ fuction in the interval

$[c, 1]$ .

DownLoad: Full-Size Img PowerPoint

Table 4. IT and CPU of GRK, GRKO, MWRK and MWRKO for matrices

$A\in \mathbb{R}^{500\times1000}$ generated by the

$rand$ function in the interval

$[c, 1]$ .

$c$	IT				CPU
$c$	GRK	GRKO	MWRK	MWRKO	GRK	GRKO	MWRK	MWRKO
0.1	16,828	1968	16,913	1795	1.7612	0.2103	0.9353	0.1083
0.2	23,518	2003	23,234	1857	2.3037	0.2066	1.3119	0.1230
0.3	30,875	1661	31,017	1688	2.9310	0.1635	1.7373	0.0997
0.4	41,242	1511	40,986	1515	4.3004	0.1726	2.2899	0.1025
0.5	60,000	1399	59,750	1349	5.4754	0.1252	2.8920	0.0727
0.6	97,045	1270	95,969	1264	8.5229	0.1173	4.8380	0.0688
0.7	-	1082	-	1022	-	0.1168	-	0.0646
0.8	-	858	-	863	-	0.0960	-	0.0585
0.9	-	549	-	598	-	0.0582	-	0.0353

| Show Table

DownLoad: CSV

Figure 5. (a) IT and (b) CPU versus

$c$ for four methods with matrices

$A\in \mathbb{R}^{500\times 1000}$ generated by the

$rand$ fuction in the interval

$[c, 1]$ .

DownLoad: Full-Size Img PowerPoint

4.3. Experiments for sparse matrix

In this subsection, we will give three examples to illustrate the effectiveness of our new methods applied to sparse matrix. The coefficient matrices $A$ of these three examples are the practical problems from ^[57] and the two test problems from ^[58].

Example 1. We solve the problem (1.1) with the coefficient matrix $A\in \mathbb{R}^{m\times n}$ chosen from the University of Florida sparse matrix collection ^[57]. the matrices are $divorce$ , $photogrammetry$ , $Ragusa18$ , $Trec8$ , $Stranke94$ and $well1033$ . In Table 5, we list some properties of these matrices, where density is defined as follows:

$\text{density} = \frac{\text{number of nonzeros of m-by-n matrix}}{mn}.$

Table 5. The properties of different sparse matrices.

A	divorce	photogrammetry	Ragusa18	Trec8	Stranke94	well1033
m $\times$ n	50 $\times$ 9	1388 $\times$ 390	23 $\times$ 23	23 $\times$ 84	10 $\times$ 10	1033 $\times$ 320
rank	9	390	15	23	10	320
cond $(A)$	19.3908	4.35e+8	3.48e+35	26.8949	51.7330	166.1333
density	50.00%	2.18%	12.10%	28.42%	90.00%	1.43%

| Show Table

DownLoad: CSV

In this group of experiments, we set $\omega = 0.5\times10^{-5}$ . In order to solve Example 1, we list the IT, CPU and historical convergence of the GRK, GRKO, MWRK and MWRKO methods in Figure 6 and Table 6, respectively. It can be seen that IT and CPU of the MWRKO method are the least. Although the GRKO method is not faster than the MWRK method for most of the experiments in Table 6, it is always faster than the GRK method.

Figure 6. Convergence history of the four methods for sparse matrices. (a) divorce, (b) photogrammetry, (c)Ragusa18, (d) Trec8, (e) Stranke94, (f) well1033.

DownLoad: Full-Size Img PowerPoint

Table 6. IT and CPU of GRK, GRKO, MWRK and MWRKO for different sparse matrices.

$A$	IT				CPU
$A$	GRK	GRKO	MWRK	MWRKO	GRK	GRKO	MWRK	MWRKO
divorce	51	28	54	22	0.0053	0.0037	0.0017	0.0013
photogrammetry	85,938	48,933	90,480	27,084	9.9917	8.0424	3.9809	2.5026
Ragusa18	744	262	727	280	0.0577	0.0270	0.0121	0.0098
Trec8	465	152	538	139	0.0382	0.0168	0.0111	0.0062
Stranke94	1513	197	1453	181	0.1291	0.0187	0.0208	0.0082
well1033	22,924	9825	25,250	8655	2.4278	1.5112	0.8491	0.5827

| Show Table

DownLoad: CSV

Example 2. We consider $fancurvedtomo(N, \theta, P)$ test problem from the MATLAB package AIR Tools ^[58], which generates sparse matrix $A$ , an exact solution $x^*$ and $b = Ax^*$ . We set $N = 60$ , $\theta = 0 : 0.5 : 179.5^{\circ}$ , $P = 90$ , then resulting matrix is of size $32400\times3600$ . We test RRE every $10$ iterations and run these four methods until RRE $< \omega$ is satisfied, where $\omega = 0.5\times10^{-5}$ .

We first remove the rows of $A$ where the entries are all 0 and perform row unitization processing on $A$ and $b$ . We emphasize that this will not cause a change in $x^*$ . In , we give $60\times60$ images of the exact phantom and the approximate solutions obtained by the GRK, GRKO, MWRK, MWRKO methods. In Figure 7, we can see that the GRK, GRKO, MWRK and MWRKO method can all converge to the exact solution successfully. In the subgraph (f) of Figure 7, we can see that the MWRKO method needs the least iterative steps and the GRKO method has less iterative steps than the GRK method. It can be observed from Table 7 that the MWRKO method is the best in terms of IT and CPU.

Figure 7. Performance of the(a) exact phantom, (b) GRK, (c) GRKO, (d) MWRK, (e) MWRKO methods for

$fancurvedtomo$ testproblem. (f) Convergencetest historytest oftest fourtest methods.

DownLoad: Full-Size Img PowerPoint

Table 7. IT and CPU of GRK, GRKO, MWRK and MWRKO for

$fancurvedtomo$ problem.

method	IT	CPU
GRK	13,550	581.17
GRKO	12,750	538.82
MWRK	12,050	504.83
MWRKO	10,790	452.86

| Show Table

DownLoad: CSV

Example 3. We use an example from 2D seismic travel-time tomography reconstruction, implemented in the function $seismictomo(N, s, p)$ in the MATLAB package AIR Tools ^[58], which generates sparse matrix $A$ , an exact solution $x^*$ and $b = Ax^*$ . We set $N = 50$ , $s = 150$ , $p = 120$ , then resulting matrix is of size $18000\times2500$ . We test RRE every $50$ iterations and run these four methods until RRE $< \omega$ is satisfied, where $\omega = 0.5\times10^{-6}$ .

We first remove the rows of $A$ where the entries are all 0 and perform row unitization processing on $A$ and $b$ . In , we give $50\times50$ images of the exact phantom and the approximate solutions obatined by the GRK, GRKO, MWRK, MWRKO methods. From the subgraph (f) of Figure 8 and Table 8, we can see that the MWRKO method is the best in terms of IT and CPU.

Figure 8. Performanceof the (a) exactphantom, (b) GRK, (c) GRKO, (d) MWRK, (e) MWRKO methods for

$seismictomo$ test problem. (f) Convergence history of four methods.

DownLoad: Full-Size Img PowerPoint

Table 8. IT and CPU of GRK, GRKO, MWRK and MWRKO for

$seismictomo$ problem.

method	IT	CPU
GRK	21,500	378.6629
GRKO	15,850	324.9338
MWRK	19,250	334.3446
MWRKO	14,400	262.1489

| Show Table

DownLoad: CSV

4.4. Comparison experiments with other methods

In this subsection, we compare the classical methods—Landweber method ^[59] and generalized minimum residual (GMRES) method ^[60] with our proposed methods—GRKO and MWRKO. The iterative expression of Landweber is:

$\begin{equation*} x^{(k+1)} = x^{(k)}+\eta A^{T}(b-Ax^{(k)}), \end{equation*}$

where $\eta$ is a real parameter satisfying $0 < \eta < \frac{2}{\lambda_{max}(A^{T}A)}$ . For convenience, we take $\eta = \frac{2}{\|A\|^{2}_{F}} \; (<\frac{2}{\lambda_{max}(A^{T}A)})$ in the following numerical experiments. There are also other new improvements to the Landweber method. Interested readers can refer ^[61,62] and the references therein.

For the GMRES method, we use the matlab function $gmres(A, b, n, RRE^{*}, n)$ , where $RRE^{*} = \frac{||b-Ax^{(k)}||}{||b||} = \sqrt{RRE}$ and $n$ is the dimension of matrix $A$ . In this group of experiments, we take the parameter $\omega = 0.5\times10^{-8}$ and sparse matrices $A\in \mathbb{R}^{600\times600}$ are generated by the MATLAB function $sprand(600,600, density, 0.75)$ and then perform row unitization processing. Obviously, the biggest computational cost of the GRKOmethod and the MWRKO method is the update of the residual $r$ . Therefore, we firstcalculate $B = AA^{T}$ and then calculate the residual $r$ accordingto Eq (2.2), which can improve CPU to a certain extent. Note that the calculation time of $B$ is included in the CPU of the GRKO method and the MWRKO method.

In Table 9, we compare IT and CPU of MWRKO, GRKO, Landweber and GMRES methods at different densities. Obviously as density increases, the CPU increases for all methods. Among them, the best CPU performance is still the MWRKO method, followed by the GMRES method.

Table 9. IT and CPU of MWRKO, GRKO, Landweber and GMRES for sparse matrices

$A\in \mathbb{R}^{600\times 600}$ under different densities.

$density$	IT				CPU
$density$	MWRKO	GRKO	Landweber	GMRES	MWRKO	GRKO	Landweber	GMRES
0.15	1423	1432	3177	600	0.1285	0.2557	0.4897	0.2522
0.3	1501	1518	3236	600	0.1808	0.3484	1.0245	0.2839
0.45	1546	1557	3221	600	0.2299	0.3708	3.3660	0.3239
0.6	1573	1593	3245	600	0.2633	0.4202	4.5986	0.3423
0.75	1581	1599	3236	600	0.2782	0.4282	5.7157	0.3671
0.9	1632	1652	3231	600	0.3132	0.4734	6.9041	0.4021

| Show Table

DownLoad: CSV

5. Conclusions

Combined with the representative randomized and non-randomized row index selection strategies, two Kaczamrz-type methods with oblique projection for solving large-scale consistent linear systems are proposed, namely the GRKO method and the MWRKO method. The exponential convergences of the GRKO method and the MWRKO method are deduced. Theoretical and experimental results show that the convergence rates of the GRKO method and the MWRKO method are better than GRK method and MWRK method respectively. Numerical experiments show the effectiveness of these two methods, especially when the rows of the coefficient matrix $A$ are highly linearly correlated.

Acknowledgments

This work was supported by the Fundamental Research Funds for the Central Universities [grant number 18CX02041A, 19CX05003A-2, 19CX02062A], the National Key Research and Development Program of China [grant number 2019YFC1408400] and the Science and Technology Support Plan for Youth Innovation of University in Shandong Province [No.YCX2021151].

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

[1]	Y. Zhang, M. J. Wainwright, J. C. Duchi, Communication-efficient algorithms for statistical optimization, Adv. Neural Inf. Process. Syst., 25 (2012). https://doi.org/10.1109/CDC.2012.6426691 doi: 10.1109/CDC.2012.6426691
[2]	A. Kleiner, A. Talwalkar, P. Sarkar, M. Jordan, The big data bootstrap, arXiv preprint, (2012), arXiv: 1206.6415.
[3]	T. Zhao, G. Cheng, H. Liu, A partially linear framework for massive heterogeneous data, Ann. Stat., 44 (2016), 1400–1437. https://doi.org/10.1214/15-AOS1410 doi: 10.1214/15-AOS1410
[4]	Q. Xu, C. Cai, C. Jiang, F. Sun, X. Huang, Block average quantile regression for massive dataset, Stat. Pap. (Berl), 61 (2020), 141–165. https://doi.org/10.1007/s00362-017-0932-6 doi: 10.1007/s00362-017-0932-6
[5]	H. Battey, J. Fan, H. Liu, J. Lu, Z. Zhu, Distributed testing and estimation under sparse high dimensional models, Ann. Stat., 46 (2018), 1352. https://doi.org/10.1214/17-AOS1587 doi: 10.1214/17-AOS1587
[6]	J. Fan, D. Wang, K. Wang, Z. Zhu, Distributed estimation of principal eigenspaces, Ann. Stat., 47 (2019), 3009–3031. https://doi.org/10.1214/18-AOS1713 doi: 10.1214/18-AOS1713
[7]	J. D. Lee, Q. Liu, Y. Sun, J. E. Taylor, Communication-efficient sparse regression, J. Mach. Learn. Res., 18 (2017), 115–144.
[8]	A. Javanmard, A. Montanari, Confidence intervals and hypothesis testing for high-dimensional regression, J. Mach. Learn. Res., 15 (2014), 2869–2909.
[9]	X. Chen, M.-g. Xie, A split-and-conquer approach for analysis of extraordinarily large data, Stat. Sin., (2014), 1655–1684.
[10]	Y. Zhang, J. Duchi, M. Wainwright, Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates, J. Mach. Learn. Res., 16 (2015), 3299–3340.
[11]	F. Liang, Q. Song, K. Yu, Bayesian subset modeling for high-dimensional generalized linear models, J. Am. Stat. Assoc., 108 (2013), 589–606. https://doi.org/10.1080/01621459.2012.761942 doi: 10.1080/01621459.2012.761942
[12]	Q. Song, F. Liang, A split-and-merge bayesian variable selection approach for ultrahigh dimensional regression, J. R. Stat. Soc. Series B Stat. Methodol., 77 (2015), 947–972. https://doi.org/10.1111/rssb.12095 doi: 10.1111/rssb.12095
[13]	T. Park, G. Casella, The bayesian lasso, J. Am. Stat. Assoc., 103 (2008), 681–686. https://doi.org/10.1198/016214508000000337 doi: 10.1198/016214508000000337
[14]	R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Series B Stat. Methodol., 58 (1996), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x doi: 10.1111/j.2517-6161.1996.tb02080.x
[15]	M. Yuan, Y. Lin, Efficient empirical bayes variable selection and estimation in linear models, J. Am. Stat. Assoc., 100 (2005), 1215–1225. https://doi.org/10.1198/016214505000000367 doi: 10.1198/016214505000000367
[16]	C. Hans, Bayesian lasso regression, Biometrika, 96 (2009), 835–845. https://doi.org/10.1093/biomet/asp047 doi: 10.1093/biomet/asp047
[17]	H. Mallick, N. Yi, A new bayesian lasso, Stat. Interface, 7 (2014), 571–582. https://doi.org/10.4310/SII.2014.v7.n4.a12 doi: 10.4310/SII.2014.v7.n4.a12
[18]	F. Liang, Y. K. Truong, W. H. Wong, Automatic bayesian model averaging for linear regression and applications in bayesian curve fitting, Sta. Sin., 1005–1029. http://www.jstor.org/stable/24306895
[19]	G. Casella, M. Ghosh, J. Gill, M. Kyung, Penalized regression, standard errors, and bayesian lassos, Bayesian Anal., 5 (2010), 369–411. https://doi.org/10.1214/10-BA607 doi: 10.1214/10-BA607
[20]	M. Yuan, Y. Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Series B Stat. Methodol., 68 (2006), 49–67. https://doi.org/10.1111/j.1467-9868.2005.00532.x doi: 10.1111/j.1467-9868.2005.00532.x
[21]	H. Zou, T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Series B Stat. Methodol., 67 (2005), 301–320. https://doi.org/10.1080/01621459.2014.881153 doi: 10.1080/01621459.2014.881153
[22]	S. Kundu, D. B. Dunson, Bayes variable selection in semiparametric linear models, J. Am. Stat. Assoc., 109 (2014), 437–447. https://doi.org/10.1080/01621459.2014.881153 doi: 10.1080/01621459.2014.881153
[23]	N. Meinshausen, P. Bühlmann, Stability selection, J. R. Stat. Soc. Series B Stat. Methodol., 72 (2010), 417–473. https://doi.org/10.1111/j.1467-9868.2010.00740.x doi: 10.1111/j.1467-9868.2010.00740.x
[24]	R. D. Shah, R. J. Samworth, Variable selection with error control: another look at stability selection, J. R. Stat. Soc. Series B Stat. Methodol., 75 (2013), 55–80. https://doi.org/10.1111/j.1467-9868.2011.01034.x doi: 10.1111/j.1467-9868.2011.01034.x
[25]	G. Casella, Empirical bayes gibbs sampling, Biostatistics, 2 (2001), 485–500. https://doi.org/10.1093/biostatistics/2.4.485 doi: 10.1093/biostatistics/2.4.485
[26]	A. Bhattacharya, D. Pati, N. S. Pillai, D. B. Dunson, Dirichlet-laplace priors for optimal shrinkage, J. Am. Stat. Assoc., 110 (2015), 1479–1490. https://doi.org/10.1080/01621459.2014.960967 doi: 10.1080/01621459.2014.960967
[27]	C. Leng, M.-N. Tran, D. Nott, Bayesian adaptive lasso, Ann. Inst. Stat. Math., 66 (2014), 221–244. https://doi.org/10.1007/s10463-013-0429-6 doi: 10.1007/s10463-013-0429-6
[28]	H. Mallick, N. Yi, Bayesian methods for high dimensional linear models, J. Biometrics Biostatistics, 1 (2013), 005. https://doi.org/10.4172/2155-6180.S1-005 doi: 10.4172/2155-6180.S1-005

This article has been cited by:

1.	Andreas Frommer, Daniel B. Szyld, On the convergence of randomized and greedy relaxation schemes for solving nonsingular linear systems of equations, 2023, 92, 1017-1398, 639, 10.1007/s11075-022-01431-7
2.	Yansheng Su, Deren Han, Yun Zeng, Jiaxin Xie, On greedy multi-step inertial randomized Kaczmarz method for solving linear systems, 2024, 61, 0008-0624, 10.1007/s10092-024-00621-0

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Electronic Research Archive

1 1.3

Metrics

Article views(1674) PDF downloads(65) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(3) / Tables(4)

Electronic Research Archive

Distributed Bayesian posterior voting strategy for massive data

Related Papers:

Abstract

1. Introduction

2. Kaczmarz-type method with oblique projection and its lemmas

2.1. Relationship between coordinate descent method and Kaczmarz-type method with oblique projection

2.2. Theoretical guarantees of Kaczmarz-type method with oblique projection

3. Greedy randomized and maximal weighted residual Kaczmarz methods with oblique projection

3.1. Greedy randomized Kaczmarz method with oblique projection

3.2. Maximal weighted residual Kaczmarz method with oblique projection

4. Numerical experiments

4.1. Experiments for random matrix collection in $[0, 1]$

4.2. Experiments for random matrix collection in $[c, 1]$

4.3. Experiments for sparse matrix

4.4. Comparison experiments with other methods

5. Conclusions

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Electronic Research Archive

Distributed Bayesian posterior voting strategy for massive data

Related Papers:

Abstract

1. Introduction

2. Kaczmarz-type method with oblique projection and its lemmas

2.1. Relationship between coordinate descent method and Kaczmarz-type method with oblique projection

2.2. Theoretical guarantees of Kaczmarz-type method with oblique projection

3. Greedy randomized and maximal weighted residual Kaczmarz methods with oblique projection

3.1. Greedy randomized Kaczmarz method with oblique projection

3.2. Maximal weighted residual Kaczmarz method with oblique projection

4. Numerical experiments

4.1. Experiments for random matrix collection in [0,1] [0, 1]

4.2. Experiments for random matrix collection in [c,1] [c, 1]

4.3. Experiments for sparse matrix

4.4. Comparison experiments with other methods

5. Conclusions

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog

4.1. Experiments for random matrix collection in $[0, 1]$

4.2. Experiments for random matrix collection in $[c, 1]$