Randomized symmetric Gauss-Seidel method for solving linear least squares problems

Fan Sha; Jianbing Zhang; Fan Sha; Jianbing Zhang

doi:10.3934/math.2024848

AIMS Mathematics

2024, Volume 9, Issue 7: 17453-17463. doi: 10.3934/math.2024848

Previous Article Next Article

Research article

Randomized symmetric Gauss-Seidel method for solving linear least squares problems

Fan Sha ^{1
,
,},
Jianbing Zhang ²

1.
School of Mathematics, East China Normal University, Shanghai 200241, China
2.
School of Artificial Intelligence, Nanjing University, Nanjing 210023, China

Received: 30 March 2024 Revised: 08 May 2024 Accepted: 14 May 2024 Published: 23 May 2024
MSC : 65F10

We introduced a random symmetric Gauss-Seidel (RSGS) method, which was designed to handle large scale linear least squares problems involving tall coefficient matrices. This RSGS method projected the approximate residual onto the subspace spanned by two symmetric columns at each iteration. These columns were sampled from the coefficient matrix based on an effective probability criterion. Our theoretical analysis indicated that RSGS converged when the coefficient matrix had full column rank. Furthermore, numerical experiments demonstrated that RSGS outperformed the baseline algorithms in terms of iteration steps and CPU time.

Keywords:

Citation: Fan Sha, Jianbing Zhang. Randomized symmetric Gauss-Seidel method for solving linear least squares problems[J]. AIMS Mathematics, 2024, 9(7): 17453-17463. doi: 10.3934/math.2024848

Related Papers:

[1]	Rashid Ali, Ilyas Khan, Asad Ali, Abdullah Mohamed . Two new generalized iteration methods for solving absolute value equations using $ M $-matrix. AIMS Mathematics, 2022, 7(5): 8176-8187. doi: 10.3934/math.2022455
[2]	Mingzhou Xu, Xuhang Kong . Note on complete convergence and complete moment convergence for negatively dependent random variables under sub-linear expectations. AIMS Mathematics, 2023, 8(4): 8504-8521. doi: 10.3934/math.2023428
[3]	Shuting Tang, Xiuqin Deng, Rui Zhan . The general tensor regular splitting iterative method for multilinear PageRank problem. AIMS Mathematics, 2024, 9(1): 1443-1471. doi: 10.3934/math.2024071
[4]	Lunyi Liu, Qunying Wu . Complete integral convergence for weighted sums of negatively dependent random variables under sub-linear expectations. AIMS Mathematics, 2023, 8(9): 22319-22337. doi: 10.3934/math.20231138
[5]	Mingzhou Xu, Kun Cheng, Wangke Yu . Complete convergence for weighted sums of negatively dependent random variables under sub-linear expectations. AIMS Mathematics, 2022, 7(11): 19998-20019. doi: 10.3934/math.20221094
[6]	Chengcheng Jia, Qunying Wu . Complete convergence and complete integral convergence for weighted sums of widely acceptable random variables under the sub-linear expectations. AIMS Mathematics, 2022, 7(5): 8430-8448. doi: 10.3934/math.2022470
[7]	Shuyan Li, Qunying Wu . Complete integration convergence for arrays of rowwise extended negatively dependent random variables under the sub-linear expectations. AIMS Mathematics, 2021, 6(11): 12166-12181. doi: 10.3934/math.2021706
[8]	Mingzhou Xu . Complete convergence and complete moment convergence for maximal weighted sums of extended negatively dependent random variables under sub-linear expectations. AIMS Mathematics, 2023, 8(8): 19442-19460. doi: 10.3934/math.2023992
[9]	Xiaocong Chen, Qunying Wu . Complete convergence and complete integral convergence of partial sums for moving average process under sub-linear expectations. AIMS Mathematics, 2022, 7(6): 9694-9715. doi: 10.3934/math.2022540
[10]	He Dong, Xili Tan, Yong Zhang . Complete convergence and complete integration convergence for weighted sums of arrays of rowwise $ m $-END under sub-linear expectations space. AIMS Mathematics, 2023, 8(3): 6705-6724. doi: 10.3934/math.2023340

Abstract

1. Introduction

We consider solving the linear least squares problems given by

$\begin{align} \min\limits_{x\in \mathbb{R}^{n}}\|{Ax-b}\|_2, \end{align}$

(1.1)

where $A\in \mathbb{R}^{m\times n}$ has full column rank and $m\geq n$ . Problem (1.1) is prevalent in several fields, including ridge regression, machine learning, optimal control, and others. To address this problem in a resourceful and efficient manner, a substantial body of research has been conducted on iterative methods ^[1,2,3]. Notably, the randomized Kaczmarz method ^[4] and the randomized Gauss-Seidel method ^[5,6,7,8,9] have amassed considerable interest due to their capacity to handle large volumes of data.

Strohmer and Vershynin ^[4] put forth a randomized version of the Kaczmarz method. This method uniquely selects a row in proportion to the squared Euclidean norm, resulting in a fast convergence. Inspired by this, Leventhal and Lewis ^[10] developed a randomized Gauss-Seidel (RGS) method that samples a column of $A$ based on an appropriately chosen probability. Subsequently, Ma et al. ^[11] established a comprehensive convergence theory.

To expedite convergence, researchers have extensively investigated the two-step Gauss-Seidel method. Liu et al. ^[12] proposed the 2SGS method, a deterministic iteration scheme based on the maximum residual rule. Liao et al. ^[13] introduced RGS2, a method that merges two single-step iterations into one, sampling two distinct indices simultaneously. In the same literature, the TRGS method, an enhanced two-step approach, projects the approximate solution onto the solution space using two random columns. Mustafa and Saha ^[14] devised D2RGS, a two-dimensional coordinate descent method employing uniform sampling to randomly select two distinct columns of the coefficient matrix in each iteration.

In this paper, we concentrate on the result found by Niu and Zheng ^[8], which presents a novel randomized Gauss-Seidel (NRGS) method as follows:

$\begin{align*} x_{k+1} = x_{k}+\frac{A_{i_{k}}^\top(b-Ax_{k})}{\|{A_{i_{k}}}\|_2^2}e_{i_{k}},\; k = 0,1,2,\ldots, \end{align*}$

where $A_{i_{k}}$ is the $i_{k}$ -th column of $A$ , $e_{i_{k}}$ is the ${i_{k}}$ -th column of the $n\times n$ identity matrix, $r_{k} = b-Ax_{k}$ , and the column index is chosen with probability $p_{i_{k}} = \frac{|A_{i_{k}}^\top r_{k}|^2}{\|{A^\top r_{k}}\|_2^2}$ . The convergence result was given by

$\begin{align} \mathbb{E}{\|{x_{k+1}-A^{\dagger}b}\|}_{A^\top A}^2 \leq \left(1-\frac{\lambda_{\min}(A^\top A)}{\|{A}\|_F^2}\right)\left(1-\frac{\lambda_{\min}(A^\top A)}{\beta} \right)^{k-1}\|{x_0-A^{\dagger}b}\|_{A^\top A}^2, k = 0,1,2,\ldots, \end{align}$

(1.2)

where $\beta = \|{A}\|_F^2-\min_{ 1\leq i \leq n}\|{A_i}\|_2^2$ , $A^{\dagger}$ is the Moore-Penrose pseudoinverse of $A$ , and $A^{\dagger}b$ is the unique solution of (1.1).

For solving (1.1) more efficiently, we consider projecting the update residual to two random symmetric columns. Borrowing the idea of the probability criterion given by NRGS in ^[8], we introduce Random Symmetric Gauss-Seidel (RSGS) method.

The organization of this paper is as follows. In Section 2, we present the related work. The RSGS method and its convergence analysis are presented in Section 3. Several numerical examples are displayed in Section 4 to show that our proposed RSGS method performs better than NRGS and other methods compared. Finally, the conclusion is drawn in Section 5.

2. Related work

A large number of methods have been designed to solve linear least squares problems. Here, we review only the most relevant and recent research works.

Niu and Zheng ^[8] proposed a single-step Gauss-Seidel method, namely NRGS. This algorithm adopts an effective probabilistic criterion, allowing it to efficiently capture the larger elements within the residual vector. Inspired by its probabilistic criterion, we extend this algorithm to the two-dimensional scenario. While ensuring the simplification of the sampling method, we achieve faster convergence rates.

To accelerate the convergence, numerous researchers have conducted studies on the two-step Gauss-Seidel method. Liu et al. ^[12] proposed a two-step iteration Gauss-Seidel deterministic method named the 2SGS method, which is based on the maximum residual rule. As a deterministic algorithm, 2SGS is particularly suitable for medium-sized problems. However, for large-scale datasets, it is necessary to investigate stochastic algorithms. 2SGS is a highly effective deterministic approach, serving as an inspiring foundation for our subsequent design of efficient stochastic algorithms in large-scale datasets.

Liao et al. ^[13] introduced the two-step iteration method RGS2, which effectively combines two single-step iterations into one step, where two different indices are sampled simultaneously. The advantage of this approach is the avoidance of the possibility of repeating the same index in two consecutive single-step iterations, thereby improving convergence efficiency. In the same literature, the TRGS is the improved two-step method that projects the approximate solution onto the solution space by two random columns. Differently, each iteration of our method employs a stochastic two-dimensional coordinate space least squares approach. Intuitively, under the same conditions, the two-dimensional minimization method outperforms two-step minimization approach (i.e., the combination of two separated singe steps).

Mustafa and Saha ^[14] developed a two-dimensional coordinate descent method D2RGS, which employs uniform sampling to randomly select two different columns of the coefficient matrix in each iteration. While two-dimensional uniform sampling can save time compared to non-uniform sampling, it fails to consider the importance of different columns. The challenge lies in balancing the fast convergence of the algorithm with the efficiency of sampling. We aim to achieve local minimization in the two-dimensional coordinate space through symmetric sampling. Our method requires only non-uniform sampling of one index at a time while obtaining two indices. This approach is both simple to implement and efficient.

3. The randomized symmetric Gauss-Seidel method

In this section, we introduce the randomized symmetric Gauss-Seidel method (RSGS), which can also be explained as the randomized symmetric coordinate projection method. The iteration scheme is given by

$\begin{align} &x_{k+1} = x_{k}+\alpha_k e_{i_k}+\beta_k e_{n-i_k+1}, \end{align}$

(3.1)

where $\alpha_k$ and $\beta_k$ are parameters which are chosen dynamically such that

$\begin{align} & A_{i_k}^\top r_{k+1} = A_{n-i_k+1}^\top r_{k+1} = 0, \end{align}$

(3.2)

where $r_{k+1} = b-Ax_{k+1}$ . Substituting (3.1) into (3.2), we have

$\begin{eqnarray} \left\{ \begin{array}{ccc} \alpha_k\|{A_{i_k}}\|_2^2+\beta_k A_{i_k}^\top A_{n-i_k+1}& = &A_{i_k}^\top r_k,\\[.2cm] \alpha_k A_{n-i_k+1}^\top A_{i_k}+\beta_k\|{A_{n-i_k+1}}\|_2^2 & = &A_{n-i_k+1}^\top r_k. \end{array} \right. \end{eqnarray}$

(3.3)

Thus,

$\begin{align} & \alpha_k = \left\{ \begin{array}{ll} \dfrac{A_{i_k}^\top r_k}{2\|{A_{i_k}}\|^2}, & \text{if}\quad i_k = n-i_k+1,\\ [.3cm] \dfrac{ A_{i_k}^\top r_k\|{A_{n-i_k+1}}\|_2^2-A_{i_k}^\top A_{n-i_k+1} A_{n-i_k+1}^\top r_k }{ \|{A_{i_k}}\|_2^2\|{A_{n-i_k+1}}\|_2^2-(A_{i_k}^\top A_{n-i_k+1})^2 }, & \text{if}\quad i_k\neq n-i_k+1, \end{array} \right. \end{align}$

(3.4)

and

$\begin{align} & \beta_k = \left\{ \begin{array}{ll} \dfrac{A_{i_k}^\top r_{k}}{2{\|{A_{i_k}}\|^2}}, & \text{if} \quad i_k = n-i_k+1,\\[.3cm] \dfrac{ \|{A_{i_k}}\|_2^2 A_{n-i_k+1}^\top r_k-A_{i_k}^\top r_kA_{n-i_k+1}^\top A_{i_k}} { \|{A_{i_k}}\|_2^2\|{A_{n-i_k+1}}\|_2^2-(A_{i_k}^\top A_{n-i_k+1})^2 }, & \text{if}\quad i_k\neq n-i_k+1. \end{array} \right. \end{align}$

(3.5)

Here, $A_j$ and $e_j$ are the $j$ -th column of $A$ , and the identity matrix $I_n$ , respectively. The sampled index $i_k\in [n]\triangleq \{1, 2, \cdots, n\}$ is chosen with the probability

$\begin{align} & \mathbb{P}(i = i_k) = \frac{|A_{i_k}^\top r_k|^2+|A_{n-i_k+1}^\top r_k|^2}{2\|{A^\top r_k}\|_2^2}. \end{align}$

(3.6)

Based on this construction and criterion, we give Algorithm 1.

Algorithm 1 Randomized Symmetric Gauss-Seidel (RSGS)
Input: $A, b, x_0$ and $r_0 = b-Ax_0$ ;
Output: $x_k$ .
for $k = 0, 1, 2, \ldots$ do
Pick the index $i_k\in [n]$ with probability (3.6).
Choose $\alpha_k, \beta_k$ as (3.4) and (3.5).
Update the approximate solution and residual,
$x_{k+1} = x_k+\alpha_k e_{i_k}+\beta_k e_{n-i_k+1},$
$r_{k+1} = r_{k}-\alpha_k A_{i_k}-\beta_k A_{n-i_k+1},$
until termination criterion is satisfied.
end for

To discuss the convergence, we first give two Lemmas.

Lemma 1. Let $\alpha_k, \beta_k$ satisfy (3.4), (3.5) and (3.3). Then it holds that

$\begin{align} &\alpha_k^2\|{A_{i_k}}\|_2^2+2\alpha_k\beta_k A_{i_k}^\top A_{n-i_k+1}+\beta_k^2\|{A_{n-i_k+1}}\|_2^2\geq\frac{(A_{i_k}^\top r_k)^2}{\|{A_{i_k}}\|_2^2}, \end{align}$

(3.7)

and

$\begin{align} \alpha_k^2\|{A_{i_k}}\|_2^2+2\alpha_k\beta_k A_{i_k}^\top A_{n-i_k+1}+\beta_k^2\|{A_{n-i_k+1}}\|_2^2\geq\frac{(A_{n-i_k+1}^\top r_k)^2}{\|{A_{n-i_k+1}}\|_2^2}, \end{align}$

(3.8)

where if $i_k = n-i_k+1$ , equalities hold in (3.7) and (3.8).

Proof. Substituting the first equality of (3.3) into the right-hand side of (3.7), we obtain

$\begin{align*} \frac{(A_{i_k}^\top r_k)^2}{\|{A_{i_k}}\|_2^2} = \alpha_k^2\|{A_{i_k}}\|_2^2+2\alpha_k\beta_k A_{i_k}^\top A_{n-i_k+1}+\beta_k^2\frac{(A_{i_k}^\top A_{n-i_k+1})^2}{\|{A_{i_k}}\|_2^2}. \end{align*}$

According to Cauchy-Schwarz inequality $A_{i_k}^\top A_{n-i_k+1}\leq \|A_{i_k}\|_2\|A_{n-i_k+1}\|_2$ , (3.7) is proved.

Similarly, substituting the second inequality of (3.3) into the right-hand side of (3.8) and by Cauchy-Schwarz inequality, (3.8) is proved.

Note that if $i_k = n-i_k+1$ , then $A_{i_k} = A_{n-i_k+1}$ and $A_{i_k}^\top A_{n-i_k+1} = \|A_{i_k}\|_2\|A_{n-i_k+1}\|_2$ , then equalities hold in (3.7) and (3.8).

The proof is finished. □

Lemma 2. ^[15] The following inequality holds

$\begin{align*} \sum\limits_{i = 1}^n \frac{x_i^{l+1}}{y_i^{l}}\geq \frac{\left(\sum\limits_{i = 1}^{n} x_i\right)^{l+1} }{\left(\sum\limits_{i = 1}^{n} y_i\right)^{l}}, \end{align*}$

for $x_i\geq 0, y_i > 0, l > 0, i = 1, 2, \ldots, n.$ The equality holds if and only if $\frac{x_1}{y_1} = \cdots = \frac{x_n}{y_n}$ .

The convergence of Algorithm 1 is given in the following theorem.

Theorem 1. Let $A\in \mathbb{R}^{m\times n}$ be of full column rank, and $m\geq n$ . For the linear least squares problems (1.1), the iteration series $\{x_{k}\}_{k = 0}^{\infty}$ generated by Algorithm 1 converges in expectation to the unique solution $x_* = A^{\dagger}b$ . Furthermore, we have the following results:

● If $n$ is even, for $k\geq 0$ , we have

$\begin{align*} \mathbb{E}\|{x_{k+1}-A^\dagger b}\|_{A^\top A}^2 \leq \left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_2}\right)^k \left(1-\frac{\lambda_{\min}(A^\top A)}{\|{A}\|_F^2}\right)\|{x_{0}-A^\dagger b}\|_{A^\top A}^2. \end{align*}$

● If $n$ is odd, we have

$\begin{align*} &\quad \mathbb{E}\|{x_{k+1}-A^\dagger b}\|_{A^\top A}^2\\ &\leq \left\{\begin{array}{ll} \left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_2}\right)^{\frac{k}{2}}\left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_1}\right)^{\frac{k}{2}} \left(1-\frac{\lambda_{\min}(A^\top A)}{\|{A}\|_F^2}\right)\|{x_{0}-A^\dagger b}\|_{A^\top A}^2, & \mathit{\text{if}}\; k\; \mathit{\text{is}}\; \mathit{\text{even}}\; \mathit{\text{and}}\; k\geq0,\\ \left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_2}\right)^{\frac{k-1}{2}}\left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_1}\right)^\frac{k+1}{2}\left(1-\frac{\lambda_{\min}(A^\top A)}{\|{A}\|_F^2}\right)\|{x_{0}-A^\dagger b}\|_{A^\top A}^2, & \mathit{\text{if}}\; k\; \mathit{\text{is}}\; \mathit{\text{odd}}\; \mathit{\text{and}}\; k\geq 1, \end{array} \right. \end{align*}$

where $\gamma_1 = \|{A}\|_{F}^2-\min_{1\leq i\leq n}\|{A_i}\|_2^2$ , and $\gamma_2 = \|{A}\|_{F}^2-2\min_{1\leq i\leq n}\|{A_i}\|_2^2$ .

Proof. According to Algorithm 1, the $k+1$ -th $(k\geq 0)$ iteration can be represented as the optimization problem:

$\begin{align*} x_{k+1} = \text{argmin}_{x-x_k\in \text{span}\{e_{i_{k}},e_{n-i_{k}+1}\}}\|{b-Ax}\|_2^2, \end{align*}$

which is equivalent to the following orthogonal projection,

$\begin{equation} b-Ax_{k+1}\bot \text{span}\{Ae_{i_k},Ae_{n-i_k+1}\},\quad x_{k+1}\in x_k+\text{span}\{e_{i_k}, e_{n-i_k+1}\},\; k\geq 0. \end{equation}$

(3.9)

Then, according to (3.3)–(3.5), for $k\geq 0$ , we have

$\begin{align*} &\quad\|{x_{k+1}-A^{\dagger}b }\|_{A^\top A}^2-\|{x_{k}-A^{\dagger}b }\|_{A^\top A}^2\\ & = \|{b-Ax_{k+1}}\|_2^2 -\|{b-Ax_k}\|_2^2\\ & = \|{r_k-\alpha_k A_{i_k}-\beta_{k} A_{n-i_k+1}}\|_2^2-r_k^\top r_k\\ & = -2\alpha_k A_{i_k}^\top r_k-2\beta_k r_k^\top A_{n-i_k+1}+\alpha_k^2\|{A_{i_k}}\|_2^2+\beta_k^2\|{A_{n-i_k+1}}\|_2^2+2\alpha_k\beta_k A_{i_k}^\top A_{n-i_k+1}\\ & = -\alpha_k^2\|{A_{i_k}}\|_2^2-\beta_k^2\|{A_{n-i_k+1}}\|_2^2-2\alpha_k\beta_k A_{i_k}^\top A_{n-i_k+1}\\ & \leq -\alpha_k^2\|{A_{i_k}}\|_2^2-\beta_k^2\|{A_{n-i_k+1}}\|_2^2+2|\alpha_k|\cdot|\beta_k| \cdot|A_{i_k}^\top A_{n-i_k+1}|\\ & \leq -\alpha_k^2\|{A_{i_k}}\|_2^2-\beta_k^2\|{A_{n-i_k+1}}\|_2^2+2|\alpha_k|\cdot|\beta_k| \cdot\|{A_{i_k}}\|_2\|{A_{n-i_k+1}}\|_2\\ & = -(|\alpha_k|\|{A_{i_k}}\|_2-|\beta_k|\|{A_{n-i_k+1}}\|_2)^2\\ &\leq 0, \end{align*}$

where the second inequality is due to the fact that, for $k\geq 0$ , $|A_{i_k}^\top A_{n-i_k+1}|\leq\|{A_{i_k}}\|_2\|{A_{n-i_k+1}}\|_2$ . Taking the conditional expectation and according to (3.9), for $k\geq 0$ , it holds that

$\begin{align*} &\quad \mathbb{E}_k\|{x_{k+1}-A^\dagger b}\|_{A^\top A}^2\\ & = \|{x_{k}-A^\dagger b}\|_{A^\top A}^2- \mathbb{E}_k\|{A(x_{k+1}-x_{k}) }\|_{2}^2\\ & = \|{x_{k}-A^\dagger b}\|_{A^\top A}^2- \mathbb{E}_k(\alpha_k^2\|{A_{i_k}}\|_2^2+2\alpha_k\beta_k A_{i_k}^\top A_{n-i_k+1}+\beta_k^2\|{A_{n-i_k+1}}\|_2^2 ). \end{align*}$

Now, we estimate the lower bound of the second term in the above last equality. By (3.3)–(3.5), for $k\geq 1$ , we have

$\begin{align*} &\quad \mathbb{E}_k(\alpha_k^2\|{A_{i_k}}\|_2^2+2\alpha_k\beta_k A_{i_k}^\top A_{n-i_k+1}+\beta_k^2\|{A_{n-i_k+1}}\|_2^2 )\\ & = \sum\limits_{i_k = 1}^{n}\left(\frac{1}{2}\frac{(A_{i_k}^\top r_k)^2}{\|{A^\top r_k}\|_2^2}+\frac{1}{2}\frac{(A_{n-i_k+1}^\top r_k)^2}{\|{A^\top r_k}\|_2^2}\right)(\alpha_k^2\|{A_{i_k}}\|_2^2+2\alpha_k\beta_k A_{i_k}^\top A_{n-i_k+1}+\beta_k^2\|{A_{n-i_k+1}}\|_2^2 )\\ & = \frac{1}{2}\sum\limits_{i_k = 1}^{n}\frac{(A_{i_k}^\top r_k)^2}{\|{A^\top r_k}\|_2^2}\left(\frac{(A_{i_k}^\top r_k)^2}{\|{A_{i_k}}\|_2^2}-\frac{(A_{i_k}^\top r_k)^2}{\|{A_{i_k}}\|_2^2}+\alpha_k^2\|{A_{i_k}}\|_2^2+2\alpha_k\beta_k A_{i_k}^\top A_{n-i_k+1}+\beta_k^2\|{A_{n-i_k+1}}\|_2^2\right)\\ &\quad+\frac{1}{2}\sum\limits_{i_k = 1}^{n} \frac{(A_{n-i_k+1}^\top r_k)^2}{\|{A^\top r_k}\|_2^2}\bigg(\frac{(A_{n-i_k+1}^\top r_k)^2}{\|{A_{n-i_k+1}}\|_2^2}-\frac{(A_{n-i_k+1}^\top r_k)^2}{\|{A_{n-i_k+1}}\|_2^2}\\ &\quad+\alpha_k^2\|{A_{i_k}}\|_2^2+2\alpha_k\beta_k A_{i_k}^\top A_{n-i_k+1}+\beta_k^2\|{A_{n-i_k+1}}\|_2^2 \bigg)\\ & \geq \frac{1}{2}\sum\limits_{i_k = 1}^{n}\frac{(A_{i_k}^\top r_k)^2}{\|{A^\top r_k}\|_2^2}\frac{(A_{i_k}^\top r_k)^2}{\|{A_{i_k}}\|_2^2}+\frac{1}{2}\sum\limits_{i_k = 1}^{n} \frac{(A_{n-i_k+1}^\top r_k)^2}{\|{A^\top r_k}\|_2^2}\frac{(A_{n-i_k+1}^\top r_k)^2}{\|{A_{n-i_k+1}}\|_2^2}\\ & = \frac{1}{\|{A^\top r_k}\|_2^2}\left(\frac{1}{2}\sum\limits_{i_k = 1}^{n}\frac{(A_{i_k}^\top r_k)^4}{\|{A_{i_k}}\|_2^2}+\frac{1}{2}\sum\limits_{i_k = 1}^{n}\frac{(A_{n-i_k+1}^\top r_k)^4}{\|{A_{n-i_k+1}}\|_2^2}\right)\\ & = \frac{1}{\|{A^\top r_k}\|_2^2}\sum\limits_{i_k\in [n] \diagdown \{i_{k-1},n-i_{k-1}+1\}}\left(\frac{1}{2}\frac{(A_{i_k}^\top r_k)^4}{\|{A_{i_k}}\|_2^2}+\frac{1}{2}\frac{(A_{n-i_k+1}^\top r_k)^4}{\|{A_{n-i_k+1}}\|_2^2}\right)\\ &\geq \frac{1}{\|{A^\top r_k}\|_2^2}\left(\frac{1}{2}\frac{\left(\sum_{i_k\in [n] \diagdown \{i_{k-1},n-i_{k-1}+1\}}(A_{i_k}^\top r_k)^2\right)^2}{\sum_{i_k\in [n] \diagdown \{i_{k-1},n-i_{k-1}+1\}}\|{A_{i_k}}\|_2^2} +\frac{1}{2}\frac{\left(\sum_{i_k\in [n] \diagdown \{i_{k-1},n-i_{k-1}+1\}}(A_{n-i_k+1}^\top r_k)^2\right)^2}{\sum_{i_k\in [n] \diagdown \{i_{k-1},n-i_{k-1}+1\}}\|{A_{n-i_k+1}}\|_2^2}\right)\\ &\geq\left\{ \begin{array}{ll} \frac{\|{A^\top r_k}\|_2^2}{\|{A}\|_F^2-2\min\{\|{A_i}\|_2^2,i\in[n] \} } & \text{if} \quad i_{k-1}\neq n-i_{k-1}+1,\\[.2cm] \frac{\|{A^\top r_k}\|_2^2}{\|{A}\|_F^2-\min\{\|{A_i}\|_2^2,i\in[n] \} } & \text{if} \quad i_{k-1} = n-i_{k-1}+1, \end{array} \right.\\ &\geq\left\{ \begin{array}{ll} \frac{\lambda_{\min}(A^\top A)\|{r_k}\|_2^2}{\|{A}\|_F^2-2\min\{\|{A_i}\|_2^2,i\in[n] \} } & \text{if} \quad i_{k-1}\neq n-i_{k-1}+1,\\[.2cm] \frac{\lambda_{\min}(A^\top A)\|{ r_k}\|_2^2}{\|{A}\|_F^2-\min\{\|{A_i}\|_2^2,i\in[n] \} } & \text{if} \quad i_{k-1} = n-i_{k-1}+1, \end{array} \right. \end{align*}$

where the first inequality is obtained by Lemma 1, the second inequality is due to Lemma 2 and $\text{span}\{A_{i_{k-1}}, A_{i_{n-i_{k-1}+1}}\} \perp r_k$ . Thus,

$\begin{equation} \mathbb{E}_k\|{x_{k+1}-A^\dagger b}\|_{A^\top A}^2 \leq \left(1-\frac{\lambda_{\min}(A^\top A)}{\tilde{\gamma}_k}\right) \|{x_{k}-A^\dagger b}\|_{A^\top A}^2, \; k\geq 1, \end{equation}$

(3.10)

where

$\begin{align} &\tilde{\gamma}_k = \left\{ \begin{array}{ll} \|{A}\|_F^2-\min\{\|{A_i}\|_2^2, i\in[n]\}\triangleq \gamma_1, & \text{if}\; i_{k-1} = n-i_{k-1}+1,\\[.2cm] \|{A}\|_F^2-2\min\{\|{A_i}\|_2^2, i\in[n]\}\triangleq\gamma_2, & \text{if}\; i_{k-1}\neq n-i_{k-1}+1. \end{array} \right. \end{align}$

(3.11)

Note that, if $n$ is odd and $k\geq 2$ , we have

$\begin{align*} &\quad \mathbb{E}_{k-1}\|{x_{k+1}-A^\dagger b}\|_{A^\top A}^2\\ &\leq \mathbb{E}_{k-1}\left(\left(1-\frac{\lambda_{\min}(A^\top A)}{\tilde{\gamma}_k}\right)\|{x_{k}-A^\dagger b}\|_{A^\top A}^2\right)\\ & = \sum\limits_{i_{k-1}\neq \frac{n+1}{2}}\left(\left(1-\frac{\lambda_{\min}(A^\top A)}{\tilde{\gamma}_k}\right) \|{x_{k}-A^\dagger b}\|_{A^\top A}^2 \frac{1}{2}\left(\frac{(A_{i_{k-1}}^\top r_{k-1})^2}{\|{A^\top r_{k-1}}\|_2^2}+ \frac{(A_{n-i_{k-1}+1}^\top r_{k-1})^2}{\|{A^\top r_{k-1}}\|_2^2} \right ) \right) \\ &+\frac{(A_{\frac{n+1}{2}}^\top r_{k-1})^2}{\|{A^\top r_{k-1}}\|_2^2}\left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_1} \right)\|{x_{k}-A^\dagger b}\|_{A^\top A}^2\\[.2cm] &\leq\left\{ \begin{array}{ll} \left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_2}\right)\sum\limits_{i_{k-1} = 1}^{n}\left(\|{x_{k}-A^\dagger b}\|_{A^\top A}^2\frac{1}{2}\left(\frac{(A_{i_{k-1}}^\top r_{k-1})^2}{\|{A^\top r_{k-1}}\|_2^2}+ \frac{(A_{n-i_{k-1}+1}^\top r_{k-1})^2}{\|{A^\top r_{k-1}}\|_2^2} \right )\right) & \text{if}\; i_{k-2} = n-i_{k-2}+1 = \frac{n+1}{2},\\ \left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_1}\right) \mathbb{E}_{k-1}\|{x_{k}-A^\dagger b}\|_{A^\top A}^2& \text{if}\; i_{k-2}\neq n-i_{k-2}+1, \end{array} \right.\\[.2cm] &\leq\left\{ \begin{array}{ll} \left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_2}\right) \mathbb{E}_{k-1}\|{x_{k}-A^\dagger b}\|_{A^\top A}^2 & \text{if}\; i_{k-2} = n-i_{k-2}+1 = \frac{n+1}{2},\\ \left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_1}\right) \mathbb{E}_{k-1}\|{x_{k}-A^\dagger b}\|_{A^\top A}^2& \text{if}\; i_{k-2}\neq n-i_{k-2}+1, \end{array} \right.\\ & \leq \left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_1}\right)\left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_2}\right)\|{x_{k-1}-A^\dagger b}\|_{A^\top A}^2, \end{align*}$

where the two inequalities next to the last are due to that $r_{k-1}\bot A_{\frac{n+1}{2}}$ , while the last inequality is obtained by (3.11). If $n$ is even, then

$\begin{align*} \mathbb{E}_{k-1}\|{x_{k+1}-A^\dagger b}\|_{A^\top A}^2 \leq \left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_2}\right)^2\|{x_{k-1}-A^\dagger b}\|_{A^\top A}^2, \; k\geq 2. \end{align*}$

By taking the full expectation on both sides of (3.10), for $k\geq 2$ , we get that

$\begin{align*} &\quad \mathbb{E}\|{x_{k+1}-A^\dagger b}\|_{A^\top A}^2\\ &\leq \left\{\begin{array}{ll} \left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_2}\right)^2 \mathbb{E}\|{x_{k-1}-A^\dagger b}\|_{A^\top A}^2, & \text{if}\; n\; \text{is}\; \text{even},\\ \left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_1}\right)\left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_2}\right) \mathbb{E}\|{x_{k-1}-A^\dagger b}\|_{A^\top A}^2, & \text{if}\; n\; \text{is}\; \text{odd}. \end{array} \right. \end{align*}$

Note that

$\begin{align*} &\quad \mathbb{E}\|{x_1-A^\dagger b}\|_{A^\top A}^2\\ & = \sum\limits_{i_0 = 1}^{n} \left( \min\limits_{x_1-x_0\in \text{span}\{e_{i_0},e_{n-i_0+1}\}}\|{x_1-A^\dagger b}\|_{A^\top A}^2 \frac{1}{2}\left(\frac{(A_{i_0}^\top r_0)^2 }{\|{A^\top r_0}\|^2 } +\frac{(A_{n-i_0+1}^\top r_0)^2 }{\|{A^\top r_0}\|^2 }\right)\right) \\ &\leq\frac{1}{2} \sum\limits_{i_0 = 1}^{n} \left( \min\limits_{x_1-x_0\in \text{span}\{e_{i_0}\}}\|{x_1-A^\dagger b}\|_{A^\top A}^2\frac{(A_{i_0}^\top r_0)^2 }{\|{A^\top r_0}\|^2 }+\min\limits_{x_1-x_0\in \text{span}\{e_{n-i_0+1}\}}\|{x_1-A^\dagger b}\|_{A^\top A}^2\frac{(A_{n-i_0+1}^\top r_0)^2 }{\|{A^\top r_0}\|^2 }\right) \\ &\leq \left(1-\frac{\lambda_{\min}(A^\top A)}{\|{A}\|_F^2}\right) \|{x_{0}-A^\dagger b}\|_{A^\top A}^2, \end{align*}$

where the last inequality is obtained by Theorem 2.2 in ^[8].

$\begin{align*} &\quad \mathbb{E}\|{x_{2}-A^\dagger b}\|_{A^\top A}^2\\ &\leq \left\{\begin{array}{ll} \left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_2}\right) \left(1-\frac{\lambda_{\min}(A^\top A)}{\|{A}\|_F^2}\right)\|{x_{0}-A^\dagger b}\|_{A^\top A}^2, & \text{if}\; n\; \text{is}\; \text{even},\\ \left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_1}\right) \left(1-\frac{\lambda_{\min}(A^\top A)}{\|{A}\|_F^2}\right) \mathbb{E}\|{x_{0}-A^\dagger b}\|_{A^\top A}^2, & \text{if}\; n\; \text{is}\; \text{odd}. \end{array} \right. \end{align*}$

By induction on $k$ , if $n$ is even, for $k\geq 0$ , we have

if $n$ is odd, we have

$\begin{align*} &\quad \mathbb{E}\|{x_{k+1}-A^\dagger b}\|_{A^\top A}^2\\ &\leq \left\{\begin{array}{ll} \left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_2}\right)^{\frac{k}{2}}\left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_1}\right)^{\frac{k}{2}} \left(1-\frac{\lambda_{\min}(A^\top A)}{\|{A}\|_F^2}\right)\|{x_{0}-A^\dagger b}\|_{A^\top A}^2, & \text{if}\; k\; \text{is}\; \text{even}\; \text{and}\; k\geq 0,\\ \left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_2}\right)^{\frac{k-1}{2}}\left(1-\frac{\lambda_{\min}(A^\top A)}{\gamma_1}\right)^\frac{k+1}{2}\left(1-\frac{\lambda_{\min}(A^\top A)}{\|{A}\|_F^2}\right)\|{x_{0}-A^\dagger b}\|_{A^\top A}^2, & \text{if}\; k\; \text{is}\; \text{odd}\; \text{and}\; k\geq 1, \end{array} \right. \end{align*}$

the proof is completed. □

Remark 1. Since $\|{A}\|_F^2-2\min\{\|{A_i}\|_2^2, i\in[n]\} < \|{A}\|_F^2-\min\{\|{A_i}\|_2^2, i\in[n]\}$ , our method is superior to NRGS in theory.

4. Numerical examples

In this section, we present several examples that utilize two groups of real coefficient matrices. These examples are designed to compare the effectiveness of RSGS and NRGS, RGS2 ^[13], TRGS ^[13], and D2RGS ^[14] methods when solving Problem (1.1). The first group of matrices is chosen from the Florida sparse matrix collection ^[16] and is listed in , where for the matrices nemsafm, df2177 and bibd_16_8, we set $A$ as their transpose. The second group consists of randomly generated dense matrices, produced using MATLAB's randn function. The inconsistent linear system is constructed by setting $b = Ax+r$ , where $x$ is a vector with entries generated from a standard normal distribution, and the residual $r$ belongs to the null space of $A^\top$ , which is derived using the null function in MATLAB. For all computations, the initial point is set as $x_0 = 0$ , and the stopping criterion is:

$\text{err} = \frac{\|{x_k-x_*}\|_2}{\|{x_*}\|_2} \leq 10^{-6},\quad \text{where} \quad x_* = A^{\dagger}b.$

Table 1. The properties of different sparse matrices.

Name	ash958	nemsafm	df2177	bibd_16_8
$m\times n$	$958\times 292$	$334\times 2348$	$630\times 10358$	$120\times 12870$
rank	292	334	630	120
Density	0.68%	0.36%	0.34%	23.33%
Condition number	3.20	4.77	2.01	9.54

| Show Table

DownLoad: CSV

In order to evaluate and compare the performance of RSGS, NRGS and other baseline algorithms, we graph the relative error (err) against the metrics of IT and CPU times. These graphs are depicted in Figures 1 and 2, respectively. As can be observed from the figures, RSGS demonstrates more efficiency than the NRGS, RGS2 ^[13], TRGS ^[13], and D2RGS ^[14] methods.

Figure 1. Compasions of different baselines in terms of iteration and running time on Florida sparse matrices.

DownLoad: Full-Size Img PowerPoint

Figure 2. Compasions of different baselines in terms of iteration and running time on dense matrices.

DownLoad: Full-Size Img PowerPoint

5. Conclusions

We proposed a randomized symmetric Gauss-Seidel method for solving linear least squares problems with the nonuniform sampling on the probability criterion (3.6). Our theoretical analysis indicates that RSGS converges when the coefficient matrix has full column rank. Furthermore, numerical experiments demonstrate that RSGS outperform the baseline algorithms.

Author contributions

Fan Sha: Write original draft, Study conception, Formal analysis; Jianbing Zhang: Supervision, Revision. All authors have read and approved the final version of the manuscript for publication.

Use of AI tools declaration

The authors declare that they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

This research is supported by National Key R&D Program of China (Grant No. 2022YFA1004403) and by funding from oppo: Multi-modal information enhanced Image Caption.

Conflict of interest

All authors declare no conflicts of interest in this paper.

References

[1]	G. H. Golub, C. F. V. Loan, Matrix computations, Johns Hopkins University Press, 2013.
[2]	A. Hefny, D. Needell, A. Ramdas, Rows versus columns: Randomized Kaczmarz or Gauss-Seidel for ridge regression, SIAM J. Sci. Comput., 39 (2017), S528–S542. https://doi.org/10.1137/16M1077891 doi: 10.1137/16M1077891
[3]	Y. Saad, Iterative methods for sparse linear systems, Society for Industrial and Applied Mathematics, 2003.
[4]	T. Strohmer, R. Vershynin, A randomized Kaczmarz algorithm with exponential convergence, J. Fourier Anal. Appl., 15 (2009), 262–278. https://doi.org/10.1007/s00041-008-9030-4 doi: 10.1007/s00041-008-9030-4
[5]	Z. Z. Bai, W. T. Wu, On greedy randomized coordinate descent methods for solving large linear least-squares problems, Numer. Linear Algebra Appl., 26 (2019), e2237. https://doi.org/10.1002/nla.2237 doi: 10.1002/nla.2237
[6]	K. Du, Tight upper bounds for the convergence of the randomized extended Kaczmarz and Gauss-Seidel algorithms, Numer. Linear Algebra Appl., 26 (2019), e2233. https://doi.org/10.1002/nla.2233 doi: 10.1002/nla.2233
[7]	R. M. Gower, P. Richt{á}rik, Randomized iterative methods for linear systems, SIAM J. Matrix Anal. A., 36 (2015), 1660–1690. https://doi.org/10.1137/15M1025487 doi: 10.1137/15M1025487
[8]	Y. Q. Niu, B. Zheng, A new randomized Gauss-Seidel method for solving linear least-squares problems, Appl. Math. Lett., 116 (2021), 107057. https://doi.org/10.1016/j.aml.2021.107057 doi: 10.1016/j.aml.2021.107057
[9]	J. Nutini, M. Schmidt, I. Laradji, M. Friedlander, H. Koepke, Coordinate descent converges faster with the gauss-southwell rule than random selection, Proc. Int. Conf. Mach. Learn., 2015, 1632–1641. http://proceedings.mlr.press/v37/nutini15.pdf
[10]	D. Leventhal, A. S. Lewis, Randomized methods for linear constraints: Convergence rates and conditioning, Math. Oper. Res., 35 (2010), 641–654. https://doi.org/10.1287/moor.1100.0456 doi: 10.1287/moor.1100.0456
[11]	A. Ma, D. Needell, A. Ramdas, Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methods, SIAM J. Matrix Anal. Appl., 36 (2015), 1590–1604. https://doi.org/10.1137/15M1014425 doi: 10.1137/15M1014425
[12]	Y. Liu, X. L. Jiang, C. Q. Gu, On maximum residual block and two-step Gauss-Seidel algorithms for linear least-squares problems, Calcolo, 58 (2015), 1–32. https://doi.org/10.1007/s10092-021-00404-x doi: 10.1007/s10092-021-00404-x
[13]	Y. M. Liao, T. X. Lu, F. Yin, A two-step randomized Gauss-Seidel method for solving large-scale linear least squares problems, Electron. Res. Arch., 30 (2022), 755–779. https://doi.org/10.3934/era.2022040 doi: 10.3934/era.2022040
[14]	A. Mustafa, M. Saha, A two-dimensional randomized extended Gauss-Seidel algorithm for solving least squares problems, Numer. Algor., (2023), 1–22. https://doi.org/10.1007/s11075-023-01661-3
[15]	Y. J. Guan, W. G. Li, L. L. Xing, T. T. Qiao, A note on convergence rate of randomized Kaczmarz method, Calcolo, 57 (2020), 1–11. https://doi.org/10.1007/s10092-020-00376-4 doi: 10.1007/s10092-020-00376-4
[16]	T. A. Davis, Y. F. Hu, The university of Florida sparse matrix collection, Acm T. Math. Software, 38 (2011), 1–25. https://doi.org/10.1145/2049662.2049663 doi: 10.1145/2049662.2049663

This article has been cited by:

Youngjin Hwang, Soobin Kwak, Hyundong Kim, Junseok Kim, An explicit numerical method for the conservative Allen–Cahn equation on a cubic surface, 2024, 9, 2473-6988, 34447, 10.3934/math.20241641

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.4

Metrics

Article views(973) PDF downloads(38) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(2) / Tables(1)

AIMS Mathematics

Randomized symmetric Gauss-Seidel method for solving linear least squares problems

Related Papers:

Abstract

1. Introduction

2. Related work

3. The randomized symmetric Gauss-Seidel method

4. Numerical examples

5. Conclusions

Author contributions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Mathematics

Randomized symmetric Gauss-Seidel method for solving linear least squares problems

Related Papers:

Abstract

1. Introduction

2. Related work

3. The randomized symmetric Gauss-Seidel method

4. Numerical examples

5. Conclusions

Author contributions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog