1.
Introduction
The singular value decomposition (SVD) is a key tool in matrix theory and numerical linear algebra, and plays an important role in many areas of scientific computing and engineering applications, such as least square problem [1], data mining [2], pattern recognition [3], image and signal processing [4,5], statistics, engineering, physics and so on (see [1,2,3,4,5,6]).
Research on the efficient numerical methods for computing the singular values of a matrix has been a hot topic, many practical algorithms have been proposed for this problem. By using the symmetric QR method to ATA, Golub and Kahan [7] presented an efficient algorithm named as Golub-Kahan SVD algorithm; Gu and Eisenstat [8] introduced a stable and efficient divide-and-conquer algorithm, called as Divide-and-Conquer algorithm as well as Bisection algorithm for computing the singular value decomposition (SVD) of a lower bidiagonal matrix, see also [1]; Drmac and Veselic [9] given the superior variant of the Jacobi algorithm and proposed a new one-sided Jacobi SVD algorithm for triangular matrices computed by revealing QR factorizations; many researchers such as Zha [10], Bojanczyk [11], Shirokov [12] and Novaković [13] came up with some methods for the problem of hyperbolic SVD; A Cross-Product Free (CPF) Jacobi-Davidson (JD) type method is proposed to compute a partial generalized singular value decomposition (GSVD) of a large matrix pair (A,B), which is referred to as the CPF-JDGSVD method [14]; many good references for these include [1,2,7,8,9,15,16,17,18,19,20,21,22,23,24,25,26] and the references therein for details.
There are important relationships between the SVD of a matrix A and the Schur decompositions of the symmetric matrices ATA,AAT and (0ATA0). These connections to the symmetric eigenproblem allow us to adapt the mathematical and algorithmic developments of the eigenproblem to the singular value problem. So most of the algorithms mentioned above are analogs of algorithms for computing eigenvalues of symmetric matrices. All the algorithms mentioned above except for the Jacobi algorithm, had firstly to reduce A to bi-diagonal form. When the size of the matrix is large, this performance will become very costly. On the other hand, the Jacobi algorithm is rather slowly, though some modification has been added to it (see[9]).
In some applications, such as the compressed sensing as well as the matrix completion problems [3,27] or computing the 2-norm of a matrix, only a few singular values of a large matrix are required. In these cases, it is obvious that those methods, mentioned the above, for computing the SVD is not very suitable. If only the largest singular value and the singular vectors corresponding to the largest singular value of A is needed, the power method, which is used to approximate a largest eigenpair of an n×n symmetric matrix A, should be more suitable.
Computing the largest singular value and corresponding singular vectors of a matrix is one of the most important algorithmic tasks underlying many applications including low-rank approximation, PCA, spectral clustering, dimensionality reduction, matrix completion and topic modeling. This paper consider the problem of computing the largest singular value and singular vectors corresponding to the largest singular value of a matrix. We propose an alternating direction method, a fast general purpose method for computing the largest singular vectors of a matrix when the target matrix can only be accessed through inaccurate matrix-vector products. In the other words, the proposed method is analogous to the well-known power method, but has much better numerical behaviour than the power method. Numerical experiments show that the new method is more effective than the power method in some cases.
The rest of the paper is organized as follows. Section 2 contains some notations and some general results that are used in subsequent sections. In Section 3 we propose the alternating direction power-method in detail and give its convergence analysis. In Section 4, we use some experiments to show the effectiveness of the new method. Finally, we end the paper with a concluding remark in Section 5.
2.
Preliminaries
The following are some notations and definitions we will use later.
We use Rm×n to denote the set of all real m×n matrices, and Rn the set of real n×1 vectors. The symbol I denotes the n×n identity matrix. For a vector x∈Rn, ‖x‖2 denotes the 2-norm of x. For a matrix A∈Rn×n, AT is used to express the transpose of A, rank(A) is equal to the rank of a matrix A, ‖A‖2 denotes the 2-norm of A and the Frobenius norm by ‖A‖F is the maximum absolute value of the matrix entries of a matrix A. diag(a1,a2,…,an) represents the diagonal matrix with diagonal elements a1,a2,…,an.
If A∈Rm×n, then there exist two orthogonal matrices
such that
where Σr=diag(σ1,σ2,…,σr), σ1≥σ2≥⋯≥σr>0,r=rank(A)⩽min{m,n}. The σi are the singular values of A and the vectors ui and vi are the ith left singular vector and the ith right singular vector respectively. And we have the SVD expansion
This is the well-known singular value decomposition (SVD) theorem[2].
Lemma 2.1. (see Lemma 1.7 and Theorem 3.3 of [1]) Let A∈Rm×n. Then ‖A‖2=‖AT‖2=√‖AAT‖2=σ1, where σ1 is the largest singular value of A.
Lemma 2.2. (see Theorem 3.3 of [1]) Let A∈Rm×n and σi, ui, vi, i=1,2,…,r be the singular values and the corresponding singular vectors of A respectively. Then
Lemma 2.3. (refer to Section 2.4 of [2] or Theorem 3.3 of [1])) Assume the matrix A∈Rm×n has rank r>k and the SVD of A be (2.1). The matrix approximation problem minrank(Z)=k‖A−Z‖F has the solution
where Uk=(u1,…,uk), Vk=(v1,…,vk) and Σk=diag(σ1,…,σk).
Let A∈Rn×n. The power method for computing the module largest eigenvalue of A is as follows (see as Algorithm 4.1 of [1]).
The power method is very simple and easy to implement and is applied in many applications, for example, for the PCA problem (see [28]).
To compute the largest singular value and the corresponding singular vectors of A, we can apply the power method to AAT or ATA, without actually computing AAT or ATA.
However, the power method will cost extra operations if the two singular vectors are needed and, in some cases, it converges very slowly. So, in the next section, we will propose a new iteration method for computing the largest singular value and the corresponding singular vectors of a matrix, which is similar to the power method but needs fewer operations in the iterations.
3.
An alternating direction power-method
In this section, we will introduce an alternating direction power iteration method. The new method is based on an important property of the SVD.
From Lemma 2.3, it is known that the largest singular value σ1 and the corresponding singular vectors u1, v1 of A satisfy the following condition
where u∈Rm and v∈Rn. Thus the problem of computing σ1, u1 and v1 is equivalent to solve the optimization problem
Let f(u,v)=12‖A−uvT‖2F. For the sake of simplicity, we will solve the equivalent optimization problem
where u∈Rm and v∈Rn. However, the problem (3.1) is difficult to solve since it is not convex for u and v. Fortunately, it is convex for u individually and so is v, so we can use the alternating direction method to solve it. Alternating minimization is widely used in optimization problems due to its simplicity, low memory and flexibility (see [20,29]). In the following we apply an alternating method to solve the unconstrained optimization problem (3.1).
Suppose vk was known, then (3.1) can be reduced to the unconstrained optimization problem
The (3.2) can be solved by many efficient methods, such as steepest decent method, Newton method, conjugate gradient (CG) method and so on (see [29]). Because Newton method is simple and converges fast, we will choose to use the Newton method. By direct calculation, we get
Then applying Newton method, we get
Alternatively, when uk+1 is known, the problem (3.1) can be reduced to
Also,
it is obtained that by applying Newton method
Solving (3.2) and (3.3) to high accuracy is both computationally expensive and of limited value if uk and vk are far from stationary points. So, in the following, we apply the two iterations alternately. Thus the alternating directional Newton method for solving (3.1) is
where u0∈Rm and v0∈Rn are both initial guesses. At each iteration, only two matrix-vector multiplications are required and the operation costs are about 4mn, which is less than that of the power method.
Next, the convergence analysis of (3.4) would be provided.
Theorem 3.1 Let A∈Rm×n and σ1, u, v be the largest singular value and the corresponding singular vectors of A respectively. If u0∈Rm and v0∈Rn are both initial guesses such that the projections on u and v are not zero, then the iteration (3.4) is convergent with
and
Proof. From (3.4) we can deduce that
As in the proof of the power method (see as [2]), if the projections of u0 on u, and v0 on v are not zero, then we have
On the other hand, we have
Thus the sequence {‖uk‖2⋅‖vk‖2} is bounded from the above. By
we can conclude that when k→∞
Therefore,
We now propose the alternating direction power-method with the discussions above for computing the largest singular value and corresponding singular vectors of a matrix as follows.
4.
Numerical experiments
Here, we use several examples to show the effectiveness of the alternating direction power-method (ADPM). We compare ADPM with the power method (PM) and present numerical results in terms of the numbers of iterations (IT), CPU time (CPU) in seconds and the residue (RES), where the measurement method of CPU time in seconds is uniformly averages over multiple runs by embed MATLAB functions tic/toc at each iteration step and
The initial vectors u0 and v0 are chosen randomly by the MATLAB statements u0=rand(m,1) and v0=rand(n,1). In our implementations all iterations are performed in MATLAB (R2016a) on the same workstation with an Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz that has 16GB memory and 32-bit operating system, and are terminated when the current iterate satisfies RES<e−12 or the number of iterations is more than 9000, which is denoted by '-'.
Experiment 4.1. In the first experiment, we generate random matrices with uniformly distributed elements by the MATLAB statement
For different sizes of m and n, we apply the power method and the alternating direction power-method with numerical results reported in Table 1.
Experiment 4.2. In this experiment, we generate random matrices with normally distributed elements by
For different sizes of m and n, we apply the power method and the alternating direction power-method to A.
Numerical results are reported in Table 2.
Experiment 4.3. In this experiment, we use some test matrices with size n×n from the university of Florida sparse matrix collection [30]. Numerical results are reported in Table 3.
In particular, compared with the cost of the power method, we can find that the cost of the alternating direction power-method is discounted, up to 23.28%. The "RATIO", defined in the following, in the Tables 1–3 can show this effectiveness.
In order to show numerical behave of two methods, the cost curves of the methods are clearly given, which are shown in Figures 1–3.
From Tables 1–3, we can conclude that ADPM needs fewer iterations and less CPU time than the power method. So it is feasible and is effective in some cases.
5.
Conclusions
In this study, we have proposed an alternating direction power-method for computing the largest singular value and singular vector of a matrix, which is analogs to the power method but needs fewer operations in the iterations since using the technique of alternating. Convergence of the alternating direction power-method is proved under suitable conditions. Numerical experiments have shown that the alternating direction power-method is feasible and more effective than the power method in some cases.
Acknowledgments
The authors are very much indebted to the anonymous referees for their helpful comments and suggestions which greatly improved the original manuscript of this paper. The authors are so thankful for the support from the NSF of Shanxi Province (201901D211423) and the scientific research project of Taiyuan University, China (21TYKY02).
Conflict of interest
The authors declare that they have no conflict of interests.