An evolutionary multiobjective method for low-rank and sparse matrix decomposition

Tao Wu; Yu Lei; Jiao Shi; Maoguo Gong; Tao Wu; Yu Lei; Jiao Shi; Maoguo Gong

doi:10.3934/bdia.2017006

Big Data and Information Analytics

2017, Volume 2, Issue 1: 23-37. doi: 10.3934/bdia.2017006

Previous Article Next Article

An evolutionary multiobjective method for low-rank and sparse matrix decomposition

School of Electronics and Information, Northwestern Polytechnical University, 127 West Youyi Road, Xi'an Shaanxi, 710072, China

Published: 01 January 2017
90C29, 92D15

This paper addresses the problem of finding the low-rank and sparse components of a given matrix. The problem involves two conflicting objective functions, reducing the rank and sparsity of each part simultaneously. Previous methods combine two objectives into a single objective penalty function to solve with traditional numerical optimization approaches. The main contribution of this paper is to put forward a multiobjective method to decompose the given matrix into low-rank component and sparse part. We optimize two objective functions with an evolutionary multiobjective algorithm MOEA/D. Another contribution of this paper, a modified low-rank and sparse matrix model is proposed, which simplifying the variable of objective functions and improving the efficiency of multiobjective optimization. The proposed method obtains a set of solutions with different trade-off between low-rank and sparse objectives, and decision makers can choose one or more satisfied decomposed results according to different requirements directly. Experiments conducted on artificial datasets and nature images, show that the proposed method always obtains satisfied results, and the convergence, stability and robustness of the proposed method is acceptable.

Keywords:

Citation: Tao Wu, Yu Lei, Jiao Shi, Maoguo Gong. 2017: An evolutionary multiobjective method for low-rank and sparse matrix decomposition, Big Data and Information Analytics, 2(1): 23-37. doi: 10.3934/bdia.2017006

Related Papers:

[1]	Zhouchen Lin . A Review on Low-Rank Models in Data Analysis. Big Data and Information Analytics, 2016, 1(2): 139-161. doi: 10.3934/bdia.2016001
[2]	Ioannis D. Schizas, Vasileios Maroulas, Guohua Ren . Regularized kernel matrix decomposition for thermal video multi-object detection and tracking. Big Data and Information Analytics, 2018, 3(2): 1-23. doi: 10.3934/bdia.2018004
[3]	Ming Yang, Dunren Che, Wen Liu, Zhao Kang, Chong Peng, Mingqing Xiao, Qiang Cheng . On identifiability of 3-tensors of multilinear rank (1; L_r; L_r). Big Data and Information Analytics, 2016, 1(4): 391-401. doi: 10.3934/bdia.2016017
[4]	Jinyuan Zhang, Aimin Zhou, Guixu Zhang, Hu Zhang . A clustering based mate selection for evolutionary optimization. Big Data and Information Analytics, 2017, 2(1): 77-85. doi: 10.3934/bdia.2017010
[5]	John A. Doucette, Robin Cohen . A testbed to enable comparisons between competing approaches for computational social choice. Big Data and Information Analytics, 2016, 1(4): 309-340. doi: 10.3934/bdia.2016013
[6]	Yaguang Huangfu, Guanqing Liang, Jiannong Cao . MatrixMap: Programming abstraction and implementation of matrix computation for big data analytics. Big Data and Information Analytics, 2016, 1(4): 349-376. doi: 10.3934/bdia.2016015
[7]	Xiaoxiao Yuan, Jing Liu, Xingxing Hao . A moving block sequence-based evolutionary algorithm for resource investment project scheduling problems. Big Data and Information Analytics, 2017, 2(1): 39-58. doi: 10.3934/bdia.2017007
[8]	Cun Chen, Hui Peng . Dynamic mode decomposition for blindly separating mixed signals and decrypting encrypted images. Big Data and Information Analytics, 2024, 8(0): 1-25. doi: 10.3934/bdia.2024001
[9]	Weidong Bao, Haoran Ji, Xiaomin Zhu, Ji Wang, Wenhua Xiao, Jianhong Wu . ACO-based solution for computation offloading in mobile cloud computing. Big Data and Information Analytics, 2016, 1(1): 1-13. doi: 10.3934/bdia.2016.1.1
[10]	Dongyang Yang, Wei Xu . Statistical modeling on human microbiome sequencing data. Big Data and Information Analytics, 2019, 4(1): 1-12. doi: 10.3934/bdia.2019001

Abstract

1. Introduction

Matrix decomposition can catch some characteristic information of matrix, in which low-rank and sparse components are two of the most interesting. In the past decades, the theory of matrix decomposition has been developed rapidly, especially low-rank or sparse matrix decomposition have been used for many popular research areas, such as machine learning, image/signal processing and network analysis etc[19,22,24,20]. The problem of low-rank and sparse matrix decomposition (LRSMD) is highlighted and intensively studied recently in [4,6], especially in robust principle component analysis (PCA).

As described in the literature [4], we present a matrix $D\in R^{m\times n}$ with $m\leq n$ , and know that it can be decomposed as

$D=L_0+S_0,$

(1)

where $L_0\in R^{m\times n}$ has low rank and $S_0\in R^{m\times n}$ is sparse. We need to get the low-rank matrix $L_0$ and sparse matrix $S_0$ without prior knowledge about rank information and sparse pattern, and let both the rank of $L_0$ and the sparsity of $S_0$ as smaller as possible. We can model the LRSMD problem as a two objectives optimization problem:

$\begin{array}{c} \text{minimize } (rank(L_0), card(S_0))\\ \text{subject to } L_0 + S_0 = D. \end{array}$

(2)

It is known that the LRSMD problem is in general ill-posed and NP-hard [4], and the easiest way is finding a convex optimization problem to approximate original problem. The heuristics of using the $l_1$ -norm as the proxy of sparsity and the nuclear norm as the surrogate of low-rank are widely used in many areas (see e.g. [5,17,10]). The authors of [4] proposed the model that converting the NP-hard problem into convex optimization problem:

$\begin{array}{c} {\text{minimize }}{\left\| L \right\|_*} + \lambda {\left\| S \right\|_1}\\ {\text{subject to }}{\rm{ }}L + S = D, \end{array}$

(3)

where $\left\| L \right\|_*$ denotes the nuclear norm of the matrix $L$ , that is, the sum of the singular value of $L$ , and $\left\| S \right\|_1$ denotes the $l_1$ -norm of $S$ seen as a long vector in $R^{m\times n}$ .

Nowadays, many methods based on different strategies have been proposed. Iterative thresholding (IT) algorithms [1,2] regularize the problem (3) and construct a Lagrange function, and then two objective variables are updated alternately. The iterative form of IT algorithms is simple and convergent, but the rate of convergence is slow and it is difficult to select the appropriate length of step. Accelerated proximal gradient (APG) algorithm [12] is another iterative approach, which building partial quadratic approximation by using Lipschitz gradient to solve the Lagrange function, iterating and getting solution. Although it is similar as IT algorithms, it reduces the number of iterations greatly. The dual method [12] is proposed because the nuclear norm is dual to spectral norm, the dual problem of problem (3) is easier to solve. Compared with APG algorithm, it has better scalability because it do not need complete singular decomposition in each iteration. One of the most influential method is augmented Lagrange multipliers (ALM) method [13]. The ALM method constructs augmented Lagrange function firstly, and then using alternating iteration method to optimize the problem. It is called EALM when using an exact augmented Lagrange multipliers while it is named IALM with inexact multipliers. IALM is also known as alternating direction methods (ADM)[21]. ALM method is more efficient than APG method, and can achieve higher accuracy with lower storage space.

For low-rank and sparse matrix decomposition problem, most of methods consider the ratio of low-rank degree and sparsity as a certain parameter, however different given matrices or requirements are always with different parameter $\lambda$ . There are some related works that use multiobjective evolutionary algorithms to directly optimize two objectives of some machine learning problems [15,16]. In this paper, we model low-rank and sparse matrix decomposition as a multiobjective problem (MOP), and modify the multiobjective LRSMD (MOLRSMD), which has simple structure and is easy to optimize. In MOLRSMD, on the one hand, the measure of low-rank degree and sparsity are still used as the two objective functions, which ensure the results of decomposition are feasible. On the other hand, the optimized efficiency is improved by modifying the optimization function. We use the evolutionary multiobjective algorithms to optimize the MOLRSMD which obtain trade-off solutions in various situations. The decision maker only need to select the solution from Pareto optimal set, and it is easy to select because the quality of solutions in Pareto optimal set can be judged directly.

The remainder of this paper is organized as follows. Section 2 introduces the related background and motivation about the proposed method. In Section 3, the proposed model and algorithm are presented in detail. Experimental study is described in Section 4. Finally, concluding remarks are given in Section 5.

2. Related background and motivation

In this section, in order to make it easier to understand the proposed method, we will give a brief introduction to multiobjective optimization. Then we will describe the motivation of multiobjective low-rank and sparse matrix decomposition.

2.1. Multiobjective optimization

In general, it will be called multiobjective optimization (MOO), when the number of objective function is more than one and need to optimize simultaneously [8,7,14]. A MOP can be stated as:

$\begin{array}{c} \text{min }F(x)=\left(f_1(x), \cdots, f_m(x)\right)^T \\ \text{subject to }x\in\Omega, \end{array}$

(4)

where $\Omega$ is the decision(variable) space, $F:\Omega \rightarrow R^m$ consists of $m$ real-value objective functions and $R^m$ is called the objective space. The objectives are conflicting with each other in the most of time, it means no solution in $\Omega$ minimizes all the objectives simultaneously. The best trade-off among the objectives can be defined in terms of Pareto optimality.

Let $x_A$ , $x_B\in \Omega$ , $x_A$ is said to dominate $x_B$ if and only if

$\begin{array}{c} \forall i=1, 2, \cdots, m\ \ f_i(x_A)\leq f_i(x_B)\\ \exists j=1, 2, \cdots, m\ \ f_j(x_A)< f_j(x_B), \end{array}$

(5)

and mark it as $x_A\succ x_B$ . A solution $x^*\in \Omega$ is Pareto optimal if there is no solution $x\in \Omega$ such that $x \succ x^*$ . The set of all the Pareto optimal solutions is called the Pareto set and the set of all the Pareto optimal objective vectors which corresponding to all Pareto optimal solution is the Pareto front. In order to illustrate the concepts of nondominated, dominated and Pareto front, we take a two objective optimization as example in Figure 1.

Figure 1. The distributions of nondominated solutions, dominated solutions and Pareto front in the objective space of the two objectives problem..

DownLoad: Full-Size Img PowerPoint

In practical applications, most objective functions have many or even infinite solutions, and we can not get all of them [23]. The purpose of multiobjective optimization is that finding a uniformly distributed Pareto front under a certain amount, which can represent the whole Pareto front. Evolutionary algorithm is one of the most popular methods for solving multiobjective optimization [11,18,25,9,23], such as VEGA [18], SPEA-Ⅱ [25], NSGA-Ⅱ [9] and MOEA/D [23]. Zhang and Li [23] proposed the MOEA/D which has a good performance in MOPs. MOEA/D decomposes the MOPs into a series scalar optimization subproblems with different weights. Due to the solution of each subproblem is similar to neighboring subproblems, the solution can be optimized by neighboring subproblems information. In this paper, due to the problem can be decomposed naturally with different parameter $\lambda$ , we use MOEA/D to handling multiobjective low-rank and sparse matrix decomposition.

2.2. Motivation

For matrix decomposition with constraints of rank and sparsity, as described in problem 2, most of methods convert NP-hard problem into convex optimization. However these algorithms have to set a value of parameter $\lambda$ for trade-off between reducing the rank of low-rank component and making sparse part sparse as far as possible. Although [4] suggests $\lambda = 1/\sqrt{\max(m, n)}$ , it is clear that different choices for $\lambda$ in problem (3) will yield different optimal solutions and it is not easy to know the optimal $\lambda$ , which will result in best decomposition of the given matrix. Figure 2 show a simple experiment to demonstrate how the parameter $\lambda$ affects the result of sparse component recovering with ADM method. In Figure 2, the horizontal axis represents different possible choices of parameter $\lambda$ , and the vertical axis means the error of recovering sparse matrix with different $\lambda$ . We know that the best solution with the smallest error while parameter $\lambda$ is about $0.22$ , and we can not recover a satisfied sparse component when $\lambda$ is too small. The experiment emphasizes how important to choose a suitable parameter $\lambda$ .

Figure 2. Error of sparse component recovering by the ADM method for LRSMD with different choices of the parameter

$\lambda$ .

DownLoad: Full-Size Img PowerPoint

Multiobjective optimization is a popular way to avoiding the difficulty of parameter $\lambda$ selection. Compared with traditional single objective penalty methods, multiobjective optimization takes apparent advantages on optimizing multi-objectives simultaneously and selecting solutions easily[3,23]. Especially in solutions selection, MOO provides a set of trade-off solutions for decision maker while single objective methods have to try different parameters to find suitable solutions. In the most of time, single objective methods can not make a best decision without prior knowledge, even sometimes with prior knowledge, it is also.

For LRSMD, we focus on a more original problem, which thinks LRSMD as a multiobjective problem. The essence of the MOO is to optimize several conflicting objectives simultaneously. As described in problem (2), there are two existing conflicting objectives, which represent low-rank degree and sparsity respectively. We can obtain different solutions by a evolutionary multiobjective algorithm, such as MOEA/D. And each solution in the Pareto front represents a trade-off between reducing rank of low-rank component and decreasing the sparsity of sparse part, which can satisfy different requirements of decision makers.

3. The proposed method

First of all, we introduce the model of multiobjective low-rank and sparse matrix decomposition based on matrix singular value decomposition. Then details of multiobjective low-rank and sparse matrix decomposition algorithm will be described, which show how the MOO approach is applied in matrix decomposition.

3.1. Modified multiobjective Low-rank and sparse matrix decomposition

As is mentioned in Section 1, problem (2) is a standard multiobjective problem, however it is hard to optimize even with evolutionary algorithm. The reasons can summary as follow: 1) both two objective functions are NP-hard and the low-rank component and sparse component are tightly correlative. 2) the decision variable of problem (2) is complex whatever it is low-rank component $L_0$ or sparse part $S_0$ , it is difficult to obtain the optimal solution even if evolutionary algorithms are used, especially for the recovery of the low-rank component. A new multiobjective low-rank and sparse matrix decomposition model is modified by problem (2) and based on matrix singular value decomposition.

Let the singular value vector $x$ as decision variable, $f_1$ represents the rank degree of low rank component, it is defined as

$f_1(x) = \sum\limits_{i=1}^{n}|x_i|.$

(6)

Similarly, sparsity of sparse component is defined as

$f_2(x) = \sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n}|S_{ij}(x)|,$

(7)

where $x\in R^n$ , and $S(x) = D-U\Sigma(x) V^T$ , $D\in R^{m\times n}$ is the given matrix and $\Sigma(x)$ is the diagonal matrix consisting of $x$ . Both $U\in R^{m\times m}$ and $V\in R^{n\times n}$ are orthogonal matrix and they are fixed in the proposed method.

At this point, the two objective functions can be combined into a MOP. It has the following form:

$\begin{array}{c} \min\limits_x F(x) = \min\limits_x(f_1, f_2)\\ \text{subject to } x\in \Omega. \end{array}$

(8)

For low rank component, it is hard to measure and optimize, but the singular value vector can approximate it. When $U$ and $V$ is suitable and fixed, every different low-rank matrix have only one singular value vector from the SVD theorem. The difficulty of sparse optimization is lower than low-rank, meanwhile, the complexity of sparse component as variable is harder than singular value vector. Based on the above mentioned, set the singular value vector as the variable of MOLRSMD is recommendable. The process of multiobjective LRSMD model is summarized in Algorithm 1.

Algorithm 1 Multiobjective LRSMD Model

Input:

$D_{m\times n}$ : matrix data.

Output:

$L$ ,

$S$ : set of all possible decomposed results.

1: Step 1)

$[U, \Sigma, V] \leftarrow svd(D)$

2: Step 2) Generate a set of nondominated solutions (

$X$ ) for problem (8) using Algorithm 2.

3: Step 3) Recovery

$L$ and

$S$ .

4: for

$i=1, 2, ..., N$

5: Step 3.1)

$\Sigma_i \longleftarrow X(i)$

6: Step 3.2)

$L_i \leftarrow U\Sigma_i V^T$

7: Step 3.2)

$S_i \leftarrow D - L_i$

8: end for

3.2. Procedure of MOLRSMD based on MOEA/D

In this paper, the framework of MOEA/D [23] is applied in MOLRSMD because of the characteristic of MOEA/D which decomposing the MOP into several scalar optimization subproblems. In MOLRSMD we use Tchebycheff approach to decompose the multiobjective optimization problem (8), and the scalar optimization subproblem is in the form

$\begin{array}{c} \text{min } g^{te}(x|w) = \max\{w_1|f_1-z_1^*|, w_2|f_2-z_2^*|\} \\ \text{subject to } x\in \Omega, \end{array}$

(9)

where $w=(w_1, w_2)^T$ is a weight vector with $w_1, w_2>0$ and $w_1+w_2=1$ , and $z^*=(z_1^*, z_2^*)^T$ is the reference point, i.e., $z_1^*=min\{f_1(x)|x\in \Omega\}$ . We can obtain a set of subproblems like (9) by selecting different weight vector $w$ . Each subproblem with different weight vector represents a certain trade-off between two objectives.

The procedure of the proposed algorithm for low-rank and sparse matrix decomposition is given in Algorithm 2. The MOLRSMD problem can be decomposed into $N$ scalar optimization subproblems by (9). Therefore we defined a series of weight vectors $w^1, w^2, \cdots, w^N$ with uniform spread and calculate the Euclidean distance between any two weight vectors, and then work out the $T$ closest weight vectors to each weight vector. The neighborhood of each subproblem consists of all the subproblems with $T$ closest weight vectors. The initial population of $N$ individuals $x^1, x^2, \cdots, x^N \in \Omega$ is randomly generated, where $x^i$ is singular value vector of $i$ -th low-rank component. During iteration, some specific genetic operators and problem specific heuristic are used to generate and improve new individuals. The neighboring solutions are updated by $g^{te}(x|w)$ with each subproblem. Finally, we get a set of solutions with different trade-off relationship between the low-rank degree and sparsity of each component.

Algorithm 2 Procedure of MOLRSMD based on MOEA/D

Input: stopping criterion:

$t_{max}$ is the maximum number of generations;

$N$ : the number of subproblems;

$T$ : the size of neighborhood;

$pc$ : the probability of crossover;

$pd$ : the differential multiplier;

$pm$ : the the probability of mutation;

$ps$ : the probability of mutation strategies selection.

Output: Pareto front solution; matrix decomposition results.

1: Step 1) Initialization

2: Step 1.1) Generate a uniform distributed of

$N$ weight vectors

$w^1, w^2, ..., w^N$ and computer the Euclidean distances between any two weight vectors and then work out the

$T$ closest weight vectors to each weight vector. For each

$i=1, ..., N$ , set

$B(i)=\{i_1, ..., i_T\}$ , where

$w^{i_1}, ..., w^{i_T}$ are the

$T$ closest weight vector to

$w^i$ .

3: Step 1.2) Generate an initial population

$x^1, x^2, ..., x^N$ randomly.

4: Step 1.3) Initialize two reference points

$z^l=(z_1^l, z_2^l)^T$ and

$z^u=(z_1^u, z_2^u)^T$ represent the best and the worst values of each objective function in objective space respectively.

5: Step 1.4) For each

$i=1, ..., N$ , calculate

$f_1$ ,

$f_2$ and

$g^{te}(x^j|w^j, z^l, z^u)$ .

6: Step 2) Set

$iter=0$ .

7: Step 3) Update

8: for

$i=1, 2, ..., N,$

9: Step 3.1) Randomly select two indexes

$k$ ,

$l$ from

$B(i)$ , and then generate a new individual solution

$y$ from

$x^i$ ,

$x^k$ and

$x^l$ by using genetic operator with

$pc$ ,

$pd$ ,

$pm$ and

$ps$ .

10: Step 3.2) Apply a thresholding method to improve

$y$ .

11: Step 3.3) For each

$j=1, 2$ , if

$z_j^l>f_j(y)$ , then set

$z_j^l=f_j(y)$ ; and if

$z_j^u<f_j(y)$ , then set

$z_j^u=f_j(y)$ .

12: Step 3.4) For each index

$j\in B(i)$ , calculate

$g^{te}(y|w^j, z^l, z^u)$ , if

$g^{te}(y|w^j, z^l, z^u)\leq g^{te}(x^j|w^j, z^l, z^u)$ , then set

$x^j=y$ and

$g^{te}(x^j|w^j, z^l, z^u) = g^{te}(y|w^j, z^l, z^u)$ .

13: end for

14: Step 4) Stopping criteria: If

$iter<t_{max}$ , then

$iter=iter+1$ and go to Step 3, otherwise, stop the algorithm and output.

In this algorithm, some ideas of Tchebycheff decomposition and genetic operators are used which make solutions distribute more uniform and algorithm convergence faster. Next, some details about these ideas will be introduced.

3.2.1. Objective function normalization

In this paper, due to the difference of orders of magnitude between two objective functions value, the strategy of objective functions normalization is employed. In general, the value of objective function is large especially the range of two objectives have a huge difference, which make Pareto front is not uniform and most of points cluster in corner. A simple function normalization method formed as follow

$\overline f_i = \frac{f_i-z_i^l}{z_i^u-z_i^l}$

(10)

where $i=1, 2$ , $z^l=(z_1^l, z_2^l)^T$ is the lowest point and $z^u=(z_1^u, z_2^u)^T$ is the highest point in objective space. For scalar optimization subproblems, the objective function of Tchebycheff decomposition is

$\begin{split} \overline {g^{ws}}(x|w, z^l, z^u)=\max\{w_1|f_1-z_1^*|, w_2|f_2-z_2^*|\}\\ = \max \{w_1|\overline f_1-\overline z_1^*|, w_2|\overline f_2-\overline z_2^*| \} \\ =\max \{w_1|\frac{f_1-z_1^l}{z_1^u-z_1^l}|, w_2|\frac{f_2-z_2^l}{z_2^u-z_2^l}| \}, \end{split}$

(11)

3.2.2. Genetic operator

Genetic operator is the key of evolutionary algorithm, with genetic operator combined characteristic of optimization problem, the convergence speed of algorithm can be improved. In this paper, we use a difference strategy and double local mutation method to speed up algorithm.

In crossover operator, difference optimization strategy is applied. We select two individuals randomly from the neighborhood of $x^i$ , noted as $x^k$ and $x^l$ , and new individual can generate by

$y_j= \left\{ \begin{array}{cc} x^i_j+F(x^k_j-x^l_j) &\text{ if } r<pc\\ x^i_j &\text{ otherwise} \end{array}\right.$

(12)

where $x_j$ , $y_j$ mean the $j$ -th value of $x$ and $y$ and $j=1, 2, ..., N$ .

We apply a scale controllable mutation with two different mutations to balance diversity and precision of population. In the paper, we select Gauss mutation and polynomial mutation. Gauss mutation is simple than polynomial mutation and well done in global searching. Polynomial mutation for local search can improve the quality of solution. Therefore, if a random number $r<pm$ , do

$y'=\left\{ \begin{array}{cc} Gauss(y, \sigma)&\text{if } rand < ps\\ Polynomial(y)&\text{otherwise} \end{array} \right.$

(13)

where $Gauss(y, \sigma)$ is a Gauss function with $y, \sigma$ , and $Polynomial(y)$ is a polynomial function with variable $y$ .

In this section, setting singular value vector as decision variable, makes problem simpler and easier to optimize. On the one hand, the complexity of variable is $O(n)$ while traditional method with $O(mn)$ . On the other hand, there are few SVD process in multiobjective LRSMD model while most traditional penalty methods need more SVD process, and the most of time cost in iteration spend on SVD process. Furthermore, multiobjective LRSMD model can obtain a set of trade-off solutions which can be chosen directly by decision makers with requirements.

4. Experimental study

In this section, we will represent experiments on artificial generated datasets and some nature images to evaluate the performance of the proposed method. The experiment focus on how the multiobjective low-rank and sparse matrix decomposition algorithm works with different type test data, and analyzing the performance with the convergence, stability and robustness.

4.1. Experimental datasets and experimental setting

Let $D=L^*+S^*$ be the available data, where $L^*$ and $S^*$ are, the original low-rank and sparse matrices, respectively, that need to be recovered. The datasets of experiment are generated with different combinations of the rank of $L^*$ and the sparsity of $S^*$ . In addition, the size of original matrix is also different in experiment and all matrices are generated by Gauss random number. We also have experiments on datasets with noise to evaluate the robustness of the proposed method. The experimental datasets are not only artificial generated datasets as described above but also some nature images. The nature images are consisted of Lena and some images from orl-faces datasets.

In order to illustrate the result of experiment intuitively, we define some indexes to measure results and the performance of the proposed method. The relative error to the original low-rank matrix $L^*$ is defined by

$ErrLR:=\frac{||L-L^*||_F}{||L^*||_F}.$

(14)

Similarly, the relative error to the original sparse matrix $S^*$ is defined by

$ErrSP:=\frac{||S-S^*||_F}{||S^*||_F}.$

(15)

The total error to original matrix is

$ErrT:=\frac{||L+S-D^*||_F}{||D^*||_F}.$

(16)

Due to the characteristic of multiobjective matrix decomposition method, there are many different decomposition results and some results are widely different with original matrices. On the one hand, in order to analysis the accuracy of matrix decomposition in artificial datasets, we just focus on solution with smallest recovery error and the distribute of Pareto front. On the other hand, we could select solution with actual demand when all decomposed solutions would be represented with images in nature images.

The parameters in the proposed method are given in Table 1.

Table 1. The Parameters in the MOLRSMD used in the Experiments.

Parameter	Meaning	Value
$N$	The number of subproblems	100
$T$	The number of neighbors	20
$t_{max}$	The maximum of generations	300
$pc$	The probability of crossover	0.8
$pd$	The differential multiplier	0.5
$pm$	The probability of mutation	0.2
$ps$	The probability of mutation selection	0.85

| Show Table

DownLoad: CSV

4.2. Experimental results

4.2.1. Artificial datasets

For artificial generated datasets, we select three different size test matrices, which the sizes are $20\times 20$ , $50\times 50$ and $100\times 100$ , respectively. In addition, the rank and sparsity are also different. From Figure 3, it illustrates the results with three different sizes and represents MOLRSMD work well in this datasets. The first column is three Pareto front for MOLRSMD in different data, which the biggest different is matrix size. The Pareto front is with approximate uniform distribution for all of three different data, which means MOLRSMD has a good convergence. Each figure in the second column of Figure 3 is two objection functions corresponding to the solutions. The blue points represent low-rank function values while the red points represent sparse function value. The value of low-rank function is always in a small range compared with sparse function. The last column is three box-plot about $ErrLR$ and $ErrSP$ by running 30 times independent experiments. The box-plot shows the statistical results of the change of $ErrLR$ and $ErrSP$ . From box-plot we can see that low-rank recovering error is lower than sparse and do not change appreciably with different times running. It means the proposed method has a great stability.

Figure 3. The Pareto front, two objective functions corresponding to the solutions and box-plot for

$ErrLR$ and

$ErrSP$ by running 30 times independent experiments. Three rows represent experiment on the data with size:

$20\times 20$ , rank: 5, sparsity: 0.2, size:

$50\times 50$ , rank: 5, sparsity: 0.2 and size:

$100\times 100$ , rank: 5, sparsity: 0.5 respectively..

DownLoad: Full-Size Img PowerPoint

Furthermore, we also do experiments on some data with noise to test the robustness of the proposed method. For data with size: $20\times 20$ , rank: 1 and sparsity: 0.1, we add Gauss noise randomly. The types of noise determined by two factors and can be described as follow: number of noise points can choose from 20 and 50; $\sigma$ of Gauss noise: 1, 5 and 15. We randomly combine two factors of noise and test all of them except 50 points, $\sigma=1$ .

Figure 4 displays the box-plot of recovering error in five noisy situations, which is described as above. We can know that not only the error of low-rank recovering but also sparse recovering is changed in a small range with different types noise. With the improved of number of noise points or $\sigma$ of Gauss function, the changed range of error is improved. However, the range of difference between different noise is reasonable. In a word, the robustness of the proposed method is acceptable. Figure 5 shows that the Pareto front is also smooth and uniform distribution with the worst noise, which number of noise points is 50 and $\sigma$ of Gauss function is 15.

Figure 4. Box-plot about

$ErrLR$ and

$ErrSP$ for data with different noise. (a): number of noise points is 20,

$\sigma=1$ . (b): number of noise points is 20,

$\sigma=5$ . (c): number of noise points is 20,

$\sigma=15$ . (d): number of noise points is 50,

$\sigma=5$ . (e): number of noise points is 50,

$\sigma=15$ ..

DownLoad: Full-Size Img PowerPoint

Figure 5. The Pareto front and two objective functions corresponding to the solutions for noise data. Noise type: number of noise points is 50 and

$\sigma =15$ ..

DownLoad: Full-Size Img PowerPoint

4.2.2. Nature Datasets

In the nature images, the proposed method also works well. The images that we used is consisted of Lena and orl-faces datasets. Due to no standard decomposed results of datasets, we just analysis the Pareto front and practical results to evaluate performance of the proposed method. Some results are displayed in Figure 6 and Figure 7. Figure 6 is about image Lena and Figure 7 is result in one random face image from orl-faces. We can see that the Pareto front of the proposed method is smooth and equally distributed, which means the method converges to PF stably. The (b) of Figure 6 and 7 show three different decomposed results with different locations in the PF. The first row of part (b) means sparse component has an absolute advantages, on the contrary, the third row shows low-rank part with more advantage. The middle one is a balance between low-rank and sparse components.

Figure 6. The Pareto front and decomposed results with image lena. (a) is Pareto front of MOLRSMD and (b) is three different decomposed results selected form Pareto front, and they locate at the top, middle and bottom of Pareto front, respectively..

DownLoad: Full-Size Img PowerPoint

Figure 7. The Pareto front and decomposed results with image face. (a) is Pareto front of MOLRSMD and (b) is three different decomposed results selected form Pareto front, and they locate at the top, middle and bottom of Pareto front, respectively..

DownLoad: Full-Size Img PowerPoint

4.2.3. Comparison Study

The performance of the proposed MOLRSMD have been examined in the previous subsection. In this subsection, simple comparison with some advanced LRSMD algorithms is tested on artificial datasets. ADM [21] and ALM [13] are two mainstream and efficient algorithms in exiting research. Both of them need set parameter $\lambda$ as prior information which balances low-rank and sparse components. For all the algorithms, the parameter $\lambda$ is varied from 0 to 1, with a step size of 0.01 in order to compare with MOLRSMD fairly. The results of the proposed MOLRSMD as compared with ADM and ALM are presented in Figure 8 and Table. 2.

Figure 8. Results of the proposed MOLRSMD compared with ADM and ALM. The test dataset size is

$50\times 50$ , rank equals 5 and sparsity is 0.2..

DownLoad: Full-Size Img PowerPoint

Table 2. Errors of the proposed MOLRSMD compared with ADM and ALM..

Methods	$ErrLR$	$ErrSP$	$ErrT$
MOLRSMD	0.0100	0.1986	0
ADM	0.0299	0.1331	1.9135e-17
ALM	0.0148	0.0661	2.2412e-10

| Show Table

DownLoad: CSV

Each row of Figure 8 presents the Pareto front by different algorithms. The Pareto front of MOLRSMD is complete and smooth as compared with ADM and ALM. The results of ADM and ALM are with uneven distribution, most of Pareto optimal solutions cluster in low sparsity region even the region with sparsity close to zero. In particular, from the second row of Figure 8, solutions of ADM and ALM with a sudden change and most of solutions cluster in two regions (the value of two objective functions are about 7500,2500 and 4500, 0 respectively) while solutions of the proposed MOLRSMD with uniform and smooth distribution. Table 2 represents the smallest recovery error from 100 solutions of the proposed MOLRSMD, ADM and ALM on artificial datasets (size: 50 $\times$ 50, rank: 5, sparsity: 0.2). We can see that errors of three algorithms are similar. We have a discovery from above all, the influence of parameter $\lambda$ is nonlinear and hard to described. Sometimes it is hard to find a suitable parameter $\lambda$ . For example, if the solution in region with low rank and high sparsity in this case, the difficulty of find suitable $\lambda$ is very hard even impossible. In contrary, the proposed MOLRSMD avoids the selection of parameter $\lambda$ . It is easier to find optimal solution compared with ADM and ALM.

5. Conclusion

In this paper, an evolutionary multiobjective low-rank and sparse matrix decomposition method has been proposed, for finding optimal trade-off solutions between the rank of low-rank component and the sparsity of sparse part. We focus on more original problem of low-rank and sparse matrix decomposition, with the theorem of matrix singular value decomposition, modifying the model by replacing decision variable with singular value vector. We have experiments on different type datasets, including artificial datasets, datasets with noise and nature images, to indicate that the proposed MOLRSMD is always with good convergence, reliable stability and acceptable robustness. It also show that it can be applied to some practical problem.

Future work will be attention on decreasing the accuracy of recovering. In the proposed method, the orthogonal matrices $U$ and $V$ is fixed, however different matrix with different orthogonal matrices in the most of time. Fixed orthogonal matrices is always increasing the error. Next, we will explore how to find the best $U$ and $V$ while singular value vector changed.

References

[1]	Beck A., Teboulle M. (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2: 183-202. doi: 10.1137/080716542
[2]	Cai J.-F., Candès E. J., Shen Z. (2010) A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization 20: 1956-1982. doi: 10.1137/080738970
[3]	Cai Z., Wang Y. (2006) A multiobjective optimization-based evolutionary algorithm for constrained optimization. IEEE Transactions on Evolutionary Computation 10: 658-675. doi: 10.1109/TEVC.2006.872344
[4]	E. J. Candès, X. Li, Y. Ma and J. Wright, Robust principal component analysis? Journal of the ACM (JACM), 58 (2011), Art. 11, 37 pp. 10.1145/1970392.1970395 MR2811000
[5]	Candès E. J., Recht B. (2009) Exact matrix completion via convex optimization. Foundations of Computational Mathematics 9: 717-772. doi: 10.1007/s10208-009-9045-5
[6]	Chandrasekaran V., Sanghavi S., Parrilo P. A., Willsky A. S. (2011) Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization 21: 572-596. doi: 10.1137/090761793
[7]	C. A. C. Coello, D. A. Van~Veldhuizen and G. B. Lamont, Evolutionary Algorithms for Solving Multi-Objective Problems, Genetic Algorithms and Evolutionary Computation, 5. Kluwer Academic/Plenum Publishers, New York, 2002. 10.1007/978-1-4757-5184-0 MR2011496
[8]	K. Deb, Multi-objective Optimization Using Evolutionary Algorithms, John Wiley & Sons, Ltd. , Chichester, 2001. MR1840619
[9]	Deb K., Pratap A., Agarwal S., Meyarivan T. (2002) A fast and elitist multiobjective genetic algorithm: NSGA-Ⅱ. IEEE Transactions on Evolutionary Computation 6: 182-197. doi: 10.1109/4235.996017
[10]	Fazel M., Hindi H., Boyd S. P. (2001) A rank minimization heuristic with application to minimum order system approximation. in Proceedings of the American Control Conference, IEEE 6: 4734-4739. doi: 10.1109/ACC.2001.945730
[11]	Gong M., Jiao L., Du H., Bo L. (2008) Multiobjective immune algorithm with nondominated neighbor-based selection. Evolutionary Computation 16: 225-255. doi: 10.1162/evco.2008.16.2.225
[12]	Z. Lin, A. Ganesh, J. Wright, L. Wu, M. Chen and Y. Ma, Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix, Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), vol. 61,2009.
[13]	Z. Lin, M. Chen and Y. Ma, The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices, arXiv preprint, arXiv: 1009.5055, 2010.
[14]	K. Miettinen, Nonlinear Multiobjective Optimization, Kluwer Academic Publishers, Boston, MA, 1999. MR1784937
[15]	Qian C., Yu Y., Zhou Z.-H. (2015) Pareto ensemble pruning. in AAAI 2935-2941.
[16]	————, Subset selection by pareto optimization, in Advances in Neural Information Processing Systems, (2015), 1774-1782.
[17]	Recht B., Fazel M., Parrilo P. A. (2010) Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review 52: 471-501. doi: 10.1137/070697835
[18]	J. D. Schaffer, Multiple objective optimization with vector evaluated genetic algorithms, in Proceedings of the 1st international Conference on Genetic Algorithms. L. Erlbaum Associates Inc. , (1985), 93-100.
[19]	Starck J. L., Elad M., Donoho D. L. (2005) Image decomposition via the combination of sparse representations and a variational approach. IEEE Transactions on Image Processing 14: 1570-1582. doi: 10.1109/TIP.2005.852206
[20]	Yan J., Liu J., Li Y., Niu Z., Liu Y. (2010) Visual saliency detection via rank-sparsity decomposition. in IEEE International Conference on Image Processing, IEEE 1089-1092. doi: 10.1109/ICIP.2010.5652280
[21]	Yuan X., Yang J. (2013) Sparse and low-rank matrix decomposition via alternating direction methods. Pacific Journal of Optimization 9: 167-180.
[22]	Zhang C., Liu J., Tian Q., Xu C., Lu H., Ma S. (2011) Image classification by non-negative sparse coding, low-rank and sparse decomposition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1673-1680. doi: 10.1109/CVPR.2011.5995484
[23]	Zhang Q., Li H. (2007) MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation 11: 712-731.
[24]	Zibulevsky M., Pearlmutter B. A. (2001) Blind source separation by sparse decomposition in a signal dictionary. Neural Computation 13: 863-882.
[25]	Zitzler E., Laumanns M., Thiele L., et al. (2001) SPEA2: Improving the strength pareto evolutionary algorithm. in Eurogen 3242: 95-100.

Reader Comments

Your name:*

Email:*
© 2017 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)