1.
Introduction
As the speedy expansion of technology is about acquiring and processing information, most factual data exists in tensor form, e.g., electroencephalography signal data, video volume data, hyperspectral image data, color image data, and functional magnetic resonance imaging data. Enormous and intricate data is ubiquitous in real-world application scenarios; see [1,2,3]. Typical dimensional reduction methods, such as singular value decomposition [4], local linear embedding [5,6], vector quantization [7], principal component analysis [8,9], nonnegative matrix factorization (NMF) [10,11,12], etc., extract their low-dimensional representation from high-dimensional ones. A commonality of the abovementioned low rank approximation methods is that converting the data matrices (tensors) of samples into a new large matrix may substantially corrupt the architecture of sample data space. To preserve the internal nature of tensor data and study the low-dimensional representation of high-order domain, many tensor decomposition methods [13,14,15] have been developed. Kim et al. introduced a nonnegative Tucker decomposition (NTD) [16,17] by combining Tucker decomposition with nonnegativity constraints on the core tensor and factor matrices. By adding nonnegativity constraints, NTD could not only obtain the parts-based representation like NMF, but also improve the uniqueness of Tucker decomposition. Recent efforts have extended NTD to boost calculation efficiency and meet different demands in actual applications by incorporating suitable constraint conditions [18,19,20] with NTD, including smoothness, graph Laplacian, sparsity, orthogonality, and supervision, just to name a few.
The model that adds effective and feasible constraints to the NTD model is called the NTD-like model. For example, Liu et al. [21] stated a graph regularized Lp smooth NTD method by adding the graph regularization and Lp smooth constraint into NTD to retain smooth and more accurate solutions of the objective function. Qiu et al. [22] proposed a graph Laplacian-regularized NTD (GNTD) method. GNTD extracts the low-dimensional parts-based representation and preserves the geometrical information simultaneously from the high-dimensional tensor data. Subsequently, Qiu et al. [23] developed an alternating proximate gradient descent method to solve the proposed GNTD framework. Chen et al. [24] designed an adaptive graph regularized NTD model, which adaptively learns the optimal graph to capture local manifold information. Li et al. [25] asserted a manifold regularization NTD (MR-NTD) by employing a manifold regularization term for the core tensor constructed in the NTD to preserve geometric information in tensor data. Huang et al. [26] gave a dynamic hypergraph regularized NTD method by incorporating the hypergraph structure and NTD in a unified framework. Jing et al. [27] recommended a label constrained NTD using partial labels to construct a label matrix. Then, they embedded the label term and graph regularization term into NTD for guiding the algorithm to obtain more correct categories in clustering tasks. To make use of the available label information of sample data, Qiu et al. [28] built up a semi-supervised NTD (SNTD) model by propagating the limited label information and learning the nonnegative tensor representation. This part can only cover a small subset of the many important and interesting ideas that have emerged. There are other related studies; see [29,30,31] for details.
However, the aforementioned NTD-like model does not take into account the orthogonality constraint. In fact, the orthogonality structure of factor matrices makes sense in practical use. In [32], the equivalence of the orthogonal nonnegative matrix factorization (ONMF) problem and K-means clustering has been well discussed. To maintain this characteristic, Pan et al. [33] developed an orthogonal NTD model by considering the orthogonality on each factor matrix. It can get the clustering information from the factor matrices and their joint connection weight from the core tensor. The orthogonal NTD model not only helps to keep the inherent tensor structure but also performs well in data compression. Lately, drawing on the idea of approximate orthogonality from [34], Qiu et al. [35] affirmed a flexible multi-way clustering model called approximately orthogonal NTD for soft clustering. This model provides extra flexibility to handle crossed memberships while making the orthogonality of the factor matrix adjustable. However, these methods that consider orthogonality still have their shortcomings. In clustering problems, it is common to consider the factor matrix of the last direction as an approximation of the original data. Using graph regularization constraints to the factor matrix of the last direction can capture more internal manifold information of the data tensor. They fail to pay attention to the geometrical structure of the data space and the potential connections between samples.
The geometrical structure and resilient orthogonality of the data space are indispensable. To balance the two purposes, in this paper, we propose two approximately orthogonal NTD with graph regularized (AOGNTD) models by jointly blending the graph regularization and approximately orthogonal constraints into the NTD framework. The new model we propose is essentially an NTD-like model. The main contributions of this paper are fourfold:
● Coupling the graph regularized term, the approximately orthogonal term, and the objective function of NTD, we construct a novel tensor-based frame. Based on whether we add the approximately orthogonal constraint to the N-th factor matrix, two models naturally generated.
● By regulating the quality of approximation and picking the appropriate graph for the learning task, the models allow us to detect more complex, latent, and structural correlations among sample data.
● This algorithm is sensitive to the input of parameters, which means this option determines the performance of clustering results. To overcome the issue, we use the grid method to select competent parameters.
● Numerical experiments on frequently adopted datasets are conducted to illustrate the feasibility of the proposed methods for image clustering and classification tasks. Extensive experimental results show that the AOGNTD models could significantly improve the performance.
The rest of this paper is organized as follows. In Section 2, we review the related models. In Section 3, the two AOGNTD methods are proposed. In Section 4, we have discussed its theoretical convergence, provided a solution process and computational complexity. Finally, experiments for clustering tasks are presented in Section 5, and conclusions are drawn in Section 6.
2.
The brief review of NTD and GNTD
NTD could be considered as a special case of NMF with sparser and multi-linear basic vectors. Given a nonnegative data tensor
NTD aims at decomposing the nonnegative tensor X into a nonnegative core tensor
multiplied by N nonnegative factor matrices
along each mode. To achieve this goal, NTD minimizes the sum of squared residues between the data tensor X and the multi-linear product of core tensor G and factor matrices A(r), which can be formulated as
where in the following, the operator ×r is referred to as the r-mode product. The r-mode product of a tensor
and a matrix U∈RIr×Jr, denoted by Y×rU, is
Equation (2.1) can be represented as a matrix form:
where
⊗ denotes the Kronecker product, and
are the mode-N unfolding matrices of data tensor X and core tensor G, respectively. Therefore, the NTD regard as the NMF with the encoding matrix
and the basis matrix
where IN and JN can be regarded as the number of all samples and the dimension of low-dimensional representation of X.
The local geometric structure can be effectively modeled through a nearest neighbor graph on data points, which originates from spectral graph theory and manifold learning. Graphs using the broad range of p nearest neighbors are generated and employed. The most commonly weighted matrices are heat kernel and 0-1 weighting. The heat kernel is
where N(yi) is composed of the p-nearest neighbors of sample yi, and σ is the Gaussian kernel parameter to control the values of similarity. In our experiment, it was set to 1. Alternatively, 0-1 weighting gave a binarization of weight definition as
This 0-1 weight is the cosine weight method, which is used in the subsequent image experiments.
To measure the dissimilarity of data points zi,zj in the low-dimensional representation A(N),
is under consideration. The definition of graph Laplacian is
where W is a weight matrix, L=D−W is called the graph Laplacian, and
is a diagonal matrix whose elements are column sums of W. Minimizing the above formula, we hope if yi and yj are close, zi and zj are consistent with this trend. If data zi and zj are similar, the value of Wij is relatively large. The operation of trace effectively characterizes the smoothness of low-rank representations.
GNTD is obtained by minimizing the following objective function:
where Tr(⋅) represents the trace of a matrix, L is called the graph Laplacian matrix, and λ is a nonnegative parameter for balancing the importance of a graph regularization term and reconstruction error term. It is worthwhile to integrate the graph regularization into mode-N low-dimensional representation A(N).
3.
Approximately orthogonal NTD with graph regularized
In this section, an approximately orthogonal NTD with graph regularized model is proposed, the specific updating rules are introduced, the convergence of stated algorithms are proved, and their computational complexity is investigated.
3.1. Approximately orthogonal term
As an orthogonal version of NMF, an orthogonal nonnegative matrix factorization has been proven to be closely related to k-means clustering. For a given nonnegative data matrix X∈RM×N+, the ONMF seeks nonnegative factors U∈RM×R+ and V∈RN×R+ through the following optimization:
where
The rigid orthogonality constraint of V actually consists of two parts: the unit-norm constraints
and the orthogonality (zero) constraints
where vi denotes the i-th column of the encoding matrix V. It is reasonable to generalize the orthogonality of NMF to the orthogonality of each directional matrix in NTD:
The orthogonality of the direct absorption about the factor matrix makes model solving more complex, so we cast around for its approximate orthogonal form. The unit-norm constraints are not obligatory. In the paper, we attempted not to put to use this condition. Hence, we only need to focus on the orthogonality constraints that explain the independent relationship between columns. Under the non-negativity constraints, we have
for any i and j. The form of trace
is a convenient mathematical expression of a partially and approximately orthogonality of matrix, where
1R is an all-one R-by-1 column vector, I is an identity matrix, and R will be assigned different values depending on the corresponding issue. The larger the value of this item, the greater the degree of orthogonality of the matrix.
3.2. The proposed AOGNTD model
Notice that
holds, where
We see that the nonnegative 3-factorization:
gives a good framework for clustering the rows and columns of X(N). We consider uni-side and bi-side approximately orthogonal cases in the tri-factorization, respectively, so two models are spontaneously born:
● Uni-sided approximately orthogonality:
V⊤V=I is weakened to ∑i,j,i≠jVTiVj (Vi and Vj stand for the different column of V), or simply
for convenience of calculation. The original expression contains the mass of Kronecker product operations, which is too expensive.
● Bi-sided approximately orthogonality:
U⊤U=I and V⊤V=I are weakened to ∑i,j,i≠jUTiUj and ∑i,j,i≠jVTiVj, or simply
for convenience of calculation.
It is easy to detect that uni-sides situation is a special case of bi-sides situation. Out of such inspiration, we can embed the approximately orthogonal regularization into regular GNTD to develop AOGNTD (or bi-AOGNTD):
where L is graph Laplacian matrix,
and μr, λ are positive integers. When μN=0 is held, the bi-AOGNTD model could degenerate into uni-AOGNTD. The objective function is:
In the AOGNTD model,
● ‖X−G×1A(1)×2A(2)…×NA(N)‖2 is the main part, which is the key to feature extraction.
● Tr(A(r)TQA(r)),r=1,…,N is used to ensure that the first n−1 factor matrices are partially orthogonal, which makes global structure more accurate.
● Tr(A(N)TLA(N)) is the basis for graph optimization, which is achieved by the Laplacian regularizer to conserve the smooth structure of low-rank expression.
We expect that the proposed AOGNTD model can capture more global structure and local manifold information.
4.
Optimization algorithm
Solving the global optimal solution of (3.2) is tough, so we adopt the block coordinate descent framework that updates the core tensor or one factor matrix each time while fixing others. This update rule is also known as the multiplicative update rule. It is a validated compromise plan between speed and convenient of implementation.
We use the Lagrange multiplier method and apply the mode-n unfolding form. The Lagrange function is
where Φn and Ψr are the Lagrange multipliers matrices of G(n) and A(r), respectively. The function (4.1) can be rewritten as
The entire updating rule is divided into tripartite, containing the solutions of factor matrices A(n), (n=1,2,…,N−1), the solutions of factor matrices A(N), and the solutions of core tensor G.
4.1. Updating factor matrix
To update A(r), r=1,…,N, we fix G and the other factor matrix.
The partial derivatives of L in (4.1) with respect to A(n),(n=1,2,…,N−1) are
By using the Karush-Kuhn-Tucker (KKT) condition, i.e.,
where in the following, ⊙ denotes the Hadamard product. According to ∂L/∂A(n)=0, we obtain
By calculating
which together with
yields the following updating rule for A(n)(n=1,2,…,N−1):
where
The partial derivative of L in (4.1) with respect to A(N) is
Similarly, we consider the KKT condition
As a result, we obtain the following updating rule for A(N):
When only considering the unilateral information of the triple decomposition, the iterative format will undergo slight changes and the following updating rule for A(N) is
4.2. Updating core tensor
To update G, we fix A(r), r=1,…,N.
The objective function in (4.1) about G can be changed into
where
vec(X) expands the tensor X into a 1-dimension vector and vec(Φ) represents the Lagrange multiplier of vec(G). The partial derivative of L in (4.5) with respect to vec(G) is
By applying
we obtain the following updating rule:
At this point, the optimization problem has been solved. According to the above iteration rules, we summarize the process of AOGNTD method, as shown in Algorithm 1.
4.3. Theoretical investigation
In [11], we are well aware that if G(u,u′) is an auxiliary function for F(u), then F(u) is nonincreasing under the updating rule
The equality
holds only if ut is a local minimum of G(u,ut). Each subproblem has an optimal solution, which is the fundamental basis and theorem support about our algorithm. The construction and design of auxiliary functions are of utmost importance in this part. Before expounding the local convergence theorem, we offer three lemmas about diverse auxiliary functions.
For any element a(n)ij in A(n)(n=1,2,…,N−1), let Fij(a(n)ij) denote the part of the objective function in (3.2) relevant to a(n)ij. The iterative process is element-wise, so it is necessary to imply that each Fij(a(n)ij) is nonincreasing under the iteration rule. The first-order derivative of the Fij(a(n)ij) with respect to a(n)ij is
and
is the second order derivative of the Fij(a(n)ij) relevant to a(n)ijt.
Lemma 1. The function
is an auxiliary function for Fij(a(n)ij), with the matrix
Proof. Basically, G(u,u′) is an auxiliary function for F(u) if the conditions
are satisfied.
clearly holds, so we only need to testify that
The Taylor expansion of Fij(a(n)ij) at a(n)ijt is
Comparing (4.8) with (4.9), we can see that
is equivalent to
We have
thus, the inequality
holds. □
Let Fij(a(N)ij) denote the part of the objective function in (3.2) relevant to a(N)ijin A(N).
Lemma 2. The function
is an auxiliary function for Fij(a(N)ij).
Lemma 3. Let gi denote the element of vec(G) and Fi(gi) denote the part of the objective function in (4.1) relevant to gi. The function
is an auxiliary function for Fi(gi).
Due to the similarity between the proofs of Lemmas 1–3, they are omitted here.
Theorem 4. The objective function in (4.1) is nonincreasing under the updating rules in (4.2), (4.3), and (4.6). The objective function is invariant under these updates if, and only if, A(r), r=1,2,…,N, G are at a stationary point.
Proof. Replacing G(u,u′) in (4.7) by (4.8), we obtain
which yields
According to Lemma 1, Fij(a(n)ij) is nonincreasing under the updating rules (4.2) for A(n). Then, putting G(a(N)ij,a(N)ijt) of (4.10) into (4.7), we can obtain
From Lemma 2, Fij(a(N)ij) is nonincreasing under the updating rules (4.3) for A(N). Finally, putting G(gi,gti) of (4.11) into (4.7),
In accordance with Lemma 3, Fi(gi) is nonincreasing under the updating rule (4.6).
The proof of Theorem 4 is completed. □
4.4. Computational complexity analysis
In this subsection, we discuss the computational cost of our proposed algorithms in comparison to other methods. The common way to express the complexity of one algorithm is using big O notation, but it is not precise enough to differentiate the complexities. Thus, we additionally count the arithmetic operations for each algorithm, including fladd (a floating-point addition), flmlt (a floating-point multiplication) and fldiv (a floating-point division). Specifically, we provide the computational complexity analysis of uni-AOGNTD and bi-AOGNTD models. There are many Kronecker product operations in our update rules, which needs high time cost. So, we will replace the Kronecker product with tensor operation during the calculation process.
To facilitate the estimation of computational complexity, we consider the case where r=3 in the models. Taking uni-AOGNTD as an example, the update rules, respectively, are
and
where
and so on.
The frequently used expressions and the specific calculations for the two methods are provided in Table 1. By calculating the computational complexity of the four formats and considering the fact Ii≫Ji, we can gain the complexity of the proposed uni-AOGNTD as O(I1I2I3J), in which
Then, we also can acquire the complexity of the bi-AOGNTD method as O(I1I2I3J). According to the analysis of other compared methods in corresponding literature, Table 2 generalizes the complexity of compared algorithms. The new algorithm will inevitably increase a certain amount of computation, but it can be ignored in terms of overall computational complexity.
5.
Experiments
In this Section, according to [36], two indicators, accuracy (AC) and normalized mutual information (NMI), are used to evaluate the clustering performance. The AC is defined as
where c_{i} denotes the label from clustering method, k_{i} is the true label of object x_{i} , \delta(x, y) is the indicator function, n is the total number of objects, and map(c_{i}) is the mapping function. The NMI is defined as
where n_{uv} is the number of cluster, and n_{i} and n^{'}_{i} are the data points in the true label cluster and the clustering label cluster, respectively.
All the experiments are run with MATLAB on a PC equipped with Intel i5-10210U and 8GB of RAM.
5.1. Compared methods
To evaluate the effectiveness of the proposed AOGNTD methods, we compare the clustering performance of our methods with the other six state of the art and previously mentioned methods, which are k -means [36], NMF [11], graph regularized NMF (GNMF) [36], graph dual regularized NMF (GDNMF) [37,38], NTD [17], and GNTD [22,23]. In order to distinguish DNMF of the original article from DNMF with the label information [27], we have written DNMF [37,38] as GDNMF.
5.2. Datasets description
Three databases are used to conduct the experiments in Figure 1, and the details of public datasets are stated as follows (the COIL20 dataset means the Columbia University image library dataset, the ORL dataset means the Oxford-rotating dataset):
5.2.1. COIL20 dataset
The COIL20 dataset collects 1440 grayscale images of 20 subjects. The objects were placed on a motorized turntable, which was rotated through 360 degrees to vary object pose with respect to a fixed camera. Images of the objects were taken at pose intervals of 5 degrees, which corresponds to 72 images per object. In our experiments, each image was resized to 32 \times 32 pixels and all images are stacked into a tensor \mathcal{X}\in\mathbb{R}^{32\times32\times1440} .
5.2.2. Yale dataset
The Yale faces dataset contains 165 grayscale images in GIF format of 15 individuals. There are 11 images per subject, one per different facial expression or configuration. Each image was resized into 32 \times 32 pixels. In the experiments, all images form a tensor \mathcal{X}\in\mathbb{R}^{32\times32\times165} .
5.2.3. ORL dataset
The ORL dataset collects 400 grayscale 112 \times 92 faces images which consist of 40 different subjects with 10 distinct images. For some subjects, the images were taken at different lighting, times, and facial expressions. For each image, we resized it to be 32 \times 32 pixels in our experiment. These images construct a tensor of \mathcal{X}\in\mathbb{R}^{32\times32\times400} .
5.3. Parameter selection
Identifying and adjusting the parameter combination of the algorithm has always been a striking matter in machine learning. The size of core tensor plays a prominent role in our algorithms. For NTD-based methods, the size of the core tensor is set to \{10\times10\times s, 20\times20\times s, 30\times30\times s\} in three datasets, where s is positive integer.
There are multiple regularization parameters \lambda and \mu_{r}, r = 1, 2, \dots, N balancing the effects of the reconstruction term, graph Laplacian term, and approximately orthogonal term in our model. To reduce parameter adjustment pressure, we study the influence on the clustering performance when the parameters \lambda and \mu_{r} = \mu and the ratio of sampling vary. Since it is infeasible to consider the changes of these parameters disposable, we use the grid control method to determine the relatively optimal parameters of the process.
To begin, determining the numerical range of two parameters is [5e-5, 1e+4] , then select t values within this range, and finally select t^{2} results to achieve the best clustering effect. For the COIL20 dataset, we set \{1e-1, 1, 1e+1, 1e+2\} as selections of the parameters \lambda and \mu in uni-AOGNTD format, where t = 4 , and choose the best among 16 values. It is crucial to weigh the two measurement criteria reasonably when comparing results.
In principle, the numerical value of AC (accuracy) is the main factor, supplemented by NMI. More details can be found in Figures 2–7. Figures 2–7 shows the specific parameter selection, and the optimal parameter setting is circled in black. The symbol dataset- k , say, COIL20-4, represents the categories extracted from a certain dataset. During the procedure of parameter option, we found strong graph regularity and weak orthogonality of the AOGNTD models.
5.4. Experiments for effectiveness and analysis
Tables 3–8 show the average clustering results of the compared algorithms on three datasets, respectively. We bold mark the best results with each clustering number k on each dataset. In each experiment, we randomly selected k categories as the evaluated data. We construct the graph Laplacian using p -nearest neighbors in which the neighborhood size p is set to 5. We run the experiment and apply K -means 10 times on the low-dimensional representation matrix of each method. To make the experimental result validated, we repeat the above operations 10 times and calculate the averages, so there are a total of 100 calculations.
From Tables 3–8, our AOGNTD algorithms present better performance on the three datasets with varying category counts. In the comparison of data, the following phenomena can also be observed:
(1) For the COIL20 dataset, its performance is the most outstanding. The proposed uni-AOGNTD method attains 9\% , 8.27\% , 7.66\% , 6.69\% , 8.48\% , 3.39\% improvement corresponding to AC and 12.39\% , 10.92\% , 10.17\% , 9.14\% , 12.51\% , 3.15\% improvement corresponding to NMI in comparison with K -means, NMF, GNMF, GDNMF, NTD, GNTD, respectively. In the same way, the proposed bi-AOGNTD algorithm attains 8.65\% , 7.92\% , 7.31\% , 6.34\% , 8.13\% , 3.04\% improvement corresponding to AC and 12.41\% , 10.96\% , 10.22\% , 9.18\% , 12.55\% , 3.19\% improvement corresponding to NMI in comparison with K -means, NMF, GNMF, GDNMF, NTD, GNTD, respectively.
(2) For the Yale dataset, the uni-AOGNTD method attains 4.87\% , 3.71\% , 3.4\% , 2.85\% , 2.72\% , and 2.02\% improvement corresponding to AC and 5.11\% , 4.63\% , 4.33\% , 3.09\% , 3.38\% , 2.38\% improvement corresponding to NMI in comparison with K -means, NMF, GNMF, GDNMF, NTD, GNTD, respectively. The uni-AOGNTD and bi-AOGNTD models are about equal in problem-solving skills.
(3) For the ORL dataset, the uni-AOGNTD method attains 7.94\% , 2.48\% , 2.02\% , 1.86\% , 12.33\% , and 0.72\% improvement corresponding to AC and 9.48\% , 4.31\% , 3.88\% , 3.72\% , 15.29\% , 1.1\% improvement corresponding to NMI in comparison with K -means, NMF, GNMF, GDNMF, NTD, GNTD, respectively. The clustering effect of uni-AOGNTD method and bi-AOGNTD method are roughly the same.
As can be seen, the uni-AOGNTD and bi-AOGNTD methods are superior to the compared methods, due to innovating and balancing the approximately orthogonal regularization and the graph regularization in conjunction with the unified NTD for tensor data representation. The local minimum solution is compressed to a better solution space under the adjustment of parameters. Intuitively, the nonnegativity of the factor matrix and the model constraint reduce the feasible solution domain to near local minima.
The clustering results of methods containing graph regularization terms for {\mathbf{A}}^{(N)} are extended in Figure 8. In short, the fresh approaches produced the highest and stablest AC and NMI.
5.5. Convergence study
As described in Section 3, the convergence of the proposed algorithms has been theoretically proved. In this subsection, we experimentally study the convergence of the proposed algorithms by assuming link between the number of iterations and the value of the objective function in Figure 9. The trend intuitively suggests that the objective function can converge effectively under the multiplication iteration rules and explains the correctness of Theorem 4.
6.
Conclusions
In this paper, we built up two novel AOGNTD methods for tensor data representation. The convergence analysis is given. The AOGNTD method presents more competent representation and achieves better clustering performance on the publicly available real-world datasets through two adjustable regularization parameters. Although AOGNTD methods perform well in image clustering tasks, it can be further improved in two possible directions in the future. First, the graph in our algorithm is fixed, but it is thought-provoking whether to obtain joint information from multiple graphs or learn the optimal graph. In addition, a large number of NTD-based methods rely on K -means, and designing independent classification methods without any additional clustering procedure is the main point of further research.
Author contributions
Xiang Gao: material preparation, data collection and analysis, writing the first draft, commenting; Linzhang Lu: material preparation, data collection and analysis, commenting; Qilong Liu: material preparation, data collection and analysis, commenting. All authors have read and approved the final version of the manuscript for publication.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
Linzhang Lu is supported by the National Natural Science Foundation of China under Grant (12161020 and 12061025), and Guizhou Normal University Academic Talent Foundation under Grant (QSXM[2022]04). Qilong Liu is supported by Guizhou Provincial Basis Research Program (Natural Science) under Grant (QKHJC-ZK[2023]YB245).
Conflict of interest
The authors declare that they have no conflicts of interest in this paper.