A deep clustering framework integrating pairwise constraints and a VMF mixture model

He Ma; Weipeng Wu; He Ma; Weipeng Wu

doi:10.3934/era.2024177

Electronic Research Archive

2024, Volume 32, Issue 6: 3952-3972. doi: 10.3934/era.2024177

Previous Article Next Article

Research article

A deep clustering framework integrating pairwise constraints and a VMF mixture model

He Ma ^{1
,
,},
Weipeng Wu ^{2
,
,}

1.
College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150000, China
2.
College of Software, Harbin Institute of Information Technology, Harbin 150431, China

Received: 23 April 2024 Revised: 17 May 2024 Accepted: 07 June 2024 Published: 14 June 2024

We presented a novel deep generative clustering model called Variational Deep Embedding based on Pairwise constraints and the Von Mises-Fisher mixture model (VDEPV). VDEPV consists of fully connected neural networks capable of learning latent representations from raw data and accurately predicting cluster assignments. Under the assumption of a genuinely non-informative prior, VDEPV adopted a von Mises-Fisher mixture model to depict the hyperspherical interpretation of the data. We defined and established pairwise constraints by employing a random sample mining strategy and applying data augmentation techniques. These constraints enhanced the compactness of intra-cluster samples in the spherical embedding space while improving inter-cluster samples' separability. By minimizing Kullback-Leibler divergence, we formulated a clustering loss function based on pairwise constraints, which regularized the joint probability distribution of latent variables and cluster labels. Comparative experiments with other deep clustering methods demonstrated the excellent performance of VDEPV.

Keywords:

Citation: He Ma, Weipeng Wu. A deep clustering framework integrating pairwise constraints and a VMF mixture model[J]. Electronic Research Archive, 2024, 32(6): 3952-3972. doi: 10.3934/era.2024177

Related Papers:

[1]	Yi Liu, Jiahuan Lu, Jie Yang, Feng Mao . Sentiment analysis for e-commerce product reviews by deep learning model of Bert-BiGRU-Softmax. Mathematical Biosciences and Engineering, 2020, 17(6): 7819-7837. doi: 10.3934/mbe.2020398
[2]	Quan Zhu, Xiaoyin Wang, Xuan Liu, Wanru Du, Xingxing Ding . Multi-task learning for aspect level semantic classification combining complex aspect target semantic enhancement and adaptive local focus. Mathematical Biosciences and Engineering, 2023, 20(10): 18566-18591. doi: 10.3934/mbe.2023824
[3]	Wei Hong, Yiting Gu, Linhai Wu, Xujin Pu . Impact of online public opinion regarding the Japanese nuclear wastewater incident on stock market based on the SOR model. Mathematical Biosciences and Engineering, 2023, 20(5): 9305-9326. doi: 10.3934/mbe.2023408
[4]	Xiaobo Zhang, Donghai Zhai, Yan Yang, Yiling Zhang, Chunlin Wang . A novel semi-supervised multi-view clustering framework for screening Parkinson's disease. Mathematical Biosciences and Engineering, 2020, 17(4): 3395-3411. doi: 10.3934/mbe.2020192
[5]	Jinzhu Yang, Meihan Fu, Ying Hu . Liver vessel segmentation based on inter-scale V-Net. Mathematical Biosciences and Engineering, 2021, 18(4): 4327-4340. doi: 10.3934/mbe.2021217
[6]	Ziyue Wang, Junjun Guo . Self-adaptive attention fusion for multimodal aspect-based sentiment analysis. Mathematical Biosciences and Engineering, 2024, 21(1): 1305-1320. doi: 10.3934/mbe.2024056
[7]	Ruiping Yuan, Jiangtao Dou, Juntao Li, Wei Wang, Yingfan Jiang . Multi-robot task allocation in e-commerce RMFS based on deep reinforcement learning. Mathematical Biosciences and Engineering, 2023, 20(2): 1903-1918. doi: 10.3934/mbe.2023087
[8]	Zijian Wang, Yaqin Zhu, Haibo Shi, Yanting Zhang, Cairong Yan . A 3D multiscale view convolutional neural network with attention for mental disease diagnosis on MRI images. Mathematical Biosciences and Engineering, 2021, 18(5): 6978-6994. doi: 10.3934/mbe.2021347
[9]	Zhigao Zeng, Cheng Huang, Wenqiu Zhu, Zhiqiang Wen, Xinpan Yuan . Flower image classification based on an improved lightweight neural network with multi-scale feature fusion and attention mechanism. Mathematical Biosciences and Engineering, 2023, 20(8): 13900-13920. doi: 10.3934/mbe.2023619
[10]	Kun Lan, Jianzhen Cheng, Jinyun Jiang, Xiaoliang Jiang, Qile Zhang . Modified UNet++ with atrous spatial pyramid pooling for blood cell image segmentation. Mathematical Biosciences and Engineering, 2023, 20(1): 1420-1433. doi: 10.3934/mbe.2023064

Abstract

1. Introduction

The rapid development of newly emerging techniques such as computer, mobile computing and big data has facilitated the arrival of new media era ^[1,2]. As the native of mobile Internet, college students have strong sense in pursuit of free voice, as well as strong willingness in discussion of public topics ^[3,4]. Nowadays, they have been one of the most active user groups in mobile social media ^[5,6]. But impacted by increasingly fruitful multiple information channels, the information monopoly of mainstream media has been challenged unprecedentedly ^[7,8]. Various negative and false information on the Internet are mixed with massive information, easily bringing emotional fluctuation to college students ^[9,10]. Among them, the more typical is the impact of film and television works ^[11]. At present, the variety of film and television works on the Internet is complex, which has a positive or negative impact on college students' life attitude to a certain extent ^[12]. This circumstance may even influence mental health of college students and harmony of campuses ^[13,14]. Therefore, effective sentiment analysis for campus public opinions in mobile social media ^[15], is of great importance to ideological management in colleges and universities ^[16].

During the past few years, the wide application of deep neural networks have been continuously explored in terms of its powerful information processing ability ^[17], and have also been the mainstream technology for sentiment analysis ^[18]. The most popular deep sentiment analysis methods are established upon recurrent neural networks (RNN) ^[19]. The RNN emphasizes the dependency among word-building units of text sequences, and builds a global feature representation on this basis ^[20]. From the perspective of linguistics, there is often some contextual dependency among the characters of text, which is more consistent with the RNN-based methods in scenarios ^[21]. Typical models include long short-term memory (LSTM) and its bidirectional version ^[22]. However, existing methods only considered the vertical features of the text, yet ignoring the potential horizontal features. From the perspective of model structure, they concentrated more on learning depth than learning width to some extent ^[23,24].

In order to make up the aforementioned gap, this paper extends the RNN-based models from a single scale to multi-dimensional scales. Keeping the depth stable, the number of concurrency to increase the learning fields is increased ^[25]. It is expected to get a more comprehensive semantic feature expression to handle the imbalance problem between model depth and model width. Therefore, taking English text based campus public opinion as the object, this paper proposes a novel sentiment analysis method based on multi-scale deep learning for college public opinions (named as MSLSTM-CPO for short). In order to match the bidirectional dependency of semantic sequences, Bi-LSTM is selected as the basic networking unit. Vertically, a single word in English is used as the minimum processing unit. Horizontally, more than two basic networking units are adopted to form a parallel computing structure. As a result, the model has been enhanced with respect to horizonal layer numbers, realizing the trade-off between depth and breadth. We can summarize major working points of this paper as following aspects:

● This work discusses and demonstrates semantic characteristics from both vertical direction and horizonal direction.

● This paper develops, MSLSTM-CPO, a novel sentiment analysis method via multi-scale deep learning for college public opinions.

● This work conducts some experiments to comprehensively assess performance of the proposal.

2. Related works

In recent years, sentiment analysis has received more and more attention as an important research direction in the field of natural language processing, and has gradually become a research hotspot. From the analysis of its research development history, sentiment analysis has mainly gone through three stages: lexicon based method, machine learning method and deep learning method.

In the initial stages of sentiment analysis, lexicon-based statistics on the number of sentiment subscripts was the main method. It is obvious that lexicon-based statistical methods, although easy to understand, do not develop generalisation very well. With the development of natural language processing, machine learning methods research has so far also yielded many effective results ^[26,27,28]. Rashmi et al. ^[29] proposed a soft voting classifier by integrating five baseline models of logistic regression, balanced random forest, eXtreme Gradient Boosting, random forest, and support vector machine with the task goal of classifying mixed Indian languages. The method is useful for classifying positive, negative, neutral, mixed emotions and unemotional states with better results. Maipraditj et al. ^[30] proposed a machine learning-based approach that uses three different datasets for text processing, using the N-gram IDF method for feature extraction, and then utilizes automated machine learning to classify positive, neutral emotions, and negative emotions. However, traditional machine learning methods have the drawback of not being able to combine semantic information with textual context, so deep learning is gradually becoming a research trend ^{[31,32,33,34,35,36]}. Huang et al. ^[37] explored changes in emotion in blended learning using LSTM-based text mining methods and epistemic network analysis. Jia et al. ^[38] constructed a sentiment classification model using BERT, CNN and attention mechanism methods to mine text for contextual connections and features. Harendranath et al. ^[39] proposed a modeling method based on recurrent neural network to classify the emotions of political comments.

To sum up, almost all the related works dealt with various semantic analysis problems from the perspective of vertical-directional semantics. Nevertheless, semantics modeling from both vertical and horizonal directions, still needs to be deeply discussed and considered. Therefore, the next section of this paper is going to display the proposed technical framework from such point.

3. Methodology

3.1. Overview

Currently, almost all valid natural language models are built on modelling the serialisation of natural utterances. The data for this experiment is English comment text data, and in most cases the semantic information of each comment keyword is closely related to the information of the previous text. Therefore, a neural network that can remember the context can better handle the sentiment analysis of the comment text for this experiment. The traditional LSTM model achieves the ability to remember the text and can handle the problem of internal information complexity and overload caused by long sequential data. In order to match the bidirectional dependency of semantic sequences, Bi-LSTM is chosen as the basic networking unit for this experiment. Hence, a novel sentiment analysis method for college public opinions via multi-scale deep learning is develped in this work. As shown in Figure 1, the whole workflow of the MSLSTM-CPO model consists of two parts: Word-level semantic encoding and Sentence-level semantic encoding.

Figure 1. The major workflow of the proposed MSLSTM-CPO.

DownLoad: Full-Size Img PowerPoint

3.2. Word-level semantic encoding

The data for this experiment is English text data and the classification model cannot be trained directly on the input text. Therefore, each English comment text needs to be separated using space characters and then the words are converted into a vector representation. If there are many words in the corpus, this will result in a very high dimensionality of the vector for each word. As a result, a normal one hot encoding would make the word vectors very sparse. Word embedding is a way of representing words in natural language by representing each word as a vector in a high-dimensional space. In this way, natural language computation is converted into vector computation. The detailed steps of the word embedding are shown in Figure 2.

Figure 2. The workflow diagram for the computation of word embedding.

DownLoad: Full-Size Img PowerPoint

Step 1: Dictionary lookup. The text is first cut into words using spaces to divide long sentences into a number of words. The words in the text sentence are converted into fixed ID integers by querying the dictionary.

Step 2: One-hot encoding. If the dictionary has $P$ words in its word list, each particular word can be represented by a P-dimensional vector, thus converting each ID into a fixed-length vector. For a word with ID $x$ , the $xth$ element of the vector is 1 and the rest of the $P-1$ elements are 0. This process is One-hot encoding and can be expressed as:

$\begin{equation} {v_{x = 1}} = \left[ {1, \underbrace {0, 0, \cdots , 0}_{P - 1}} \right] \end{equation}$

(3.1)

Step 3: Embedding lookup. In a real-life experimental scenario, each text is of different lengths, both long and short. In order to avoid the overall training result of the model being affected by the length of the data being too long or too short, a parameter max_seq_len is set to truncate and complement the text. After One-hot encoding, the sentence tensor is denoted as $V$ . Then, this tensor $V$ is multiplied by another dense tensor $W$ , $W \in {\mathbb{R}^{P \times l}}$ . $P$ denotes the word table size and $l$ denotes the vector size of each word. After tensor multiplication, this can then be mapped to an embedding representation $X$ , thus completing the purpose of representing words as vectors.

3.3. Sentence-level semantic encoding

Based on word-level semantic encoding, this subsection is sentence-level semantic encoding. For each input text sentence, the sentences are truncated and padded by the set global variable max_seq_len, turning them into fixed-length vectors. The processed information is then trained using the model proposed in this paper to obtain the results of sentiment classification. As shown in Figure 3, the MSLSTM-CPO model proposed in this paper has a total of four layers: embedding layer, LSTM layer, average layer and output layer.

Figure 3. Schematic diagram of the MSLSTM-CPO network structure.

DownLoad: Full-Size Img PowerPoint

3.3.1. Bidirectional LSTM layer

The LSTM model cannot process the raw text data directly, and requires Word-Level Semantic Encoding for Word Embedding, which is a vectorised representation of the input text words. Assuming a set of input sequences as ${\mathbf{X}} \in {\mathbb{R}^{B \times L \times M}}$ , where $B$ is the batch size, $L$ is the length of the sequence and $M$ is the input feature dimension, the LSTM scans the sequences sequentially from left to right and updates the internal state ${{\mathbf{C}}_t} \in {\mathbb{R}^{B \times D}}$ and the output state ${{\mathbf{H}}_t} \in {\mathbb{R}^{B \times D}}$ of the state at each moment computationally through the loop unit. $D$ denotes the dimensionality of the hidden state vector. The computational steps of the LSTM consist mainly of computing the three gates, computing the internal state and computing the output state.

Step 1: Calculating the three "doors". At moment $t$ , the loop unit of the LSTM computes a set of input gates ${{\mathbf{I}}_t}$ , oblivion gates ${{\mathbf{F}}_t}$ and output gates ${{\mathbf{O}}_t}$ using the input ${{\mathbf{X}}_t} \in {\mathbb{R}^{B \times M}}$ at the current moment and the output state ${{\mathbf{H}}_{t - 1}} \in {\mathbb{R}^{B \times D}}$ at the previous moment. This experiment uses the paddle framework to build the model, which differs from the conventional LSTM model implemented by itself, with two additional biases and the parameters in front of the input data when the matrix is multiplied. The calculation formulas are as follows:

$\begin{equation} {{\mathbf{I}}_t} = \sigma \left( {{{\mathbf{W}}_{ii}}{{\mathbf{X}}_t} + {{\mathbf{b}}_{ii}} + {{\mathbf{U}}_{hi}}{{\mathbf{H}}_{t - 1}} + {{\mathbf{b}}_{hi}}} \right) \end{equation}$

(3.2)

$\begin{equation} {{\mathbf{F}}_t} = \sigma \left( {{{\mathbf{W}}_{if}}{{\mathbf{X}}_t} + {{\mathbf{b}}_{if}} + {{\mathbf{U}}_{hf}}{{\mathbf{H}}_{t - 1}} + {{\mathbf{b}}_{hf}}} \right) \end{equation}$

(3.3)

$\begin{equation} {{\mathbf{O}}_t} = \sigma \left( {{{\mathbf{W}}_{io}}{{\mathbf{X}}_t} + {{\mathbf{b}}_{io}} + {{\mathbf{U}}_{ho}}{{\mathbf{H}}_{t - 1}} + {{\mathbf{b}}_{ho}}} \right) \end{equation}$

(3.4)

where ${{\mathbf{W}}_*} \in {\mathbb{R}^{M \times D}}$ , ${{\mathbf{U}}_*} \in {\mathbb{R}^{D \times D}}$ , ${{\mathbf{b}}_{i*}} \in {\mathbb{R}^{1 \times D}}$ , ${{\mathbf{b}}_{h*}} \in {\mathbb{R}^{1 \times D}}$ are learnable parameters and $\sigma$ is a logistic function that controls the values of the "gates" in the ${\text{(0, 1)}}$ interval. The "gates" here are all matrices of ${\text{B}}$ samples, each row being a vector of "gates" of one sample.

Step 2: Calculating internal states. The first step is to calculate the internal state of the candidate with the following equation:

$\begin{equation} {\widetilde {\mathbf{C}}_t} = \tanh \left( {{{\mathbf{W}}_{ic}}{{\mathbf{X}}_t} + {{\mathbf{b}}_{ic}} + {{\mathbf{U}}_{hc}}{{\mathbf{H}}_{t - 1}} + {{\mathbf{b}}_{hc}}} \right) \end{equation}$

(3.5)

where ${{\mathbf{W}}_{ic}} \in {\mathbb{R}^{M \times D}}, {{\mathbf{U}}_{hc}} \in {\mathbb{R}^{D \times D}}, {{\mathbf{b}}_{ic}} \in {\mathbb{R}^{1 \times D}}, {{\mathbf{b}}_{hc}} \in {\mathbb{R}^{1 \times D}}$ are learnable parameters. Next, the internal state at moment $t$ is then calculated using the use of forgetting gates and input gates, using the following equation:

$\begin{equation} {{\mathbf{C}}_t} = {{\mathbf{F}}_t} \otimes {{\mathbf{C}}_{t - 1}} + {{\mathbf{I}}_t} \otimes {\widetilde {\mathbf{C}}_t} \end{equation}$

(3.6)

where $\otimes$ denotes the element-by-element product.

Step 3: Calculating the output state. The current cell state of the LSTM can be calculated according to Eq (3.6) and as follows:

$\begin{equation} {{\mathbf{H}}_t} = {{\mathbf{O}}_t} \otimes \tanh \left( {{{\mathbf{C}}_t}} \right) \end{equation}$

(3.7)

The input of the LSTM cyclic cell structure is the internal state vector ${{\mathbf{C}}_{t - 1}} \in {\mathbb{R}^{B \times D}}$ and the hidden state vector ${{\mathbf{H}}_{t - 1}} \in {\mathbb{R}^{B \times D}}$ at moment $t - 1$ , and the output is the state vector ${{\mathbf{C}}_t} \in {\mathbb{R}^{B \times D}}$ and the hidden state vector ${{\mathbf{H}}_t} \in {\mathbb{R}^{B \times D}}$ at the current moment $t$ . With the LSTM cyclic cell, the whole network can establish longer distance temporal dependencies, thus solving the difficulty when the sequence data carried by the RNN is too long to be handled. In addition, the LSTM can help the model capture the semantic information of long sentences more fully by selectively ignoring or reinforcing the current memory and input information.

The Paddle framework's built-in LSTM model has several parameters, including direction and num_layers. Direction indicates the direction of the network iteration and can be set to forward or bidirectional, with the default being forward. Num_layers indicates the number of layers in the network and defaults to 1. The multi-layer bidirectional LSTM needs to receive a sequence of vectors to update the cyclic units with forward and reverse respectively. Therefore, it is only necessary to set the parameter direction to bidirectional and num_layers to any desired $n$ when defining the LSTM to use the multi-layer bidirectional LSTM directly.

3.3.2. Output layer

The average layer is calculated by averaging the hidden states at all positions of the bidirectional LSTM layer and then used as a representation of the whole sentence. In the experiments, the AveragePooling operator is implemented for the aggregation of hidden states. First, the sequence length vector is used to generate a mask matrix, which is used to mask the vectors that fill the placeholder positions in the text sequence. The vectors of the sequence are then summed and averaged.

The final output layer, by using Linear to output the results of the classification. Sentiment analysis is essentially a classification problem and in the practice of classification problems only the logarithmic odds of classification are usually required to be output by the model. First, the Linear layer transforms the last moment of the hidden state vector ${{\mathbf{H}}_L} \in {\mathbb{R}^{B \times D}}$ linearly and then outputs the logits of the classification. The formula is as follows:

$\begin{equation} {\mathbf{Y}} = {{\mathbf{H}}_L}{{\mathbf{W}}_{out}} + {{\mathbf{b}}_{out}} \end{equation}$

(3.8)

where ${{\mathbf{W}}_{out}} \in {\mathbb{R}^{D \times N}}$ and ${{\mathbf{b}}_{out}} \in {\mathbb{R}^N}$ are the learnable weight matrix and bias. $N$ indicates the number of classifications.

4. Experiments and analysis

4.1. Dataset and setting

As far as we know, there is no datasets about college public opinions that are publicly available. This work selects a standard dataset IMDB ^[40] in area of sentiment analysis for evaluation. It was a kind of dataset that records short reviews in social media and sentiment information is also associated. The data of IMDB comes from the Internet Movie Database, including the user's comment text and rating information for a movie. In the current Internet era, university students are an inescapable part of the film industry's consumer base. It is easy to see from everyday entertainment life that the majority of the film industry's loyal audience comes from university students who have free time. Movie reviews can be used to determine students' emotions and capture the direction of public opinion in a timely manner, thus guiding students to develop the right values.

The IMDB dataset collected the review information of many films, with a total of 50,000 English review texts. The film viewer can give a score when commenting on the film, ranging from 1–10 points. In the data processing, if the rating is below 5, it is judged as a negative review and the label is 0. If the rating is above 6, it is judged as a positive review and the label is 1. The purpose of this experiment is to determine whether the emotion expressed by users is positive or negative according to the text information of the comments. After 50,000 pieces of data are processed, each sample data includes the user's comment text and 0/1 label for a movie. There are 25,000 training data and 25,000 test data. In the training set and test set, there are 12,500 positive and negative samples respectively.

In the experiments, the MSLSTM-CPO model proposed in this paper is compared with both machine learning methods and deep learning methods, respectively. The machine learning models discussed are SVC, GaussianNB, MultinomialNB, BernoulliNB. The deep learning models are mainly discussed in the MSLSTM-CPO proposed in this paper and other baseline methods: CNN model, LSTM model, and bidirectional LSTM model. The text of the comments in this experiment varied in length, so the max-seq-len parameter was set to 256 to truncate and complement the text. For the other parameters in the model training, batch-size was set to 128, hidden layer size was set to 256 and embedding size was set to 256. Considering the time consumption of model training and test accuracy, epoch was set to 3. The learning rate was set to 0.001, 0.002 and 0.005 respectively. The num_layers of the MSLSTM-CPO model was set to 3. According to the composition of the initial dataset, the split ratio between the training and test sets was 50%. In order to fully test the validity of the experiments, four common evaluation metrics of classification models were used: accuracy, precision, recall and F1-score. Their detailed descriptions can be found from references like ^[2,41] and are left out here.

4.2. Results and discussion

4.2.1. Machine learning methods results and discussion

Sentiment analysis is essentially a binary classification problem, so one SVC model and three typical Naive Bayes-based methods were chosen as machine learning comparison models for this experiment. They are named as Gaussian Naive Bayes, Multinomial Naive Bayes, and Bernoulli Naive Bayes, separately. Among them, GaussianNB is Naive Bayes with Gaussian distribution a priori, MultinomialNB is Naive Bayes with polynomial distribution a priori, and BernoulliNB is Naive Bayes with Bernoulli distribution a priori. Machine learning-based sentiment classification models generally have only simple steps: data pre-processing, text vectorisation, and training the classifier.

As shown in Table 1, this experiment computes the four metric values for the five models. The learning rate of the MSLSTM-CPO model is 0.002, and the other parameters are kept constant. From the results, it can be seen that the machine learning method can generally only achieve an accuracy of 0.5 for classification, while the MSLSTM-CPO model can achieve as high as 0.86, which is much higher than the classification effectiveness of the machine learning methods. A visualisation of the evaluation results of MSLSTM-CPO compared to four traditional machine learning models is shown in Figure 6. Its horizontal coordinates indicate the four metric types and the vertical coordinates indicate the magnitude of the values. It is clear from this that the MSLSTM-CPO model, indicated by the blue bars, performs approximately 50% better than the other machine learning models on all four evaluation metrics. Thus, it is again verified that deep learning models are generally able to perform better than traditional machine learning models in terms of sentiment analysis problems.

Table 1. Experimental results of machine learning models.

Model	Accuracy	F1-score	Recall	Precision
SVC	0.5206	0.4889	0.5275	0.5206
GaussianNB	0.5057	0.4262	0.5075	0.5128
MultinomialNB	0.4936	0.4935	0.4936	0.4936
BernoulliNB	0.5	0.3333	0.5	0.25
MSLSTM-CPO	0.8678	0.8676	0.86755	0.86905

| Show Table

DownLoad: CSV

4.2.2. Deep learning methods results and discussion

In order to discuss the effectiveness of the MSLSTM-CPO proposed in this paper, three other typical deep learning methods are chosen as comparison models: CNN model, LSTM model, and bidirectional LSTM model.

As shown in Figures 4 and 5, the trends of accuracy and loss in the training phase of the models are indicated for the four methods when the learning rates are set to 0.001, 0.002, 0.003 respectively. For each subplot, the x-axis represents the number of iterative rounds ranging from 1 to 600. The y-axis in Figure 4 represents the value of the accuracy of the model training, and the y-axis in Figure 5 represents the value of the loss of the model training. Each subplot is plotted from 30 consecutive sample points of data, representing the training trend of the same model at different learning rates. It is clear from this that the red curve with the learning rate set to 0.002 shows better results in the training phase of the four models, with the MSLSTM-CPO model showing the most intuitive comparison of trends. As the number of iteration rounds increases, the training accuracy of the four models shows an increasing trend and the training loss shows a decreasing trend, eventually tending to stabilise. This demonstrates that the models are working properly and reasonably well.

Figure 4. Tendency of four methods with respect to training accracy.

DownLoad: Full-Size Img PowerPoint

Figure 5. Tendency of four methods with respect to training loss.

DownLoad: Full-Size Img PowerPoint

Figure 6. The evaluation results of MSLSTM-CPO with four machine learning models.

DownLoad: Full-Size Img PowerPoint

The above trends only represent the normal training of the model, but only a comparative analysis with each evaluation index can reflect the comprehensive expression ability of the model. Therefore, in this experiment, the learning rate was used as the variable to calculate the evaluation index values of the four models MSLSTM-CPO, CNN, LSTM and Bi-LSTM respectively, while the other parameters were set unchanged. As shown in Table 2, the learning rates were set to 0.001, 0.002 and 0.005, respectively. Sentiment analysis is essentially a dichotomous problem, and the commonly used evaluation metrics are accuracy, precision, recall and F1-score. From the cross-sectional data of the table, the evaluation metrics of each model are stable under different parameter settings. From the analysis of the longitudinal data in the table, the MSLSTM-CPO model works best when the learning rate is 0.002, with an ACC of 0.8678, which is 2, 3 and 1.7% higher than CNN, LSTM and Bi-LSTM respectively. The evaluation metrics of the MSLSTM-CPO model were the highest and stable for different learning rate settings. As shown in Figure 7, the representation is a visualisation of the evaluation results of MSLSTM-CPO compared to the other three deep learning models. It has three subfigures that demonstrate such visualization effect from two angles. For Figure 7(a), it demonstrates comparison of four metric values between MSLSTM-CPO and other three methods. For Figure 7(b), (c), they select two typical metrics (Accuracy and F1-score) and display the evaluation results of the four models under three learning rate values (0.001, 0.002 and 0.005). This figure can clearly reflect the fact that the proposed MSLSTM-CPO has better performance results compared with baseline methods.

Table 2. Experimental results of deep learning models.

Learning rate	Model	Accuracy	F1-score	Recall	Precision
0.001	CNN	0.8565	0.8565	0.8565	0.8566
	LSTM	0.8465	0.84645	0.8465	0.84675
	Bi-LSTM	0.8485	0.8483	0.8487	0.85005
	MSLSTM-CPO	0.8585	0.85815	0.85555	0.85885
0.002	CNN	0.8516	0.8515	0.85165	0.85305
	LSTM	0.8449	0.84485	0.8452	0.84605
	Bi-LSTM	0.8535	0.85345	0.8537	0.8545
	MSLSTM-CPO	0.8678	0.8676	0.86755	0.86905
0.005	CNN	0.83225	0.83255	0.83225	0.8322
	LSTM	0.8411	0.8412	0.8412	0.8411
	Bi-LSTM	0.84645	0.84545	0.8471	0.85485
	MSLSTM-CPO	0.85225	0.8522	0.85235	0.8529

| Show Table

DownLoad: CSV

Figure 7. The evaluation results of MSLSTM-CPO with three deep learning models.

DownLoad: Full-Size Img PowerPoint

4.2.3. Discussion

Currently, sentiment analysis tasks are generally well developed in both machine learning and deep learning approaches. This experimental design compares the MSLSTM-CPO model proposed in this paper with traditional machine learning models and deep learning models respectively. Based on the data in Tables 1 and 2, it can be concluded that the comprehensive performance of the MSLSTM-CPO model has some validity and reliability compared to these models.

The reason why the MSLSTM-CPO model proposed in this paper can achieve better performance can be explained from the perspectives of both machine learning models and deep learning models respectively. First, the focus of the sentiment analysis task is on the exploitation of textual content. Traditional machine learning methods focus on feature extraction from a large amount of labelled data, which leads to classification results. This approach neglects the contextual coherence of the text content and leads to poor model training results. Secondly, RNN and LSTM models with memory capability are commonly used in sentiment analysis tasks. However, only LSTM can adapt to data with long sequences for memorisation. Therefore, the MSLSTM-CPO model proposed in this paper is based on the LSTM. While a normal LSTM model construction is unidirectional and has only one layer, MSLSTM-CPO is a bidirectional and multilayer model. As a result, the model is enhanced in terms of the number of layers, achieving a certain degree of balance between depth and breadth.

In summary, the MSLSTM-CPO model proposed in this paper can effectively implement a new type of college opinion analysis. And compared with machine learning and deep learning baseline methods, MSLSTM-CPO has better classification performance.

5. Conclusions

This paper successfully proposes a MSLSTM-CPO model based on multi-scale deep learning for sentiment analysis of university students. Firstly, the method performs a word embedding operation on the text. Secondly, Bi-LSTM is chosen as the basic networking unit for layer superposition, and the sentiment classification results are obtained by embedding layer, LSTM layer, average layer and linear output layer. The limitations of traditional methods are balanced by the combination of depth and breadth of the model. Experiments on real datasets show that the MSLSTM-CPO model exhibits better performance than traditional machine learning models and commonly used deep learning models.

In future work, it is expected that the user's image and audio information can be mined and fused with textual information in the future to better improve the efficiency and accuracy of sentiment analysis of campus opinion. In addition, this study will further explore more machine learning and deep learning methods in future work. Their performance characteristics in real-world applications will be analysed in order to investigate sentiment analysis tasks based on deep learning frameworks more effectively. This will help to promote a positive attitude towards life and the development of mental health among university students.

6. Acknowledgement

This work was funded by the Researchers Supporting Project number (RSPD2023R681) King Saud University, Riyadh, Saudi Arabia, and also funded by Humanities and Social Sciences Project of Chongqing Municipal Education Commission (22SKSZ047).

Conflict of interest

The authors declare there is no conflicts of interest.

References

[1]	A. E. Ezugwu, A. M. Ikotun, O. O. Oyelade, L. Abualigah, J. O. Agushaka, C. I. Eke, et al., A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects, Eng. Appl. Artif. Intell., 110 (2022), 73–89. https://doi.org/10.1016/j.engappai.2022.104743 doi: 10.1016/j.engappai.2022.104743
[2]	S. Zhou, H. Xu, Z. Zheng, J. Chen, Z. li, J. Bu, et al., A comprehensive survey on deep clustering: Taxonomy, challenges, and future directions, preprint, arXiv: 2206.07579. https://doi.org/10.48550/arXiv.2206.07579
[3]	K. A. István, F. Róbert, G. Péter, Unsupervised clustering for deep learning: A tutorial survey, Acta Polytech. Hung., 15 (2018), 29–53. https://doi.org/10.12700/APH.15.8.2018.8.2 doi: 10.12700/APH.15.8.2018.8.2
[4]	T. R. Davidson, L. Falorsi, N. D. Cao, T. Kipf, J. M. Tomczak, Hyperspherical variational auto-encoders, in 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018, (2018), 856–865.
[5]	K. V. Mardia, P. E. Jupp, K. V. Mardia, Directional Statistics, John Wiley & Sons, 2000. https://doi.org/10.1002/9780470316979
[6]	J. Taghia, Z. Ma, A. Leijon, Bayesian estimation of the von-Mises Fisher mixture model with variational inference, IEEE Trans. Pattern Anal. Mach. Intell., 36 (2014), 1701–1715. https://doi.org/10.1109/TPAMI.2014.2306426 doi: 10.1109/TPAMI.2014.2306426
[7]	F. Yuan, L. Zhang, J. She, X. Xia, G. Li, Theories and applications of auto-encoder neural networks: A literature survey, Chin. J. Comput., 42 (2019), 203–230. https://doi.org/10.11897/SP.J.1016.2019.00203 doi: 10.11897/SP.J.1016.2019.00203
[8]	S. Zhang, C. You, R. Vidal, C. Li, Learning a self-expressive network for subspace clustering, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 12393–12403. https://doi.org/10.1109/CVPR46437.2021.01221
[9]	Y. Tao, K. Takagi, K. Nakata, Clustering-friendly representation learning via instance discrimination and feature decorrelation, preprint, arXiv: 2106.00131. https://doi.org/10.48550/arXiv.2106.00131
[10]	Z. Dang, C. Deng, X. Yang, K. Wei, H. Huang, Nearest neighbor matching for deep clustering, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 13693–13702. https://doi.org/10.1109/CVPR46437.2021.01348
[11]	M. Nasrazadani, A. Fatemi, M. Nematbakhsh, Sign prediction in sparse social networks using clustering and collaborative filtering, J. Supercomput., 78 (2022), 596–615. https://doi.org/10.1007/s11227-021-03902-5 doi: 10.1007/s11227-021-03902-5
[12]	N. Alami, M. Meknassi, N. En-nahnahi, Y. E. Adlouni, O. Ammor, Unsupervised neural networks for automatic arabic text summarization using document clustering and topic modeling, Expert Syst. Appl., 172 (2021). https://doi.org/10.1016/j.eswa.2021.114652
[13]	J. Xie, R. Girshick, A. Farhad, Unsupervised deep embedding for clustering analysis, in International Conference on Machine Learning, (2016), 478–487.
[14]	X. Ye, C. Wang, A. Imakura, T. Sakurai, Spectral clustering joint deep embedding learning by autoencoder, in 2021 International Joint Conference on Neural Networks (IJCNN), 2021. https://doi.org/10.1109/IJCNN52387.2021.9533825
[15]	K. Thirumoorthy, K. Muneeswaran, A hybrid approach for text document clustering using Jaya optimization algorithm, Expert Syst. Appl., 178 (2021). https://doi.org/10.1016/j.eswa.2021.115040
[16]	J. Cai, J. Fan, W. Guo, S. Wang, Y. Zhang, Z. Zhang, Efficient deep embedded subspace clustering, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022. https://doi.org/10.1109/CVPR52688.2022.00012
[17]	Y. Li, P. Hu, Z. Liu, D. Peng, J. T. Zhou, X. Peng, Contrastive clustering, in Proceedings of the AAAI Conference on Artificial Intelligence, 35 (2021), 8547–8555. https://doi.org/10.1609/aaai.v35i10.17037
[18]	K. Do, T. Tran, S. Venkatesh, Clustering by maximizing mutual information across views, in Proceedings of the AAAI Conference on Artificial Intelligence, (2021), 9928–9938. https://doi.org/10.1109/ICCV48922.2021.00978
[19]	Y. Shen, Z. Shen, M. Wang, J. Qin, P. H. S. Torr, L. Shao, You never cluster alone, Adv. Neural Inf. Process. Syst., 34 (2021), 27734–27746.
[20]	H. Zhong, J. Wu, C. Chen, J. Huang, M. Deng, L. Nie, et al., Graph contrastive clustering, in Proceedings of the AAAI Conference on Artificial Intelligence, (2021), 9224–9233. https://doi.org/10.1109/ICCV48922.2021.00909
[21]	Q. Ji, Y. Sun, J. Gao, Y. Hu, B. Yin, A decoder-free variational deep embedding for unsupervised clustering, IEEE Trans. Neural Networks Learn. Syst., 33 (2021), 5681–5693. https://doi.org/10.1109/TNNLS.2021.3071275 doi: 10.1109/TNNLS.2021.3071275
[22]	W. Wang, J. Bao, S. Guo, Neural generative model for clustering by separating particularity and commonality, Inf. Sci., 589 (2022), 813–826. https://doi.org/10.1016/j.ins.2021.12.037 doi: 10.1016/j.ins.2021.12.037
[23]	J. Mirecka, M. Famili, A. Kota'nska, N. Juraschko, B. Costa-Gomes, C. Palmer, et al., Affinity-VAE for disentanglement, clustering and classification of objects in multidimensional image data, preprint, arXiv: 2209.04517. https://doi.org/10.48550/arXiv.2209.04517
[24]	J. Xu, Y. Ren, H. Tang, X. Pu, X. Zhu, M. Zeng, et al., Multi-VAE: Learning disentangled view-common and view-peculiar visual representations for multi-view clustering, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 9234–9243. https://doi.org/10.1109/ICCV48922.2021.00910
[25]	G. Chen, S. Long, Z. Yuan, W. Zhu, Q. Chen, Y. Wu, Ising granularity image analysis on VAE–GAN, Mach. Vision Appl., 33 (2022). https://doi.org/10.1007/s00138-022-01338-2
[26]	E. Palumbo, S. Laguna, D. Chopard, J. E. Vog, Deep generative clustering with multimodal variational autoencoders, in ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling, 2023.
[27]	L. Yang, C. Cheung, J. Li, J. Fang, Deep clustering by gaussian mixture variational autoencoders with graph embedding, in Proceedings of the AAAI Conference on Artificial Intelligence, (2019), 6440–6449.
[28]	Y. Liang, Z. Lin, F. Yuan, H. Zhang, L. Wang, W. Wang, Towards polymorphic adversarial examples generation for short text, in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. https://doi.org/10.1109/ICASSP49357.2023.10095612
[29]	K. Yonekura, Quantitative analysis of latent space in airfoil shape generation using variational autoencoders, Trans. JSME, 87 (2021). https://doi.org/10.1299/transjsme.21-00212
[30]	T. Nishida, T. Endo, Y. Kawaguchi, Zero-Shot domain adaptation of anomalous samples for semi-supervised anomaly detection, in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. https://doi.org/10.1109/ICASSP49357.2023.10095897
[31]	D. Nat, A. M. M. Pedro, G. Marta, C. H. L. Matthew, H. Salimbeni, A. Kai, et al., Deep unsupervised clustering with gaussian mixture variational autoencoders, preprint, arXiv: 1611.02648. https://doi.org/10.48550/arXiv.1611.02648
[32]	W. Wu, Y. Liu, M. Guo, Constructing training distribution by minimizing variance of risk criterion for visual category learning, in 2012 19th IEEE International Conference on Image Processing, (2012), 101–104. https//doi.org/10.1109/ICIP.2012.6466805
[33]	W. Wu, Y. Liu, W. Zeng, M. Guo, C. Wang, X. Liu, Effective constructing training sets for object detection, in 2013 IEEE International Conference on Image Processing, (2013), 3377–3380. https//doi.org/10.1109/ICIP.2013.6738696
[34]	Z. Jiang, Y. Zheng, H. Tan, B. Tang, H. Zhou, Variational deep embedding: An unsupervised and generative approach to clustering, preprint, arXiv: 1611.05148. https://doi.org/10.48550/arXiv.1611.05148
[35]	W. Liu, Y. Zhang, X. Li, Z. Liu, B. Dai, T. Zhao, et al., Deep hyperspherical learning, Adv. Neural Inf. Process. Syst., 30 (2017).
[36]	Q. Li, W. Fan, Mixture density hyperspherical generative adversarial networks, in Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence, (2022), 31–37. https://doi.org/10.1145/3529466.3529475
[37]	L. Yang, W. Fan, N. Bouguila, Deep clustering analysis via dual variational autoencoder with spherical latent embeddings, IEEE Trans. Neural Networks Learn. Syst., 34 (2021), 6303–6312. https://doi.org/10.1109/TNNLS.2021.3135460 doi: 10.1109/TNNLS.2021.3135460
[38]	W. Fan, H. Huang, C. Liang, X. Liu, S. Peng, Unsupervised meta-learning via spherical latent representations and dual VAE-GAN, Appl. Intell., 53 (2023), 22775–22788. https://doi.org/10.1007/s10489-023-04760-9 doi: 10.1007/s10489-023-04760-9
[39]	S. Basu, A. Banerjee, R. Mooney, Active semi-supervision for pairwise constrained clustering, in Proceedings of the 2004 SIAM International Conference on Data Mining, (2004), 333–344. https://doi.org/10.1137/1.9781611972740.31
[40]	K. Wagstaff, C. Cardie, S. Rogers, S. Schroedl, Constrained k-means clustering with background knowledge, in Proceedings of the Eighteenth International Conference on Machine Learning, 1 (2001), 577–584.
[41]	J. Goschenhofer, B. Bischl, Z. Kira, ConstraintMatch for semi-constrained clustering, in 2023 International Joint Conference on Neural Networks (IJCNN), (2023). https://doi.org/10.1109/IJCNN54540.2023.10191186
[42]	L. Manduchi, K. Chin-Cheong, H. Michel, S. Wellmann, J. E. Vogt, Deep conditional gaussian mixture model for constrained clustering, Neural Inf. Process. Syst., 34 (2021), 11303–11314.
[43]	S. E. Hajjar, F. Dornaika, F. Abdallah, Multi-view spectral clustering via constrained nonnegative embedding, Inf. Fusion, 78 (2021), 209–217. https://doi.org/10.1016/j.inffus.2021.09.009 doi: 10.1016/j.inffus.2021.09.009
[44]	J. Lv, Z. Kang, X. Lu, Z. Xu, Pseudo-Supervised deep subspace clustering, IEEE Trans. Image Process., 30 (2021), 5252–5263. https://doi.org/10.1109/TIP.2021.3079800 doi: 10.1109/TIP.2021.3079800
[45]	L. Bai, J. Liang, Y. Zhao, Self-Constrained spectral clustering, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2022), 5126–5138. https://doi.org/10.1109/TPAMI.2022.3188160 doi: 10.1109/TPAMI.2022.3188160
[46]	C. Hinojosa, E. Vera, H. Arguello, A fast and accurate similarity-constrained subspace clustering algorithm for hyperspectral image, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 14 (2021), 10773–10783. https//doi.org/10.1109/JSTARS.2021.3120071 doi: 10.1109/JSTARS.2021.3120071

This article has been cited by:

1.	Chigorizim Onvusiribe, Galina Astratova, Nataliya Simchenko, Features of the application of semantic and sentiment analysis methods in the process of evaluating the effectiveness of digital educational technologies in Russian universities, 2024, 0, 2221-3260, 39, 10.52957/2221-3260-2024-7-39-58
2.	Chafika Ouni, Emna Benmohamed, Hela Ltifi, Sentiment analysis deep learning model based on a novel hybrid embedding method, 2024, 14, 1869-5469, 10.1007/s13278-024-01367-x
3.	Abdulfattah Ba Alawi, Ferhat Bozkurt, A hybrid machine learning model for sentiment analysis and satisfaction assessment with Turkish universities using Twitter data, 2024, 11, 27726622, 100473, 10.1016/j.dajour.2024.100473
4.	Chafika Ouni, Emna Benmohamed, Hela Ltifi, Deep learning-based Soft word embedding approach for sentiment analysis, 2024, 246, 18770509, 1355, 10.1016/j.procs.2024.09.720
5.	Y Swathi, Mahesh Kumar Jha, Manoj Challa, 2024, Topic Modeling and EDA for Analyzing User Sentiments for ChatGPT, 979-8-3503-8520-5, 1, 10.1109/SPARC61891.2024.10829066

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Electronic Research Archive

1 1.3

Metrics

Article views(1086) PDF downloads(36) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(7) / Tables(2)

Electronic Research Archive

A deep clustering framework integrating pairwise constraints and a VMF mixture model

Related Papers:

Abstract

1. Introduction

2. Related works

3. Methodology

3.1. Overview

3.2. Word-level semantic encoding

3.3. Sentence-level semantic encoding

3.3.1. Bidirectional LSTM layer

3.3.2. Output layer

4. Experiments and analysis

4.1. Dataset and setting

4.2. Results and discussion

4.2.1. Machine learning methods results and discussion

4.2.2. Deep learning methods results and discussion

4.2.3. Discussion

5. Conclusions

6. Acknowledgement

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Electronic Research Archive

A deep clustering framework integrating pairwise constraints and a VMF mixture model

Related Papers:

Abstract

1. Introduction

2. Related works

3. Methodology

3.1. Overview

3.2. Word-level semantic encoding

3.3. Sentence-level semantic encoding

3.3.1. Bidirectional LSTM layer

3.3.2. Output layer

4. Experiments and analysis

4.1. Dataset and setting

4.2. Results and discussion

4.2.1. Machine learning methods results and discussion

4.2.2. Deep learning methods results and discussion

4.2.3. Discussion

5. Conclusions

6. Acknowledgement

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog