
Hydrocephalus is a neurodevelopmental, X-linked recessive disorder caused by mutations in the L1CAM gene. The L1CAM gene encodes for L1CAM protein which is essential for the nervous system development including adhesion between neurons, Myelination, Synaptogenesis etc. Herein, the present study has reported mutations in L1 syndrome patient with Hydrocephalus and Adducted thumb. Genomic DNA was extracted from patients whole blood (n = 18). The 11 exons of the L1CAM gene were amplified using specific PCR primers. The sequenced data was analysed and the pathogenicity of the mutation was predicted using the various bioinformatics programs: PROVEAN, PolyPhen2, and MUpro. The results revealed that the proband described here had nonsense mutation G1120→T at position 1120 in exon 9 which is in extracellular immunoglobulin domain (Ig4) of the L1CAM gene. This nonsense mutation is found to be truncated with a deleterious effect on developing brain of the child, and this is the first report of this novel mutation in patient with X-linked Hydrocephalus in India.
Citation: Madhan Srinivasamurthy, Nagaraj Kakanahalli, Shreeshail V. Benakanal. A truncation mutation in the L1CAM gene in a child with hydrocephalus[J]. AIMS Molecular Science, 2021, 8(4): 223-232. doi: 10.3934/molsci.2021017
[1] | Yi Liu, Jiahuan Lu, Jie Yang, Feng Mao . Sentiment analysis for e-commerce product reviews by deep learning model of Bert-BiGRU-Softmax. Mathematical Biosciences and Engineering, 2020, 17(6): 7819-7837. doi: 10.3934/mbe.2020398 |
[2] | Quan Zhu, Xiaoyin Wang, Xuan Liu, Wanru Du, Xingxing Ding . Multi-task learning for aspect level semantic classification combining complex aspect target semantic enhancement and adaptive local focus. Mathematical Biosciences and Engineering, 2023, 20(10): 18566-18591. doi: 10.3934/mbe.2023824 |
[3] | Wei Hong, Yiting Gu, Linhai Wu, Xujin Pu . Impact of online public opinion regarding the Japanese nuclear wastewater incident on stock market based on the SOR model. Mathematical Biosciences and Engineering, 2023, 20(5): 9305-9326. doi: 10.3934/mbe.2023408 |
[4] | Xiaobo Zhang, Donghai Zhai, Yan Yang, Yiling Zhang, Chunlin Wang . A novel semi-supervised multi-view clustering framework for screening Parkinson's disease. Mathematical Biosciences and Engineering, 2020, 17(4): 3395-3411. doi: 10.3934/mbe.2020192 |
[5] | Jinzhu Yang, Meihan Fu, Ying Hu . Liver vessel segmentation based on inter-scale V-Net. Mathematical Biosciences and Engineering, 2021, 18(4): 4327-4340. doi: 10.3934/mbe.2021217 |
[6] | Ziyue Wang, Junjun Guo . Self-adaptive attention fusion for multimodal aspect-based sentiment analysis. Mathematical Biosciences and Engineering, 2024, 21(1): 1305-1320. doi: 10.3934/mbe.2024056 |
[7] | Ruiping Yuan, Jiangtao Dou, Juntao Li, Wei Wang, Yingfan Jiang . Multi-robot task allocation in e-commerce RMFS based on deep reinforcement learning. Mathematical Biosciences and Engineering, 2023, 20(2): 1903-1918. doi: 10.3934/mbe.2023087 |
[8] | Zijian Wang, Yaqin Zhu, Haibo Shi, Yanting Zhang, Cairong Yan . A 3D multiscale view convolutional neural network with attention for mental disease diagnosis on MRI images. Mathematical Biosciences and Engineering, 2021, 18(5): 6978-6994. doi: 10.3934/mbe.2021347 |
[9] | Zhigao Zeng, Cheng Huang, Wenqiu Zhu, Zhiqiang Wen, Xinpan Yuan . Flower image classification based on an improved lightweight neural network with multi-scale feature fusion and attention mechanism. Mathematical Biosciences and Engineering, 2023, 20(8): 13900-13920. doi: 10.3934/mbe.2023619 |
[10] | Kun Lan, Jianzhen Cheng, Jinyun Jiang, Xiaoliang Jiang, Qile Zhang . Modified UNet++ with atrous spatial pyramid pooling for blood cell image segmentation. Mathematical Biosciences and Engineering, 2023, 20(1): 1420-1433. doi: 10.3934/mbe.2023064 |
Hydrocephalus is a neurodevelopmental, X-linked recessive disorder caused by mutations in the L1CAM gene. The L1CAM gene encodes for L1CAM protein which is essential for the nervous system development including adhesion between neurons, Myelination, Synaptogenesis etc. Herein, the present study has reported mutations in L1 syndrome patient with Hydrocephalus and Adducted thumb. Genomic DNA was extracted from patients whole blood (n = 18). The 11 exons of the L1CAM gene were amplified using specific PCR primers. The sequenced data was analysed and the pathogenicity of the mutation was predicted using the various bioinformatics programs: PROVEAN, PolyPhen2, and MUpro. The results revealed that the proband described here had nonsense mutation G1120→T at position 1120 in exon 9 which is in extracellular immunoglobulin domain (Ig4) of the L1CAM gene. This nonsense mutation is found to be truncated with a deleterious effect on developing brain of the child, and this is the first report of this novel mutation in patient with X-linked Hydrocephalus in India.
The rapid development of newly emerging techniques such as computer, mobile computing and big data has facilitated the arrival of new media era [1,2]. As the native of mobile Internet, college students have strong sense in pursuit of free voice, as well as strong willingness in discussion of public topics [3,4]. Nowadays, they have been one of the most active user groups in mobile social media [5,6]. But impacted by increasingly fruitful multiple information channels, the information monopoly of mainstream media has been challenged unprecedentedly [7,8]. Various negative and false information on the Internet are mixed with massive information, easily bringing emotional fluctuation to college students [9,10]. Among them, the more typical is the impact of film and television works [11]. At present, the variety of film and television works on the Internet is complex, which has a positive or negative impact on college students' life attitude to a certain extent [12]. This circumstance may even influence mental health of college students and harmony of campuses [13,14]. Therefore, effective sentiment analysis for campus public opinions in mobile social media [15], is of great importance to ideological management in colleges and universities [16].
During the past few years, the wide application of deep neural networks have been continuously explored in terms of its powerful information processing ability [17], and have also been the mainstream technology for sentiment analysis [18]. The most popular deep sentiment analysis methods are established upon recurrent neural networks (RNN) [19]. The RNN emphasizes the dependency among word-building units of text sequences, and builds a global feature representation on this basis [20]. From the perspective of linguistics, there is often some contextual dependency among the characters of text, which is more consistent with the RNN-based methods in scenarios [21]. Typical models include long short-term memory (LSTM) and its bidirectional version [22]. However, existing methods only considered the vertical features of the text, yet ignoring the potential horizontal features. From the perspective of model structure, they concentrated more on learning depth than learning width to some extent [23,24].
In order to make up the aforementioned gap, this paper extends the RNN-based models from a single scale to multi-dimensional scales. Keeping the depth stable, the number of concurrency to increase the learning fields is increased [25]. It is expected to get a more comprehensive semantic feature expression to handle the imbalance problem between model depth and model width. Therefore, taking English text based campus public opinion as the object, this paper proposes a novel sentiment analysis method based on multi-scale deep learning for college public opinions (named as MSLSTM-CPO for short). In order to match the bidirectional dependency of semantic sequences, Bi-LSTM is selected as the basic networking unit. Vertically, a single word in English is used as the minimum processing unit. Horizontally, more than two basic networking units are adopted to form a parallel computing structure. As a result, the model has been enhanced with respect to horizonal layer numbers, realizing the trade-off between depth and breadth. We can summarize major working points of this paper as following aspects:
● This work discusses and demonstrates semantic characteristics from both vertical direction and horizonal direction.
● This paper develops, MSLSTM-CPO, a novel sentiment analysis method via multi-scale deep learning for college public opinions.
● This work conducts some experiments to comprehensively assess performance of the proposal.
In recent years, sentiment analysis has received more and more attention as an important research direction in the field of natural language processing, and has gradually become a research hotspot. From the analysis of its research development history, sentiment analysis has mainly gone through three stages: lexicon based method, machine learning method and deep learning method.
In the initial stages of sentiment analysis, lexicon-based statistics on the number of sentiment subscripts was the main method. It is obvious that lexicon-based statistical methods, although easy to understand, do not develop generalisation very well. With the development of natural language processing, machine learning methods research has so far also yielded many effective results [26,27,28]. Rashmi et al. [29] proposed a soft voting classifier by integrating five baseline models of logistic regression, balanced random forest, eXtreme Gradient Boosting, random forest, and support vector machine with the task goal of classifying mixed Indian languages. The method is useful for classifying positive, negative, neutral, mixed emotions and unemotional states with better results. Maipraditj et al. [30] proposed a machine learning-based approach that uses three different datasets for text processing, using the N-gram IDF method for feature extraction, and then utilizes automated machine learning to classify positive, neutral emotions, and negative emotions. However, traditional machine learning methods have the drawback of not being able to combine semantic information with textual context, so deep learning is gradually becoming a research trend [31,32,33,34,35,36]. Huang et al. [37] explored changes in emotion in blended learning using LSTM-based text mining methods and epistemic network analysis. Jia et al. [38] constructed a sentiment classification model using BERT, CNN and attention mechanism methods to mine text for contextual connections and features. Harendranath et al. [39] proposed a modeling method based on recurrent neural network to classify the emotions of political comments.
To sum up, almost all the related works dealt with various semantic analysis problems from the perspective of vertical-directional semantics. Nevertheless, semantics modeling from both vertical and horizonal directions, still needs to be deeply discussed and considered. Therefore, the next section of this paper is going to display the proposed technical framework from such point.
Currently, almost all valid natural language models are built on modelling the serialisation of natural utterances. The data for this experiment is English comment text data, and in most cases the semantic information of each comment keyword is closely related to the information of the previous text. Therefore, a neural network that can remember the context can better handle the sentiment analysis of the comment text for this experiment. The traditional LSTM model achieves the ability to remember the text and can handle the problem of internal information complexity and overload caused by long sequential data. In order to match the bidirectional dependency of semantic sequences, Bi-LSTM is chosen as the basic networking unit for this experiment. Hence, a novel sentiment analysis method for college public opinions via multi-scale deep learning is develped in this work. As shown in Figure 1, the whole workflow of the MSLSTM-CPO model consists of two parts: Word-level semantic encoding and Sentence-level semantic encoding.
The data for this experiment is English text data and the classification model cannot be trained directly on the input text. Therefore, each English comment text needs to be separated using space characters and then the words are converted into a vector representation. If there are many words in the corpus, this will result in a very high dimensionality of the vector for each word. As a result, a normal one hot encoding would make the word vectors very sparse. Word embedding is a way of representing words in natural language by representing each word as a vector in a high-dimensional space. In this way, natural language computation is converted into vector computation. The detailed steps of the word embedding are shown in Figure 2.
Step 1: Dictionary lookup. The text is first cut into words using spaces to divide long sentences into a number of words. The words in the text sentence are converted into fixed ID integers by querying the dictionary.
Step 2: One-hot encoding. If the dictionary has P words in its word list, each particular word can be represented by a P-dimensional vector, thus converting each ID into a fixed-length vector. For a word with ID x, the xth element of the vector is 1 and the rest of the P−1 elements are 0. This process is One-hot encoding and can be expressed as:
vx=1=[1,0,0,⋯,0⏟P−1] | (3.1) |
Step 3: Embedding lookup. In a real-life experimental scenario, each text is of different lengths, both long and short. In order to avoid the overall training result of the model being affected by the length of the data being too long or too short, a parameter max_seq_len is set to truncate and complement the text. After One-hot encoding, the sentence tensor is denoted as V. Then, this tensor V is multiplied by another dense tensor W, W∈RP×l. P denotes the word table size and l denotes the vector size of each word. After tensor multiplication, this can then be mapped to an embedding representation X, thus completing the purpose of representing words as vectors.
Based on word-level semantic encoding, this subsection is sentence-level semantic encoding. For each input text sentence, the sentences are truncated and padded by the set global variable max_seq_len, turning them into fixed-length vectors. The processed information is then trained using the model proposed in this paper to obtain the results of sentiment classification. As shown in Figure 3, the MSLSTM-CPO model proposed in this paper has a total of four layers: embedding layer, LSTM layer, average layer and output layer.
The LSTM model cannot process the raw text data directly, and requires Word-Level Semantic Encoding for Word Embedding, which is a vectorised representation of the input text words. Assuming a set of input sequences as X∈RB×L×M, where B is the batch size, L is the length of the sequence and M is the input feature dimension, the LSTM scans the sequences sequentially from left to right and updates the internal state Ct∈RB×D and the output state Ht∈RB×D of the state at each moment computationally through the loop unit. D denotes the dimensionality of the hidden state vector. The computational steps of the LSTM consist mainly of computing the three gates, computing the internal state and computing the output state.
Step 1: Calculating the three "doors". At moment t, the loop unit of the LSTM computes a set of input gates It, oblivion gates Ft and output gates Ot using the input Xt∈RB×M at the current moment and the output state Ht−1∈RB×D at the previous moment. This experiment uses the paddle framework to build the model, which differs from the conventional LSTM model implemented by itself, with two additional biases and the parameters in front of the input data when the matrix is multiplied. The calculation formulas are as follows:
It=σ(WiiXt+bii+UhiHt−1+bhi) | (3.2) |
Ft=σ(WifXt+bif+UhfHt−1+bhf) | (3.3) |
Ot=σ(WioXt+bio+UhoHt−1+bho) | (3.4) |
where W∗∈RM×D, U∗∈RD×D, bi∗∈R1×D, bh∗∈R1×D are learnable parameters and σ is a logistic function that controls the values of the "gates" in the (0, 1) interval. The "gates" here are all matrices of B samples, each row being a vector of "gates" of one sample.
Step 2: Calculating internal states. The first step is to calculate the internal state of the candidate with the following equation:
˜Ct=tanh(WicXt+bic+UhcHt−1+bhc) | (3.5) |
where Wic∈RM×D,Uhc∈RD×D,bic∈R1×D,bhc∈R1×D are learnable parameters. Next, the internal state at moment t is then calculated using the use of forgetting gates and input gates, using the following equation:
Ct=Ft⊗Ct−1+It⊗˜Ct | (3.6) |
where ⊗ denotes the element-by-element product.
Step 3: Calculating the output state. The current cell state of the LSTM can be calculated according to Eq (3.6) and as follows:
Ht=Ot⊗tanh(Ct) | (3.7) |
The input of the LSTM cyclic cell structure is the internal state vector Ct−1∈RB×D and the hidden state vector Ht−1∈RB×D at moment t−1, and the output is the state vector Ct∈RB×D and the hidden state vector Ht∈RB×D at the current moment t. With the LSTM cyclic cell, the whole network can establish longer distance temporal dependencies, thus solving the difficulty when the sequence data carried by the RNN is too long to be handled. In addition, the LSTM can help the model capture the semantic information of long sentences more fully by selectively ignoring or reinforcing the current memory and input information.
The Paddle framework's built-in LSTM model has several parameters, including direction and num_layers. Direction indicates the direction of the network iteration and can be set to forward or bidirectional, with the default being forward. Num_layers indicates the number of layers in the network and defaults to 1. The multi-layer bidirectional LSTM needs to receive a sequence of vectors to update the cyclic units with forward and reverse respectively. Therefore, it is only necessary to set the parameter direction to bidirectional and num_layers to any desired n when defining the LSTM to use the multi-layer bidirectional LSTM directly.
The average layer is calculated by averaging the hidden states at all positions of the bidirectional LSTM layer and then used as a representation of the whole sentence. In the experiments, the AveragePooling operator is implemented for the aggregation of hidden states. First, the sequence length vector is used to generate a mask matrix, which is used to mask the vectors that fill the placeholder positions in the text sequence. The vectors of the sequence are then summed and averaged.
The final output layer, by using Linear to output the results of the classification. Sentiment analysis is essentially a classification problem and in the practice of classification problems only the logarithmic odds of classification are usually required to be output by the model. First, the Linear layer transforms the last moment of the hidden state vector HL∈RB×D linearly and then outputs the logits of the classification. The formula is as follows:
Y=HLWout+bout | (3.8) |
where Wout∈RD×N and bout∈RN are the learnable weight matrix and bias. N indicates the number of classifications.
As far as we know, there is no datasets about college public opinions that are publicly available. This work selects a standard dataset IMDB [40] in area of sentiment analysis for evaluation. It was a kind of dataset that records short reviews in social media and sentiment information is also associated. The data of IMDB comes from the Internet Movie Database, including the user's comment text and rating information for a movie. In the current Internet era, university students are an inescapable part of the film industry's consumer base. It is easy to see from everyday entertainment life that the majority of the film industry's loyal audience comes from university students who have free time. Movie reviews can be used to determine students' emotions and capture the direction of public opinion in a timely manner, thus guiding students to develop the right values.
The IMDB dataset collected the review information of many films, with a total of 50,000 English review texts. The film viewer can give a score when commenting on the film, ranging from 1–10 points. In the data processing, if the rating is below 5, it is judged as a negative review and the label is 0. If the rating is above 6, it is judged as a positive review and the label is 1. The purpose of this experiment is to determine whether the emotion expressed by users is positive or negative according to the text information of the comments. After 50,000 pieces of data are processed, each sample data includes the user's comment text and 0/1 label for a movie. There are 25,000 training data and 25,000 test data. In the training set and test set, there are 12,500 positive and negative samples respectively.
In the experiments, the MSLSTM-CPO model proposed in this paper is compared with both machine learning methods and deep learning methods, respectively. The machine learning models discussed are SVC, GaussianNB, MultinomialNB, BernoulliNB. The deep learning models are mainly discussed in the MSLSTM-CPO proposed in this paper and other baseline methods: CNN model, LSTM model, and bidirectional LSTM model. The text of the comments in this experiment varied in length, so the max-seq-len parameter was set to 256 to truncate and complement the text. For the other parameters in the model training, batch-size was set to 128, hidden layer size was set to 256 and embedding size was set to 256. Considering the time consumption of model training and test accuracy, epoch was set to 3. The learning rate was set to 0.001, 0.002 and 0.005 respectively. The num_layers of the MSLSTM-CPO model was set to 3. According to the composition of the initial dataset, the split ratio between the training and test sets was 50%. In order to fully test the validity of the experiments, four common evaluation metrics of classification models were used: accuracy, precision, recall and F1-score. Their detailed descriptions can be found from references like [2,41] and are left out here.
Sentiment analysis is essentially a binary classification problem, so one SVC model and three typical Naive Bayes-based methods were chosen as machine learning comparison models for this experiment. They are named as Gaussian Naive Bayes, Multinomial Naive Bayes, and Bernoulli Naive Bayes, separately. Among them, GaussianNB is Naive Bayes with Gaussian distribution a priori, MultinomialNB is Naive Bayes with polynomial distribution a priori, and BernoulliNB is Naive Bayes with Bernoulli distribution a priori. Machine learning-based sentiment classification models generally have only simple steps: data pre-processing, text vectorisation, and training the classifier.
As shown in Table 1, this experiment computes the four metric values for the five models. The learning rate of the MSLSTM-CPO model is 0.002, and the other parameters are kept constant. From the results, it can be seen that the machine learning method can generally only achieve an accuracy of 0.5 for classification, while the MSLSTM-CPO model can achieve as high as 0.86, which is much higher than the classification effectiveness of the machine learning methods. A visualisation of the evaluation results of MSLSTM-CPO compared to four traditional machine learning models is shown in Figure 6. Its horizontal coordinates indicate the four metric types and the vertical coordinates indicate the magnitude of the values. It is clear from this that the MSLSTM-CPO model, indicated by the blue bars, performs approximately 50% better than the other machine learning models on all four evaluation metrics. Thus, it is again verified that deep learning models are generally able to perform better than traditional machine learning models in terms of sentiment analysis problems.
Model | Accuracy | F1-score | Recall | Precision |
SVC | 0.5206 | 0.4889 | 0.5275 | 0.5206 |
GaussianNB | 0.5057 | 0.4262 | 0.5075 | 0.5128 |
MultinomialNB | 0.4936 | 0.4935 | 0.4936 | 0.4936 |
BernoulliNB | 0.5 | 0.3333 | 0.5 | 0.25 |
MSLSTM-CPO | 0.8678 | 0.8676 | 0.86755 | 0.86905 |
In order to discuss the effectiveness of the MSLSTM-CPO proposed in this paper, three other typical deep learning methods are chosen as comparison models: CNN model, LSTM model, and bidirectional LSTM model.
As shown in Figures 4 and 5, the trends of accuracy and loss in the training phase of the models are indicated for the four methods when the learning rates are set to 0.001, 0.002, 0.003 respectively. For each subplot, the x-axis represents the number of iterative rounds ranging from 1 to 600. The y-axis in Figure 4 represents the value of the accuracy of the model training, and the y-axis in Figure 5 represents the value of the loss of the model training. Each subplot is plotted from 30 consecutive sample points of data, representing the training trend of the same model at different learning rates. It is clear from this that the red curve with the learning rate set to 0.002 shows better results in the training phase of the four models, with the MSLSTM-CPO model showing the most intuitive comparison of trends. As the number of iteration rounds increases, the training accuracy of the four models shows an increasing trend and the training loss shows a decreasing trend, eventually tending to stabilise. This demonstrates that the models are working properly and reasonably well.
The above trends only represent the normal training of the model, but only a comparative analysis with each evaluation index can reflect the comprehensive expression ability of the model. Therefore, in this experiment, the learning rate was used as the variable to calculate the evaluation index values of the four models MSLSTM-CPO, CNN, LSTM and Bi-LSTM respectively, while the other parameters were set unchanged. As shown in Table 2, the learning rates were set to 0.001, 0.002 and 0.005, respectively. Sentiment analysis is essentially a dichotomous problem, and the commonly used evaluation metrics are accuracy, precision, recall and F1-score. From the cross-sectional data of the table, the evaluation metrics of each model are stable under different parameter settings. From the analysis of the longitudinal data in the table, the MSLSTM-CPO model works best when the learning rate is 0.002, with an ACC of 0.8678, which is 2, 3 and 1.7% higher than CNN, LSTM and Bi-LSTM respectively. The evaluation metrics of the MSLSTM-CPO model were the highest and stable for different learning rate settings. As shown in Figure 7, the representation is a visualisation of the evaluation results of MSLSTM-CPO compared to the other three deep learning models. It has three subfigures that demonstrate such visualization effect from two angles. For Figure 7(a), it demonstrates comparison of four metric values between MSLSTM-CPO and other three methods. For Figure 7(b), (c), they select two typical metrics (Accuracy and F1-score) and display the evaluation results of the four models under three learning rate values (0.001, 0.002 and 0.005). This figure can clearly reflect the fact that the proposed MSLSTM-CPO has better performance results compared with baseline methods.
Learning rate | Model | Accuracy | F1-score | Recall | Precision |
0.001 | CNN | 0.8565 | 0.8565 | 0.8565 | 0.8566 |
LSTM | 0.8465 | 0.84645 | 0.8465 | 0.84675 | |
Bi-LSTM | 0.8485 | 0.8483 | 0.8487 | 0.85005 | |
MSLSTM-CPO | 0.8585 | 0.85815 | 0.85555 | 0.85885 | |
0.002 | CNN | 0.8516 | 0.8515 | 0.85165 | 0.85305 |
LSTM | 0.8449 | 0.84485 | 0.8452 | 0.84605 | |
Bi-LSTM | 0.8535 | 0.85345 | 0.8537 | 0.8545 | |
MSLSTM-CPO | 0.8678 | 0.8676 | 0.86755 | 0.86905 | |
0.005 | CNN | 0.83225 | 0.83255 | 0.83225 | 0.8322 |
LSTM | 0.8411 | 0.8412 | 0.8412 | 0.8411 | |
Bi-LSTM | 0.84645 | 0.84545 | 0.8471 | 0.85485 | |
MSLSTM-CPO | 0.85225 | 0.8522 | 0.85235 | 0.8529 |
Currently, sentiment analysis tasks are generally well developed in both machine learning and deep learning approaches. This experimental design compares the MSLSTM-CPO model proposed in this paper with traditional machine learning models and deep learning models respectively. Based on the data in Tables 1 and 2, it can be concluded that the comprehensive performance of the MSLSTM-CPO model has some validity and reliability compared to these models.
The reason why the MSLSTM-CPO model proposed in this paper can achieve better performance can be explained from the perspectives of both machine learning models and deep learning models respectively. First, the focus of the sentiment analysis task is on the exploitation of textual content. Traditional machine learning methods focus on feature extraction from a large amount of labelled data, which leads to classification results. This approach neglects the contextual coherence of the text content and leads to poor model training results. Secondly, RNN and LSTM models with memory capability are commonly used in sentiment analysis tasks. However, only LSTM can adapt to data with long sequences for memorisation. Therefore, the MSLSTM-CPO model proposed in this paper is based on the LSTM. While a normal LSTM model construction is unidirectional and has only one layer, MSLSTM-CPO is a bidirectional and multilayer model. As a result, the model is enhanced in terms of the number of layers, achieving a certain degree of balance between depth and breadth.
In summary, the MSLSTM-CPO model proposed in this paper can effectively implement a new type of college opinion analysis. And compared with machine learning and deep learning baseline methods, MSLSTM-CPO has better classification performance.
This paper successfully proposes a MSLSTM-CPO model based on multi-scale deep learning for sentiment analysis of university students. Firstly, the method performs a word embedding operation on the text. Secondly, Bi-LSTM is chosen as the basic networking unit for layer superposition, and the sentiment classification results are obtained by embedding layer, LSTM layer, average layer and linear output layer. The limitations of traditional methods are balanced by the combination of depth and breadth of the model. Experiments on real datasets show that the MSLSTM-CPO model exhibits better performance than traditional machine learning models and commonly used deep learning models.
In future work, it is expected that the user's image and audio information can be mined and fused with textual information in the future to better improve the efficiency and accuracy of sentiment analysis of campus opinion. In addition, this study will further explore more machine learning and deep learning methods in future work. Their performance characteristics in real-world applications will be analysed in order to investigate sentiment analysis tasks based on deep learning frameworks more effectively. This will help to promote a positive attitude towards life and the development of mental health among university students.
This work was funded by the Researchers Supporting Project number (RSPD2023R681) King Saud University, Riyadh, Saudi Arabia, and also funded by Humanities and Social Sciences Project of Chongqing Municipal Education Commission (22SKSZ047).
The authors declare there is no conflicts of interest.
[1] |
Bickers D, Adams R (1949) Hereditary stenosis of the aqueduct of Sylvius as a cause of congenital hydrocephalus. Brain 72: 246-262. doi: 10.1093/brain/72.2.246
![]() |
[2] |
Finckh U, Schroder J, Ressler B, et al. (2000) Spectrum and detection rate of L1CAM mutations in isolated and familial cases with clinically suspected L1-disease. Am J Med Genet 92: 40-46. doi: 10.1002/(SICI)1096-8628(20000501)92:1<40::AID-AJMG7>3.0.CO;2-R
![]() |
[3] |
Fransen E, Camp GV, Vits L, et al. (1997) L1-associated diseases: clinical genetics divide, molecular genetics unite. Hum Mol Genet 6: 1625-1632. doi: 10.1093/hmg/6.10.1625
![]() |
[4] |
Samatov TR, Wicklein D, Tonevitsky AG (2016) L1CAM: Cell adhesion and more. Prog Histochem Cytochem 51: 25-32. doi: 10.1016/j.proghi.2016.05.001
![]() |
[5] |
Kamiguchi H, Hlavin ML, Lemmon V (1998) Role of L1 in Neural Development: What the Knockouts Tell Us. Mol Cell Neurosci 12: 48-55. doi: 10.1006/mcne.1998.0702
![]() |
[6] |
Moos M, Tacke R, Scherer H, et al. (1988) Neural Adhesion Molecule L1 as a Member of the Immunoglobulin Superfamily with Binding Domains Similar to Fibronectin. Nature 334: 701-703. doi: 10.1038/334701a0
![]() |
[7] |
Mikulak J, Negrini S, Klajn A, et al. (2012) Dual REST-dependence of L1CAM: from gene expression to alternative splicing governed by Nova2 in neural cells. J Neurochem 120: 699-709. doi: 10.1111/j.1471-4159.2011.07626.x
![]() |
[8] | Bertolin C, Boaretto F, Barbon G, et al. (2010) Novel mutations in the L1CAM gene support the complexity of L1 syndrome. J Neurosci 294: 124-126. |
[9] |
Okamoto N, Del Maestro R, Valero R, et al. (2004) Hydrocephalus and Hirschsprung's disease with a mutation of L1CAM. J Hum Genet 49: 334-337. doi: 10.1007/s10038-004-0153-4
![]() |
[10] |
Vos YJ, De Walle HE, Bos KK, et al. (2010) Genotype-phenotype correlations in L1 syndrome: a guide for genetic counseling and mutation analysis. J Med Genet 47: 169-175. doi: 10.1136/jmg.2009.071688
![]() |
[11] | L1CAM mutation database Available from: http://www.l1cammutationdatabase.info/default.aspx. |
[12] |
Kanemura Y, Takuma Y, Kamiguchi H, et al. (2005) First case of L1CAM gene mutation identified in MASA syndrome in Asia. Congenital Anomalies 45: 67-69. doi: 10.1111/j.1741-4520.2005.00067.x
![]() |
[13] |
Marin R, Ley-Martos M, Gutierrez G, et al. (2015) Three cases with L1 syndrome and two novel mutations in the L1CAM gene. Eur J Pediatr 174: 1541-1544. doi: 10.1007/s00431-015-2560-2
![]() |
[14] |
Ochando I, Vidal V, Gascon J, et al. (2015) Prenatal diagnosis of X-linked hydrocephalus in a family with a novel mutation in L1CAM gene. J Obstet Gynaecol 36: 403-405. doi: 10.3109/01443615.2015.1086982
![]() |
[15] |
Silan F, Ozdemir I, Lissens W (2005) A novel L1CAM mutation with L1 spectrum disorders. Prenat Diagn 25: 57-59. doi: 10.1002/pd.978
![]() |
[16] |
Camp GV, Vits L, Coucke P, et al. (1993) A Duplication in the L1CAM Gene Associated with X-Linked Hydrocephalus. Nat Genet 4: 421-425. doi: 10.1038/ng0893-421
![]() |
[17] |
Fransen E, Lemmon V, Camp VG, et al. (1995) CRASH syndrome: clinical spectrum of corpus callosum hypoplasia, retardation, adducted thumbs, spastic paraparesis and hydrocephalus due to mutations in one single gene, L1. Eur J Hum Genet 3: 273-284. doi: 10.1159/000472311
![]() |
[18] |
Swarna M, Sujatha M, Usha Rani P, et al. (2004) Detection of L1 CAM mutation in a male child with Mental retardation. Indian J Clin Biochem 19: 163-167. doi: 10.1007/BF02894278
![]() |
[19] |
Jharna P, Hemabindu L, Siva Prasad S, et al. (2006) Detection of L1 (CAM) mutations in X-linked mental retardation: A study from Andhra Pradesh, India. Indian J Hum Genet 12: 82-85. doi: 10.4103/0971-6866.27791
![]() |
[20] | QIAamp DNA Mini Blood Mini Handbook Available from: https://www.qiagen.com/us/resources/resourcedetail?id=62a200d6-faf4-469b-b50f-2b59cf738962&lang=en. |
[21] | NCBI Primer-Blast Available from: https://www.ncbi.nlm.nih.gov/tools/primer-blast/. |
[22] | BioEdit tool Available from: http://www.mbio.ncsu.edu/BioEdit/bioedit.html. |
[23] |
Rosenthal A, Jouet M, Kenwrick S (1992) Aberrant splicing of neural cell adhesion molecule L1 mRNA in a family with X-linked hydrocephalus. Nat Genet 2: 107-112. doi: 10.1038/ng1092-107
![]() |
[24] |
Fransen E, Camp GV, Hooge RD, et al. (1998) Genotype-phenotype correlation in L1 associated diseases. J Med Genet 35: 399-404. doi: 10.1136/jmg.35.5.399
![]() |
[25] |
Yamasaki M, Thompson P, Lemmon V (1997) CRASH syndrome: mutations in L1CAM correlate with severity of the disease. Neuropediatrics 28: 175-178. doi: 10.1055/s-2007-973696
![]() |
[26] |
De Angelis E, Watkins A, Schafer M, et al. (2002) Disease-Associated Mutations in L1CAM Interfere with Ligand Interactions and Cell-Surface Expression. Hum Mol Genet 11: 1-12. doi: 10.1093/hmg/11.1.1
![]() |
[27] |
Kaepernick L, Legius E, Higgins J, et al. (1994) Clinical aspects of the MASA syndrome in a large family, including expressing females. Clin Genet 45: 181-185. doi: 10.1111/j.1399-0004.1994.tb04019.x
![]() |
[28] | Betts MJ, Russell RB (2003) Amino acid properties and consequences of substitutions. Bioinf Genet 14: 289-316. |
1. | Chigorizim Onvusiribe, Galina Astratova, Nataliya Simchenko, Features of the application of semantic and sentiment analysis methods in the process of evaluating the effectiveness of digital educational technologies in Russian universities, 2024, 0, 2221-3260, 39, 10.52957/2221-3260-2024-7-39-58 | |
2. | Chafika Ouni, Emna Benmohamed, Hela Ltifi, Sentiment analysis deep learning model based on a novel hybrid embedding method, 2024, 14, 1869-5469, 10.1007/s13278-024-01367-x | |
3. | Abdulfattah Ba Alawi, Ferhat Bozkurt, A hybrid machine learning model for sentiment analysis and satisfaction assessment with Turkish universities using Twitter data, 2024, 11, 27726622, 100473, 10.1016/j.dajour.2024.100473 | |
4. | Chafika Ouni, Emna Benmohamed, Hela Ltifi, Deep learning-based Soft word embedding approach for sentiment analysis, 2024, 246, 18770509, 1355, 10.1016/j.procs.2024.09.720 | |
5. | Y Swathi, Mahesh Kumar Jha, Manoj Challa, 2024, Topic Modeling and EDA for Analyzing User Sentiments for ChatGPT, 979-8-3503-8520-5, 1, 10.1109/SPARC61891.2024.10829066 |
Model | Accuracy | F1-score | Recall | Precision |
SVC | 0.5206 | 0.4889 | 0.5275 | 0.5206 |
GaussianNB | 0.5057 | 0.4262 | 0.5075 | 0.5128 |
MultinomialNB | 0.4936 | 0.4935 | 0.4936 | 0.4936 |
BernoulliNB | 0.5 | 0.3333 | 0.5 | 0.25 |
MSLSTM-CPO | 0.8678 | 0.8676 | 0.86755 | 0.86905 |
Learning rate | Model | Accuracy | F1-score | Recall | Precision |
0.001 | CNN | 0.8565 | 0.8565 | 0.8565 | 0.8566 |
LSTM | 0.8465 | 0.84645 | 0.8465 | 0.84675 | |
Bi-LSTM | 0.8485 | 0.8483 | 0.8487 | 0.85005 | |
MSLSTM-CPO | 0.8585 | 0.85815 | 0.85555 | 0.85885 | |
0.002 | CNN | 0.8516 | 0.8515 | 0.85165 | 0.85305 |
LSTM | 0.8449 | 0.84485 | 0.8452 | 0.84605 | |
Bi-LSTM | 0.8535 | 0.85345 | 0.8537 | 0.8545 | |
MSLSTM-CPO | 0.8678 | 0.8676 | 0.86755 | 0.86905 | |
0.005 | CNN | 0.83225 | 0.83255 | 0.83225 | 0.8322 |
LSTM | 0.8411 | 0.8412 | 0.8412 | 0.8411 | |
Bi-LSTM | 0.84645 | 0.84545 | 0.8471 | 0.85485 | |
MSLSTM-CPO | 0.85225 | 0.8522 | 0.85235 | 0.8529 |
Model | Accuracy | F1-score | Recall | Precision |
SVC | 0.5206 | 0.4889 | 0.5275 | 0.5206 |
GaussianNB | 0.5057 | 0.4262 | 0.5075 | 0.5128 |
MultinomialNB | 0.4936 | 0.4935 | 0.4936 | 0.4936 |
BernoulliNB | 0.5 | 0.3333 | 0.5 | 0.25 |
MSLSTM-CPO | 0.8678 | 0.8676 | 0.86755 | 0.86905 |
Learning rate | Model | Accuracy | F1-score | Recall | Precision |
0.001 | CNN | 0.8565 | 0.8565 | 0.8565 | 0.8566 |
LSTM | 0.8465 | 0.84645 | 0.8465 | 0.84675 | |
Bi-LSTM | 0.8485 | 0.8483 | 0.8487 | 0.85005 | |
MSLSTM-CPO | 0.8585 | 0.85815 | 0.85555 | 0.85885 | |
0.002 | CNN | 0.8516 | 0.8515 | 0.85165 | 0.85305 |
LSTM | 0.8449 | 0.84485 | 0.8452 | 0.84605 | |
Bi-LSTM | 0.8535 | 0.85345 | 0.8537 | 0.8545 | |
MSLSTM-CPO | 0.8678 | 0.8676 | 0.86755 | 0.86905 | |
0.005 | CNN | 0.83225 | 0.83255 | 0.83225 | 0.8322 |
LSTM | 0.8411 | 0.8412 | 0.8412 | 0.8411 | |
Bi-LSTM | 0.84645 | 0.84545 | 0.8471 | 0.85485 | |
MSLSTM-CPO | 0.85225 | 0.8522 | 0.85235 | 0.8529 |