
Citation: Christian Carpéné, Jean Galitzky, Jean Sébastien Saulnier-Blache. Short-term and rapid effects of lysophosphatidic acid on human adipose cell lipolytic and glucose uptake activities[J]. AIMS Molecular Science, 2016, 3(2): 222-237. doi: 10.3934/molsci.2016.2.222
[1] | Huiying Zhang, Jiayan Lin, Lan Zhou, Jiahui Shen, Wenshun Sheng . Facial age recognition based on deep manifold learning. Mathematical Biosciences and Engineering, 2024, 21(3): 4485-4500. doi: 10.3934/mbe.2024198 |
[2] | Mubashir Ahmad, Saira, Omar Alfandi, Asad Masood Khattak, Syed Furqan Qadri, Iftikhar Ahmed Saeed, Salabat Khan, Bashir Hayat, Arshad Ahmad . Correction: Facial expression recognition using lightweight deep learning modeling. Mathematical Biosciences and Engineering, 2023, 20(6): 10675-10677. doi: 10.3934/mbe.2023472 |
[3] | R Nandhini Abiram, P M Durai Raj Vincent . Identity preserving multi-pose facial expression recognition using fine tuned VGG on the latent space vector of generative adversarial network. Mathematical Biosciences and Engineering, 2021, 18(4): 3699-3717. doi: 10.3934/mbe.2021186 |
[4] | Basem Assiri, Mohammad Alamgir Hossain . Face emotion recognition based on infrared thermal imagery by applying machine learning and parallelism. Mathematical Biosciences and Engineering, 2023, 20(1): 913-929. doi: 10.3934/mbe.2023042 |
[5] | Jia-Gang Qiu, Yi Li, Hao-Qi Liu, Shuang Lin, Lei Pang, Gang Sun, Ying-Zhe Song . Research on motion recognition based on multi-dimensional sensing data and deep learning algorithms. Mathematical Biosciences and Engineering, 2023, 20(8): 14578-14595. doi: 10.3934/mbe.2023652 |
[6] | Shuai Cao, Biao Song . Visual attentional-driven deep learning method for flower recognition. Mathematical Biosciences and Engineering, 2021, 18(3): 1981-1991. doi: 10.3934/mbe.2021103 |
[7] | Yuanyao Lu, Kexin Li . Research on lip recognition algorithm based on MobileNet + attention-GRU. Mathematical Biosciences and Engineering, 2022, 19(12): 13526-13540. doi: 10.3934/mbe.2022631 |
[8] | C Willson Joseph, G. Jaspher Willsie Kathrine, Shanmuganathan Vimal, S Sumathi., Danilo Pelusi, Xiomara Patricia Blanco Valencia, Elena Verdú . Improved optimizer with deep learning model for emotion detection and classification. Mathematical Biosciences and Engineering, 2024, 21(7): 6631-6657. doi: 10.3934/mbe.2024290 |
[9] | Yufeng Qian . Exploration of machine algorithms based on deep learning model and feature extraction. Mathematical Biosciences and Engineering, 2021, 18(6): 7602-7618. doi: 10.3934/mbe.2021376 |
[10] | Chelsea Harris, Uchenna Okorie, Sokratis Makrogiannis . Spatially localized sparse approximations of deep features for breast mass characterization. Mathematical Biosciences and Engineering, 2023, 20(9): 15859-15882. doi: 10.3934/mbe.2023706 |
Currently, FER is a major communication source which is the most important tool to interact with others without knowing their gender, race and national borders. FER comes under the category of non-verbal communication, which delivers a person's feelings in the form of gestures and body language [1] that allow people to infer others' feelings. Many of us can understand facial expressions from key emotions, but others cannot [2]. A robust emotion classification relies heavily on effective facial representations, so it is quite a challenging task to identify significant discriminative facial features that might demonstrate the appearances of each emotion because of the limitations and variability of facial expressions. In addition, the FER system has many challenges, such as difficulties occurring when the face is not in an accurate position, lighting problems while capturing the image, obstruction, noise and over-fitting [3]. Although there are several challenges in the existing FER system, still, it is a fascinating area that could help in different fields of life, such as online education, health management, audience analysis in the market and entertainment. It also helps in security, driver drowsiness detection, etc. S. H. Ma et al. [4] studied facial expressions and classified them into seven categories, which are happy, sad, angry, fearful, surprised, contemptuous and disgusted. During the last five years, different researchers have adopted different methods while working on FER. Some of them considered different types of data used as input to expression recognition systems. In light of perceptions, we observed that facial expressions are typically associated with emotions. For example, lifting the eyebrows indicates a sudden change that means the expression of fear or surprise. The image of human facial expressions is the standard and promising type of information, which gives many unexpected results, applying different algorithms on it such as convolutional neural network (CNN), artificial neural network (ANN), generative adversarial network (GAN), etc. These algorithms give results according to our desired expressions. However, it does not matter how an image is captured or which device is utilized for capturing the image, such as a digital camera, camcorder or DSLR. Human emotions can be perceived from numerous points of view, such as physiological and psychological signs. The FER might be a part of a telepresence bundle which provides information about the patient's mental condition to his doctor or counselor. When something is discovered to be wrong, the professional will choose to change his or her approach to patient care. According to [5], human facial expressions, visual contact, body language, voice tone and personal space are included in nonverbal communication. Since nonverbal communication is 55 to 94% of actual communication [6], positive nonverbal communication can help to boost interpersonal relationships and emphasize emotional bonds [7] proposed a deep learning method for facial recognition for distance learning that extracts features from the images and produced better results. Some recent deep learning methods have also been used for recognition in different fields like medical image classification and segmentation [8,9,10,11,12,13,14,15,16,17].
Over-fitting is the problem with existing FER systems which occurs when the model training has no issues, but they do not predict the indicated results at the time of testing. Many research authors are going to minimize the drawbacks of the prevailing systems, which tend to offer quick-expression classification in the same way. In the contemporary era, computer analysis of the face and facial expressions is the most commonly used issue among specialists. Facial expressions are one of the more significant aspects of human communication, with the face being responsible for communicating, not only for considerations but also for emotions. Over-fitting, due to a lack of training data, remains a major challenge that must be addressed by all deep learning FER systems to achieve high accuracy. The present study is conducted on the SSAE-FER framework for facial expression recognition. The main contributions are as follows.
ⅰ. The SSAE-FER model is used for the recognition of seven basic facial expressions. All the images are the same size for input into the model, which could give us better results than other comparison methods.
ⅱ. Our model is a two-layered stacked auto-encoder (SAE) which is lightweight, and the training and testing time is very much lower due to its simplicity.
ⅲ. The performance of the proposed system was assessed on two different datasets, JAFFE and CK+, and both datasets are publically available.
ⅳ. Enhanced and adequate results were attained in terms of accuracy, specificity and sensitivity.
The remaining portion of this paper is organized as follows: Section 2 discusses related work. Material and methods are discussed in Section 3, Section 4 presents the results and discussion, and the Section 5 concludes the paper.
There are many machine learning and deep learning algorithms which have a great impact on facial expression recognition (FER) systems. Deep learning has different types of algorithms which play a vital role in FER, like generative adversarial networks (GAN), deep belief networks (DBN), stacked sparse auto-encoder (SSAE), conventional neural networks (CNN), recurrent neural networks (CNN-RNN) architecture [18], etc. Similarly, machine learning has different methods and classifiers which are used for classification, like support vector machine (SVM), k-nearest neighbors (KNN), decision tree (DT), weighted hierarchical adaptive voting ensemble (WHAVE), logistic regression (LR), which are used for FER, analyzing a social interaction, intelligent transportation system, fruit identification and anomaly detection [19,20,21,22,23,24,25,26,27,28]. Some feature extraction techniques have been used in the past few years, such as the active shape model (ASM) [29], used to extract features based on expression contours. A deep network was used [30] which has two further different models, where the first model is a deep temporal appearance network (DTAN), and the other is a deep temporal geometry network (DTGN). Thus, DTAN extracts features based on temporal appearance, and another extracts deep temporal geometry network features using facial landmark points. The models are combined through a novel integration method to achieve the best accuracy for expression recognition. This network is known as the deep temporal appearance-geometry network (DTAGN). The selection of pairwise features and their classification is discussed in [31], and [32] introduced a peak-piloted deep network in which peak and non-peak expressions are involved. In this work, peak expressions supervise the recognition of non-peak expressions, but it can only distinguish the same expressions of the same subject. The process of non-peak-to-peak expression is indirectly inserted into the network to get the invariance to expression intensities. A back-propagation method, peak gradient suppression (PGS), is utilized for training the network.
Automatically recognizing facial expressions [33] is an interesting and important part of human-machine interaction, where [34] introduced the CNN and landmark feature technique for 3D facial expression recognition. [35] proposed a novel model for facial expression recognition (FER) using the color scheme and deep information through the Kinect sensor. The proposed system extracts the different features of facial expression and utilizes captured sensor information; it emphasizes vectors by face tracing algorithm and perceives the six facial emotions using the random forest (RF) algorithm. The implementation of RF is utilized for facial expression recognition execution for real-time scenarios. A novel deep-learning method for facial expression recognition is introduced in [36]. The training images were divided into seven groups, due to seven expressions, to train the sparse autoencoder network. Interestingly, a graph convolution neural network (GCNN) successfully recognized the object, text classification and human activities, so GCNN is used to represent the features. Euclidean distance is used to find out the shortest path between edges and joints; a convolutional neural network (CNN) deals with Euclidean data and performs further work on it. A spatial domain convolution kernel is stretched out to graph convolutional kernels to process the number of items over neighbor nodes. The application of pooling is used in the hidden layers of neighbor nodes to complete the data structure on incomplete nodes. The balance cuts and heavy edge matching (HEM) techniques are used for graph pooling. A graph may contain excess or blurring edges, so it utilizes the mechanism to notice the critical node [37].
A convolutional neural network (CNN) combined with bag of words is used. It was successful in object detection, while for further improvement in its results, supervised and unsupervised methods are used. It has been observed that there are different objects in an image that define feature descriptors that are used for forming histograms. By creating histograms of different images that form a bag of words (BoW) that can be learned by a classifier. For the new experimentation of the model, the features were extracted from images using CNN, and then spatial pyramid matching (SPM) was applied to this information to localize the objects. They used CaffeNet, which is similar to Alex-Net, where pooling is done before normalization. It used the t-distribution stochastic neighbor embedding (t-SNE) algorithm to visualize different obtained features from the last layer of CaffeNet in a high-dimensional histogram for each image, which also allows clustering. Although t-SNE is an unsupervised method to cluster data, it is used to see how it classifies suggested data by applying the K-means clustering algorithm on top of t-SNE. Therefore, this is useful in human actions recognition and accomplished the best outcomes using contrasting, where various classification algorithms are used, like k-nearest neighbor (KNN), support vector machine (SVM), relational neighbor (RN) [38] and particle swarm optimization (PSO) [39]. Different other methods have been used in a very efficient way using optical coherence tomography (OCT) in vivo imaging [40,41,42,43].
In video-based action recognition the understanding of actions, pose, estimation, and retrieval of images from different perspectives [44]. Many types of research on the FER system have been directed at both posed and natural expressions under various imaging conditions that include a few head poses, imaging resolutions, illumination factors and occlusion [45]. Conditional generative adversarial networks (cGANs) [46] are applied to gain the images from the neutral face. However, the model has many intermediate layers in which filters remain unchanged, and different layers of the same size are concatenated and combined with the last and fully connected layer for the classification of expression that includes a display of happy, angry, fearful, natural, sad, surprise and disgust [47].
Convolutional neural networks (CNN) are used for feature extraction and classification purposes. A method known as amalgam fusion consists of two levels; the prior level is a feature and the post level is a decision. Both are implemented in a way to pool the features in one place and observe its decision at different stages. As CNN model has trained with a different voice sample of the Ryerson Audio-Visual Dataset of Emotional Speech and Song (RAVDESS) dataset [48] and then joined with an output of an image classifier by utilizing the fusion results. The attained results through decision-level which further proceed towards the final decision [49]. In this study [50] the author proposed a novel technique of color channel-wise recurrent learning which obtained an accuracy rate of 85.74% on facial expression. All the above studies formed the basis of the significant results of their own proposed methods of facial expression, but there is still a need for a simple and attractive piece of facial expressions recognition. We are going to introduce the new technique for the FER framework using SSAE-FER, which will grab the user's attention and will help them to solve their challenges regarding facial expression recognition systems.
In this section, we present our proposed method, which consists of pre-processing, training and testing. A flow diagram of our proposed method is shown in Figure 1.
Image preprocessing is performed on the input dataset on a 2D facial image. The purpose of preprocessing is to enhance the images, where 2 datasets are used in this experiment. The same preprocessing is applied to both datasets. In the first step, we converted the images into grayscale, and a Gaussian smoothing filter is applied due to the noise in the original image. We set the sigma value of σ = 0.5, which is helpful for noise removal. The results of the preprocessing step are illustrated in Figure 2.
Normalizing the whole image is a good idea rather than normalizing some specific parts of the face such as the eye, lips, nose and eyebrows [51]. Normalization has a great influence on images because of different intensities, so we normalize the data using zero mean unit variance. After the normalization, we adjust the images to 0 and 1 values [52]. All the images are normalized, and we performed the class balancing method, where all the classes have the same images in the training set. This is an important step to cure overfitting [53], and our method gained effective results to prepare the input images for the training phase. Due to the preprocessing techniques, our images are more enhanced and prepared for training on the SSAE-FER model.
After the preprocessing step, we are ready to train our data on SSAE-FER. An SSAE is based on two phases, encoder and decoder, to extract high-level feature learning in an unsupervised manner in the first step. The original sizes of the input images were 48 × 48 and 256 × 256 pixels for the 2 individually trained datasets, and we flattened the original images to 2304 and 65,536 for further steps. There are 2 main steps in the training, which are pre-training and fine-tuning. In the pre-training step, we provided the data without ground truth, and in the second step, we trained the whole dataset with ground truths. The SSAE-FER comprises many layers of sparse autoencoder, where the output of each hidden layer is connected to the input of the successive hidden layer. The hidden layers are trained in an unsupervised algorithm and then are fine-tuned in a supervised fashion by using the stochastic gradient descent (SGD) algorithm. We train the autoencoder using input data of 48 × 48 size or 256 × 256 and acquire the learned data from it. The learned data from the previous layer is utilized as input for the next layer, and this continues until the training is completed. Once all the hidden layers are trained, the model uses the backpropagation between hidden layers by using SGD. In the proposed method, we worked on two SSAE layers. The working of SSAE in our work is shown in Figure 3.
The model performs the greedy layer-wise pre-training of data, considering a stacked autoencoder composed of n layers. The suggested model can be greedily pre-trained to initialize the parameters of the deep network, to train the first layer using the input to obtain the parameters for the first autoencoder in the stack, whereas all other parameters in the remainder of the network remain fixed. By initializing the parameters, the input can be transformed into a vector consisting of the activations (learned features) of the hidden units. The autoencoder can map the input directly to the hidden layer using a parameter called an encoder [54]. The encoding step can transform the high-dimensional input data into lower dimensions. The decoding step involves mapping these learned features from the hidden space back to the reconstruction of that input. We have demonstrated the structure of the SSAE1 hidden layers above in Figure 3, where the SSAE2 layers are stacked together. The output of the first layer of SSAE becomes the input of the next layer of SSAE. Data is compressed at latent layers, which become the input of further layers for better performance. The stacked layered network is connected to the Softmax layer, which performs the prediction of the features attained by the SSAE2. As our proposed method uses a novel technique for facial expression recognition, the autoencoder works in an unsupervised manner in pre-training. It comprises encoder and decoder: The encoder maps the input data and represents it in a new form, whereas the new form of data is then decoded at the output to regenerate the input $ {x}^{\text{'}} $ as given in Eqs (1) and (2), where $ x $ is the input, and $ z $ is the new representation of the input.
$ Z = H({W}_{x}+b) $ | (1) |
$ X\text{'} = H({{W}^{\text{'}}}_{x}+b\text{'}) $ | (2) |
In the above Eq (1), $ h $ represents the activation function of neurons of the hidden layer. In Eq (2), g represents the neurons of the output layer, $ W $ and $ {W}^{\text{'}} $ represent the weight matrices, and b and b' represent the bias vectors for encoder and decoder, respectively. SSAE layers have some weights $ W $ and biases $ b $, which help to produce better results. Further parameters of the model are utilized to improve the performance of the network. Fine-tuning reduces the error rate observed from the previous epoch which is performed in a supervised manner. After the backpropagation is utilized to fine-tune the whole network, this process minimizes the error rate and refines the model enough to deal with the new samples of datasets.
After the completion of fine-tuning, we trained the FER-SSAE model and applied the classifier Softmax on the last layer. The utilization of the Softmax function is best for multiclass data classification because it maps and predicts the values to probabilities against each expression present in the data. There are seven nodes in the last or output layer of the model that facilitate the network to choose the most desirable features for the representation of each image. Therefore, with the Softmax classifier used for the classification of expressions, this function returns the probability of each class. In our case, it gives the best recognition result against seven expressions. The equation for the Softmax activation function is given below [55].
$ Softmax\left({z}_{i}\right) = \frac{{\mathrm{exp}}\left({z}_{i}\right)}{\sum _{i}{\mathrm{exp}}\left({z}_{i}\right)} $ | (3) |
In Eq (3), the $ z $ is the neuron value, which it presents, in the output, and $ exp $ is the non-linear function. Later, the sum of exponential finds is used to divide the values of neurons to perform normalization and subsequently convert them into different probabilities when Softmax activation is applied on the final or last layer of each neuron to recognize the expression successfully. Figure 4 shows how two hidden layers work during the pre-training and fine-tuning stages for the classification of expressions.
After the completion of training the same data, input is provided for testing for the trained model. In this step, we also tested the balanced class for better test results. Some of each class sample are equally validated on our SSAE-FER model for classifications of expressions. The structure of the proposed model is given below in Figure 5, which illustrates the complete overview.
In this section, we present the datasets used in this experiment, the parametric settings for the SSAE-FER model and the details of the results.
JAFFE [56,57] and CK+ [58] were used for the experiment on the SSAE-FER model. The JAFFE dataset contains 10 Japanese female expressions that have seven poses: happy, sad, fear, anger, surprise, neutral and disgust. Several images of each expression are available in the dataset having 256 × 256 pixels resolution. We used 213 2D grayscale images from the JAFFE dataset with different classes: anger containing 30 images, disgust in 29 images, fear in 33 images, happiness in 31 images, neutral in 30 images, sadness in 31 images, and 29 images containing surprise expressions. Similarly, the CK+ dataset contains 8 expressions, seven primary expressions, and contempt expressions. The dataset comprises a total of 981 images of different classes were used in our experiment: The happy class contains 207 images, sad 84 images, anger 135 images, fear 75 images, surprise 249 images, disgust 177 images, and the contempt class contains 54 images in our proposed work. The resolution of the CK+ dataset images is 48 × 48 pixels. In [59], the JFEE dataset is used in facial expression recognition which is publically available at the following link: https://zenodo.org/record/3451524#.YSSx1I4zaM8. CK+ dataset is also available publically at the following link: https://www.kaggle.com/shawon10/ckplus. A sample of CK+ & JAFFE dataset images is given below in Figure 6. This work was performed on MATLAB 2021a with a Core i7 processor (3.6 GHz CPU) and 32 GB of RAM. In this study, our FER-SSAE model was based on the MATLAB library "Deep learning Toolbox" [60].
In this section, we provide our results on the basis of our SSAE-FER model. 70% of the data was utilized to train the model, while the rest of the data was used for testing and validation. There were two hidden layers, with 100 neurons at the SSAE1 and similarly 100 neurons on the second layer, SSAE2. The final layer contains 7 neurons to find the most similar features, which help to recognize the expression in each testing image. For pre-training, we set up 200 epochs, and for fine-tuning 6000 epochs with a minimum batch size of 32 were used. The learning rate for pre-training and fine-tuning was 0.0001 with a sparsity of 0.05 and momentum of 0.9. Table 1 shows the parametric settings for our experiment. Our model took 3 hours on CK+ and 13 hours on the JAFFE dataset for fine-tuning. Table 1 shows the parameters which were used to train the SSAE-FER model. Mean Square Error (MSE) was noted at 0.06 on CK+, and the error rate was 0.02 on the JAFFE dataset during the training. The training and validation graphs are given in Figure 7 for both datasets.
Parametric name | Values |
Hidden layers Number of neurons at each layer Number of epochs Learning rate Momentum Mini batch size Sparsity |
2 Layer1 & Layer2 = 100 200 for pre-training and 6000 for fine-tuning 0.0001 0.9 32 0.5 |
The following boundaries are given to assess the exhibition of our planned model.
Accuracy: the proportion of the total number of right expectations. It consists of the prediction of seven human expression samples. We can calculate it by using the following equation:
$ \frac{{\mathrm{T}}{\mathrm{P}}{\mathrm{ }}+{\mathrm{ }}{\mathrm{T}}{\mathrm{N}}}{{\mathrm{T}}{\mathrm{P}}{\mathrm{ }}+{\mathrm{ }}{\mathrm{T}}{\mathrm{N}}{\mathrm{ }}+{\mathrm{ }}{\mathrm{F}}{\mathrm{P}}{\mathrm{ }}+{\mathrm{ }}{\mathrm{F}}{\mathrm{N}}} $ | (4) |
Error rate: the total number of predicted cases that were not correct. It consists of both positive and negative samples of seven expressions. We can calculate it by using the following equation:
$ \frac{{\mathrm{F}}{\mathrm{P}}{\mathrm{ }}+{\mathrm{ }}{\mathrm{F}}{\mathrm{N}}}{{\mathrm{T}}{\mathrm{P}}{\mathrm{ }}+{\mathrm{ }}{\mathrm{T}}{\mathrm{N}}{\mathrm{ }}+{\mathrm{ }}{\mathrm{F}}{\mathrm{P}}{\mathrm{ }}+{\mathrm{ }}{\mathrm{F}}{\mathrm{N}}} $ | (5) |
Sensitivity or Recall: the proportion of genuine positive occasions effectively recognized is called the true positive rate. We can calculate it by using the following equation:
$ \frac{{\mathrm{T}}{\mathrm{P}}}{{\mathrm{T}}{\mathrm{P}}{\mathrm{ }}+{\mathrm{ }}{\mathrm{F}}{\mathrm{N}}} $ | (6) |
Precision: Precision is the ratio of true positives to the total of false positives and true positives. We can calculate it by using the following equation:
$ \frac{{\mathrm{T}}{\mathrm{P}}}{{\mathrm{T}}{\mathrm{P}}{\mathrm{ }}+{\mathrm{ }}{\mathrm{F}}{\mathrm{P}}} $ | (7) |
Specificity: This is the proportion of negative occasions accurately distinguished and is also known as the negative rate. We can calculate it by using the following equation:
$ \frac{{\mathrm{T}}{\mathrm{N}}}{{\mathrm{T}}{\mathrm{N}}{\mathrm{ }}+{\mathrm{ }}{\mathrm{F}}{\mathrm{N}}} $ | (8) |
True positives (TP) represent those expressions that are correctly identified. False positives (FP) represent the expressions that do not belong to their respective class but the model identifies them as a part of it. True negatives (TN) represent the images that do not belong to another class and are correctly identified as belonging to other classes. False negatives (FN) represent the expressions that belong to a class itself but are identified as another class expression. The results of our model are presented in Tables 2 and 3 for CK+ and JAFFE datasets, respectively.
No. | Expression | Precision % | Sensitivity % | Specificity % | Accuracy % |
1 | Anger | 100.00 | 82.00 | 100.00 | 99.00 |
2 | Contempt | 100.00 | 100.00 | 100.00 | 100.00 |
3 | Disgust | 100.00 | 100.00 | 100.00 | 100.00 |
4 | Fear | 100.00 | 100.00 | 100.00 | 100.00 |
5 | Happy | 100.00 | 100.00 | 100.00 | 100.00 |
6 | Sadness | 95.00 | 95.10 | 100.00 | 96.00 |
7 | Surprise | 100.00 | 100.00 | 100.00 | 100.00 |
Mean | 99.28 | 96.72 | 100.00 | 99.30 |
No. | Expression | Precision % | Sensitivity % | Specificity % | Accuracy % |
1 | Anger | 96.30 | 96.30 | 99.30 | 98.00 |
2 | Disgust | 85.70 | 92.30 | 97.50 | 96.70 |
3 | Fear | 91.30 | 100.00 | 97.50 | 99.40 |
4 | Happy | 100.00 | 92.30 | 100.00 | 98.90 |
5 | Neutral | 86.50 | 83.90 | 97.50 | 95.20 |
6 | Sadness | 89.70 | 89.70 | 98.10 | 96.80 |
7 | Surprise | 100.00 | 96.20 | 100.00 | 99.40 |
Mean | 92.90 | 92.96 | 98.60 | 92.50 |
The CK+ dataset gives us much better results using SSAE-FER on 7 standard expressions. Precision, sensitivity, specificity and accuracy were noted at 99.28, 96.72,100 and 99.30%, respectively. On the other hand, the JAFEE dataset gives us lower results due to the complexity of the dataset, where precision, sensitivity, specificity and accuracy were noted at 92.90, 92.96, 98.60 and 92.50%, respectively.
In this section, we compare our results with other techniques which are given in Table 4.
Methodology | Precision % | Sensitivity % | Specificity % | Error rate % | Accuracy % |
[58] | - | - | - | 10.55 | 89.45 |
[59] | - | - | - | 17.90 | 82.10 |
[49] | 88.00 | 86.00 | - | 13.64 | 86.36 |
[60] | - | - | - | 26.20 | 73.80 |
Our model on the CK+ dataset Our model on the JAFFE dataset |
99.28 92.90 |
96.72 92.96 |
100.00 98.60 |
0.70 7.50 |
99.30 92.50 |
The error rate and accuracy presented in [61] were 10.55 and 89.45% with a novel technique of FER that has a modified classification and regression tree (M-CRT) to deal with the problem in the classification of expressions. The supervised descent and local binary method involve forgetting the global and local features. In [62] a projective complex matrix factorization (proCMF) is introduced, high-dimensional images are used for input, and these are converted into lower dimension subspace. It deals with the complex domain through the optimization problem where the error rate and accuracy were 17.9 and 82.10%. Another novel technique to recognize the expressions is through multimodal automatic emotion recognition (AER) network, which is highly capable in recognizing the expressions with reasonable accuracy. The model achieved 86.36% accuracy with 88% precision [49]. In [63], the author proposed a technique for the FER system to reduce the parameters. A deep learning neural network with a fully connected layer and the global average pooling (GAP) method is applied to achieve 73.80% accuracy. Our proposed method shows a comparatively high recognition rate of 99.30% accuracy on CK+ and 92.50% on JAFFE dataset. Our model on the JAFFE dataset did not perform well due to the complex nature of the dataset, but on CK+, it performed well.
As is quite evident after plenty of research and deliberation, gaining insight into what a person may be feeling is very valuable for many reasons by identifying human feelings from their facial expressions. We have adopted a unique approach, the SSAE-FER model, for the classification of facial expressions. Our model learns the features automatically when input images are given to the model. The pre-training of datasets is achieved in an unsupervised manner and then fine-tuned in a supervised manner. After that, the probability estimation matrix showed the most effective results in the classification of seven basic facial expressions.
Our work was limited to training on CPU-based machines, which is why it took a longer time for training. In the future, we will use a framework that could support GPU, which will improve the training time. Several possible research directions of our proposed model can be utilized for the binary classification of images, such as tumor classification and segmentation. The performance of the proposed model could be enhanced by providing a larger dataset; moreover, it can be used for color-based datasets and real-time scenarios.
This research work was supported by Zayed University in Abu Dhabi with research fund #R20102.
The authors declare that they have no conflict of interest.
[1] |
Noguchi K, Herr D, Mutoh T, et al. (2009) Lysophosphatidic acid (LPA) and its receptors. Curr Opin Pharmacol 9: 15-23. doi: 10.1016/j.coph.2008.11.010
![]() |
[2] |
Yung YC, Stoddard NC, Chun J (2014) LPA receptor signaling: pharmacology, physiology, and pathophysiology. J Lipid Res 55: 1192-1214. doi: 10.1194/jlr.R046458
![]() |
[3] |
van Meeteren LA, Ruurs P, Stortelers C, et al. (2006) Autotaxin, a secreted lysophospholipase D, is essential for blood vessel formation during development. Mol Cell Biol 26: 5015-5022. doi: 10.1128/MCB.02419-05
![]() |
[4] |
Tanaka M, Okudaira S, Kishi Y, et al. (2006) Autotaxin stabilizes blood vessels and is required for embryonic vasculature by producing lysophosphatidic acid. J Biol Chem 281: 25822-25830. doi: 10.1074/jbc.M605142200
![]() |
[5] | Gesta S, Simon MF, Rey A, et al. (2002) Secretion of a lysophospholipase D activity by adipocytes: involvement in lysophosphatidic acid synthesis. J Lipid Res 43: 904-910. |
[6] | Ferry G, Tellier E, Try A, et al. (2003) Autotaxin is released from adipocytes, catalyzes lysophosphatidic acid synthesis, and activates preadipocyte proliferation. Up-regulated expression with adipocyte differentiation and obesity. J Biol Chem 278: 18162-18169. |
[7] |
Boucher J, Quilliot D, Pradere JP, et al. (2005) Potential involvement of adipocyte insulin resistance in obesity-associated up-regulation of adipocyte lysophospholipase D/autotaxin expression. Diabetologia 48: 569-577. doi: 10.1007/s00125-004-1660-8
![]() |
[8] |
Rancoule C, Dusaulcy R, Tréguer K, et al. (2012) Depot-specific regulation of autotaxin with obesity in human adipose tissue. J Physiol Biochem 68: 635-644. doi: 10.1007/s13105-012-0181-z
![]() |
[9] | Pagés G, Girard A, Jeanneton O, et al. (2000) LPA as a paracrine mediator of adipocyte growth and function. Ann N Y Acad Sci 905: 159-164. |
[10] |
Simon MF, Daviaud D, Pradere JP, et al. (2005) Lysophosphatidic acid inhibits adipocyte differentiation via lysophosphatidic acid 1 receptor-dependent down-regulation of peroxisome proliferator-activated receptor gamma2. J Biol Chem 280: 14656-14662. doi: 10.1074/jbc.M412585200
![]() |
[11] |
Dusaulcy R, Daviaud D, Pradere JP, et al. (2009) Altered food consumption in mice lacking lysophosphatidic acid receptor-1. J Physiol Biochem 65: 345-350. doi: 10.1007/BF03185929
![]() |
[12] |
Dusaulcy R, Rancoule C, Grès S, et al. (2011) Adipose-specific disruption of autotaxin enhances nutritional fattening and reduces plasma lysophosphatidic acid. J Lipid Res 52: 1247-1255. doi: 10.1194/jlr.M014985
![]() |
[13] |
Rancoule C, Attané C, Grès S, et al. (2013) Lysophosphatidic acid impairs glucose homeostasis and inhibits insulin secretion in high-fat diet obese mice. Diabetologia 56: 1394-1402. doi: 10.1007/s00125-013-2891-3
![]() |
[14] |
Rancoule C, Viaud M, Grès S, et al. (2014) Pro-fibrotic activity of lysophosphatidic acid in adipose tissue: in vivo and in vitro evidence. Biochim Biophys Acta 1841: 88-96. doi: 10.1016/j.bbalip.2013.10.003
![]() |
[15] |
Dalfó E, Hernandez M, Lizcano JM, et al. (2003) Activation of human lung semicarbazide sensitive amine oxidase by a low molecular weight component present in human plasma. Biochim Biophys Acta 1638: 278-286. doi: 10.1016/S0925-4439(03)00094-2
![]() |
[16] |
Bour S, Daviaud D, Grès S, et al. (2007) Adipogenesis-related increase of semicarbazide-sensitive amine oxidase and monoamine oxidase in human adipocytes. Biochimie 89: 916-925. doi: 10.1016/j.biochi.2007.02.013
![]() |
[17] |
Iglesias-Osma MC, Garcia-Barrado MJ, Visentin V, et al. (2004) Benzylamine exhibits insulin-like effects on glucose disposal, glucose transport, and fat cell lipolysis in rabbits and diabetic mice. J Pharmacol Exp Ther 309: 1020-1028. doi: 10.1124/jpet.103.063636
![]() |
[18] |
Mercader J, Wanecq E, Chen J, et al. (2011) Isopropylnorsynephrine is a stronger lipolytic agent in human adipocytes than synephrine and other amines present in Citrus aurantium. J Physiol Biochem 67: 443-452. doi: 10.1007/s13105-011-0078-2
![]() |
[19] | Visentin V, Morin N, Fontana E, et al. (2001) Dual action of octopamine on glucose transport into adipocytes: inhibition via beta3-adrenoceptor activation and stimulation via oxidation by amine oxidases. J Pharmacol Exp Ther 299: 96-104. |
[20] |
Iffiu-Soltesz Z, Prévot D, Carpéné C (2009) Influence of prolonged fasting on monoamine oxidase and semicarbazide-sensitive amine oxidase activities in rat white adipose tissue. J Physiol Biochem 65: 11-23. doi: 10.1007/BF03165965
![]() |
[21] |
Carpéné C, Chalaux E, Lizarbe M, et al. (1993) Beta 3-adrenergic receptors are responsible for the adrenergic inhibition of insulin-stimulated glucose transport in rat adipocytes. Biochem J 296: 99-105. doi: 10.1042/bj2960099
![]() |
[22] |
Iglesias-Osma MC, Bour S, Garcia-Barrado MJ, et al. (2005) Methylamine but not mafenide mimics insulin-like activity of the semicarbazide-sensitive amine oxidase-substrate benzylamine on glucose tolerance and on human adipocyte metabolism. Pharmacol Res 52: 475-484. doi: 10.1016/j.phrs.2005.07.008
![]() |
[23] |
Carpéné C, Bousquet-Melou A, Galitzky J, et al. (1998) Lipolytic effects of beta 1-, beta 2-, and beta 3-adrenergic agonists in white adipose tissue of mammals. Ann N Y Acad Sci 839: 186-189. doi: 10.1111/j.1749-6632.1998.tb10756.x
![]() |
[24] | Castan I, Valet P, Quideau N, et al. (1994) Antilipolytic effects of alpha 2-adrenergic agonists, neuropeptide Y, adenosine, and PGE1 in mammal adipocytes. Am J Physiol 266: R1141-1147. |
[25] |
Sekharam M, Cunnick JM, Wu J (2000) Involvement of lipoxygenase in lysophosphatidic acid-stimulated hydrogen peroxide release in human HaCaT keratinocytes. Biochem J 346: 751-758. doi: 10.1042/bj3460751
![]() |
[26] |
Pizzinat N, Marti L, Remaury A, et al. (1999) High expression of monoamine oxidases in human white adipose tissue: evidence for their involvement in noradrenaline clearance. Biochem Pharmacol 58: 1735-1742. doi: 10.1016/S0006-2952(99)00270-1
![]() |
[27] | Schimmel RJ, Honeyman TW, McMahon KK, et al. (1980) Inhibition of cyclic AMP accumulation in hamster adipocytes with phosphatidic acid: differences and similarities with alpha adrenergic effects. J Cyclic Nucleotide Res 6: 437-449. |
[28] | Lafontan M, Bousquet-Melou A, Galitzky J, et al. (1995) Adrenergic receptors and fat cells: differential recruitment by physiological amines and homologous regulation. Obes Res 3 Suppl 4: 507s-514s. |
[29] | Carpéné C, Schaak S, Guilbeau-Frugier C, et al. (2016) High intake of dietary tyramine does not deteriorate glucose handling and does not cause adverse cardiovascular effects in mice. J Physiol Biochem in press. |
[30] | Liu YB, Kharode Y, Bodine PV, et al. (2010) LPA induces osteoblast differentiation through interplay of two receptors: LPA1 and LPA4. J Cell Biochem 109: 794-800. |
[31] |
Jun DJ, Lee JH, Choi BH, et al. (2006) Sphingosine-1-phosphate modulates both lipolysis and leptin production in differentiated rat white adipocytes. Endocrinology 147: 5835-5844. doi: 10.1210/en.2006-0579
![]() |
[32] |
Hirakawa M, Karashima Y, Watanabe M, et al. (2007) Protein kinase A inhibits lysophosphatidic acid-induced migration of airway smooth muscle cells. J Pharmacol Exp Ther 321: 1102-1108. doi: 10.1124/jpet.106.118042
![]() |
[33] |
Chang CL, Lin ME, Hsu HY, et al. (2008) Lysophosphatidic acid-induced interleukin-1 beta expression is mediated through Gi/Rho and the generation of reactive oxygen species in macrophages. J Biomed Sci 15: 357-363. doi: 10.1007/s11373-007-9223-x
![]() |
[34] |
Lee MJ, Jeon ES, Lee JS, et al. (2008) Lysophosphatidic acid in malignant ascites stimulates migration of human mesenchymal stem cells. J Cell Biochem 104: 499-510. doi: 10.1002/jcb.21641
![]() |
[35] |
Carpéné C, Berlan M, Lafontan M (1983) Lack of functional antilipolytic alpha 2-adrenoceptor in rat fat cell:comparison with hamster adipocyte. Comp Biochem Physiol C 74: 41-45. doi: 10.1016/0742-8413(83)90145-7
![]() |
[36] |
Yea K, Kim J, Lim S, et al. (2008) Lysophosphatidic acid regulates blood glucose by stimulating myotube and adipocyte glucose uptake. J Mol Med (Berl) 86: 211-220. doi: 10.1007/s00109-007-0269-z
![]() |
[37] |
Thomson FJ, Moyes C, Scott PH, et al. (1996) Lysophosphatidic acid stimulates glucose transport in Xenopus oocytes via a phosphatidylinositol 3'-kinase with distinct properties. Biochem J 316: 161-166. doi: 10.1042/bj3160161
![]() |
[38] | Keller JN, Steiner MR, Mattson MP, et al. (1996) Lysophosphatidic acid decreases glutamate and glucose uptake by astrocytes. J Neurochem 67: 2300-2305. |
[39] |
Rancoule C, Dusaulcy R, Tréguer K, et al. (2014) Involvement of autotaxin/lysophosphatidic acid signaling in obesity and impaired glucose homeostasis. Biochimie 96: 140-143. doi: 10.1016/j.biochi.2013.04.010
![]() |
[40] |
McIntyre TM, Pontsler AV, Silva AR, et al. (2003) Identification of an intracellular receptor for lysophosphatidic acid (LPA): LPA is a transcellular PPARgamma agonist. Proc Natl Acad Sci U S A 100: 131-136. doi: 10.1073/pnas.0135855100
![]() |
[41] |
Iffiu-Soltesz Z, Mercader J, Daviaud D, et al. (2011) Increased primary amine oxidase expression and activity in white adipose tissue of obese and diabetic db-/- mice. J Neural Transm (Vienna) 118: 1071-1077. doi: 10.1007/s00702-011-0586-9
![]() |
[42] |
Rancoule C, Pradere JP, Gonzalez J, et al. (2011) Lysophosphatidic acid-1-receptor targeting agents for fibrosis. Expert Opin Investig Drugs 20: 657-667. doi: 10.1517/13543784.2011.566864
![]() |
[43] |
Liu S, Umezu-Goto M, Murph M, ., et al. (2009) Expression of autotaxin and lysophosphatidic acid receptors increases mammary tumorigenesis, invasion, and metastases. Cancer Cell 15: 539-550. doi: 10.1016/j.ccr.2009.03.027
![]() |
[44] |
Sibon I, Mercier N, Darret D, et al. (2008) Association between semicarbazide-sensitive amine oxidase, a regulator of the glucose transporter, and elastic lamellae thinning during experimental cerebral aneurysm development: laboratory investigation. J Neurosurg 108: 558-566. doi: 10.3171/JNS/2008/108/3/0558
![]() |
[45] |
Taylor LA, Arends J, Hodina AK, et al. (2007) Plasma lyso-phosphatidylcholine concentration is decreased in cancer patients with weight loss and activated inflammatory status. Lipids Health Dis 6: 17. doi: 10.1186/1476-511X-6-17
![]() |
[46] |
Reeves VL, Trybula JS, Wills RC, et al. (2015) Serum Autotaxin/ENPP2 correlates with insulin resistance in older humans with obesity. Obesity (Silver Spring) 23: 2371-2376. doi: 10.1002/oby.21232
![]() |
[47] |
Visentin V, Prévot D, De Saint Front VD, et al. (2004) Alteration of amine oxidase activity in the adipose tissue of obese subjects. Obes Res 12: 547-555. doi: 10.1038/oby.2004.62
![]() |
1. | Jassim AlMulla, Mohammad Tariqul Islam, Hamada R. H. Al-Absi, Tanvir Alam, Rahul Gomes, SoccerNet: A Gated Recurrent Unit-based model to predict soccer match winners, 2023, 18, 1932-6203, e0288933, 10.1371/journal.pone.0288933 | |
2. | Chaolin Tang, Dong Zhang, Qichuan Tian, Convolutional Neural Network–Bidirectional Gated Recurrent Unit Facial Expression Recognition Method Fused with Attention Mechanism, 2023, 13, 2076-3417, 12418, 10.3390/app132212418 | |
3. | Edy Winarno, Anindita Septiarini, Wiwien Hadikurniawati, Hamdani Hamdani, The Hybrid Features and Supervised Learning for Batik Pattern Classification, 2024, 17, 1556-4673, 1, 10.1145/3631131 | |
4. | Samla Salim, R. Sarath, BREAST CANCER DETECTION AND CLASSIFICATION USING HISTOPATHOLOGICAL IMAGES BASED ON OPTIMIZATION-ENABLED DEEP LEARNING, 2024, 36, 1016-2372, 10.4015/S101623722350028X | |
5. | Shengfu Zhang, Zhongjie Xiao, Facial Expression Recognition Using a Semantic-Based Bottleneck Attention Module, 2024, 20, 1552-6283, 1, 10.4018/IJSWIS.352418 | |
6. | Haoyu Zhou, Lingfeng Sang, Jingjing Luo, Hongbo Wang, Yongfei Feng, Xueze Zhang, Li Chen, Development of a multifunctional nursing bed system for in-bed position recognition and automatic repositioning, 2024, 238, 0954-4062, 5437, 10.1177/09544062231223878 | |
7. | Fen Wang, Kim Ju Kyoung, Leather Defect Detection Method in Clothing Design Based on TDENet, 2023, 11, 2169-3536, 104890, 10.1109/ACCESS.2023.3308493 | |
8. | A Çelik, E Tekin, Classification of Hatchery Eggs Using a Machine Learning Algorithm Based on Image Processing Methods: A Comparative Study, 2024, 26, 1806-9061, 10.1590/1806-9061-2023-1882 | |
9. | Jian‐Ting Shi, Gui‐Xu Qu, Zhi‐Jun Li, UCSwin‐UNet model for medical image segmentation based on cardiac haemangioma, 2024, 18, 1751-9659, 3302, 10.1049/ipr2.13175 | |
10. | P Karthikeyan, Kirutheesvar S, Sivakumar S, 2024, Facial Emotion Recognition for Enhanced Human-Computer Interaction using Deep Learning and Temporal Modeling with BiLSTM, 979-8-3315-0440-3, 1791, 10.1109/ICOSEC61587.2024.10722687 | |
11. | Muhammad Hameed Siddiqi, Irshad Ahmad, Yousef Alhwaiti, Faheem Khan, Facial Expression Recognition for Healthcare Monitoring Systems Using Neural Random Forest, 2025, 29, 2168-2194, 30, 10.1109/JBHI.2024.3482450 | |
12. | Kun Zheng, Li Tian, Jinling Cui, Junhua Liu, Hui Li, Jing Zhou, Junjie Zhang, An Adaptive Thresholding Method for Facial Skin Detection in HSV Color Space, 2025, 25, 1530-437X, 3098, 10.1109/JSEN.2024.3506579 | |
13. | Muhammed Telceken, Devrim Akgun, Sezgin Kacar, Kübra YESİN, Metin Yıldız, Can artificial intelligence understand our emotions? Deep learning applications with face recognition, 2025, 1046-1310, 10.1007/s12144-025-07375-0 |
Parametric name | Values |
Hidden layers Number of neurons at each layer Number of epochs Learning rate Momentum Mini batch size Sparsity |
2 Layer1 & Layer2 = 100 200 for pre-training and 6000 for fine-tuning 0.0001 0.9 32 0.5 |
No. | Expression | Precision % | Sensitivity % | Specificity % | Accuracy % |
1 | Anger | 100.00 | 82.00 | 100.00 | 99.00 |
2 | Contempt | 100.00 | 100.00 | 100.00 | 100.00 |
3 | Disgust | 100.00 | 100.00 | 100.00 | 100.00 |
4 | Fear | 100.00 | 100.00 | 100.00 | 100.00 |
5 | Happy | 100.00 | 100.00 | 100.00 | 100.00 |
6 | Sadness | 95.00 | 95.10 | 100.00 | 96.00 |
7 | Surprise | 100.00 | 100.00 | 100.00 | 100.00 |
Mean | 99.28 | 96.72 | 100.00 | 99.30 |
No. | Expression | Precision % | Sensitivity % | Specificity % | Accuracy % |
1 | Anger | 96.30 | 96.30 | 99.30 | 98.00 |
2 | Disgust | 85.70 | 92.30 | 97.50 | 96.70 |
3 | Fear | 91.30 | 100.00 | 97.50 | 99.40 |
4 | Happy | 100.00 | 92.30 | 100.00 | 98.90 |
5 | Neutral | 86.50 | 83.90 | 97.50 | 95.20 |
6 | Sadness | 89.70 | 89.70 | 98.10 | 96.80 |
7 | Surprise | 100.00 | 96.20 | 100.00 | 99.40 |
Mean | 92.90 | 92.96 | 98.60 | 92.50 |
Methodology | Precision % | Sensitivity % | Specificity % | Error rate % | Accuracy % |
[58] | - | - | - | 10.55 | 89.45 |
[59] | - | - | - | 17.90 | 82.10 |
[49] | 88.00 | 86.00 | - | 13.64 | 86.36 |
[60] | - | - | - | 26.20 | 73.80 |
Our model on the CK+ dataset Our model on the JAFFE dataset |
99.28 92.90 |
96.72 92.96 |
100.00 98.60 |
0.70 7.50 |
99.30 92.50 |
Parametric name | Values |
Hidden layers Number of neurons at each layer Number of epochs Learning rate Momentum Mini batch size Sparsity |
2 Layer1 & Layer2 = 100 200 for pre-training and 6000 for fine-tuning 0.0001 0.9 32 0.5 |
No. | Expression | Precision % | Sensitivity % | Specificity % | Accuracy % |
1 | Anger | 100.00 | 82.00 | 100.00 | 99.00 |
2 | Contempt | 100.00 | 100.00 | 100.00 | 100.00 |
3 | Disgust | 100.00 | 100.00 | 100.00 | 100.00 |
4 | Fear | 100.00 | 100.00 | 100.00 | 100.00 |
5 | Happy | 100.00 | 100.00 | 100.00 | 100.00 |
6 | Sadness | 95.00 | 95.10 | 100.00 | 96.00 |
7 | Surprise | 100.00 | 100.00 | 100.00 | 100.00 |
Mean | 99.28 | 96.72 | 100.00 | 99.30 |
No. | Expression | Precision % | Sensitivity % | Specificity % | Accuracy % |
1 | Anger | 96.30 | 96.30 | 99.30 | 98.00 |
2 | Disgust | 85.70 | 92.30 | 97.50 | 96.70 |
3 | Fear | 91.30 | 100.00 | 97.50 | 99.40 |
4 | Happy | 100.00 | 92.30 | 100.00 | 98.90 |
5 | Neutral | 86.50 | 83.90 | 97.50 | 95.20 |
6 | Sadness | 89.70 | 89.70 | 98.10 | 96.80 |
7 | Surprise | 100.00 | 96.20 | 100.00 | 99.40 |
Mean | 92.90 | 92.96 | 98.60 | 92.50 |
Methodology | Precision % | Sensitivity % | Specificity % | Error rate % | Accuracy % |
[58] | - | - | - | 10.55 | 89.45 |
[59] | - | - | - | 17.90 | 82.10 |
[49] | 88.00 | 86.00 | - | 13.64 | 86.36 |
[60] | - | - | - | 26.20 | 73.80 |
Our model on the CK+ dataset Our model on the JAFFE dataset |
99.28 92.90 |
96.72 92.96 |
100.00 98.60 |
0.70 7.50 |
99.30 92.50 |