Research article Special Issues

Leveraging ResNet and label distribution in advanced intelligent systems for facial expression recognition

  • Received: 19 January 2023 Revised: 27 March 2023 Accepted: 13 April 2023 Published: 24 April 2023
  • With the development of AI (Artificial Intelligence), facial expression recognition (FER) is a hot topic in computer vision tasks. Many existing works employ a single label for FER. Therefore, the label distribution problem has not been considered for FER. In addition, some discriminative features can not be captured well. To overcome these problems, we propose a novel framework, ResFace, for FER. It has the following modules: 1) a local feature extraction module in which ResNet-18 and ResNet-50 are used to extract the local features for the following feature aggregation; 2) a channel feature aggregation module, in which a channel-spatial feature aggregation method is adopted to learn the high-level features for FER; 3) a compact feature aggregation module, in which several convolutional operations are used to learn the label distributions to interact with the softmax layer. Extensive experiments conducted on the FER+ and Real-world Affective Faces databases demonstrate that the proposed approach obtains comparable performances: 89.87% and 88.38%, respectively.

    Citation: Zhenggeng Qu, Danying Niu. Leveraging ResNet and label distribution in advanced intelligent systems for facial expression recognition[J]. Mathematical Biosciences and Engineering, 2023, 20(6): 11101-11115. doi: 10.3934/mbe.2023491

    Related Papers:

  • With the development of AI (Artificial Intelligence), facial expression recognition (FER) is a hot topic in computer vision tasks. Many existing works employ a single label for FER. Therefore, the label distribution problem has not been considered for FER. In addition, some discriminative features can not be captured well. To overcome these problems, we propose a novel framework, ResFace, for FER. It has the following modules: 1) a local feature extraction module in which ResNet-18 and ResNet-50 are used to extract the local features for the following feature aggregation; 2) a channel feature aggregation module, in which a channel-spatial feature aggregation method is adopted to learn the high-level features for FER; 3) a compact feature aggregation module, in which several convolutional operations are used to learn the label distributions to interact with the softmax layer. Extensive experiments conducted on the FER+ and Real-world Affective Faces databases demonstrate that the proposed approach obtains comparable performances: 89.87% and 88.38%, respectively.



    加载中


    [1] G. M. Jacob, B. Stenger, Facial action unit detection with transformers, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 7680–7689. http://doi.org/10.1109/CVPR46437.2021.00759
    [2] X. Liu, L. Jin, X. Han, J. Lu, J. You, L. Kong, Identity-aware facial expression recognition in compressed video, arXiv preprint, (2021), arXiv: 2101.00317. https://doi.org/10.48550/arXiv.2101.00317
    [3] C. Liu, H. Wechsler, Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition, IEEE Trans. Image Process., 11 (2002), 467–476. https://doi.org/10.1109/TIP.2002.999679 doi: 10.1109/TIP.2002.999679
    [4] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), (2005), 886–893. https://doi.org/10.1109/CVPR.2005.177
    [5] C. Shan, S. Gong, P. W. McOwan, Facial expression recognition based on local binary patterns: A comprehensive study, Image Vision Comput., 27 (2009), 803–816. https://doi.org/10.1016/j.imavis.2008.08.005 doi: 10.1016/j.imavis.2008.08.005
    [6] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature, 521 (2015), 436–444. https://doi.org/10.1038/nature14539
    [7] H. Ding, S. K. Zhou, R. Chellappa, Facenet2expnet: Regularizing a deep face recognition net for expression recognition, in 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), (2017), 118–126. https://doi.org/10.1109/FG.2017.23
    [8] J. Cai, Z. Meng, A. S. Khan, Z. Li, J. O'Reilly, Y. Tong, Island loss for learning discriminative features in facial expression recognition, in 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), (2018), 302–309. http://doi.org/10.1109/FG.2018.00051
    [9] K. Wang, X. Peng, J. Yang, D. Meng, Y. Qiao, Region attention networks for pose and occlusion robust facial expression recognition, IEEE Trans. Image Process., 29 (2020), 4057–4069. http://doi.org/10.1109/TIP.2019.2956143 doi: 10.1109/TIP.2019.2956143
    [10] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, et al., Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint, (2017), arXiv: 1704.04861. https://doi.org/10.48550/arXiv.1704.04861
    [11] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L. C. Chen, Mobilenetv2: Inverted residuals and linear bottlenecks, in Proceedings of the IEEE conference on computer vision and pattern recognition, (2018), 4510–4520. https://doi.org/10.1109/CVPR.2018.00474
    [12] X. Zhang, X. Zhou, M. Lin, J. Sun, Shufflenet: An extremely efficient convolutional neural network for mobile devices, arXiv preprint, (2018), arXiv: 1707.01083. https://doi.org/10.48550/arXiv.1707.01083
    [13] N. Ma, X. Zhang, H. T. Zheng, J. Sun, Shufflenet v2: Practical guidelines for efficient cnn architecture design, arXiv preprint, (2018), arXiv: 1807.11164. https://doi.org/10.48550/arXiv.1807.11164
    [14] L. Yang, Y. Li, S. X. Yang, Y. Lu, T. Guo, K. Yu, Generative adversarial learning for intelligent trust management in 6G wireless networks, arXiv preprint, (2022), arXiv: 2208.01221. https://doi.org/10.48550/arXiv.2208.01221
    [15] S. Xia, Z. Yao, G. Wu, Y. Li, Distributed offloading for cooperative intelligent transportation under heterogeneous networks, IEEE Trans. Intell. Transp. Syst., 23 (2022), 16701–16714. http://doi.org/10.1109/TITS.2022.3190280 doi: 10.1109/TITS.2022.3190280
    [16] D. Peng, D. He, Y. Li, Z. Wang, Integrating terrestrial and satellite multibeam systems toward 6G: Techniques and challenges for interference mitigation, IEEE Wireless Commun., 29 (2022), 24–31. http://doi.org/10.1109/MWC.002.00293 doi: 10.1109/MWC.002.00293
    [17] H. Li, M. Zhang, D. Chen, J. Zhang, M. Yang, Z. Li, Image color rendering based on hinge-cross-entropy gan in internet of medical things, Comput. Model. Eng. Sci., 135 (2023), 779–794. https://doi.org/10.32604/cmes.2022.022369 doi: 10.32604/cmes.2022.022369
    [18] J. Zhang, Q. Yan, X. Zhu, K. Yu, Smart industrial IoT empowered crowd sensing for safety monitoring in coal mine, Digital Commun. Networks, 2022, in press. https://doi.org/10.1016/j.dcan.2022.08.002
    [19] L. Huang, R. Nan, K. Chi, Q. Hua, K. Yu, N. Kumar, et al., Throughput guarantees for multi-cell wireless powered communication networks with non-orthogonal multiple access, IEEE Trans. Veh. Technol., 71 (2022), 12104–12116. https://doi.org/10.1109/TVT.2022.3189699 doi: 10.1109/TVT.2022.3189699
    [20] H. Li, L. Hu, J. Zhang, Irregular mask image inpainting based on progressive generative adversarial networks, Imaging Sci. J., 2023 (2023), 1–14. https://doi.org/10.1080/13682199.2023.2180834 doi: 10.1080/13682199.2023.2180834
    [21] R. Plutchik, A general psychoevolutionary theory of emotion, Social Sci. Inf., 21 (1982), 529–553. https://doi.org/10.1177/053901882021004003 doi: 10.1177/053901882021004003
    [22] S. H. Gao, M. M. Cheng, K. Zhao, X. Y. Zhang, M. H. Yang, P. Torr, Res2net: A new multi-scale backbone architecture, arXiv preprint, (2019), arXiv: 1904.01169. https://doi.org/10.48550/arXiv.1904.01169
    [23] S. Chen, J. Wang, Y. Chen, Z. Shi, X. Geng, Y. Rui, Label distribution learning on auxiliary label space graphs for facial expression recognition, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 13984–13993. http://doi.org/10.1109/CVPR42600.2020.01400
    [24] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90
    [25] T. Connie, M. Al-Shabi, W. P. Cheah, M. Goh, Facial expression recognition using a hybrid cnn–sift aggregator, in Multi-disciplinary Trends in Artificial Intelligence, Springer, (2017), 139–149.
    [26] D. Feng, F. Ren, Dynamic facial expression recognition based on two-stream-cnn with LBP-TOP, in 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), (2018), 355–359. https://doi.org/10.1109/CCIS.2018.8691380
    [27] J. Shao, Y. Qian, Three convolutional neural network models for facial expression recognition in the wild, Neurocomputing, 355 (2019), 82–92. https://doi.org/10.1016/j.neucom.2019.05.005 doi: 10.1016/j.neucom.2019.05.005
    [28] Y. Li, Y. Lu, B. Chen, Z. Zhang, J. Li, G. Lu, et al., Learning informative and discriminative features for facial expression recognition in the wild, IEEE Trans. Circuits Syst. Video Technol., 32 (2022), 3178–3189. https://doi.org/10.1109/TCSVT.2021.3103760 doi: 10.1109/TCSVT.2021.3103760
    [29] N. Zeng, H. Zhang, B. Song, W. Liu, Y. Li, A. M. Dobaie, Facial expression recognition via learning deep sparse autoencoders, Neurocomputing, 273 (2018), 643–649. https://doi.org/10.1016/j.neucom.2017.08.043 doi: 10.1016/j.neucom.2017.08.043
    [30] Y. Liu, W. Dai, F. Fang, Y. Chen, R. Huang, R. Wang, et al., Dynamic multi-channel metric network for joint pose-aware and identity-invariant facial expression recognition, Inf. Sci., 578 (2021), 195–213. https://doi.org/10.1016/j.ins.2021.07.034 doi: 10.1016/j.ins.2021.07.034
    [31] Y. Zhang, C. Wang, W. Deng, Relative uncertainty learning for facial expression recognition, in 35th Conference on Neural Information Processing Systems (NeurIPS 2021), (2021), 17616–17627.
    [32] A. H. Farzaneh, X. Qi, Facial expression recognition in the wild via deep attentive center loss, in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), (2021), 2402–2411. https://doi.org/10.1109/WACV48630.2021.00245
    [33] J. Jiang, W. Deng, Disentangling identity and pose for facial expression recognition, IEEE Trans. Affective Comput., 13 (2022), 1868–1878. https://doi.org/10.1109/TAFFC.2022.3197761 doi: 10.1109/TAFFC.2022.3197761
    [34] F. Xue, Z. Tan, Y. Zhu, Z. Ma, G. Guo, Coarse-to-fine cascaded networks with smooth predicting for video facial expression recognition, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2022), 2412–2418. https://doi.org/10.1109/CVPRW56347.2022.00269
    [35] D. Ruan, Y. Yan, S. Lai, Z. Chai, C. Shen, H. Wang, Feature decomposition and reconstruction learning for effective facial expression recognition, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 7660–7669.
    [36] Y. Li, J. Zeng, S. Shan, X. Chen, Occlusion aware facial expression recognition using cnn with attention mechanism, IEEE Trans. Image Process., 28 (2019), 2439–2450. https://doi.org/10.1109/TIP.2018.2886767 doi: 10.1109/TIP.2018.2886767
    [37] J. Park, S. Woo, J. Y. Lee, I. S. Kweon, Bam: Bottleneck attention module, arXiv preprint, (2018), arXiv: 1807.06514. https://doi.org/10.48550/arXiv.1807.06514
    [38] S. Li, W. Deng, J. Du, Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 2852–2861.
    [39] I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, et al., Challenges in representation learning: A report on three machine learning contests, Neural Networks, 64 (2015), 59–63. https://doi.org/10.1016/j.neunet.2014.09.005 doi: 10.1016/j.neunet.2014.09.005
    [40] Y. Guo, L. Zhang, Y. Hu, X. He, J. Gao, Ms-celeb-1m: A dataset and benchmark for large-scale face recognition, in Computer Vision – ECCV 2016, Springer, (2016), 87–102. https://doi.org/arXiv: 1607.08221
    [41] Y. Li, Y. Lu, J. Li, G. Lu, Separate loss for basic and compound facial expression recognition in the wild, in Proceedings of The Eleventh Asian Conference on Machine Learning, (2019), 897–911.
    [42] K. Wang, X. Peng, J. Yang, S. Lu, Y. Qiao, Suppressing uncertainties for large-scale facial expression recognition, arXiv preprint, (2020), arXiv: 2002.10392. https://doi.org/10.48550/arXiv.2002.10392
    [43] Z. Zhao, Q. Liu, F. Zhou, Robust lightweight facial expression recognition network with label distribution training, in Proceedings of the AAAI conference on artificial intelligence, 35 (2021), 3510–3519. https://doi.org/10.1609/aaai.v35i4.16465
    [44] E. Barsoum, C. Zhang, C. C. Ferrer, Z. Zhang, Training deep networks for facial expression recognition with crowd-sourced label distribution, in Proceedings of the 18th ACM International Conference on Multimodal Interaction, (2016), 279–283. https://doi.org/10.1145/2993148.2993165
    [45] C. Huang, Combining convolutional neural networks for emotion recognition, in 2017 IEEE MIT Undergraduate Research Technology Conference (URTC), (2017), 1–4. https://doi.org/10.1109/URTC.2017.8284175
    [46] S. Albanie, A. Nagrani, A. Vedaldi, A. Zisserman, Emotion recognition in speech using cross-modal transfer in the wild, arXiv preprint, (2018), arXiv: 1808.05561. https://doi.org/10.48550/arXiv.1808.05561
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1736) PDF downloads(88) Cited by(2)

Article outline

Figures and Tables

Figures(4)  /  Tables(6)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog