Research article Special Issues

Emotion recognition in talking-face videos using persistent entropy and neural networks


  • Received: 16 November 2021 Revised: 07 February 2022 Accepted: 14 February 2022 Published: 18 February 2022
  • The automatic recognition of a person's emotional state has become a very active research field that involves scientists specialized in different areas such as artificial intelligence, computer vision, or psychology, among others. Our main objective in this work is to develop a novel approach, using persistent entropy and neural networks as main tools, to recognise and classify emotions from talking-face videos. Specifically, we combine audio-signal and image-sequence information to compute a topology signature (a 9-dimensional vector) for each video. We prove that small changes in the video produce small changes in the signature, ensuring the stability of the method. These topological signatures are used to feed a neural network to distinguish between the following emotions: calm, happy, sad, angry, fearful, disgust, and surprised. The results reached are promising and competitive, beating the performances achieved in other state-of-the-art works found in the literature.

    Citation: Eduardo Paluzo-Hidalgo, Rocio Gonzalez-Diaz, Guillermo Aguirre-Carrazana. Emotion recognition in talking-face videos using persistent entropy and neural networks[J]. Electronic Research Archive, 2022, 30(2): 644-660. doi: 10.3934/era.2022034

    Related Papers:

  • The automatic recognition of a person's emotional state has become a very active research field that involves scientists specialized in different areas such as artificial intelligence, computer vision, or psychology, among others. Our main objective in this work is to develop a novel approach, using persistent entropy and neural networks as main tools, to recognise and classify emotions from talking-face videos. Specifically, we combine audio-signal and image-sequence information to compute a topology signature (a 9-dimensional vector) for each video. We prove that small changes in the video produce small changes in the signature, ensuring the stability of the method. These topological signatures are used to feed a neural network to distinguish between the following emotions: calm, happy, sad, angry, fearful, disgust, and surprised. The results reached are promising and competitive, beating the performances achieved in other state-of-the-art works found in the literature.



    加载中


    [1] E. Ertay, H. Huang, Z. Sarsenbayeva, T. Dingler, Challenges of emotion detection using facial expressions and emotion visualisation in remote communication, in Processing of the 2021 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Academic Press, (2021), 230–236. https://doi.org/10.1145/3460418.3479341
    [2] B. Sun, S. Cao, D. Li, J. He, Dynamic micro-expression recognition using knowledge distillation, IEEE Trans. Affect. Comput., (2020), In press. https://doi.org/10.1109/TAFFC.2020.2986962
    [3] J. Gou, B. Yu, S. J. Maybank, D. Tao, Knowledge distillation: A survey, Int. J. Comput. Vis., 129 (2021), 1789–1819. https://doi.org/10.1007/s11263-021-01453-z doi: 10.1007/s11263-021-01453-z
    [4] I. Ofodile, K. Kulkarni, C. A. Corneanu, S. Escalera, X. Baro, S. Hyniewska, et al., Automatic recognition of deceptive facial expressions of emotion, Comput. Sci., 2017. https://arXiv.org/abs/1707.04061.
    [5] S. Shojaeilangari, W. Y. Yau, E. K. Teoh, Pose-invariant descriptor for facial emotion recognition, Mach. Vis. Appl., 27 (2016), 1063–1070. https://doi.org/10.1007/s00138-016-0794-2 doi: 10.1007/s00138-016-0794-2
    [6] J. Wan, S. Escalera, G. Anbarjafari, H. J. Escalante, X. Baró, I. Guyon, et al., Results and analysis of chalearn lap multi-modal isolated and continuous gesture recognition, and real versus fake expressed emotions challenges, in IEEE International Conference on Computer Vision Workshop, (2017), 3189–3197. https://doi.org/10.1109/ICCVW.2017.377
    [7] E. Avots, T. Sapiński, M. Bachmann, D. Kamińska, Audiovisual emotion recognition in wild, Mach. Vis. Appl., 30 (2019), 975–985. https://doi.org/10.1007/s00138-018-0960-9 doi: 10.1007/s00138-018-0960-9
    [8] A. Kleinsmith, N. Bianchi-Berthouze, Affective body expression perception and recognition: A survey, IEEE Trans. Affect. Comput., 4 (2012), 15–33. https://doi.org/10.1109/T-AFFC.2012.16 doi: 10.1109/T-AFFC.2012.16
    [9] C. T. Lu, C. W. Su, H. L. Jiang, Y. Y. Lu, An interactive greeting system using convolutional neural networks for emotion recognition, Entertain. Comput., 40 (2022), 100452. https://doi.org/10.1016/j.entcom.2021.100452 doi: 10.1016/j.entcom.2021.100452
    [10] F. Noroozi, D. Kaminska, C. Corneanu, T. Sapinski, S. Escalera, G. Anbarjafari, Survey on emotional body gesture recognition, IEEE Trans. Affect. Comput., 12 (2018), 505–523. https://doi.org/10.1109/TAFFC.2018.2874986 doi: 10.1109/TAFFC.2018.2874986
    [11] P. Pławiak, T. Sośnicki, M. Niedźwiecki, Z. Tabor, K. Rzecki, Hand body language gesture recognition based on signals from specialized glove and machine learning algorithms, IEEE Trans. Industr. Inform. 12 (2016), 1104–1113. https://doi.org/10.1109/TII.2016.2550528 doi: 10.1109/TII.2016.2550528
    [12] T. Sapiński, D. Kamińska, A. Pelikant, C. Ozcinar, E. Avots, G. Anbarjafari, Multimodal database of emotional speech, video and gestures, in Pattern Recognition and Information Forensics, ICPR 2018 Lecture Notes in Computer Science, 11188 (2019). https://doi.org/10.1007/978-3-030-05792-315
    [13] R. Jenke, A. Peer, M. Buss, Feature extraction and selection for emotion recognition from eeg, IEEE Trans. Affect. Comput., 5 (2014), 327–339. https://doi.org/10.1109/TAFFC.2014.2339834 doi: 10.1109/TAFFC.2014.2339834
    [14] S. Kwon, Mlt-dnet: Speech emotion recognition using 1d dilated cnn based on multi-learning trick approach, Expert Syst. Appl., 167 (2021), 114177. https://doi.org/10.1016/j.eswa.2020.114177 doi: 10.1016/j.eswa.2020.114177
    [15] D. Issa, M. F. Demirci, A. Yazici, Speech emotion recognition with deep convolutional neural networks, Biomed. Signal Process. Control, 59 (2020), 101894. https://doi.org/10.1016/j.bspc.2020.101894 doi: 10.1016/j.bspc.2020.101894
    [16] S. R. Livingstone, F. A. Russo, The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english, Plos One, 13 (2018), 1–35. https://doi.org/10.1371/journal.pone.0196391 doi: 10.1371/journal.pone.0196391
    [17] R. Gonzalez-Diaz, E. Paluzo-Hidalgo, J. F. Quesada, Towards emotion recognition: A persistent entropy application, in Processing of the International Conference on Computational Topology in Image Context, Academic Press, (2019), 96–109. https://doi.org/10.1007/978-3-030-10828-18
    [18] B. Zhang, G. Essl, E. M. Provost, Recognizing emotion from singing and speaking using shared models, in Processing of the IEEE International Conference on affective computing and intelligent interaction, Academic Press, (2015), 139–145. https://doi.org/10.1109/ACII.2015.7344563
    [19] H. Elhamdadi, S. Canavan, P. Rosen, Affective TDA: Using topological data analysis to improve analysis and explainability in affective computing, IEEE Trans. Vis. Comput. Graph., 28 (2021), 769–779. https://doi.org/0.1109/TVCG.2021.3114784
    [20] H. Edelsbrunner, J. Harer, Computational topology: an introduction, Am. Math. Soc., Academic Press, (2010). https://doi.org/10.1090/mbk/069
    [21] X. Guo, L. F. Polanía, K. E. Barner, Audio-video emotion recognition in the wild using deep hybrid networks, 2020. https://arXiv.org/abs/2002.09023.
    [22] J. Kossaifi, G. Tzimiropoulos, S. Todorovic, M. Pantic, Afew-va database for valence and arousal estimation in-the-wild, Image Vis. Comput., 65 (2017), 23–36. https://doi.org/10.1016/j.imavis.2017.02.001 doi: 10.1016/j.imavis.2017.02.001
    [23] H. Chintakunta, T. Gentimis, R. Gonzalez-Diaz, M. J. Jimenez, H. Krim, An entropy-based persistence barcode, Pattern Recognit., 48 (2015), 391–401. https://doi.org/10.1016/j.patcog.2014.06.023 doi: 10.1016/j.patcog.2014.06.023
    [24] N. Atienza, R. Gonzalez-Diaz, M. Soriano-Trigueros, On the stability of persistent entropy and new summary functions for topological data analysis, Pattern Recognit., 107 (2020), 107509. https://doi.org/10.1016/j.patcog.2020.107509 doi: 10.1016/j.patcog.2020.107509
    [25] M. Rucco, R. Gonzalez-Diaz, M. J. Jimenez, N. Atienza, C. Cristalli, E. Concettoni, et al., A new topological entropy-based approach for measuring similarities among piecewise linear functions, Signal Process., 134 (2017), 130–138. https://doi.org/10.1016/j.sigpro.2016.12.006 doi: 10.1016/j.sigpro.2016.12.006
    [26] A. Myers, E. Munch, F. A. Khasawneh, Persistent homology of complex networks for dynamic state detection, Phys. Rev. E, 100 (2019), 022314. https://doi.org/10.1103/PhysRevE.100.022314 doi: 10.1103/PhysRevE.100.022314
    [27] X. Wang, F. Sohel, M. Bennamoun, Y. Guo, H. Lei, Scale space clustering evolution for salient region detection on 3d deformable shapes, Pattern Recognit., 71 (2017), 414–427. https://doi.org/10.1016/j.patcog.2017.05.018 doi: 10.1016/j.patcog.2017.05.018
    [28] Y. M. Chung, C. S. Hu, Y. L. Lo, H. T. Wu, A persistent homology approach to heart rate variability analysis with an application to sleep-wake classification, Front. Phys., 12 (2021), 202. https://doi.org/10.3389/fphys.2021.637684 doi: 10.3389/fphys.2021.637684
    [29] M. Rucco, G. Viticchi, L. Falsetti, Towards personalized diagnosis of glioblastoma in fluid-attenuated inversion recovery (flair) by topological interpretable machine learning, Electr. Eng. Syst. Sci., 8 (2020), 770. https://doi.org/10.3390/math8050770 doi: 10.3390/math8050770
    [30] J. Lamar-Leon, R. Alonso-Baryolo, E. Garcia-Reyes, R. Gonzalez-Diaz, Persistent homology-based gait recognition robust to upper body variations, in Processing of the 23rd International Conference on Pattern Recognition, Academic Press, (2016), 1083–1088. https://doi.org/10.1109/ICPR.2016.7899780
    [31] J. Lamar-Leon, R. Alonso-Baryolo, E. Garcia-Reyes, R. Gonzalez-Diaz, Topological features for monitoring human activities at distance, in Processing of the 2nd International Workshop on Activity Monitoring by Multiple Distributed Sensing, 8703 (2014), 40–51. https://doi.org/10.1007/978-3-319-13323-2
    [32] J. Lamar-Leon, A. Cerri, E. Garcia-Reyes, R. Gonzalez-Diaz, Gait-based gender classification using persistent homology, in Processing of the 18th Iberoamerican Congress on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Apps, 8259 (2013) 366–373. https://doi.org/10.1007/978-3-642-41827-346
    [33] C. D. Toth, J. O'Rourke, J. E. Goodman, Handbook of discrete and computational geometry, CRC press, Academic Press, (2017). https://doi.org/10.1201/9781315119601
    [34] A. Zomorodian, G. Carlsson, Computing persistent homology, Discrete Comput. Geom., 33 (2005), 249–274. https://doi.org/10.1007/s00454-004-1146-y doi: 10.1007/s00454-004-1146-y
    [35] S. S. Haykin, Neural networks and learning machines, Pearson Education, Upper Saddle River, NJ, Academic Press, 2009.
    [36] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, (2017). arXiv https://arXiv.org/abs/1412.6980
    [37] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, J. Mach. Learn. Res., 15 (2014), 1929–1958. http://jmlr.org/papers/v15/srivastava14a.html
    [38] R. Gonzalez-Diaz, P. Real, On the cohomology of 3d digital images, Discret. Appl. Math., 147 (2005), 245–263. https://doi.org/10.1016/j.dam.2004.09.014 doi: 10.1016/j.dam.2004.09.014
    [39] E. Diener, R. J. Larsen, S. Levine, R. A. Emmons, Intensity and frequency: dimensions underlying positive and negative affect, J. Pers. Soc. Psychol., 48 (1985), 1253. https://doi.org/10.1037//0022-3514.48.5.1253 doi: 10.1037//0022-3514.48.5.1253
    [40] H. Schlosberg, Three dimensions of emotion, Psychol. Rev., 61 (1954), 81. https://doi.org/10.1037/h0054570 doi: 10.1037/h0054570
    [41] D. Kamińska, T. Sapiński, A. Pelikant, Recognition of emotion intensity basing on neutral speech model, in Man-Machine Interactions 3, Springer, 242 (2014), 451–458. https://doi.org/10.1007/978-3-319-02309-049
    [42] S. W. Byun, S. P. Lee, Human emotion recognition based on the weighted integration method using image sequences and acoustic features, Multimed. Tools. Appl., 80 (2020), 35871–35885. https://doi.org/10.1007/s11042-020-09842-1 doi: 10.1007/s11042-020-09842-1
    [43] M. F. H. Siddiqui, A. Y. Javaid, A multimodal facial emotion recognition framework through the fusion of speech with visible and infrared images, Multimodal Technol. Int., 4 (2020), 46. https://doi.org/10.3390/mti4030046 doi: 10.3390/mti4030046
    [44] C. Luna-Jimenez, D. Griol, Z. Callejas, R. Kleinlein, J. Montero, F. Fernandez-Martinez, Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning, Sensors, 21 (2021), 7665. https://doi.org/10.3390/s21227665 doi: 10.3390/s21227665
    [45] E. Ghaleb, J. Niehues, S. Asteriadis, Multimodal attention-mechanism for temporal emotion recognition, in Processng of the IEEE International Conference on Image Processing, Academic Press, (2020), 251–255. https://doi.org/10.1109/ICIP40778.2020.9191019
  • Reader Comments
  • © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2128) PDF downloads(138) Cited by(0)

Article outline

Figures and Tables

Figures(8)  /  Tables(3)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog