Research article

AB-GRU: An attention-based bidirectional GRU model for multimodal sentiment fusion and analysis


  • Received: 27 June 2023 Revised: 11 September 2023 Accepted: 13 September 2023 Published: 27 September 2023
  • Multimodal sentiment analysis is an important area of artificial intelligence. It integrates multiple modalities such as text, audio, video and image into a compact multimodal representation and obtains sentiment information from them. In this paper, we improve two modules, i.e., feature extraction and feature fusion, to enhance multimodal sentiment analysis and finally propose an attention-based two-layer bidirectional GRU (AB-GRU, gated recurrent unit) multimodal sentiment analysis method. For the feature extraction module, we use a two-layer bidirectional GRU network and connect two layers of attention mechanisms to enhance the extraction of important information. The feature fusion part uses low-rank multimodal fusion, which can reduce the multimodal data dimensionality and improve the computational rate and accuracy. The experimental results demonstrate that the AB-GRU model can achieve 80.9% accuracy on the CMU-MOSI dataset, which exceeds the same model type by at least 2.5%. The AB-GRU model also possesses a strong generalization capability and solid robustness.

    Citation: Jun Wu, Xinli Zheng, Jiangpeng Wang, Junwei Wu, Ji Wang. AB-GRU: An attention-based bidirectional GRU model for multimodal sentiment fusion and analysis[J]. Mathematical Biosciences and Engineering, 2023, 20(10): 18523-18544. doi: 10.3934/mbe.2023822

    Related Papers:

  • Multimodal sentiment analysis is an important area of artificial intelligence. It integrates multiple modalities such as text, audio, video and image into a compact multimodal representation and obtains sentiment information from them. In this paper, we improve two modules, i.e., feature extraction and feature fusion, to enhance multimodal sentiment analysis and finally propose an attention-based two-layer bidirectional GRU (AB-GRU, gated recurrent unit) multimodal sentiment analysis method. For the feature extraction module, we use a two-layer bidirectional GRU network and connect two layers of attention mechanisms to enhance the extraction of important information. The feature fusion part uses low-rank multimodal fusion, which can reduce the multimodal data dimensionality and improve the computational rate and accuracy. The experimental results demonstrate that the AB-GRU model can achieve 80.9% accuracy on the CMU-MOSI dataset, which exceeds the same model type by at least 2.5%. The AB-GRU model also possesses a strong generalization capability and solid robustness.



    加载中


    [1] T. Uyen Tran, H. H. T. Thanh, P. H. Dang, M. Riveill, Multitask apect_based sentiment analysis with integrated bidirectional LSTM & CNN model, in Proceedings of the 4th International Conference on FutureNetworks and Distributed Systems (ICFNDS), (2020), 1–7. https://doi.org/10.1145/3440749.3442656
    [2] A. Agarwal, P. Dey, S. Kumar, Sentiment analysis using modified GRU, in Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing, (2022), 356–361. https://doi.org/10.1145/3549206.3549270
    [3] L. Wang, J. Niu, S. Yu, SentiDiff: Combining textual information and sentiment diffusion patterns for twitter sentiment analysis, IEEE Trans. Knowl. Data Eng., 32 (2020), 2026–2039. https://doi.org/10.1109/TKDE.2019.2913641 doi: 10.1109/TKDE.2019.2913641
    [4] M. A. Hassonah, R. Al-Sayyed, A. Rodan, A. M. Al-Zoubi, I. Aljarah, H. Faris, An efficient hybrid filter and evolutionary wrapper approach for sentiment analysis of various topics on Twitter, Knowl. Based Syst., 192 (2020), 105353. https://doi.org/10.1016/j.knosys.2019.105353 doi: 10.1016/j.knosys.2019.105353
    [5] M. G. Huddar, S. S. Sannakki, V. S. Rajpurohit, Attention-based multimodal contextual fusion for sentiment and emotion classification using bidirectional LSTM, Multim. Tools Appl., 80 (2021), 13059–13076. https://doi.org/10.1007/s11042-020-10285-x doi: 10.1007/s11042-020-10285-x
    [6] T. Jiang, J. Wang, Z. Liu, Y. Ling, Fusion-extraction network for multimodal sentiment analysis, in Proceedings of the Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, (2020), 785–797. https://doi.org/10.1007/978-3-030-47436-2_59
    [7] A. Zadeh, P. P. Liang, S. Poria, P. Vij, E. Cambria, L. P. Morency, Multi-attention recurrent network for human communication comprehension, preprint, arXiv: 1802.00923.
    [8] Z. Liu, Y. Shen, V. B. Lakshminarasimhan, P. P. Liang, A. Zadeh, L. P. Morency, Efficient low-rank multimodal fusion with modality-specific factors, preprint, arXiv: 1806.00064.
    [9] L. N. Zúñiga-Morales, J. Á. González-Ordiano, J. E. Quiroz-Ibarra, S. J. Simske, Impact evaluation of multimodal information on sentiment analysis. in Proceedings of the Advances in Computational Intelligence: 21st Mexican International Conference on Artificial Intelligence, (2022), 18–29. https://doi.org/10.1007/978-3-031-19496-2_2
    [10] D. Zeng, Y. Yu, K. Oyama, Deep triplet neural networks with cluster-CCA for audio-visual cross-modal retrieval, in ACM Transaction on Multimedia Computing Communication and Applications (TOMCCAP), (2020), 1–23. https://doi.org/10.1145/3387164
    [11] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, et al., Learning transferable visual models from natural language supervision, preprint, arXiv: 2103.00020.
    [12] A. Zadeh, M. Chen, S. Poria, E. Cambria, L. P. Morency, Tensor fusion network for multimodal sentiment analysis, preprint, arXiv: 1707.07250.
    [13] A. Zadeh, P. P. Liang, N. Mazumder, S. Poria, E. Cambria, L. P. Morency, Memory fusion network for multi-view sequential learning, preprint, arXiv: 1802.00927.
    [14] G. Van Houdt, C. Mosquera, G. Nápoles, A review on the long short-term memory model, Artif. Intell. Rev., 53 (2020), 5929–5955. https://doi.org/10.1007/s10462-020-09838-1 doi: 10.1007/s10462-020-09838-1
    [15] A. P. Rodrigues, R. Fernandes, A. Shetty, K. Lakshmanna, R. M. Shafi, Real-time twitter spam detection and sentiment analysis using machine learning and deep learning techniques, Comput. Intell. Neurosci., (2022). https://doi.org/10.1155/2022/5211949
    [16] A. Londhe, P. V. R. D. P. Rao, Aspect based sentiment analysis–an incremental model learning approach using LSTM-RNN, in Proceedings of the Advances in Computing and Data Sciences: 5th International Conference, (2021), 677–689. https://doi.org/10.1007/978-3-030-81462-5_59
    [17] H. Jelodar, Y. Wang, R. Orji, S. Huang, Deep sentiment classification and topic discovery on novel coronavirus or COVID-19 online discussions: NLP using LSTM recurrent neural network approach, IEEE J. Biomed. Health Inform., 24 (2020), 2733–2742. https://doi.org/10.1109/JBHI.2020.3001216 doi: 10.1109/JBHI.2020.3001216
    [18] F. Wang, S. Tian, L. Yu, J. Liu, J. Wang, K. Li, et al., TEDT: Transformer-based encoding-decoding translation network for multimodal sentiment analys, Cogn. Comput., (2022), 1–15 https://doi.org/10.1007/s12559-022-10073-9
    [19] J. Wu, T. Zhu, J. Zhu, T. Li, C. Wang, A Optimized BERT for Multimodal Sentiment Analysis, ACM Trans. Multim. Comput. Commun. Appl., 19 (2023), 1–12. https://doi.org/10.1080/09540091.2022.2155614 doi: 10.1080/09540091.2022.2155614
    [20] A. Bello, S. C. Ng, M. F. Leung, A BERT framework to sentiment analysis of tweets, Sensors, 23 (2023), 506. https://doi.org/10.3390/s23010506 doi: 10.3390/s23010506
    [21] J. Wei, J. Liao, Z. Yang, S. Wang, Q. Zhao, BiLSTM with multi-polarity orthogonal attention for implicit sentiment analysis, Neurocomputing, 383 (2020), 165–173. https://doi.org/10.1016/j.neucom.2019.11.054 doi: 10.1016/j.neucom.2019.11.054
    [22] Y. Zhang, J. Wang, X. Zhang, Conciseness is better: Recurrent attention LSTM model for document-level sentiment analysis, Neurocomputing, 462 (2021), 101–112. https://doi.org/10.1016/j.neucom.2021.07.072 doi: 10.1016/j.neucom.2021.07.072
    [23] J. Hassan, U. Shoaib, Multi-class review rating classification using deep recurrent neural network, Neural Process. Letters, 51 (2020), 1031–1048. https://doi.org/10.1007/s11063-019-10125-6 doi: 10.1007/s11063-019-10125-6
    [24] A. Zouzou, I. E. Azami, Text sentiment analysis with CNN & GRU model using GloVe, in Proceedings of the 2021 Fifth International Conference On Intelligent Computing in Data Sciences, (2021), 1–5. https://doi.org/10.1109/ICDS53782.2021.9626715
    [25] A. G. Eker, K. Eker, N. Duru, Multi-class sentiment analysis from turkish tweets with RNN, in Proceedings of the 2021 6th International Conference on Computer Science and Engineering (UBMK), (2021), 560–564. https://doi.org/10.1109/UBMK52708.2021.9558958
    [26] L. Zhu, Z. Zhu, C. Zhang, Y. Xu, X. Kong, Multimodal sentiment analysis based on fusion methods: A survey, Inform. Fusion, (2023), 306–325. https://doi.org/10.1016/j.inffus.2023.02.028
    [27] S. Ma, Z. Zeng, D. McDuff, Y. Song, Contrastive self-supervised learning of global-local audio-visual representations, 2021.
    [28] L. Zhu, M. Xu, Y. Bao, Y. Xu, X. Kong, Deep learning for aspect-based sentiment analysis: A review, PeerJ Comput. Sci., (2022), e1044. https://doi.org/10.7717/peerj-cs.1044
    [29] X. Liu, J. You, Y. Wu, T. Li, L. Li, Z. Zhang, et al., Attention-based bidirectional GRU networks for efficient HTTPS traffic classification. Inform. Sci., (2020), 297–315. https://doi.org/10.1016/j.ins.2020.05.035
    [30] J. Wu, T. Zhu, J. Zhu, T. Li, C. Wang, Hierarchical multiples self-attention mechanism for multi-modal analysis, Multim. Syst., (2023). https://doi.org/10.1016/j.ins.2020.05.035
    [31] A. Zadeh, R. Zellers, E. Pincus, L. P. Morency, Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos, preprint, arXiv: 1606.06259.
    [32] M. Chen, S. Wang, P. P. Liang, T. Baltrušaitis, A. Zadeh, L. P. Morency, Multimodal sentiment analysis with word-level fusion and reinforcement learning. in Proceedings of the 19th ACM international conference on multimodal interaction, (2017), 163–171. https://doi.org/10.1145/3136755.3136801
    [33] Y. H. H. Tsai, P. P. Liang, A. Zadeh, L. P. Morency, R. Salakhutdinov, Learning factorized multimodal representations, preprint, arXiv: 1806.06176.
    [34] P. P. Liang, Z. Liu, A. Zadeh, L. P. Morency, Multimodal language analysis with recurrent multistage fusion, in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (2018), 150–161. https://doi.org/10.18653/v1/D18-1014
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1489) PDF downloads(105) Cited by(4)

Article outline

Figures and Tables

Figures(10)  /  Tables(5)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog