AB-GRU: An attention-based bidirectional GRU model for multimodal sentiment fusion and analysis

Jun Wu; Xinli Zheng; Jiangpeng Wang; Junwei Wu; Ji Wang; Jun Wu; Xinli Zheng; Jiangpeng Wang; Junwei Wu; Ji Wang

doi:10.3934/mbe.2023822

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 10: 18523-18544. doi: 10.3934/mbe.2023822

Previous Article Next Article

Research article

AB-GRU: An attention-based bidirectional GRU model for multimodal sentiment fusion and analysis

1.
School of Computer Science, Hubei University of Technology, Wuhan 430000, China
2.
Wuhan University of Technology, Wuhan 430000, China

Academic Editor: Yang Kuang

Received: 27 June 2023 Revised: 11 September 2023 Accepted: 13 September 2023 Published: 27 September 2023

Multimodal sentiment analysis is an important area of artificial intelligence. It integrates multiple modalities such as text, audio, video and image into a compact multimodal representation and obtains sentiment information from them. In this paper, we improve two modules, i.e., feature extraction and feature fusion, to enhance multimodal sentiment analysis and finally propose an attention-based two-layer bidirectional GRU (AB-GRU, gated recurrent unit) multimodal sentiment analysis method. For the feature extraction module, we use a two-layer bidirectional GRU network and connect two layers of attention mechanisms to enhance the extraction of important information. The feature fusion part uses low-rank multimodal fusion, which can reduce the multimodal data dimensionality and improve the computational rate and accuracy. The experimental results demonstrate that the AB-GRU model can achieve 80.9% accuracy on the CMU-MOSI dataset, which exceeds the same model type by at least 2.5%. The AB-GRU model also possesses a strong generalization capability and solid robustness.
- attention mechanism,
- GRU,
- multi-modal fusion,
- multimedia sentiment analysis
Citation: Jun Wu, Xinli Zheng, Jiangpeng Wang, Junwei Wu, Ji Wang. AB-GRU: An attention-based bidirectional GRU model for multimodal sentiment fusion and analysis[J]. Mathematical Biosciences and Engineering, 2023, 20(10): 18523-18544. doi: 10.3934/mbe.2023822

Related Papers:

Abstract

Multimodal sentiment analysis is an important area of artificial intelligence. It integrates multiple modalities such as text, audio, video and image into a compact multimodal representation and obtains sentiment information from them. In this paper, we improve two modules, i.e., feature extraction and feature fusion, to enhance multimodal sentiment analysis and finally propose an attention-based two-layer bidirectional GRU (AB-GRU, gated recurrent unit) multimodal sentiment analysis method. For the feature extraction module, we use a two-layer bidirectional GRU network and connect two layers of attention mechanisms to enhance the extraction of important information. The feature fusion part uses low-rank multimodal fusion, which can reduce the multimodal data dimensionality and improve the computational rate and accuracy. The experimental results demonstrate that the AB-GRU model can achieve 80.9% accuracy on the CMU-MOSI dataset, which exceeds the same model type by at least 2.5%. The AB-GRU model also possesses a strong generalization capability and solid robustness.

References

[1]	T. Uyen Tran, H. H. T. Thanh, P. H. Dang, M. Riveill, Multitask apect_based sentiment analysis with integrated bidirectional LSTM & CNN model, in Proceedings of the 4th International Conference on FutureNetworks and Distributed Systems (ICFNDS), (2020), 1–7. https://doi.org/10.1145/3440749.3442656
[2]	A. Agarwal, P. Dey, S. Kumar, Sentiment analysis using modified GRU, in Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing, (2022), 356–361. https://doi.org/10.1145/3549206.3549270
[3]	L. Wang, J. Niu, S. Yu, SentiDiff: Combining textual information and sentiment diffusion patterns for twitter sentiment analysis, IEEE Trans. Knowl. Data Eng., 32 (2020), 2026–2039. https://doi.org/10.1109/TKDE.2019.2913641 doi: 10.1109/TKDE.2019.2913641
[4]	M. A. Hassonah, R. Al-Sayyed, A. Rodan, A. M. Al-Zoubi, I. Aljarah, H. Faris, An efficient hybrid filter and evolutionary wrapper approach for sentiment analysis of various topics on Twitter, Knowl. Based Syst., 192 (2020), 105353. https://doi.org/10.1016/j.knosys.2019.105353 doi: 10.1016/j.knosys.2019.105353
[5]	M. G. Huddar, S. S. Sannakki, V. S. Rajpurohit, Attention-based multimodal contextual fusion for sentiment and emotion classification using bidirectional LSTM, Multim. Tools Appl., 80 (2021), 13059–13076. https://doi.org/10.1007/s11042-020-10285-x doi: 10.1007/s11042-020-10285-x
[6]	T. Jiang, J. Wang, Z. Liu, Y. Ling, Fusion-extraction network for multimodal sentiment analysis, in Proceedings of the Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, (2020), 785–797. https://doi.org/10.1007/978-3-030-47436-2_59
[7]	A. Zadeh, P. P. Liang, S. Poria, P. Vij, E. Cambria, L. P. Morency, Multi-attention recurrent network for human communication comprehension, preprint, arXiv: 1802.00923.
[8]	Z. Liu, Y. Shen, V. B. Lakshminarasimhan, P. P. Liang, A. Zadeh, L. P. Morency, Efficient low-rank multimodal fusion with modality-specific factors, preprint, arXiv: 1806.00064.
[9]	L. N. Zúñiga-Morales, J. Á. González-Ordiano, J. E. Quiroz-Ibarra, S. J. Simske, Impact evaluation of multimodal information on sentiment analysis. in Proceedings of the Advances in Computational Intelligence: 21st Mexican International Conference on Artificial Intelligence, (2022), 18–29. https://doi.org/10.1007/978-3-031-19496-2_2
[10]	D. Zeng, Y. Yu, K. Oyama, Deep triplet neural networks with cluster-CCA for audio-visual cross-modal retrieval, in ACM Transaction on Multimedia Computing Communication and Applications (TOMCCAP), (2020), 1–23. https://doi.org/10.1145/3387164
[11]	A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, et al., Learning transferable visual models from natural language supervision, preprint, arXiv: 2103.00020.
[12]	A. Zadeh, M. Chen, S. Poria, E. Cambria, L. P. Morency, Tensor fusion network for multimodal sentiment analysis, preprint, arXiv: 1707.07250.
[13]	A. Zadeh, P. P. Liang, N. Mazumder, S. Poria, E. Cambria, L. P. Morency, Memory fusion network for multi-view sequential learning, preprint, arXiv: 1802.00927.
[14]	G. Van Houdt, C. Mosquera, G. Nápoles, A review on the long short-term memory model, Artif. Intell. Rev., 53 (2020), 5929–5955. https://doi.org/10.1007/s10462-020-09838-1 doi: 10.1007/s10462-020-09838-1
[15]	A. P. Rodrigues, R. Fernandes, A. Shetty, K. Lakshmanna, R. M. Shafi, Real-time twitter spam detection and sentiment analysis using machine learning and deep learning techniques, Comput. Intell. Neurosci., (2022). https://doi.org/10.1155/2022/5211949
[16]	A. Londhe, P. V. R. D. P. Rao, Aspect based sentiment analysis–an incremental model learning approach using LSTM-RNN, in Proceedings of the Advances in Computing and Data Sciences: 5th International Conference, (2021), 677–689. https://doi.org/10.1007/978-3-030-81462-5_59
[17]	H. Jelodar, Y. Wang, R. Orji, S. Huang, Deep sentiment classification and topic discovery on novel coronavirus or COVID-19 online discussions: NLP using LSTM recurrent neural network approach, IEEE J. Biomed. Health Inform., 24 (2020), 2733–2742. https://doi.org/10.1109/JBHI.2020.3001216 doi: 10.1109/JBHI.2020.3001216
[18]	F. Wang, S. Tian, L. Yu, J. Liu, J. Wang, K. Li, et al., TEDT: Transformer-based encoding-decoding translation network for multimodal sentiment analys, Cogn. Comput., (2022), 1–15 https://doi.org/10.1007/s12559-022-10073-9
[19]	J. Wu, T. Zhu, J. Zhu, T. Li, C. Wang, A Optimized BERT for Multimodal Sentiment Analysis, ACM Trans. Multim. Comput. Commun. Appl., 19 (2023), 1–12. https://doi.org/10.1080/09540091.2022.2155614 doi: 10.1080/09540091.2022.2155614
[20]	A. Bello, S. C. Ng, M. F. Leung, A BERT framework to sentiment analysis of tweets, Sensors, 23 (2023), 506. https://doi.org/10.3390/s23010506 doi: 10.3390/s23010506
[21]	J. Wei, J. Liao, Z. Yang, S. Wang, Q. Zhao, BiLSTM with multi-polarity orthogonal attention for implicit sentiment analysis, Neurocomputing, 383 (2020), 165–173. https://doi.org/10.1016/j.neucom.2019.11.054 doi: 10.1016/j.neucom.2019.11.054
[22]	Y. Zhang, J. Wang, X. Zhang, Conciseness is better: Recurrent attention LSTM model for document-level sentiment analysis, Neurocomputing, 462 (2021), 101–112. https://doi.org/10.1016/j.neucom.2021.07.072 doi: 10.1016/j.neucom.2021.07.072
[23]	J. Hassan, U. Shoaib, Multi-class review rating classification using deep recurrent neural network, Neural Process. Letters, 51 (2020), 1031–1048. https://doi.org/10.1007/s11063-019-10125-6 doi: 10.1007/s11063-019-10125-6
[24]	A. Zouzou, I. E. Azami, Text sentiment analysis with CNN & GRU model using GloVe, in Proceedings of the 2021 Fifth International Conference On Intelligent Computing in Data Sciences, (2021), 1–5. https://doi.org/10.1109/ICDS53782.2021.9626715
[25]	A. G. Eker, K. Eker, N. Duru, Multi-class sentiment analysis from turkish tweets with RNN, in Proceedings of the 2021 6th International Conference on Computer Science and Engineering (UBMK), (2021), 560–564. https://doi.org/10.1109/UBMK52708.2021.9558958
[26]	L. Zhu, Z. Zhu, C. Zhang, Y. Xu, X. Kong, Multimodal sentiment analysis based on fusion methods: A survey, Inform. Fusion, (2023), 306–325. https://doi.org/10.1016/j.inffus.2023.02.028
[27]	S. Ma, Z. Zeng, D. McDuff, Y. Song, Contrastive self-supervised learning of global-local audio-visual representations, 2021.
[28]	L. Zhu, M. Xu, Y. Bao, Y. Xu, X. Kong, Deep learning for aspect-based sentiment analysis: A review, PeerJ Comput. Sci., (2022), e1044. https://doi.org/10.7717/peerj-cs.1044
[29]	X. Liu, J. You, Y. Wu, T. Li, L. Li, Z. Zhang, et al., Attention-based bidirectional GRU networks for efficient HTTPS traffic classification. Inform. Sci., (2020), 297–315. https://doi.org/10.1016/j.ins.2020.05.035
[30]	J. Wu, T. Zhu, J. Zhu, T. Li, C. Wang, Hierarchical multiples self-attention mechanism for multi-modal analysis, Multim. Syst., (2023). https://doi.org/10.1016/j.ins.2020.05.035
[31]	A. Zadeh, R. Zellers, E. Pincus, L. P. Morency, Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos, preprint, arXiv: 1606.06259.
[32]	M. Chen, S. Wang, P. P. Liang, T. Baltrušaitis, A. Zadeh, L. P. Morency, Multimodal sentiment analysis with word-level fusion and reinforcement learning. in Proceedings of the 19th ACM international conference on multimodal interaction, (2017), 163–171. https://doi.org/10.1145/3136755.3136801
[33]	Y. H. H. Tsai, P. P. Liang, A. Zadeh, L. P. Morency, R. Salakhutdinov, Learning factorized multimodal representations, preprint, arXiv: 1806.06176.
[34]	P. P. Liang, Z. Liu, A. Zadeh, L. P. Morency, Multimodal language analysis with recurrent multistage fusion, in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (2018), 150–161. https://doi.org/10.18653/v1/D18-1014

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)