Fake news has already become a severe problem on social media, with substantially more detrimental impacts on society than previously thought. Research on multi-modal fake news detection has substantial practical significance since online fake news that includes multimedia elements are more likely to mislead users and propagate widely than text-only fake news. However, the existing multi-modal fake news detection methods have the following problems: 1) Existing methods usually use traditional CNN models and their variants to extract image features, which cannot fully extract high-quality visual features. 2) Existing approaches usually adopt a simple concatenate approach to fuse inter-modal features, leading to unsatisfactory detection results. 3) Most fake news has large disparity in feature similarity between images and texts, yet existing models do not fully utilize this aspect. Thus, we propose a novel model (TGA) based on transformers and multi-modal fusion to address the above problems. Specifically, we extract text and image features by different transformers and fuse features by attention mechanisms. In addition, we utilize the degree of feature similarity between texts and images in the classifier to improve the performance of TGA. Experimental results on the public datasets show the effectiveness of TGA*.
* Our code is available at https://github.com/PPEXCEPED/TGA.
Citation: Pingping Yang, Jiachen Ma, Yong Liu, Meng Liu. Multi-modal transformer for fake news detection[J]. Mathematical Biosciences and Engineering, 2023, 20(8): 14699-14717. doi: 10.3934/mbe.2023657
Fake news has already become a severe problem on social media, with substantially more detrimental impacts on society than previously thought. Research on multi-modal fake news detection has substantial practical significance since online fake news that includes multimedia elements are more likely to mislead users and propagate widely than text-only fake news. However, the existing multi-modal fake news detection methods have the following problems: 1) Existing methods usually use traditional CNN models and their variants to extract image features, which cannot fully extract high-quality visual features. 2) Existing approaches usually adopt a simple concatenate approach to fuse inter-modal features, leading to unsatisfactory detection results. 3) Most fake news has large disparity in feature similarity between images and texts, yet existing models do not fully utilize this aspect. Thus, we propose a novel model (TGA) based on transformers and multi-modal fusion to address the above problems. Specifically, we extract text and image features by different transformers and fuse features by attention mechanisms. In addition, we utilize the degree of feature similarity between texts and images in the classifier to improve the performance of TGA. Experimental results on the public datasets show the effectiveness of TGA*.
* Our code is available at https://github.com/PPEXCEPED/TGA.
[1] | S. M. Alzanin, A. M. Azmi, Detecting rumors in social media: A survey, Procedia Comput. Sci., 142 (2018), 294–300. https://doi.org/10.1016/j.procs.2018.10.495 doi: 10.1016/j.procs.2018.10.495 |
[2] | S. Islam, T. Sarkar, S. H. Khan, A. Kamal, H. Seale, A. Kabir, et al., COVID-19-related infodemic and its impact on public health: A global social media analysis, Am. J. Trop. Med. Hyg., 103 (2020), 1–9. https://doi.org/10.1038/s41598-020-73510-5 doi: 10.1038/s41598-020-73510-5 |
[3] | Z. W. Jin, J. Cao, G. Han, Y. D. Zhang, J. B. Luo, Multimodal fusion with recurrent neural networks for rumor detection on microblogs, in Proceedings of the 25th ACM International Conference on Multimedia, (2017), 759–816. https://doi.org/10.1145/3123266.3123454 |
[4] | D. Khattar, J. S. Goud, M. Gupta, V. Varma, MVAE: Multimodal variational autoencoder for fake news detection, in The World Wide Web Conference, (2019), 2915–2921. https://doi.org/10.1145/3308558.3313552 |
[5] | S. Singhal, A. Kabra, M. Sharma, R. R. Shah, P. Kumaraguru, SpotFake: A multi-modal framework for fake news detection, in 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM), (2019), 39–47. https://doi.org/10.1109/BigMM.2019.00-44 |
[6] | Y. Q. Wang, F. L. Ma, Z. W. Jin, Y. Yuan, G. X. Xun, K. Jha, et al., EANN: Event adversarial neural networks for multi-modal fake news detection, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2018), 849–857. https://doi.org/10.1145/3219819.3219903 |
[7] | H. W. Zhang, Q. Fang, S. S. Qian, C. S. Xv, Multi-modal knowledge-aware event memory network for social media rumor detection, in Proceedings of the 27th ACM International Conference on Multimedia, (2019), 1942–1951. https://doi.org/10.1145/3343031.3350850 |
[8] | J. Cao, P. Qi, Q. Sheng, T. Y. Yang, Exploring the role of visual content in fake news detection, Disinf. Misinf. Fake News Social Media, 2020 (2020), 141–161. https://doi.org/10.1007/978-3-030-42699-6_8 doi: 10.1007/978-3-030-42699-6_8 |
[9] | S. Singhal, A. Kabra, M. Sharma, R. Shah, P. Kumaraguru, SpotFake+: A multimodal framework for fake news detection via transfer learning (student abstract), in Proceedings of the AAAI Conference on Artificial Intelligence, 34 (2020), 13915–13916. https://doi.org/10.1609/aaai.v34i10.7230 |
[10] | J. S. Liu, K. Feng, J. Z. Pan, J. Deng, L. Wang, MSRD: Multimodal web rumor detection method, J. Comput. Res. Dev., 11 (2020), 9. https://doi.org/10.21203/rs.3.rs-101168/v1 doi: 10.21203/rs.3.rs-101168/v1 |
[11] | T. Jin, H. X. Xia, Lookback option pricing models based on the uncertain fractional-order differential equation with Caputo type, J. Ambient Intell. Hum. Comput., 2021 (2021), 1–14. https://doi.org/10.1007/s12652-021-03516-y doi: 10.1007/s12652-021-03516-y |
[12] | K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556. |
[13] | D. Jia, D. Wei, R. Socher, L. J. Li, L. Kai, F. F. Li, ImageNet: A large-scale Hierarchical Image Database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), 248–255. https://doi.org/10.1109/CVPR.2009.5206848 |
[14] | K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90 |
[15] | Y. Wang, F. Ma, H. Wang, K. Jha, J. Gao, Multimodal emergent fake news detection via meta neural process networks, in Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2021), 3708–3716. https://doi.org/10.1145/3447548.3467153 |
[16] | D. Khattar, J. S. Goud, M. Gupta, V. Varma, MVAE: Multimodal variational autoencoder for fake news detection, in the World Wide Web Conference, (2019), 2915–2921. https://doi.org/10.1145/3308558.3313552 |
[17] | Z. Jin, J. Cao, G. Han, Y. Zhang, J. Luo, Multimodal fusion with recurrent neural networks for rumor detection on microblogs, in Proceedings of the 25th ACM International Conference on Multimedia, (2017), 795–816. https://doi.org/10.1145/3123266.3123454 |
[18] | M. Liu, Z. W. Quan, J. M. Wu, Y. Liu, M. Han, Embedding temporal networks inductively via mining neighborhood and community influences, Appl. Intell., 2022 (2022), 1–20. https://doi.org/10.1007/s10489-021-03102-x doi: 10.1007/s10489-021-03102-x |
[19] | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in Proceedings of the 31st International Conference on Neural Information Processing Systems, (2017), 6000–6010. https://doi.org/10.5555/3295222.3295349 |
[20] | J. Ma, W. Gao, P. Mitra, S. Kwon, B. J. Jansen, K. Wong, et al., Detecting rumors from microblogs with recurrent neural networks, in Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, (2016), 3818–3824. |
[21] | F. Feng, Q. Liu, S. Wu, L. Wang, T. Tan, A convolutional approach for misinformation identification, in Twenty-Sixth International Joint Conference on Artificial Intelligence, (2017), 3901–3907. https://doi.org/10.5555/3172077.3172434 |
[22] | J. Ma, W. Gao, K. F. Wong, Detect rumor and stance jointly by neural multi-task learning, in Companion Proceedings of the Web Conference 2018, (2018), 585–593. https://doi.org/10.1145/3184558.3188729 |
[23] | J. Ma, W. Gao, K. F. Wong, Detect rumors on twitter by promoting information campaigns with generative adversarial learning, in The World Wide Web Conference, (2019), 3049–3055. https://doi.org/10.1145/3308558.3313741 |
[24] | V. Vaibhav, R. Mandyam, E. Hovy, Do sentence interactions matter? leveraging sentence level representations for fake news classification, in Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing, (2019), 134–139. https://doi.org/10.18653/v1/d19-5316 |
[25] | M. X. Cheng, S. Nazarian, P. Bogdan, VRoC: Variational autoencoder-aided multi-task rumor classifier based on text, in Proceedings of the Web Conference 2020, (2020), 2892–2898. https://doi.org/10.1145/3366423.3380054 |
[26] | C. G. Song, K. Shu, B. Wu, Temporally evolving graph neural network for fake news detection, Inf. Process. Manage., 58 (2021), 102712. https://doi.org/10.1016/j.ipm.2021.102712 doi: 10.1016/j.ipm.2021.102712 |
[27] | M. Liu, K. Liang, B. Xiao, S. H. Zhou, W. X. Tu, Y. Liu, et al., Self-supervised temporal graph learning with temporal and structural intensity alignment, preprint, arXiv: 2302.07491. https://doi.org/10.48550/arXiv.2302.07491 |
[28] | Y. Q. Jin, X. T. Wang, R. C. Yang, Y. Z. Sun, W. Wang, H. Liao, et al., Towards fine-grained reasoning for fake news detection, in Proceedings of the AAAI Conference on Artificial Intelligence, 36 (2022), 5746–5754. https://doi.org/10.48550/arXiv.2110.15064 |
[29] | M. X. Cheng, S. Nazarian, P. Bogdan, Fake news detection on social media: A data mining perspective, ACM SIGKDD Explor. Newsl., 19 (2017), 22-36. https://doi.org/10.1145/3137597.3137600 doi: 10.1145/3137597.3137600 |
[30] | Z. W. Jin, J. Cao, Y. D. Zhang, J. S. Zhou, Q. Tian, Novel visual and statistical image features for microblogs news verification, IEEE Trans. Multimedia, 19 (2019), 598–608. https://doi.org/10.1109/TMM.2016.2617078 doi: 10.1109/TMM.2016.2617078 |
[31] | P. Qi, J. Cao, T. Y. Yang, J. B. Guo, J. T. Li, Exploiting multi-domain visual information for fake news detection, in 2019 IEEE International Conference on Data Mining, (2019), 518–527. https://doi.org/10.1109/ICDM.2019.00062 |
[32] | J. Xue, Y. Wang, Y. Tian, Y. Li, L. Wei, Detecting fake news by exploring the consistency of multimodal data, Inf. Process. Manage., 58 (2021), 102610. https://10.1016/j.ipm.2021.102610 doi: 10.1016/j.ipm.2021.102610 |
[33] | X. Zhou, J. Wu, R. Zafarani, SAFE: Similarity-aware multi-modal fake news detection, preprint, arXiv: 2003.04981. |
[34] | H. Zhang, Q. Fang, S. Qian, C. Xu, Multi-modal knowledge-aware event memory network for social media rumor detection, in the 27th ACM International Conference, (2019), 1942–1951. https://doi.org/10.1145/3343031.3350850 |
[35] | Y. Wu, P. Zhan, Y. Zhang, L. Wang, Z. Xu, Multimodal fusion with co-attention networks for fake news detection, in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, (2021), 2560–2569. https://doi.org/10.18653/v1/2021.findings-acl.226 |
[36] | W. Zhang, L. Gui, Y. He, Supervised contrastive learning for multimodal unreliable news detection in COVID-19 pandemic, in the 30th ACM International Conference on Information and Knowledge Management, (2021), 3637–3641. https://doi.org/10.1145/3459637.3482196 |
[37] | J. H. Hua, X. D. Cui, X. H. Li, K. K. Tang, P. C. Zhu, Multimodal fake news detection through data augmentation-based contrastive learning, Appl. Soft Comput., 136 (2023), 1568–4946. https://doi.org/10.1016/j.asoc.2023.110125 doi: 10.1016/j.asoc.2023.110125 |
[38] | Y. X. Chen, D. S. Li, P. Zhang, J. Sui, Q. Lv, L. Tun, et al., Crossmodal ambiguity learning for multimodal fake news detection, in Proceedings of the ACM Web Conference 2022, (2022), 2897–2905. https://doi.org/10.1145/3485447.3511968 |
[39] | A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. H. Zhai, T. Unterthiner, An image is worth 16x16 words: Transformers for image recognition at scale, in International Conference on Learning Representations, 2021. https://doi.org/10.48550/arXiv.2010.11929 |
[40] | C. Boididou, S. Papadopoulos, M. Zampoglou, L. Apostolidis, Y. Kompatsiaris, Detection and visualization of misleading content on Twitter, Int. J. Multimedia Inf. Retr., 7 (2017), 71–86. https://doi.org/10.1007/s13735-017-0143-x doi: 10.1007/s13735-017-0143-x |
[41] | Y. Lecun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, et al., Backpropagation applied to handwritten zip code recognition, Neural Comput., 1 (1989), 541–551. https://doi.org/10.1162/neco.1989.1.4.541 doi: 10.1162/neco.1989.1.4.541 |
[42] | P. Zhou, W. Shi, J. Tian, Z. Y. Qi, B. C. Li, H. W. Hao, et al., Attention-based bidirectional long short-term memory networks for relation classification, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2 (2016), 207–212. https://doi.org/10.18653/v1/P16-2034 |
[43] | Z. Jin, J. Cao, G. Han, Y. D. Zhang, J. B. Luo, Multimodal fusion with recurrent neural networks for rumor detection on microblogs, in Proceedings of the 25th ACM International Conference on Multimedia, (2017), 795–816. https://doi.org/10.1145/3123266.3123454 |