Topic-based automatic summarization algorithm for Chinese short text

Tinghuai Ma; Hongmei Wang; Yuwei Zhao; Yuan Tian; Najla Al-Nabhan; Tinghuai Ma; Hongmei Wang; Yuwei Zhao; Yuan Tian; Najla Al-Nabhan

doi:10.3934/mbe.2020202

Mathematical Biosciences and Engineering

2020, Volume 17, Issue 4: 3582-3600. doi: 10.3934/mbe.2020202

Previous Article Next Article

Research article Special Issues

Topic-based automatic summarization algorithm for Chinese short text

1.
Nanjing University of Information Science and Technology, Nanjing 210044, China
2.
Nanjing Institute of Technology, Nanjing 211167, China
3.
King Saud University, Riyadh 11362, Saudi Arabia

Received: 23 March 2020 Accepted: 05 May 2020 Published: 12 May 2020

Most current automatic summarization methods are for English texts. The distinction between words in Chinese text is large, the types of parts of speech are many and complex, and polysemy or ambiguous words appear frequently. Therefore, compared with English text, Chinese text is more difficult to extract useful feature words. Due to the complex syntax of Chinese, there are currently relatively few automatic summarization methods for Chinese text. In the past, only the important sentences in the original text can be selected and simply arranged to obtain a summary with chaotic sentences and insufficient coherence. Meanwhile, because Chinese short text usually contains more redundant information and the sentence structure is not neat, we propose a topic-based automatic summary method for Chinese short text. Firstly, a key sentence selection method is proposed combining topic words and TF-IDF to obtain the score of each text corresponding to the topic in the original text data. Then the sentence with the highest score as the topic sentence of the topic is selected. Considering that the short text of Weibo may contain a lot of irrelevant information and sometimes even lack some important components of topic, three retouching mechanisms are proposed to improve the conciseness, richness and readability of topic sentence extraction results. We validate our approach on natural disaster and social hot event datasets from Sina Weibo. The experimental results show that the polished topic summary not only reflects the exact relationship between topic sentences and natural disasters or social hot events, but also has rich semantic information. More importantly, we can almost grasp the basic elements of natural disaster or social hot event from the topic sentence, so as to help the government guide disaster relief or meet the needs of users for quickly obtaining information of social hot events.
- Chinese short text,
- automatic summarization,
- topic sentence,
- natural disaster,
- social hot event,
- Sina Weibo
Citation: Tinghuai Ma, Hongmei Wang, Yuwei Zhao, Yuan Tian, Najla Al-Nabhan. Topic-based automatic summarization algorithm for Chinese short text[J]. Mathematical Biosciences and Engineering, 2020, 17(4): 3582-3600. doi: 10.3934/mbe.2020202

Related Papers:

Abstract

Most current automatic summarization methods are for English texts. The distinction between words in Chinese text is large, the types of parts of speech are many and complex, and polysemy or ambiguous words appear frequently. Therefore, compared with English text, Chinese text is more difficult to extract useful feature words. Due to the complex syntax of Chinese, there are currently relatively few automatic summarization methods for Chinese text. In the past, only the important sentences in the original text can be selected and simply arranged to obtain a summary with chaotic sentences and insufficient coherence. Meanwhile, because Chinese short text usually contains more redundant information and the sentence structure is not neat, we propose a topic-based automatic summary method for Chinese short text. Firstly, a key sentence selection method is proposed combining topic words and TF-IDF to obtain the score of each text corresponding to the topic in the original text data. Then the sentence with the highest score as the topic sentence of the topic is selected. Considering that the short text of Weibo may contain a lot of irrelevant information and sometimes even lack some important components of topic, three retouching mechanisms are proposed to improve the conciseness, richness and readability of topic sentence extraction results. We validate our approach on natural disaster and social hot event datasets from Sina Weibo. The experimental results show that the polished topic summary not only reflects the exact relationship between topic sentences and natural disasters or social hot events, but also has rich semantic information. More importantly, we can almost grasp the basic elements of natural disaster or social hot event from the topic sentence, so as to help the government guide disaster relief or meet the needs of users for quickly obtaining information of social hot events.

References

[1]	S. L. Lo, R. Chiong, D. Cornforth, An unsupervised multilingual approach for online social media topic identification, Expert Syst. Appl., 81 (2017), 282-298. doi: 10.1016/j.eswa.2017.03.029
[2]	J. F. Yeh, Y. S. Tan, C. H. Lee, Topic detection and tracking for conversational content by using conceptual dynamic latent Dirichlet allocation, Neurocomputing, 216 (2016), 310-318. doi: 10.1016/j.neucom.2016.08.017
[3]	J. Christensen, Mausam, S. Soderland, O. Etzioni, Towards coherent multi-document summarization, Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies, 2013, 1163-1173. Available from: https://www.aclweb.org/anthology/N13-1136/.
[4]	E. Lloret, M. Palomar, Towards automatic tweet generation: A comparative study from the text summarization perspective in the journalism genre, Expert Syst. Appl., 40 (2013), 6624-6630. doi: 10.1016/j.eswa.2013.06.021
[5]	G. Yang, D. Wen, Kinshuk, N. S. Chen, E. Sutinen, A novel contextual topic model for multidocument summarization, Expert Syst. Appl., 42 (2015), 1340-1352. doi: 10.1016/j.eswa.2014.09.015
[6]	I. Mani, M. T. Maybury, Advances in Automatic Text Summarization, (MITRE Corporation) Cambridge, The MIT Press, (1999).
[7]	J. M. Torres-Moreno, Automatic Text Summarization, John Wiley and Sons, 2014.
[8]	A. Nenkova, K. McKeown, A survey of text summarization techniques, Min. Text Data, 2012 (2012), 43-76.
[9]	T. Ma, Y. Zhao, H. Zhou, Y. Tian, A. Al-Dhelaan, M. Al-Rodhaan, Natural disaster topic extraction in sina microblogging based on graph analysis, Expert Syst. Appl., 115 (2019), 346-355. doi: 10.1016/j.eswa.2018.08.010
[10]	T. Ma, Q. Liu, J. Cao, Y. Tian, A. Al-Dhelaan, M. Al-Rodhaan, LGIEM: Global and local node influence based community detection, Future Gener. Comput. Syst., 105 (2020), 533-546. doi: 10.1016/j.future.2019.12.022
[11]	T. Ma, H. Rong, Y. Hao, J. Cao, Y. Tian, M. A. Al-Rodhaan, A Novel Sentiment Polarity Detection Framework for Chinese, IEEE Trans. Affective Comput., 2019.
[12]	A. Kazantseva, S. Szpakowicz, Summarizing short stories, Comput. Linguist., 36 (2010), 71-109. doi: 10.1162/coli.2010.36.1.36102
[13]	M. T. Khan, M. Durrani, S. Khalid, F. Aziz, Online knowledge-based model for big data topic extraction, Comput. Intell. Neurosci., 2016 (2016), 1-10.
[14]	Indra, E. Winarko, R. Pulungan, Trending topics detection of Indonesian tweets using BN-grams and Doc-p, J. King Saud Univ. Comput. Inf. Sci., 31 (2019), 266-274.
[15]	W. M. Wang, Z. Li, J. W. Wang, Z. H. Zheng, How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds, Expert Syst. Appl., 90 (2017), 439-463. doi: 10.1016/j.eswa.2017.08.040
[16]	M. Moradi, N. Ghadiri, Different approaches for identifying important concepts in probabilistic biomedical text summarization, Artif. Intell. Med., 84 (2018), 101-116. doi: 10.1016/j.artmed.2017.11.004
[17]	R. Yan, L. Kong, C. Huang, X. Wan, X. Li, Y. Zhang, Timeline generation through evolutionary trans-temporal summarization, In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011,433-443. Available from: https://www.aclweb.org/anthology/D11-1040/.
[18]	W. Liu, X. Luo, J. Zhang, R. Xue, R. Xu, Semantic summary automatic generation in news event, Concurrency Comput. Pract. Exp., 29 (2017), e4287. doi: 10.1002/cpe.4287
[19]	D. Zhou, D. Zhong, A semi-supervised learning framework for biomedical event extraction based on hidden topics, Artif. Intell. Med., 64 (2015), 51-58. doi: 10.1016/j.artmed.2015.03.004
[20]	W. Xiong, D. Litman, Empirical analysis of exploiting review helpfulness for extractive summarization of online reviews, In Proceedings of coling 2014, the 25th international conference on computational linguistics: Technical papers, 2014, 1985-1995. Available from: https://www.aclweb.org/anthology/C14-1187/.
[21]	Z. Wu, L. Lei, G. Li, H. Huang, C. Zheng, E. Chen, et al., A topic modeling based approach to novel document automatic summarization, Expert Syst. Appl., 84 (2017), 12-23. doi: 10.1016/j.eswa.2017.04.054
[22]	A. Barrera, R. Verma, Combining syntax and semantics for automatic extractive single-document summarization, In International Conference on Intelligent Text Processing and Computational Linguistics, 2012,366-377. Available from: https://link.springer.com/chapter/10.1007/978-3-642-28601-8_31.
[23]	F. Barrios, F. López, L. Argerich, R. Wachenchauzer, Variations of the similarity function of textrank for automated summarization, preprint, arXiv1602.03606, 2016.
[24]	C. Fang, D. Mu, Z. Deng, Z. Wu, Word-sentence co-ranking for automatic extractive text summarization, Expert Syst. Appl., 72 (2017), 189-195. doi: 10.1016/j.eswa.2016.12.021
[25]	M. Schinas, S. Papadopoulos, Y. Kompatsiaris, P. A. Mitkas, Mgraph: Multimodal event summarization in social media using topic models and graph-based ranking, Int. J. Multimedia Inf. Retr., 5 (2016), 51-69. doi: 10.1007/s13735-015-0089-9
[26]	F. Ye, X. Xu, Automatic multi-document summarization based on keyword density and sentenceword graphs, J. Shanghai Jiaotong Univ. Sci., 23 (2018), 584-592. doi: 10.1007/s12204-018-1957-2
[27]	W. Xie, F. Zhu, J. Jiang, E. P. Lim, K. Wang, Topicsketch: Real-time bursty topic detection from twitter, IEEE Trans. Knowl. Data Eng., 28 (2016), 2216-2229. doi: 10.1109/TKDE.2016.2556661
[28]	X. Yang, P. Jin, X. Chen, The construction of a kind of chat corpus in chinese word segmentation, In 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015,168-172. Available from: https://ieeexplore.ieee.org/document/7397448.
[29]	D. Yan, E. Hua, B. Hu, An improved single-pass algorithm for chinese microblog topic detection and tracking, In 2016 IEEE International Congress on Big Data (BigData Congress), 2016,251-258. Available from: https://ieeexplore.ieee.org/abstract/document/7584945.
[30]	C. C. Birant, O. Aktas, Rule-based turkish text summarizer (RB-TTS), Adv. Electr. Comput. Eng., 18 (2018), 113-119.
[31]	A. Abdi, N. Idris, R. M. Alguliev, R. M. Aliguliyev, Automatic summarization assessment through a combination of semantic and syntactic information for intelligent educational systems, Inf. Process. Manage., 51 (2015), 340-358. doi: 10.1016/j.ipm.2015.02.001
[32]	H. Rong, T. Ma, J. Cao, Y. Tian, A. Al-Dhelaan, M. Al-Rodhaan, Deep Rolling: A Novel Emotion Prediction Model for a Multi-Participant Communication Context, Inf. Sci., 488 (2019), 158-180. doi: 10.1016/j.ins.2019.03.023

Reader Comments

Your name:*

Email:*
© 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Mathematical Biosciences and Engineering

4.4

Metrics

Article views(6087) PDF downloads(292) Cited by(4)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Tables(9)

Mathematical Biosciences and Engineering

Topic-based automatic summarization algorithm for Chinese short text

Related Papers:

Abstract

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Mathematical Biosciences and Engineering

Topic-based automatic summarization algorithm for Chinese short text

Related Papers:

Abstract

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog