Research article

Co-occurrence word model for news media hotspot mining-text mining method design


  • Received: 08 October 2023 Revised: 30 December 2023 Accepted: 08 January 2024 Published: 08 March 2024
  • Currently, with the rapid growth of online media, more people are obtaining information from it. However, traditional hotspot mining algorithms cannot achieve precise and fast control of hot topics. Aiming at the problem of poor accuracy and timeliness in current news media hotspot mining methods, this paper proposes a hotspot mining method based on the co-occurrence word model. First, a new co-occurrence word model based on word weight is proposed. Then, for key phrase extraction, a hotspot mining algorithm based on the co-occurrence word model and improved smooth inverse frequency rank (SIFRANK) is designed. Finally, the Spark computing framework is introduced to improve the computing efficiency. The experimental outcomes expresses that the new word discovery algorithm discovered 16871 and 17921 new words in the Weibo Short News and Weibo Short Text datasets respectively. The heat weight values of the keywords obtained by the improved SIFRANK reaches 0.9356, 0.9991, and 0.6117. In the Covid19 Tweets dataset, the accuracy is 0.6223, the recall is 0.7015, and the F1 value is 0.6605. In the President-elects Tweets dataset, the accuracy is 0.6418, the recall is 0.7162, and the F1 value is 0.6767. After applying the Spark computing framework, the running speed has significantly improved. The text mining news media hotspot mining method based on the co-occurrence word model proposed in this study has improved the accuracy and efficiency of mining hot topics, and has great practical significance.

    Citation: Xinyun Zhang, Tao Ding. Co-occurrence word model for news media hotspot mining-text mining method design[J]. Mathematical Biosciences and Engineering, 2024, 21(4): 5411-5429. doi: 10.3934/mbe.2024238

    Related Papers:

  • Currently, with the rapid growth of online media, more people are obtaining information from it. However, traditional hotspot mining algorithms cannot achieve precise and fast control of hot topics. Aiming at the problem of poor accuracy and timeliness in current news media hotspot mining methods, this paper proposes a hotspot mining method based on the co-occurrence word model. First, a new co-occurrence word model based on word weight is proposed. Then, for key phrase extraction, a hotspot mining algorithm based on the co-occurrence word model and improved smooth inverse frequency rank (SIFRANK) is designed. Finally, the Spark computing framework is introduced to improve the computing efficiency. The experimental outcomes expresses that the new word discovery algorithm discovered 16871 and 17921 new words in the Weibo Short News and Weibo Short Text datasets respectively. The heat weight values of the keywords obtained by the improved SIFRANK reaches 0.9356, 0.9991, and 0.6117. In the Covid19 Tweets dataset, the accuracy is 0.6223, the recall is 0.7015, and the F1 value is 0.6605. In the President-elects Tweets dataset, the accuracy is 0.6418, the recall is 0.7162, and the F1 value is 0.6767. After applying the Spark computing framework, the running speed has significantly improved. The text mining news media hotspot mining method based on the co-occurrence word model proposed in this study has improved the accuracy and efficiency of mining hot topics, and has great practical significance.



    加载中


    [1] B. Dadashova, C. Silvestri-Dobrovolny, J. Chauhan, M. Perez, R. Bligh, Hot-spot analysis of motorcyclist crashes involving fixed objects using multinomial logit and data mining tools, J. Transp. Saf. Secur., 36 (2021), 10–29. https://doi.org/10.1080/19439962.2021.1898070 doi: 10.1080/19439962.2021.1898070
    [2] M. Saeed, M. R. Ahmad, A. U. Rahman, Refined pythagorean fuzzy sets: Properties, set-theoretic operations and axiomatic results, J. Comput. Cogn. Eng., 2 (2022), 10–16. https://doi.org/10.47852/bonviewJCCE2023512225
    [3] S. Choudhuri, S. Adeniye, A. Sen, Distribution alignment using complement entropy objective and adaptive consensus-based label refinement for partial domain adaptation/artificial intelligence and applications, 1 (2023), 43–51. https://doi.org/10.47852/bonviewAIA2202524
    [4] S. Oslund, C. Washington, A. So, T. Chen, H. Ji, Multiview robust adversarial stickers for arbitrary objects in the physical world, J. Comput. Cogn. Eng., 1 (2022), 152–158. https://doi.org/10.47852/bonviewJCCE2202322 doi: 10.47852/bonviewJCCE2202322
    [5] X. Wang, M. Cheng, J. Eaton, C. J. Hsieh, S. F. Wu, Fake node attacks on graph convolutional networks, J. Comput. Cogn. Eng., 1 (2022), 165–173. https://doi.org/10.47852/bonviewJCCE2202321 doi: 10.47852/bonviewJCCE2202321
    [6] Y. Jia, S. B. Tsai, Digital media hotspot mining algorithm implementation with complex systems in the mobile internet environment, Complexity, 4 (2021), 71–82. https://doi.org/10.1155/2021/3471168 doi: 10.1155/2021/3471168
    [7] S. Manoharan, R. Senthilkumar, An intelligent fuzzy rule-based personalized news recommendation using social media mining, Comput. Intell. Neurosci., 2020 (2020), 3791541–3791550. https://doi.org/10.1155/2020/3791541 doi: 10.1155/2020/3791541
    [8] H. De, K. Deb, Does social media follow news media? A comparative sentiment analysis during the COVID-19 pandemic, Int. J. Inform. Commun. Tech. Hum. Dev., 13 (2021), 72–82. https://doi.org/10.4018/IJICTHD.2021100102 doi: 10.4018/IJICTHD.2021100102
    [9] Y. Wang, J. Ren, Taxi passenger hot spot mining based on a refined k-means++ algorithm, IEEE Access, 9 2021, 66587–66598. https://doi.org/10.1109/ACCESS.2021.3075682
    [10] Y. He, T. Wang, J. Xie, M. Zhang, Research on mining key nodes of complex web-based communities based on mining algorithm, Int. J. Web Based Commun., 16 (2020), 202–210. https://doi.org/10.1504/IJWBC.2020.107155 doi: 10.1504/IJWBC.2020.107155
    [11] S. D. Park, Policy discourse among the chinese public on initiatives for cultural and creative industries: text mining analysis, SAGE Open, 12 (2022), 45–65. https://doi.org/10.1177/21582440221079927 doi: 10.1177/21582440221079927
    [12] H. Xu, Y. Liu, C. M. Shu, M. Bai, M. Motalifu, Z. He, et al., Cause analysis of hot work accidents based on text mining and deep learning, J. Loss Prevent. Proc. Ind., 2 (2022), 104747–101458. https://doi.org/10.1016/j.jlp.2022.104747 doi: 10.1016/j.jlp.2022.104747
    [13] J. B. Macêdo, M. das Chagas Moura, D. Aichele, I. D. Lins, Identification of risk features using text mining and BERT-based models: Application to an oil refinery, Process Saf. Environ. Prot., 158 (2022), 382–399. https://doi.org/10.1016/j.psep.2021.12.025 doi: 10.1016/j.psep.2021.12.025
    [14] A. Akundi, O. Mondragon, Model based systems engineering—A text mining based structured comprehensive overview, Syst. Eng., 25 (2022), 51–67. https://doi.org/10.1002/sys.21601 doi: 10.1002/sys.21601
    [15] F. Muñoz-Leiva, M. E. Rodriguez Lopez, F. Liebana-Cabanillas, S. Moro, Past, present, and future research on self-service merchandising: A co-word and text mining approach, Eur. J. Marketing, 55 (2021), 2269–2307.
    [16] X. M. Long, Y. J. Chen, J. Zhou, Development of AR experiment on electric-thermal effect by open framework with simulation-based asset and user-defined input, Artif. Intell. Appl., 1 (2023), 52–57. https://doi.org/10.47852/bonviewAIA2202359 doi: 10.47852/bonviewAIA2202359
    [17] A. Islam, F. Othman, N. Sakib, H. M. H. Babu, Prevention of shoulder-surfing attacks using shifting condition using digraph substitution rules, Artif. Intell. Appl., 1 (2023), 58–68. https://doi.org/10.47852/bonviewAIA2202289 doi: 10.47852/bonviewAIA2202289
    [18] A. M. Usman, M. K. Abdullah, An assessment of building energy consumption characteristics using analytical energy and carbon footprint assessment model, Green Low-Carbon Econ., 1 (2023), 28–40. https://doi.org/10.47852/bonviewGLCE3202545 doi: 10.47852/bonviewGLCE3202545
    [19] Y. Wang, Y. Liu, W. Feng, S. Zeng, Waste haven transfer and poverty-environment trap: Evidence from EU, Green Low-Carbon Econ., 1 (2023), 41–49. https://doi.org/10.47852/bonviewGLCE3202668 doi: 10.47852/bonviewGLCE3202668
    [20] V. D. Gazman, A new criterion for the ESG Model, Green Low-Carbon Econ., 1 (2023), 22–27. https://doi.org/10.47852/bonviewGLCE3202511 doi: 10.47852/bonviewGLCE3202511
    [21] J. Machicao, E. A. Corrêa Jr, G. H. B. Miranda, D. R. Amancio, O. M. Bruno, Authorship attribution based on life-like network automata, Plos One, 13 (2018), 1371–1381. https://doi.org/10.1371/journal.pone.0193703 doi: 10.1371/journal.pone.0193703
    [22] L. V. C. Quispe, J. A. V. Tohalino, D. R. Amancio, Using virtual edges to improve the discriminability of co-occurrence text networks, Physica A, 562 (2021), 125344–125357. https://doi.org/10.1016/j.physa.2020.125344 doi: 10.1016/j.physa.2020.125344
    [23] J. Qiang, Z. Qian, Y. Li, Y. Yuan, X. Wu, Short text topic modeling techniques, applications, and performance: a survey, IEEE Trans. Knowl. Data Eng., 34 (2020), 1427–1445. https://doi.org/10.1109/TKDE.2020.2992485 doi: 10.1109/TKDE.2020.2992485
    [24] D. R. Amancio, O. N. Oliveira Jr, L. da F Costa, Using complex networks to quantify consistency in the use of words, J. Stat. Mech., 2012 (2012), P01004. https://doi.org/10.1088/1742-5468/2012/01/P01004 doi: 10.1088/1742-5468/2012/01/P01004
    [25] H. Che, B. Pan, M. F. Leung, Y. Cao, Z. Yan, Tensor factorization with sparse and graph regularization for fake news detection on social networks, IEEE Trans. Comput. Social Syst., 14 (2023), 1–11. https://doi.org/10.1109/TCSS.2023.3296479 doi: 10.1109/TCSS.2023.3296479
    [26] M. Zhang, H. Su, J. Wen, Analysis and mining of internet public opinion based on LDA subject classification, J. Web Eng., 20 (2021), 2457–2472.
    [27] Y. Qian, Z. Ni, W. Gui, Y. Liu, Exploring the landscape, hot topics, and trends of electronic health records literature with topics detection and evolution analysis, Int. J. Comput. Intell. Syst., 14 (2021), 744–757. https://doi.org/10.2991/ijcis.d.210203.006 doi: 10.2991/ijcis.d.210203.006
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(883) PDF downloads(112) Cited by(0)

Article outline

Figures and Tables

Figures(10)  /  Tables(4)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog