Processing math: 100%
Research article

Are Natural Language Processing methods applicable to EPS forecasting in Poland?

  • Received: 11 September 2024 Revised: 15 December 2024 Accepted: 21 January 2025 Published: 11 February 2025
  • JEL Codes: C01, C02, C12, C14, C58, G17

  • Accurate earnings forecasts are crucial for successful investment outcomes, especially in emerging markets like Poland, where analyst coverage is limited. This study investigated the applicability of natural language processing (NLP) techniques, specifically FastText and FinBERT word embeddings, combined with a gradient-boosting decision tree (XGBoost) machine learning algorithm, to forecast earnings per share (EPS) for companies listed on the Warsaw Stock Exchange from 2010 to 2019. The performance of these models was compared with a seasonal random walk (SRW) model. The SRW model consistently demonstrated the lowest error rates, as measured by the mean arctangent absolute percentage error, and outperformed the NLP-based models across different periods and error metrics. The superior performance of the simple SRW model can be attributed to the overparameterization and overfitting tendencies of the complex NLP models, as well as the relatively straightforward dynamics of the Polish stock market. The findings suggest that the application of sophisticated NLP techniques for EPS forecasting in Poland may not be justified, and that the SRW model provides a more accurate representation of the market's behavior.

    Citation: Wojciech Kurylek. Are Natural Language Processing methods applicable to EPS forecasting in Poland?[J]. Data Science in Finance and Economics, 2025, 5(1): 35-52. doi: 10.3934/DSFE.2025003

    Related Papers:

    [1] Jiayuan Hu, Yu Zhan . Pythagorean triples and quadratic residues modulo an odd prime. AIMS Mathematics, 2022, 7(1): 957-966. doi: 10.3934/math.2022057
    [2] Ai-qun Ma, Lin Chen, Zijie Qin . Jordan semi-triple derivations and Jordan centralizers on generalized quaternion algebras. AIMS Mathematics, 2023, 8(3): 6026-6035. doi: 10.3934/math.2023304
    [3] Rong-jian Ning, Xiao-yan Liu, Zhi Liu . Conversion calculation method of multivariate integrals. AIMS Mathematics, 2021, 6(3): 3009-3024. doi: 10.3934/math.2021183
    [4] Lili Ma, Qiang Li . Cohomology and its applications on multiplicative Hom-δ-Jordan Lie color triple systems. AIMS Mathematics, 2024, 9(9): 25936-25955. doi: 10.3934/math.20241267
    [5] Wen Teng, Fengshan Long, Yu Zhang . Cohomologies of modified λ-differential Lie triple systems and applications. AIMS Mathematics, 2023, 8(10): 25079-25096. doi: 10.3934/math.20231280
    [6] Xiuhai Fei, Zhonghua Wang, Cuixian Lu, Haifang Zhang . Higher Jordan triple derivations on -type trivial extension algebras. AIMS Mathematics, 2024, 9(3): 6933-6950. doi: 10.3934/math.2024338
    [7] Hasanen A. Hammad, Manuel De la Sen . Tripled fixed point techniques for solving system of tripled-fractional differential equations. AIMS Mathematics, 2021, 6(3): 2330-2343. doi: 10.3934/math.2021141
    [8] Abeer Al Elaiw, Murugesan Manigandan, Muath Awadalla, Kinda Abuasbeh . Existence results by Mönch's fixed point theorem for a tripled system of sequential fractional differential equations. AIMS Mathematics, 2023, 8(2): 3969-3996. doi: 10.3934/math.2023199
    [9] Muhammed Jamil, Rahmat Ali Khan, Kamal Shah, Bahaaeldin Abdalla, Thabet Abdeljawad . Application of a tripled fixed point theorem to investigate a nonlinear system of fractional order hybrid sequential integro-differential equations. AIMS Mathematics, 2022, 7(10): 18708-18728. doi: 10.3934/math.20221029
    [10] Hasanen A. Hammad, Doha A. Kattan . Strong tripled fixed points under a new class of F-contractive mappings with supportive applications. AIMS Mathematics, 2025, 10(3): 5785-5805. doi: 10.3934/math.2025266
  • Accurate earnings forecasts are crucial for successful investment outcomes, especially in emerging markets like Poland, where analyst coverage is limited. This study investigated the applicability of natural language processing (NLP) techniques, specifically FastText and FinBERT word embeddings, combined with a gradient-boosting decision tree (XGBoost) machine learning algorithm, to forecast earnings per share (EPS) for companies listed on the Warsaw Stock Exchange from 2010 to 2019. The performance of these models was compared with a seasonal random walk (SRW) model. The SRW model consistently demonstrated the lowest error rates, as measured by the mean arctangent absolute percentage error, and outperformed the NLP-based models across different periods and error metrics. The superior performance of the simple SRW model can be attributed to the overparameterization and overfitting tendencies of the complex NLP models, as well as the relatively straightforward dynamics of the Polish stock market. The findings suggest that the application of sophisticated NLP techniques for EPS forecasting in Poland may not be justified, and that the SRW model provides a more accurate representation of the market's behavior.



    It is interesting and definitely worth investigating to solve the problems of humanities and social sciences with the method of natural science. The gradual formation of the converged media environment has a great influence on the transmission and guidance of public opinion. Most of the existing researches on public opinion guidance are focused on single media, while there are few researches on converged media. Especially, the cross-influence of different media on public opinion guidance, which makes the work of public opinion guidance more complicated. The research on public opinion guidance of converging media based on AHP and transmission dynamics is worthy to solve. The classic transmission dynamics model is extended to the transmission environment of the converged media. It is of great significance to study the transmission law of public opinion and control the transmission of public opinion in the converged media environment. Based on the above analysis, the research on public opinion guidance of converging media based on AHP and transmission dynamics is worthy to solve.

    The advancement of Internet technology has changed the way people communicate and share information: various social software and portals are emerging one after another, such as Facebook, Twitter, Instagram commonly used in foreign countries, and WeChat, Weibo, short video frequently used in China. These media platforms have become an important part of people's lives. People have gradually changed from being a passive receiver of public opinion to a creator of public opinion in the traditional media era, and they actively understand and participate in the discussion of public opinion events. In this way, people with similar opinions can easily gather together.

    The current development of information technology has made the boundary between traditional media and new media more and more blurred. Therefore, the convergence model of traditional media and new media is constantly being explored. In the environment of converged media, the transmission of information has become more rapid, and the media's reporting of public opinion has become more comprehensive and hierarchical, which is also more conducive to the transmission of public opinion. Therefore, correct public opinion transmission guidance is needed. This paper uses the science and engineering method to solve the humanities and social science problems. The current research hotspot is to solve the core problems through the construction of the model, and provide corresponding public opinion guidance strategies.

    The research foundation of this paper has been extended to a communication environment such as converged media, and the urgent need to promote effective government intervention in the transmission of public opinion is considered. A public opinion transmission guidance research model (UCIR) based on the converged media is proposed, and the model is closer to actual transmission. In this paper, the balance and stability of the model have been proved, and the process of public opinion transmission has been analyzed. Finally, countermeasures and suggestions for public opinion guidance are given from the perspectives of online media and the government.

    Foreign research on converged media started earlier, and the research results are more mature than domestic. For the research on the concept of converged media, Niehoff believes that converged media is the development trend of multiple media platforms showing multi-functional integration [1]. Andrew defines converged media as a strategic, operational, and cultural alliance between audio, video, text and other media forms [2]. Gordon summed up the converged media as the integration of ownership, strategy, structure, information collection, and news expression [3].

    The earliest domestic research on integrated media was in 2002, a little later than foreign research. For the research on the concept of converging media, domestic scholars often adopt the concept of media convergence, which is directly translated from Media Convergence. Although domestic scholars have some understanding of the concept of converging media. In essence, converging media means combining the advantages of traditional media and new media. In the research on converged media and public opinion, Liao Feng studied the development status and characteristics of converged media in his article, and based on the occurrence mechanism of public emergencies, strategies and suggestions to effectively guide public opinion were put forward [4].

    Many scholars have found that the process of information transmission is very similar to the development process of infectious diseases. Therefore, the establishment of information transmission models based on infectious disease models has become one of the focuses of domestic and foreign scholars. The earliest SIR model was proposed by Kermack and McKendrick in 1927 when they were studying the Black Death epidemic in London [5]. The SIR model divides the population into three states: Susceptible refers to those who are not infected but are easily infected after contact with an infected person. Infective refers to people who have been infected and can be transmitted to S state. Removed refers to people who have recovered from illness and have immunity. After that, scholars improved the SIR model according to different research environments, and derived multiple infectious disease models such as the SI (Susceptible-Infective) model and the SIS (Susceptible-Infective-Susceptible) model.

    In recent studies, most scholars have focused on improving the dynamic model of information transmission. In 2012, Zhao [6] et al. considered the feature that users would forget information in their proposed SIHR rumor propagation model. In 2015, Ding [7] et al. improved the traditional SIR model to the SCIR model, adding a new node state-contact state C. Lin [8] et al. studied the SEIR network public opinion propagation model with population dynamics and saturated growth rate. In 2016, You [9] et al. introduced foreign users into the SIR model to better predict the dissemination trend of public opinion. Based on the SIR model, Liu [10] et al. proposed the SHIR model with the hesitation state. In 2018, Rui [11] et al. established the SPIR model and introduced a new state—the potential propagation state (PS). Only when all the infected nodes around are no longer transmissioning information, the nodes in this state will be converted to an unknown state. Ebadizadeh [12] et al. established the ISRC model that can block information transmission, where C is defined as the state of controlling information transmission. Zan [13] studied the propagation characteristics of dual opinion information in the network in the DSIR model. In 2019, Kabir [14] introduced node sensitivity in the SIR model. Each state in the model is again divided into a sensitive state and an insensitive state. Taking the S state as an example, it is divided into an unknown sensitive state SA and an unknown insensitive state SU.

    Based on the mean field theory and the influence factors of public opinion, many studies have proposed further improved models [15,16,17,18,19,20,21,22,23]. Yu [24] et al. analyzed the difference between the SEIR model including the latent state E and the SIR model, and the different phenomena of the two models in the information diffusion process when the parameters are changed.

    In the actual public opinion transmission process, users may not transmission the public opinion immediately after learning the public opinion. Similarly, users may directly refuse to understand public opinion. Based on this situation, a UCIR (unknown-contact-infected-recovered) dynamic model is established. In the UCIR model, the contact status is introduced to describe the hesitation process of the information transmission process. At the same time, the status conversion of (ⅰ) unknown persons who directly refuse to understand a certain public opinion topic and become immunized, and (ⅱ) some immunized persons are affected by other related derivative public opinion topics and then return to contact status are proposed. Each state increases the probability of entering and leaving, enhancing the dynamics of the UCIR model.

    The Analytic Hierarchy Process (AHP) is a comprehensive evaluation method that combines quantitative and qualitative methods to calculate index weight. Its principle [25] is to decompose the problem into different factors according to the understanding of the problem and the goal to be achieved. And according to the relationship between different factors, they are combined at different levels to form a multi-level analysis structure model [26]. Radovanovic [27] presents a decision support model when choosing the most efficient rectification procedure of the optical sight of the long-range rifle. Bobar [28] presents methods of ranking and evaluation of the effectiveness of Social Media (SM). The methodology presented is based on multicircular decision-making using the Fuzzy Analytical Hierarchical Process (fuzzy AHP)-Z number model-Fuzzy Multi-Attributive Border Approximation Area Comparison (fuzzy MABAC), which eliminates the traditional intuitive ratings of PR services. Lyu [29] incorporates the original analytic hierarchy process (AHP) and triangular fuzzy number-based AHP (TFN-AHP) into a geographic information system (GIS) to assess the inundation risk of the metro system in Shenzhen. Duleba [30] originally constructed the methodology of Parsimonious Analytic Hierarchy Process (PAHP) to unburden the evaluators of an AHP survey from the numerous pairwise comparisons caused by the several alternatives in decision problems. Ohta [31] presents Hesitant FAHP (HFAHP) and Intuitionistic FAHP (IFAHP). Choi [32] proposes a method of weight adjustment as a modification of AHP. Lin [33] proposes a heuristic method (Bayesian cosine maximization method (BCCM)) to rank the alternatives in the Analytic Hierarchy Process (AHP) synthesis. Leal [34] presents in detail a simplified method for the application of the analytic hierarchy method (AHP) that aims to calculate the priorities of each alternative against a set of criteria with only n−1 comparisons of n alternatives for each criterion (instead of (n(2)-n)/2 comparisons. Deng [35] proposed a method called D-AHP (AHP method extended by D numbers preference relation) to study the MCDM. Zhang [36] presents an improved OWA-Fuzzy AHP decision model for multi-attribute decision making problem. Abastante [37] presents a new parsimonious AHP methodology: Assigning priorities to many objects by comparing pairwise few reference objects. Chen [38] proposed diversified AHP-tree approach for multiple-criteria supplier selection. Li [39] proposed an improved analytic hierarchy process-back propagation (AHP-BP) neural network algorithm.

    Figure 1.  Overall model architecture diagram.

    The evaluation index is generally an objective evaluation of the influence of media users, media communication and information dissemination. The establishment of the evaluation index system should not only cover the characteristics of all aspects of media influence, but also initially reflect the interrelationship of various indicators of media influence. When the evaluation index system is established, it is necessary to ensure that the established indexes do not overlap as much as possible or the index direction is clear. According to the above related requirements, the following evaluation indicators have been established:

    The transmission breadth refers to the degree of media information sharing, the scope of the content covered by the media information, and the area and population that can be affected. The following secondary indicators are included:

    ● Number of visitors: the number of visitors during the evaluation period.

    ● Number of information released: the number of information released during the evaluation period.

    ● Content coverage: the scope of the published content type.

    ● User coverage: the number of users that the content can reach.

    The transmission intensity is used to measure the degree of interaction between people in the process of information transmission. The higher the degree of interaction, the greater the influence of the media. The following secondary indicators are included:

    ● Number of comments: the number of comments during the evaluation period.

    ● Number of reposts: the number of reposts during the evaluation period.

    ● Number of likes: the number of likes during the evaluation period.

    The transmission depth refers to users' deep impressions of the media. The following secondary indicators are included:

    ● Cognitive channel: the information channel that recognizes the media.

    ● Quality cognition: the user's subjective evaluation of the quality of the content released by the media platform.

    ● Functional cognition: Under normal circumstances, whether the media can achieve the functions required by the user.

    The transmission validity refers to the degree of user satisfaction with the various services provided by the media. The following secondary indicators are included:

    ● Platform satisfaction: easy access through different devices.

    ● Content satisfaction: the published content is fresh and diverse.

    ● Service satisfaction: Media platforms can not only communicate with others, but also satisfy self-expression.

    Transmission viscosity refers to the obvious preference behavior of users. The following secondary indicators are included:

    ● User viscosity: how often users use the media and how long each time they use it.

    ● Recommendation ratio: the likelihood of users recommending to others and their recommendation behavior.

    ● Intention to use: reasons for users to use media.

    The relevant indicators are decomposed into several levels from top to bottom according to different attributes, as shown in Figure 2.

    Figure 2.  Hierarchical model of evaluation system.
    Table 1.  1–9 Scaling method.
    The level of importance Meaning
    1 Equally important
    3 Slightly important
    5 Obviously important
    7 Very important
    9 Extremely important
    2, 4, 6, 8 …………

     | Show Table
    DownLoad: CSV

    In this round of analytic hierarchy process, a total of 10 experts who are responsible for relevant operations on various media platforms were invited to score the indicator system. The first-level and second-level indicators of the 10 questionnaires were compared in pairs according to the scores of experts. The judgment matrix was constructed, and a total of 60 judgment matrices were obtained. The index weight is calculated by the characteristic root method and checked for consistency. The following is the calculation process of an expert's first-level indicator questionnaire result data as an example:

    Step 1: The construction of the judgment matrix:

    A=[11/2433217551/41/711/21/31/31/52111/31/5311]

    Step 2: The calculation of the product of elements in each row:

    Mi=nj=1aij(j=1,2Λ,n) (1)

    The results: M1 = 18, M2 = 350, M3 = 1/168, M4 = 2/15, M5 = 1/5.

    Step 3: The calculation of the nth root of Mi in each row:

    ¯Wi=nMi(i=1,2Λ,n) (2)
    ¯W1=1.783,¯W2=3.227,¯W3=0.359,¯W4=0.668,¯W5=0.725

    Step 4: Normalize W to calculate the index weight:

    Wi=¯Wini=1¯Wi (3)

    W1 = 0.264, W2 = 0.477, W3 = 0.053, W4 = 0.099, W5 = 0.107

    Step 5: The calculation of the consistency index CI:

    CI=λmaxnn1=0.018 (4)

    Step 6: The calculation of the test coefficient CR:

    CR=CIRI=0.016 (5)

    Generally speaking, if CR < 0.1, the judgment matrix is considered to have satisfactory consistency. The matrix CR calculated in this paper is less than 0.1, so it is proved that the matrix passes the consistency test.

    According to the above method steps, 60 judgment matrices are calculated, and the consistency test is performed through the analytic hierarchy process. The results are shown in Table 2.

    Table 2.  The weight of the impact indicator.
    First-level indicator The weight of first-level indicators Secondary indicators The weight of secondary indicators Combination weight
    Transmission breadth 0.158 Number of visitors 0.178 0.028
    Number of information released 0.295 0.047
    Content coverage 0.273 0.043
    User coverage 0.254 0.04
    Transmission intensity 0.35 Number of comments 0.231 0.081
    Number of reposts 0.546 0.191
    Number of likes 0.223 0.078
    Transmission depth 0.162 Cognitive channel 0.268 0.043
    Quality cognition 0.521 0.085
    Functional cognition 0.211 0.034
    Transmission validity 0.172 Platform satisfaction 0.198 0.034
    Content satisfaction 0.546 0.094
    Service satisfaction 0.256 0.044
    Transmission viscosity 0.158 User viscosity 0.182 0.028
    Recommendation ratio 0.541 0.085
    Intention to use 0.277 0.045

     | Show Table
    DownLoad: CSV

    The final media weight evaluation system includes two parts: quantitative indicators and qualitative indicators. Quantitative indicators are quantified based on intuitive data, which can evaluate the influence of media influence on the transmission and intensity of the media, such as the number of visitors, the number of comments, the number of reposts and so on. Qualitative indicators are still in the form of questionnaires, divided into five levels. 1-disagree, 2-little disagree, 3-generally agree, 4-relatively agree, 5-strongly agree. Each level represents 20, 40, 60, 80, and 100 points to quantify the index. The rest of the data is obtained through web crawlers, APIs and market research data. In addition, it needs to be clarified that because traditional media such as TV and radio may have slightly different evaluation dimensions from new media, the traditional media data here comes from related clients and websites. The final result obtained by combining the index score with the determined index weight is shown in Table 3.

    Table 3.  Final score of each media.
    Indicator Weibo WeChat Short video TV radio newspapers
    Number of visitors 1.37 2.8 0.97 0.6 0.09 0.04
    Number of information released 2.44 4.7 1.97 0.43 0.18 0.08
    Content coverage 1.26 2.63 1.34 3.82 3.8 4.23
    User coverage 3.33 4 3.06 3.61 3.78 2.94
      Transmission breadth 8.4 14.13 7.34 8.46 7.85 7.29
    Number of comments 5.86 8.1 3.58 0.35 0.37 0.67
    Number of reposts 9.51 19.1 14.39 1.05 1.95 1.03
    Number of likes 5.61 7.8 4.6 0.54 0.75 0.79
      Transmission intensity 20.98 35 22.57 1.94 3.07 2.49
    Cognitive channel 3.18 3.53 2.93 3.01 2.28 2.16
    Quality cognition 6.34 6.06 4.87 5.84 4.69 4.69
    Functional cognition 2.28 2.79 1.93 1.95 1.56 1.49
      Transmission depth 11.8 12.38 9.73 10.8 8.53 8.34
    Platform satisfaction 2.44 2.68 1.98 1.82 1.64 1.34
    Content satisfaction 7.42 6.71 6.55 6.25 5.08 5.03
    Service satisfaction 2.93 3.71 2.59 1.9 1.9 1.52
      Transmission validity 12.79 13.1 11.12 9.97 8.62 7.89
    User viscosity 1.68 2.42 1.27 1.3 1.01 0.85
    Recommendation ratio 5.38 7.31 4.23 4.64 3.86 3.86
    Intention to use 2.85 3.87 1.97 2.43 1.9 1.73
      Transmission viscosity 9.91 13.6 7.47 8.37 6.77 6.44
    Final score 61.6 85.42 56.3 37.59 33.28 30.96

     | Show Table
    DownLoad: CSV

    Therefore, the final media influence weight is obtained as shown in Figure 3.

    Figure 3.  Schematic diagram of the influence weight of each media.

    The UCIR model proposed in this paper is an upgrade of the classic transmission dynamics model. In the UCIR model, the status conversion of (ⅰ) unknown persons who directly refuse to understand a certain public opinion topic and become immunized, and (ⅱ) some immunized persons are affected by other related derivative public opinion topics and then return to contact status are proposed. This can make the model more complete and the dissemination mechanism of public opinion can be explained better. On this basis, the system dynamics differential equation is constructed. In the case of comprehensively considering the public opinion transmission process, users are divided into the following four forms:

    The total number of people N is assumed to be unchanged, and the entire population is divided into four types:

    U (Unknown): Refers to individuals who currently do not know the public opinion. After understanding the public opinion, they can either become contacts and consider whether to transmission or become directly immunized.

    C (Contact): Refers to hesitating individuals who have known public opinion but have not transmission it yet. They can either directly transmission to become communicators or become immunized without transmissioning.

    I (Infective): Refers to individuals who know public opinion and immediately transmission it.

    R (Recovered): Refers to individuals who know public opinion but are not interested in transmissioning. The influence of related derivative public opinion is considered, and the immunized may become a contact again.

    Unknown, Contact, Infective, and Recovered are represented by U(t), C(t), I(t), and R(t). α is the contact rate, β is the transmission rate, θ is the transmission immunity rate, ε is the direct immunity rate, p is the direct shielding rate, and q is the derived transmission rate. The relationship between them is shown in the following Figure 4.

    Figure 4.  UCIR model state transition.

    According to the above figure, the dynamic differential equation of the model is as follows:

    dU(t)dt=αU(t)I(t)pU(t)I(t) (6)
    dC(t)dt=αU(t)I(t)βC(t)εC(t)+qR(t) (7)
    dI(t)dt=βC(t)θI(t) (8)
    dR(t)dt=pU(t)I(t)+εC(t)+θI(t)qR(t) (9)

    In the study of infectious diseases, the reproducible number is a key indicator to measure whether the disease will break out during the transmission of the infectious disease. Similarly, in the transmission of public opinion, the reproducible number can also be used to measure whether the transmission of public opinion is likely to explode.

    Let X(t) = (C(t), I(t), R(t), U(t))T, then the differential equation of the model can be expressed as:

    dxdt=F(x)V(x) (10)
    ˜F(x)=[αUI000],˜V(x)=[βC+εCqRβC+θIpUεCθI+qRαUI+pU]

    Calculated:

    F=[0α00000000000000],V=[β+ε0q0βθ00εθqp0α0α+p]

    The reproducible number R0 can be obtained by the spectral radius of the reproducible matrix FV–1:

    R0=ρ(FV1)=αβεθq(β+qε) (11)

    In this article, the discussion on the related topics of #Qingdao epidemic# in 2020 was selected. The amount of discussion on this topic in each media with typical characteristics is given in detail in the table. The first time the data was collected in this article is the date of the first occurrence of the event, October 11, 2020, and the data sampling frequency is 1 day. Specific data are shown in Table 4.

    Table 4.  The amount of public opinion discussed by each media in a single day.
    10.11 10.12 10.13 10.14 10.15 10.16 10.17 10.18 10.19
    Weibo 2064 60175 18731 17197 12442 9560 3788 2617 2019
    WeChat 25 4200 4037 5481 3937 6575 3231 2423 3541
    Short video 48 1314 793 611 248 997 400 322 169
    Newspapers 0 7 45 47 40 161 104 8 43
    TV 720 485 338 369 770 341 253 354 166
    Radio 500 743 345 248 555 366 322 237 172
    10.20 10.21 10.22 10.23 10.24 10.25 10.26 10.27 10.28
    Weibo 1007 630 395 215 240 224 250 279 232
    WeChat 2041 1549 1045 709 428 294 475 572 650
    Short Video 97 26 16 5 8 10 9 14 3
    Newspapers 100 15 10 4 3 2 7 3 2
    TV 126 50 21 13 15 14 48 14 41
    Radio 124 79 56 58 30 25 22 37 25
    10.29 10.30 10.31 11.1 11.2 11.3 11.4 11.5 11.6
    Weibo 390 333 175 108 84 100 75 33 43
    WeChat 508 569 446 621 386 292 214 116 88
    Short video 14 15 1 2 3 2 4 3 3
    Newspapers 5 10 1 1 4 16 3 2 1
    TV 44 30 43 24 16 11 7 5 2
    Radio 31 27 32 33 27 7 4 2 1

     | Show Table
    DownLoad: CSV

    According to the calculated influence weight of each media in the second subsection, the amount of discussion on this topic under the converged media can be calculated as shown in Table 5.

    Table 5.  The amount of public opinion discussed by converged media in a single day.
    10.11 10.12 10.13 10.14 10.15 10.16 10.17 10.18 10.19
    Converged media 558 13576 5097 5156 3781 4028 1808 1323 1467
    10.20 10.21 10.22 10.23 10.24 10.25 10.26 10.27 10.28
    Converged media 830 236 168 202 130 106 77 41 34
    10.29 10.30 10.31 11.1 11.2 11.3 11.4 11.5 11.6
    Converged media 231 236 168 202 130 106 77 41 34

     | Show Table
    DownLoad: CSV
    Table 6.  The root mean square error comparison result of the model.
    SIR UCIR
    Converged media 271.39 185.72

     | Show Table
    DownLoad: CSV

    Since there will be a certain error between the estimated value of the model and the actual data, in order to accurately verify the effectiveness of the UCIR model. In this paper, the error between the estimated value of the model and the actual data is calculated through the root mean square error (RMSE). The smaller the error, the better the fit of the curve and the greater the effectiveness of the model. The calculation formula of root mean square error (RMSE) is shown in formula 12.

    RMSE=1NNt=1(observedpredicted) (12)

    RMSE represents the root mean square error; N represents the number of data; observed represents the actual data value collected; predicted represents the data value estimated by the model. The value of I estimated by the classic SIR model was compared with the actual data on the topic of #Qingdao epidemic# in 2020. As shown in Figure 4, it can be seen that this model has a large fitting error. At the same time, the value of I estimated by the UCIR model was compared with the actual data. As shown in Figure 5, it is obvious that this model greatly reduces the curve fitting error. It can better fit the actual data of public opinion events.

    Figure 5.  SIR model.
    Figure 6.  UCIR Model.

    The table lists the root-mean-square error of the estimated values of the SIR model and the UCIR model compared with the actual data on the related topics of #Qingdao epidemic# in 2020. Compared with the SIR model, the RMSE of the UCIR model is reduced by 31.6%.

    A single communication channel can no longer accurately describe the transmission process of public opinion, and the transmission of public opinion in converged media has more research value. The research on the transmission of public opinion in the converged media is an indispensable link in the work of public opinion transmission. For example, in real life, people can not only obtain public opinion through traditional media such as TV and radio, but also discuss and communicate with friends on new media platforms such as Weibo and WeChat. This is the most common manifestation of interaction between traditional media and new media. Traditional media and new media are slightly different in information acquisition, information can be obtained more in new media, but generally the acceptance of such information is not high, Although the information obtained in traditional media is limited, the acceptance of information is relatively high. In recent years, the transmission of public opinion in traditional media and new media has gradually shown a characteristic that "the trend of public opinion transmission is expanded by new media, while mainstream public opinion transmission is led by traditional media".

    In this paper, Matlab2017a is used for experimental verification of the model, and the horizontal axis is set as the time step, and the vertical axis is the change in the proportion of each state. Analyze the law of the public opinion transmission.

    It can be seen from Figure 7. that the number of individuals in the four states of the converged media changes over time. As time increases, the number of individuals in the unknown (U) gradually decreases, roughly at t = 18, the unknown (U) will tend to a stable value and no longer change. The individual number of contacts (C) gradually increases with the increase of time. At about t = 12, the individual number of contacts reaches its peak, then begins to decline and eventually tends to zero. The change process of Infective (I) is roughly the same as that of contacts. It also increases first and then decreases. At about t = 13, the number of Infective reaches a peak, then begins to decline and eventually tends to zero. The change in the number of individuals in the recovered (R) is just the opposite of the unknown. As time increases, the number of individuals in the recovered (R) gradually increases, reaching its peak at about t = 18, and then stabilizing at the value. When the entire process of transmission of public opinion is over, the four states of unknown, contact, Infective and recovered all tend to stabilize and remain unchanged.

    Figure 7.  Dynamic changes of UCIR.

    In the actual transmission of public opinion, the public opinion transmission will be affected by many factors. The influencing factors in this article mainly include: contact rate α, transmission rate β, transmission immunity rate θ, direct immunity rate ε, direct shielding rate p, and derivative transmission rate q. This section mainly studies these six parameters, and the specific experimental results and analysis will be introduced in detail.

    Therefore, through the remaining parameters are fixed, only one of them is changed to study the important factors that affecting the transmission of public opinion. Matlab2017a is used to simulate and analyze that the model proposed in this article. The total number of users N = 20000, at t = 0, set S(0) = N−1/N, E(0) = 0, I(0) = 10/N, R(0) = 0. The main parameters are: α = 3.6; β = 2.3; θ = 1.9; p = 0.4; ε = 0.1; q = 0.5.

    It can be seen from Figure 8. that with the increase of the contact rate α, the two curves of the C(t) and I(t) increase at a faster rate, which means that the speed of the outbreak of public opinion is accelerated, the time to reach the peak is earlier, and the corresponding peak value is also increased. This shows that the number of users who have converted to C(t) after browsing the public opinion under the converged media has increased, the speed of the transmission of public opinion has been accelerated, and the scope of the transmission of influence has been increased. In the actual public opinion transmission process, whether users are willing to understand a public opinion, the most important factor is the appeal of the content publisher and his credibility. Therefore, if the transmission of public opinion is to be promoted, factors such as the word-of-mouth of the content publisher and the attributes of users deserve to be paid attention to.

    Figure 8.  The change of U, C, I, R with time at different propagation contact rate α.
    Figure 9.  The propagation state I changes with time at different propagation contact rates α.

    It can be seen from Figure 10. that as the transmission rate β increases, the peak of I(t) gradually exceeds E(t), and the time to reach the peak of I(t) is continuously shortened, which means that the greater the transmission rate, the more individuals who are interested in the public opinion and are willing to spread it after being exposed to the public opinion, and the greater the amount of transmission. At the same time, it also shows that the faster the transmission of public opinion, the wider the coverage of public opinion. Therefore, if the transmission of public opinion wants to be suppressed, such as the transmission of rumors needs to be controlled by the government, the method of monitoring important nodes to strictly control the transmission rate can be selected. At the same time, the government can also suppress the transmission of rumors by publishing clarifying information. On the contrary, if the transmission of public opinion wants to be promoted, the transmission rate can be increased through online media, and the public opinion can be forwarded to the greatest extent to expand its scope of influence.

    Figure 10.  The change of U, C, I, R with time at different propagation contact rate β.
    Figure 11.  The propagation state I changes with time at different propagation rates β.

    It can be seen from Figure 12. that as the transmission immunity rate θ increases, the peak of I(t) gradually decreases, and the time to reach the peak of I(t) is continuously extended, which means that the greater the transmission immunity rate θ, the more individuals in the I(t) will lose their vitality and can no longer affect individuals in other states to spread. The smaller the amount of transmission, the smaller the coverage of public opinion in the process of transmission, and the transmission of public opinion will also be reduced. Therefore, the transmission of public opinion can be suppressed by enhancing the transmission immunity rate, such as clarifying that the information is quickly released by the government to accelerate the speed of the transformation of I(t) to R(t), and then the transmission of the entire public opinion is controlled.

    Figure 12.  Changes of U, C, I, R with time at different propagation immunity rates θ.
    Figure 13.  The variation of propagation state I with time at different propagation immunity rates θ.

    It can be seen from Figure 14. that the increase of ε indicates that the appeal of the public opinion is insufficient or the content users are not interested, and the more individuals will turn directly from C(t) to R(t). Therefore, the lower the peak of the I(t) curve, the lower the final spread of information. This shows that when users are not interested in the content of public opinion, the number of users of I(t) will be greatly reduced, which will have a significant impact on the transmission of public opinion. Therefore, if the transmission of public opinion is to be promoted, it is necessary to reduce the direct immunity rate and pay more attention to users' points of interest. The content should be edited in the direction that users are interested in, so as to increase the amount of information transmission.

    Figure 14.  Changes of U, C, I, R with time at different direct immunization rates ε.
    Figure 15.  The propagation state I varies with time at different direct immunization rates ε.

    It can be seen from Figure 16. that as the value of p increases, the rising speed of the R(t) curve is accelerated. It shows that with the increase of the direct shielding rate p, the proportion of U(t) individuals in the network gradually decreases, and the proportion of R(t) individuals gradually increases. The spread value of public opinion is reduced, and the effective spread time of public opinion is shortened. Therefore, if the transmission of public opinion is to be strengthened, it is necessary to ensure that the intuitive content such as the title, form, and category of the public opinion published by the platform can attract users, increase users' attention to the public opinion, and reduce the possibility of users directly rejecting the transmission.

    Figure 16.  Changes of U, C, I, R with time at different direct shielding rates p.
    Figure 17.  The propagation state I changes with time at different direct shielding rates p.
    Figure 18.  The propagation state I changes with time at different derivative propagation rates q.

    It can be seen from Figure 19. that as the derivative transmission rate q increases, the peak of the curve of I(t) increases, the time to stay at the peak becomes longer, and the time for the individual I(t) to return to the initial state becomes later. Even if q is larger, C(t) and I(t) will always exist in the system and reach a certain balance. This shows that as q increases, the effective transmission time of public opinion becomes longer, and its influence also increases. In the process of transmission of public opinion, the stronger the derivation of public opinion, the greater the scope of influence of the transmission. Therefore, if the transmission of public opinion wants to be promoted, the methods of topical information transmission can be used to increase the derivative transmission rate of public opinion, and increase the effective transmission time. In this way, the spread of public opinion is strengthened. Conversely, if the transmission of public opinion wants to be suppressed, the government can stop the transmission of public opinion in time through external intervention and other means, so as to reduce the topic's derivative transmission rate, and reduce the amount of public opinion transmission and the scope of influence.

    Figure 19.  Changes of U, C, I, R with time at different derivative propagation rates q.

    With the support of the experimental analysis results, the different stages of the public opinion transmission process can be adjusted by the characteristics of each parameter. The guidance of public opinion transmission is realized in order to provide better public opinion transmission strategies. In the initial stage of public opinion transmission, the reproducible number R0 of public opinion transmission is calculated to determine the possibility of public opinion transmission, which is convenient for guiding the transmission of public opinion. Next, suggestions for public opinion guidance are given.

    For positive public opinion information, it is expected to transmission as wide as possible. A platform with a higher network density as the source of public opinion transmission can be selected to increase the contact rate α, and the basis for increasing the effect of public opinion transmission is provided. As the party that needs to promote the transmission of public opinion, the transmission rate β can be increased and the direct shielding rate p, the direct immunity rate ε, and the transmission immunity rate θ can be reduced by making the content more sophisticated and more eye-catching. A large amount of information related to the topic of public opinion is released by opinion leaders, and users' discussions are stimulated to increase the derivative transmission rate q. The high-profile transmission is maintained, so that more users can see the public opinion information and actively participate in the transmission. In the end, the scope of public opinion information transmission will be expanded.

    When negative public opinion is transmission to a certain scale, we hope to control the transmission of negative public opinion within a certain range and restrain its ability to transmission. The contact rate can be reduced by inhibiting key users from acquiring information. Through external intervention, such as rumor information is clarified and accurate content released by the government to reduce the transmission rate β, increase the direct blocking rate p, the direct immunization rate ε, and the transmission immunity rate θ. At the same time, the false content is corrected, so that users who contact the information can directly read the correct content, the user's derivative transmission rate q of the wrong content is reduced, and the transmissioning scope of the information is reduced.

    In this paper, the proportion of each media in the converged media, the existence value of the contact state in the converged media and the derivative characteristics of topical information are comprehensively considered, and the UCIR model of public opinion transmission based on the SIR model is constructed. The feasibility and effectiveness of the model are verified through experiments and comparative analysis, and the law of public opinion transmission that is more in line with the actual situation is studied. On this basis, through the adjustment of each parameter value in the UCIR model, the influence of its changes on the node density is analyzed, so as to find out the law of public opinion transmission. This provides a reference basis for the spread of positive information to be increased and strategies to effectively control the spread of rumors are formulated. It is interesting and definitely worth investigating to solve the problems of humanities and social sciences with the method of natural science. This paper establishes a data-driven public opinion guidance method for converging media. This paper provides scientific data method support for public opinion management of converging media.

    Although this paper has made some achievements in the study of the communication of public opinion information, there are still many shortcomings and many problems to be further studied. The future outlook is as follows: the parameter estimation in the model in this paper is based on an initial value to find the parameter local optimal solution that minimizes the error between the model estimate value and the actual data. The best approach is to keep repeating the selection until the objective function has the smallest stable state. In this case, the parameter estimate is the best match with the actual data. In the future, the repeated selection method can be considered as an algorithm for better parameter optimization.

    This research was supported by the Research Project of the Beijing Chaoyang District Bureau of Science, Technology and Information Technology (No. CYXC2012) and the Fundamental Research Fund of the Central University (No. CUC2019T008).

    We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service or company.



    [1] Abarbanell J, Bushee B (1997) Fundamental analysis, future EPS, and stock prices. J Account Res 35: 1–24. https://doi.org/10.2307/2491464 doi: 10.2307/2491464
    [2] Armon A, Shwartz-Ziv R (2022) Tabular data: Deep learning is not all you need. Inform Fusion 81: 84–90. https://doi.org/10.1016/j.inffus.2021.11.011 doi: 10.1016/j.inffus.2021.11.011
    [3] Ball R, Ghysels E (2017) Automated earnings forecasts: Beat analysts or combine and conquer? Manage Sci 64: 4936–4952. https://doi.org/10.1287/mnsc.2017.2864 doi: 10.1287/mnsc.2017.2864
    [4] Ball R, Watts R (1972) Some Time Series Properties of Accounting Income. J Financ 27: 663–681. http://dx.doi.org/10.1111/j.1540-6261.1972.tb00991.x doi: 10.1111/j.1540-6261.1972.tb00991.x
    [5] Banerjee P (2020) A Guide on XGBoost hyperparameters tuning, Accessed June 14, 2024. Available from: https://www.kaggle.com/code/prashant111/a-guide-on-xgboost-hyperparameters-tuning.
    [6] Bathke Jr AW, Lorek KS (1984) The Relationship between Time-Series Models and the Security Market's Expectation of Quarterly Earnings. Account Rev 59: 163–176.
    [7] Blomme S, Dedeyne J (2020) Predicting the effect of 10-K, 10-Q and 8-K company reports on abnormal stock returns using FinBERT NLP methods. Master thesis in Business Engineering: Data Analytics, Faculteti Economie en Bedrufskunde. University of Gent.
    [8] Bv N, Simha JB, Abhi S (2023) Deploying NLP Techniques for Earnings Call Transcripts for Financial Analysis: A Reverse Phenomenon Paradigm. 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), I-SMAC 2023 – Proceedings: 368–375. https://doi.org/10.1109/I-SMAC58438.2023.10290494 doi: 10.1109/I-SMAC58438.2023.10290494
    [9] Borisov V, Haug J, Kasneci G, et al. (2024) Deep Neural Networks and Tabular Data: A Survey. Ieee T Neur Net Lear 35: 7499–7519. https://doi.org/10.1109/tnnls.2022.3229161 doi: 10.1109/tnnls.2022.3229161
    [10] Bradshaw M, Drake M, Myers J, et al. (2012). A re-examination of analysts' superiority over time-series forecasts of annual earnings. Rev Account Stud 17: 944–968. http://dx.doi.org/10.1007/s11142-012-9185-8 doi: 10.1007/s11142-012-9185-8
    [11] Brandon Ch, Jarrett JE, Khumawala SB, et al. (1987) A Comparative Study of the Forecasting Accuracy of Holt‐Winters and Economic Indicator Models of Earnings Per Share for Financial Decision Making. Manage Financ 13: 10–15. http://dx.doi.org/10.1108/eb013581 doi: 10.1108/eb013581
    [12] Brooks LD, Buckmaster DA (1976) Further Evidence of The Time Series Properties of Accounting Income. J Financ 31: 1359–1373. http://dx.doi.org/10.1111/j.1540-6261.1976.tb03218.x doi: 10.1111/j.1540-6261.1976.tb03218.x
    [13] Brown LD, Griffin PA, Hagerman RL, et al. (1987) Security analyst superiority relative to univariate time-series models in forecasting quarterly earnings. J Account Econ 9: 61–87. http://dx.doi.org/10.1016/0165-4101(87)90017-6 doi: 10.1016/0165-4101(87)90017-6
    [14] Brown LD, Rozeff MS (1979) Univariate Time-Series Models of Quarterly Accounting Earnings per Share: A Proposed Model. J Account Res 17: 179–189. http://dx.doi.org/10.2307/2490312 doi: 10.2307/2490312
    [15] Cao Q, Gan Q (2009) Forecasting EPS of Chinese listed companies using a neural network with genetic algorithm. 15th Americas Conference on Information Systems 2009, AMCIS 2009: 2791–2981.
    [16] Cao Q, Parry M (2009) Neural network earnings per share forecasting models: A comparison of backward propagation and the genetic algorithm. Decis Support Syst 47: 32–41. http://dx.doi.org/10.1016/j.dss.2008.12.011 doi: 10.1016/j.dss.2008.12.011
    [17] Chang MW, Devlin J, Lee K, et al. (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019). arXiv, arXiv: 1810.04805.
    [18] Chen Y, Chen S, Huang H, et al. (2020) Applied identification of industry data science using an advanced multi-componential discretization model. Symmetry 12: 1–28. https://doi.org/10.3390/sym12101620 doi: 10.3390/sym12101620
    [19] Chen T, Guestrin C (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785 doi: 10.1145/2939672.2939785
    [20] Conroy R, Harris R (1987) Consensus Forecasts of Corporate Earnings: Analysts' Forecasts and Time Series Methods. Manage Sci 33: 725–738. http://dx.doi.org/10.1287/mnsc.33.6.725 doi: 10.1287/mnsc.33.6.725
    [21] Delen D, Kuzey C, Uyar A, et al. (2013) Measuring firm performance using financial ratios: A decision tree approach. Expert Syst Appl 40: 3970–3983. https://doi.org/10.1016/j.eswa.2013.01.012 doi: 10.1016/j.eswa.2013.01.012
    [22] Dragan Ł, Wróblewska A (2019) Content-Based Recommendations in an E-Commerce Platform. Information Technology, Systems Research, and Computational Physics, Springer International Publishing, 252–263. https://doi.org/10.1007/978-3-030-18058-4_20
    [23] Dreher S, Eichfelder S, Noth F, et al. (2024) Does IFRS information on tax loss carryforwards and negative performance improve predictions of earnings and cash flows? J Bus Econ 94: 1–39. http://dx.doi.org/10.1007/s11573-023-01147-7 doi: 10.1007/s11573-023-01147-7
    [24] Elamir E (2020) Modeling and predicting earnings per share via regression tree approaches in banking sector: Middle East and North African countries case. Invest Manag Financ Innov 17: 51–68. https://doi.org/10.21511/imfi.17(2).2020.05 doi: 10.21511/imfi.17(2).2020.05
    [25] Elton EJ, Gruber MJ (1972) Earnings Estimates and the Accuracy of Expectational Data. Manage Sci 18: B409–B424. http://dx.doi.org/10.1287/mnsc.18.8.B409 doi: 10.1287/mnsc.18.8.B409
    [26] Etemadi H, Ahmadpour A, Moshashaei S, et al. (2015) Earnings Per Share Forecast Using Extracted Rules from Trained Neural Network by Genetic Algorithm. Computational Econ 46: 55–63. https://doi.org/10.1007/s10614-014-9455-6 doi: 10.1007/s10614-014-9455-6
    [27] Fisher IE, Garnsey MR, Hughes ME, et. al (2016) Natural Language Processing in Accounting, Auditing and Finance: A Synthesis of the Literature with a Roadmap for Future Research. Intell Syst Account 23: 157–214. Portico. https://doi.org/10.1002/isaf.1386 doi: 10.1002/isaf.1386
    [28] Foster G (1977) Quarterly Accounting Data: Time-Series Properties and Predictive-Ability Results. Account Rev 52: 1–21.
    [29] Frankel RM, Jennings JN, Lee JA, et al. (2017) Using Natural Language Processing to Assess Text Usefulness to Readers: The Case of Conference Calls and Earnings Prediction. SSRN Electronic J. https://doi.org/10.2139/ssrn.3095754 doi: 10.2139/ssrn.3095754
    [30] Gatsios RC, Lima FG, Gaio LE, et al. (2021) Re-examining analyst superiority in forecasting results of publicly-traded Brazilian companies. Revista de Administracao Mackenzie 22: eRAMF210164. https://doi.org/10.1590/1678-6971/eramf210164 doi: 10.1590/1678-6971/eramf210164
    [31] Gerakos J, Gramacy R (2013) Regression-Based Earnings Forecasts. Chicago Booth Res Paper, 12–26. https://doi.org/10.2139/ssrn.2112137 doi: 10.2139/ssrn.2112137
    [32] Griffin P (1977) The Time-Series Behavior of Quarterly Earnings: Preliminary Evidence. J Accounting Res 15: 71–83. http://dx.doi.org/10.2307/2490556 doi: 10.2307/2490556
    [33] Harris RDF, Wang P (2019) Model-based earnings forecasts vs. financial analysts' earnings forecasts. British Account Rev 51: 424–437. https://doi.org/10.1016/j.bar.2018.10.002 doi: 10.1016/j.bar.2018.10.002
    [34] Hou K, van Dijk M, Zhang Y, et al. (2012) The implied cost of capital: A new approach. J Account Econ 53: 504–526. https://doi.org/10.1016/j.jacceco.2011.12.001 doi: 10.1016/j.jacceco.2011.12.001
    [35] Huang AH, Wang H., Yang Y, et al. (2023) FinBERT: A Large Language Model for Extracting Information from Financial Text. Contemp Account Res 40: 806–841. Portico. https://doi.org/10.1111/1911-3846.12832 doi: 10.1111/1911-3846.12832
    [36] Huang D, Huang K, Liu Z, et al. (2020) FinBERT: A Pre-trained Financial Language Representation Model for Financial Text Mining. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2020/622 doi: 10.24963/ijcai.2020/622
    [37] Ishikawa Y, Izumi K, Matsushima H, et al. (2020) Forecasting Net Income Estimate and Stock Price Using Text Mining from Economic Reports. Information 11: 1–21. https://doi.org/10.3390/info11060292 doi: 10.3390/info11060292
    [38] Jarrett JE (2008) Evaluating Methods for Forecasting Earnings Per Share. Manage Financ 16: 30–35. http://dx.doi.org/10.1108/eb013647 doi: 10.1108/eb013647
    [39] Johnson TE, Schmitt TG (1974) Effectiveness of Earnings Per Share Forecasts. Financ Manage 3: 64–72. http://dx.doi.org/10.2307/3665292 doi: 10.2307/3665292
    [40] Joulin A, Grave E, Bojanowski P, et al. (2016) Bag of Tricks for Efficient Text Classification. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2, Short Papers. https://doi.org/10.18653/v1/e17-2068 doi: 10.18653/v1/e17-2068
    [41] Kambadura P, Manna G, Stentb A, et al. (2023) NLP in Finance, In: Capponi A and Lehalle Ch A, Machine Learning and Data Sciences for Financial Markets, Cambridge University Press, Cambridge. https://doi.org/10.1080/14697688.2023.2280101
    [42] Kim S, Kim H (2016) A new metric of absolute percentage error for intermittent demand forecasts. Int J Forecast 32: 669–679. http://dx.doi.org/10.1016/j.ijforecast.2015.12.003 doi: 10.1016/j.ijforecast.2015.12.003
    [43] Kiran JS, Jonnalagadda S, Naga Veera Tarun D, et al. (2023) Stock Market Prediction Using Sentiment Analysis and Incremental Clustering Approaches. 2023 9th International Conference on Advanced Computing and Communication Systems, ICACCS 2023: 888–893. https://doi.org/10.1109/ICACCS57279.2023.10112768 doi: 10.1109/ICACCS57279.2023.10112768
    [44] Klimczak KM (2020) Text analysis in finance: The challenges for efficient application. In: Gąsiorkiewicz, L., & Monkiewicz, J. (Eds.) Innovation in Financial Services, 199–216. https://doi.org/10.4324/9781003051664-4
    [45] Kropiński P (2023). Investigating Whether Economic Policy Uncertainty Affects Central and Eastern European Markets. Evidence from Twitter-Based Uncertainty Measures. Available at SSRN 4359895. https://doi.org/10.2139/ssrn.4359895 doi: 10.2139/ssrn.4359895
    [46] Kuryłek W (2023a) The modeling of earnings per share of Polish companies for the post-financial crisis period using random walk and ARIMA models. J Bank Financ Econ 1: 26–43. http://dx.doi.org/10.7172/2353-6845.jbfe.2023.1.2 doi: 10.7172/2353-6845.jbfe.2023.1.2
    [47] Kuryłek W (2023b) Can exponential smoothing do better than seasonal random walk for earnings per share forecasting in Poland? Bank Credit 54: 651–672.
    [48] Kurylek W (2024) Can we profit from BigTechs' time series models in predicting earnings per share? Evidence from Poland. Data Sci Financ Econ 4: 218–235. http://dx.doi.org/10.3934/DSFE.2024008 doi: 10.3934/DSFE.2024008
    [49] Lacina M, Lee B, Xu R, et al. (2011) An evaluation of financial analysts and naïve methods in forecasting long-term earnings. In: Lawrence K D, and Klimberg R K (Eds.), Advances in business and management forecasting, Bingley, UK, Emerald, 77–101. http://dx.doi.org/10.1108/S1477-4070(2011)0000008009
    [50] Lev B, Souginannis T (2010) The usefulness of accounting estimates for predicting cash flows and earnings. Rev Account Stud 15: 779–807. http://dx.doi.org/10.1007/s11142-009-9107-6 doi: 10.1007/s11142-009-9107-6
    [51] Lev B, Thiagarajan S (1993) Fundamental information analysis. J Account Res 31: 190–215. http://doi.org/10.2307/2491270 doi: 10.2307/2491270
    [52] Li KK (2011) How well do investors understand loss persistence? Rev Account Stud 16: 630–667. https://doi.org/10.1007/s11142-011-9157-4 doi: 10.1007/s11142-011-9157-4
    [53] Li KK, Mohanram P (2014) Evaluating cross-sectional forecasting models for the implied cost of capital. Rev Account Stud 19: 1152–1185. https://doi.org/10.1007/s11142-014-9282-y doi: 10.1007/s11142-014-9282-y
    [54] Lorek KS (1979) Predicting Annual Net Earnings with Quarterly Earnings Time-Series Models. J Account Res 17: 190–204. http://dx.doi.org/10.2307/2490313 doi: 10.2307/2490313
    [55] Lorek KS, Willinger GL (1996) A multivariate time-series model for cash-flow data. Accoun Rev 71: 81–101.
    [56] Łaniewski S, Ślepaczuk R (2024). Enhancing literature review with NLP methods Algorithmic investment strategies case. Faculty of Economic Studies, University of Warsaw Working Papers. https://doi.org/10.33138/2957-0506.2024.16.452 doi: 10.33138/2957-0506.2024.16.452
    [57] Medya S, Rasoolinejad M, Uzzi B, et. al (2022) An Exploratory Study of Stock Price Movements from Earnings Calls. WWW 2022 - Companion Proceedings of the Web Conference 2022: 20–31. https://doi.org/10.1145/3487553.3524205 doi: 10.1145/3487553.3524205
    [58] Nabiee S (2020) Prediction of Firms' Annual and Quarterly Return Using NLP Techniques. Master thesis in Electrical Engineering, University of California, Irvine.
    [59] Ohlson JA (1995) Earnings, Book Values, and Dividends in Equity Valuation. Contemp Account Res 11: 661–687. https://doi.org/10.1092/7tpj-rxqn-tqc7-ffae doi: 10.1092/7tpj-rxqn-tqc7-ffae
    [60] Ohlson JA (2001) Earnings, Book Values, and Dividends in Equity Valuation: An Empirical Perspective. Contemp Account Res 18: 107–120. https://doi.org/10.1092/7tpj-rxqn-tqc7-ffae doi: 10.1092/7tpj-rxqn-tqc7-ffae
    [61] Pagach DP, Warr RS (2020) Analysts versus time-series forecasts of quarterly earnings: A maintained hypothesis revisited. Adv Account 51: 1–15. http://dx.doi.org/10.1016/j.adiac.2020.100497 doi: 10.1016/j.adiac.2020.100497
    [62] Polak K (2021) The Impact of Investor Sentiment on Direction of Stock Price Changes: Evidence from the Polish Stock Market. J Bank Financ Econ 2: 72–90. https://doi.org/10.7172/2353-6845.jbfe.2021.2.4 doi: 10.7172/2353-6845.jbfe.2021.2.4
    [63] Pope PF, Wang P (2005) Earnings Components, Accounting Bias and Equity Valuation. Rev Account Stud 10: 387–407. https://doi.org/10.1007/s11142-005-4207-4 doi: 10.1007/s11142-005-4207-4
    [64] Pope P, Wang P (2014) On the relevance of earnings components: Valuation and forecasting links. Rev Quant Financ Account 42: 399–413. https://doi.org/10.1007/s11156-013-0347-y doi: 10.1007/s11156-013-0347-y
    [65] Rao J, Ramaraju V, Smith J, et al. (2022) A Sentiment Analysis Based Stock Recommendation System. Proceedings - 2022 IEEE 5th International Conference on Artificial Intelligence and Knowledge Engineering, AIKE 2022: 82–89. https://doi.org/10.1109/AIKE55402.2022.00020 doi: 10.1109/AIKE55402.2022.00020
    [66] Rawte V, Gupta A, Zaki MJ, et al. (2021) A Comparative Analysis of Temporal Long Text Similarity: Application to Financial Documents. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 77–91. https://doi.org/10.1007/978-3-030-66981-2_7 doi: 10.1007/978-3-030-66981-2_7
    [67] Ruland W (1980) On the Choice of Simple Extrapolative Model Forecasts of Annual Earnings. Financ Manage 9: 30–37. http://dx.doi.org/10.2307/3665165 doi: 10.2307/3665165
    [68] Rybinski K (2020) Should asset managers pay for economic research? A machine learning evaluation. J Financ Data Sci 6: 31–48. https://doi.org/10.1016/j.jfds.2020.08.001 doi: 10.1016/j.jfds.2020.08.001
    [69] Rybinski K (2021) Ranking professional forecasters by the predictive power of their narratives. Int J Forecast 37: 186–204. https://doi.org/10.1016/j.ijforecast.2020.04.003 doi: 10.1016/j.ijforecast.2020.04.003
    [70] Rybinski K (2023) Content still matters. A machine learning model for predicting news longevity from textual and context features. Inf Process Manage 60: 103398. https://doi.org/10.1016/j.ipm.2023.103398 doi: 10.1016/j.ipm.2023.103398
    [71] Santana García F (2023) The effect of financial news on stock prices: insights from NLP techniques. Comillas Pontifical University, Faculty of Economics and Business Administration, ICADE Working Paper.
    [72] Simon J (2020) Learn Amazon SageMaker: A guide to building, training, and deploying machine learning models for developers and data scientists, Packt > Birmingham – Mumbai.
    [73] Wang X, Han R, Zheng M et al. (2024) Competitive strategy and stock market liquidity: a natural language processing approach. Inf Technol Manage 25: 99–112. https://doi.org/10.1007/s10799-023-00401-2 doi: 10.1007/s10799-023-00401-2
    [74] Wawer A, Sobiczewska J (2019) Predicting Sentiment of Polish Language Short Texts. Proceedings - Natural Language Processing in a Deep Learning World, 1321–1327. https://doi.org/10.26615/978-954-452-056-4_151
    [75] Watts RL (1975) The Time Series Behavior of Quarterly Earnings. Working paper, Department of Commerce, University of New Castle, April 1975.
    [76] Wierzba M, Riegel M, Kocoń J, et al. (2021) Emotion norms for 6000 Polish word meanings with a direct mapping to the Polish wordnet. Behav Res Methods 54: 2146–2161. https://doi.org/10.3758/s13428-021-01697-0 doi: 10.3758/s13428-021-01697-0
    [77] Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1: 80–83. http://dx.doi.org/10.2307/3001968 doi: 10.2307/3001968
    [78] Wujec M (2021) Analysis of the Financial Information Contained in the Texts of Current Reports: A Deep Learning Approach. J Risk Financ Manage 14: 582. https://doi.org/10.3390/jrfm14120582 doi: 10.3390/jrfm14120582
    [79] Wang XQ (2022) Research on enterprise financial performance evaluation method based on data mining. In: 2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI). https://doi.org/10.1109/icetci55101.2022.9832404
    [80] Xu Z (2019) NLP driven large scale financial data analysis. Doctoral dissertation, University of Illinois at Urbana-Champaign.
  • This article has been cited by:

    1. Sicheng Huang, Zibo Lin, Xinqi Lin, Lin Li, Feng Ruan, Wenhua Mei, Sidong Chen, Dragan Pamucar, Establishment of a no-notice drill mode evaluation system for public health emergencies, 2022, 17, 1932-6203, e0266093, 10.1371/journal.pone.0266093
    2. Guangyu Mu, Zehan Liao, Jiaxue Li, Nini Qin, Ziye Yang, Nebojsa Bacanin, IPSO-LSTM hybrid model for predicting online public opinion trends in emergencies, 2023, 18, 1932-6203, e0292677, 10.1371/journal.pone.0292677
    3. Shunli Zhang, Evaluation and guidance of university network public opinion environment based on fuzzy evaluation method, 2024, 24, 14727978, 2763, 10.3233/JCM-247511
    4. Hongying Fei, Jinyin Zhu, Analysis of the Influence of Online Public Opinion on Corporate Brand Value: An Efficient Way to Avoid Unexpected Shocks from the Internet, 2024, 12, 2079-8954, 337, 10.3390/systems12090337
    5. Hua Wang, Ling Luo, Tao Liu, Applying the DEMATEL−ANP Fuzzy Comprehensive Model to Evaluate Public Opinion Events, 2023, 13, 2076-3417, 5737, 10.3390/app13095737
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(499) PDF downloads(36) Cited by(0)

Article outline

Figures and Tables

Tables(6)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog