Processing math: 100%
Review

Online public opinion and asset prices: a literature review

  • We review the research work undertaken to explore how online public opinion information through social media and news media affects asset prices. First, it summarizes the measurement of online public opinion from three aspects: data source of online public opinion, textual sentiment analysis, and measurement of online public opinion index. Second, it puts forward the related theoretical basis of the research on online public opinion and asset price such as the noise trading theory, arbitrage limitation demonstrations, limited attention assumption and divergence models, etc. Third, it summarizes the three transmission channels through which online public opinion affects asset prices: investor attention, investor perceptions, and investor sentiment. Last, it looks deeply into the area and classifies the empirical literature according to various sources of online public opinion chosen by the researcher. Therefore, this exploratory work contributes to the existing literature by introducing the first systematic review.

    Citation: Yaya Su, Yi Qu, Yuxuan Kang. Online public opinion and asset prices: a literature review[J]. Data Science in Finance and Economics, 2021, 1(1): 60-76. doi: 10.3934/DSFE.2021004

    Related Papers:

    [1] Yuxuan Zhang, Xinmiao Rong, Jimin Zhang . A diffusive predator-prey system with prey refuge and predator cannibalism. Mathematical Biosciences and Engineering, 2019, 16(3): 1445-1470. doi: 10.3934/mbe.2019070
    [2] Tingting Ma, Xinzhu Meng . Global analysis and Hopf-bifurcation in a cross-diffusion prey-predator system with fear effect and predator cannibalism. Mathematical Biosciences and Engineering, 2022, 19(6): 6040-6071. doi: 10.3934/mbe.2022282
    [3] Guangxun Sun, Binxiang Dai . Stability and bifurcation of a delayed diffusive predator-prey system with food-limited and nonlinear harvesting. Mathematical Biosciences and Engineering, 2020, 17(4): 3520-3552. doi: 10.3934/mbe.2020199
    [4] Jun Zhou . Bifurcation analysis of a diffusive plant-wrack model with tide effect on the wrack. Mathematical Biosciences and Engineering, 2016, 13(4): 857-885. doi: 10.3934/mbe.2016021
    [5] Yongli Cai, Malay Banerjee, Yun Kang, Weiming Wang . Spatiotemporal complexity in a predator--prey model with weak Allee effects. Mathematical Biosciences and Engineering, 2014, 11(6): 1247-1274. doi: 10.3934/mbe.2014.11.1247
    [6] Wanxiao Xu, Ping Jiang, Hongying Shu, Shanshan Tong . Modeling the fear effect in the predator-prey dynamics with an age structure in the predators. Mathematical Biosciences and Engineering, 2023, 20(7): 12625-12648. doi: 10.3934/mbe.2023562
    [7] Zuolin Shen, Junjie Wei . Hopf bifurcation analysis in a diffusive predator-prey system with delay and surplus killing effect. Mathematical Biosciences and Engineering, 2018, 15(3): 693-715. doi: 10.3934/mbe.2018031
    [8] Mingzhu Qu, Chunrui Zhang, Xingjian Wang . Analysis of dynamic properties on forest restoration-population pressure model. Mathematical Biosciences and Engineering, 2020, 17(4): 3567-3581. doi: 10.3934/mbe.2020201
    [9] Xue Xu, Yibo Wang, Yuwen Wang . Local bifurcation of a Ronsenzwing-MacArthur predator prey model with two prey-taxis. Mathematical Biosciences and Engineering, 2019, 16(4): 1786-1797. doi: 10.3934/mbe.2019086
    [10] Xiaoling Li, Guangping Hu, Xianpei Li, Zhaosheng Feng . Positive steady states of a ratio-dependent predator-prey system with cross-diffusion. Mathematical Biosciences and Engineering, 2019, 16(6): 6753-6768. doi: 10.3934/mbe.2019337
  • We review the research work undertaken to explore how online public opinion information through social media and news media affects asset prices. First, it summarizes the measurement of online public opinion from three aspects: data source of online public opinion, textual sentiment analysis, and measurement of online public opinion index. Second, it puts forward the related theoretical basis of the research on online public opinion and asset price such as the noise trading theory, arbitrage limitation demonstrations, limited attention assumption and divergence models, etc. Third, it summarizes the three transmission channels through which online public opinion affects asset prices: investor attention, investor perceptions, and investor sentiment. Last, it looks deeply into the area and classifies the empirical literature according to various sources of online public opinion chosen by the researcher. Therefore, this exploratory work contributes to the existing literature by introducing the first systematic review.



    Parameter sensitivity analysis is a useful tool for elucidating the dynamics of biological processes, optimally designing biological experiments, and investigating the identifiability of model parameters. It is particularly important in the context of reaction networks describing the time-evolution of populations of molecular species that interact with each other through a set of reactions. These biochemical reaction networks, which are used to describe signalling, regulation and development processes in molecular biology, often involve a large number of non-linear interactions parameterised with an even larger number of parameters (e.g. reaction rate constants and reaction thresholds). Sensitivity analysis in this context can be used to unravel the network complexities by identifying the key parameters and the corresponding reactions that drive the most fundamental aspects of the network. Sensitivity analysis can also be used to optimally design experiments, for example by selecting the variables, time-points and number of replicates to be observed in order to achieve maximum parameter sensitivity. Similarly, it is a useful tool for examining the identifiability of model parameters for a given set of observations.

    Early results in sensitivity analysis of biochemical reaction networks were derived using deterministic (i.e. non-stochastic) models [1,2,3]. However, the technological innovations that allowed for observing molecular species at the single cell level and over time emphasised the need to account for the intrinsic stochasticity of reaction networks in molecular biology [4,5,6]. Progress has been made in estimating sensitivities of summary measures of the probability distribution of reaction networks (e.g. expectation of a key molecular population at its expected peak time) using finite difference and other methods (e.g. [7,8,9,10,11,12]).

    These methods use the Stochastic Simulation Algorithm (SSA) [13] to simulate sample paths of the so-called master equation describing the evolution of the probability distribution P of the interacting molecular populations over time. However, the SSA, which simulates every single reaction occurrence, is computationally expensive for large and complex networks and especially those networks where reactions occur across well-separated time-scales.

    Sensitivity analysis that uses the full probability distributions, rather than their summary measures, is computationally infeasible unless a suitable approximation of the master equation is used. The Linear Noise Approximation (LNA) is a systematic stochastic approximation of the master equation in terms of the system size. The system size, Ω, is a scale parameter that is inversely proportional to the levels of stochasticity of the evolving molecular populations, i.e. the size of fluctuations is small for large Ω [14]. The key advantage of the LNA, over approximations such as tau-leaping [15] and Langevin or diffusion approximations [16], is that LNA provides analytical expressions for the probability distribution, P, of the interacting molecular populations, which are Multivariate Normal. This implies that the LNA can be much faster than other approximations in terms of simulation and parameter estimation but also that quantities such as the Kullback-Leibler (KL) divergence and the Fisher Information can be computed. The LNA has been used for simulation [17,18], parameter estimation [19,20,21,22] and sensitivity analysis [23].

    The LNA has been shown to be inaccurate for simulating noisy oscillations [17,24,25,26,27]. Oscillatory dynamics commonly arise in biology, epidemiology, engineering and beyond with numerous examples including the circadian clock, NF-κB signalling, cardiac rhythms, and predator-prey systems. We have recently developed (see [27]) an approximation, called phase corrected LNA (pcLNA), that corrects the standard LNA to give fast and accurate long-time stochastic simulations for oscillations. The probability distributions derived using pcLNA are Multivariate Normal with mean vector and covariance matrix that have similar expressions to the standard LNA and therefore computation of the KL divergence and the Fisher Information Matrix, which enables a parameter sensitivity analysis, is computationally feasible.

    This article develops a general theory of parameter sensitivity analysis. It uses the Kullback-Leibler divergence and the Fisher Information Matrix to study the sensitivity of the probability distribution, Pθ, of a stochastic process, Y={Y(t),t0}, to changes of the parameter vector θ. It is a local analysis in the sense that we study changes of the parameter vector θ=θ+δθ, where O(δθ3) are negligible. It is, on the other hand, complete in the sense that the sensitivity of the probability distribution, rather than summaries of the probability distribution, are analysed. It extends the theory of sensitivity analysis developed earlier in [23,27,28], by (ⅰ) deriving a matrix for studying the parameter sensitivity of any stochastic process of which the Fisher information matrix can be computed, possibly even by rudimentary approximations, (ⅱ) deriving a complete method for studying the sensitivity of the joint probability distribution of any sample path of a multivariate Gaussian stochastic processes to changes in parameter values, (ⅲ) describing the application of this theory to oscillatory networks approximated by pcLNA and showing how to use it for experimental design, for studying parameter identifiability and for comparing deterministic to stochastic models.

    The paper proceeds as follows. In section 2 we develop a general theory of parameter sensitivity analysis of the probability distributions of sample paths of stochastic processes. We also consider the case of multivariate Gaussian stochastic process. In section 3 we describe reaction networks and the master equation and in section 4 the LNA and pcLNA approximations. In section 5 we provide an illustrative example of our methods using the Brusselator system [29]. Our approach is then applied in section 6 to the sensitivity analysis of the Drosophila circadian clock developed in [30]. Section 7 provides a discussion of the results, while details of the Drosophila circadian clock model [30] are provided in Supplement A.

    Let Y={Y(t),t0} be a stochastic process defined on a probability space (Ω,F,Pθ) of which the probability distribution Pθ depends on a parameter θ=(θ1,,θk)TΘRk. We wish to study how the probability distribution of sample paths, Y(t1),,Y(tn), for 0t1<<tn<, is affected by changes in the value of θ. For this purpose, we introduce the Fisher Information Matrix (FIM).

    We first define the log-likelihood function (θ;y)=logpθ(y) where pθ(y) is the probability density (or mass) function of a sample path y of Y. The FIM, I(θ), is a symmetric positive-(semi)definite k×k matrix with entries

    Iij(θ)=EPθ[ij]=EPθ[2ij].

    Here EPθ denotes the expectation function under the probability distribution Pθ, i the partial derivative with respect to θi evaluated at θ and 2ij the corresponding second order derivative. The FIM is therefore the negative of the expected curvature of (θ;y)C2(Θ). For k=1 and convex (θ;y), the FIM, I(ˆθ), measures the expected "peakedness" of the likelihood at its maximum value (ˆθ;y).

    The Fisher information matrix is related to the Kullback-Leibler (KL) divergence between two probability distributions Pθ+δθ and Pθ. For two probability distributions P and Q with density functions p(y) and q(y), yY, the KL divergence is

    DKL(PQ)=Yp(y)logp(y)q(y)dy.

    That is, the KL divergence DKL(PQ) is the expected value of the logarithm of the likelihood ratio logp(y)/q(y) with the expectation taken with respect to P and the usual conventions when p(y)=0 or q(y)=0. An analogous definition of KL divergence applies for discrete probability distributions.

    If P=Pθ+δθ and Q=Pθ, then (see [31]),

    DKL(Pθ+δθPθ)=12δθTI(θ)δθ+O(δθ3).

    That is, the FIM is the hessian matrix of the above KL divergence at θΘ (the tangent of DKL at δθ=0k is 0).

    If the Fisher information matrix I=I(θ), θΘ, is positive definite, it defines a Riemannian metric over the statistical manifold of probability distributions {Pθ,θΘ} by the inner product of two probability distributions Pθ+δθ and Pθ+δθ in the tangent space of the manifold at θ

    δθ,δθθ=i,jδθiδθjIij(θ)=δθTIδθ,

    with the FI metric

    δθ2θ=δθ,δθθ=i,jδθiδθjIij(θ)=δθTIδθ.

    This FI metric is related to the KL divergence by

    DKL(Pθ+δθ||Pθ)=12δθ,δθθ+O(||δθ||3).

    The FIM can therefore be used to locally (i.e. when O(||δθ||3)0) measure the change in probability distribution PθPθ+δθ for a change in parameter values θθ+δθ.

    Because the FIM I=I(θ) is symmetric and positive semi-definite, its Singular Value Decomposition (SVD) is of the form VD2VT where V is orthogonal and D is diagonal with entries σ1σk0. It can therefore be decomposed to I=s_Ts_ with the matrix s_=s_(θ)=DVT and the KL divergence

    DKL(Pθ+δθ||Pθ)=12s_δθ2+O(||δθ||3)=12i,j,lδθjδθls_ijs_il+O(||δθ||3).

    The length, s_j2, of the column s_j=(s_1j,,s_kj)T of s_, measures the effects of a single unit change of the j-th parameter θj to the distribution Pθ, j=1,,k. It can therefore be used to study the sensitivity of Pθ to changes in the parameter values.

    Note that no assumptions for the probability distribution, Pθ, are made so far. We next explain the role of the matrix s_ as a sensitivity matrix in the important case of multivariate Gaussian stochastic processes where analytical expressions for the FIM of probability distributions of sample paths are available.

    We consider the case where Y={Y(t),t0} is an m-dimensional Gaussian stochastic process. That is, for t0, Y(t)=(Y1(t),,Ym(t))TRm with the joint probability distribution of sample paths Y(t1),Y(t2),,Y(tn) being the multivariate normal MVN(μ(θ),Σ(θ)), with mean vector μ=μ(t1,,tn;θ)=μ(θ) and covariance matrix Σ=Σ(t1,,tn;θ)=Σ(θ) depending on θ. In this case, the entries of the FIM are

    Iij(θ)=(iμ)TΣ1(jμ)+12tr(Σ1(iΣ)Σ1(jΣ)) (2.1)

    where all derivatives are taken at θ. This can also be written using the vec notation [32] as

    Iij(θ)=(iμ)TΣ1(jμ)+12vec(iΣ)(Σ1I)(IΣ1)vec(jΣ).

    Now consider the N×k matrix (N=(mn)+(mn)2)

    L=(μvec(Σ))=(1μkμ1vec(Σ)kvec(Σ)), (2.2)

    which is the linearisation matrix of the mapping θ(μ(θ),vecΣ(θ)) at θ. That is, if we let δμ=(δμi) and δΣ=(δΣij), with δμi=μi(θ+δθ)μi(θ) and δΣij=Σij(θ+δθ)Σij(θ), then

    (δμ,δvec(Σ))T=Lδθ+O(δθ2). (2.3)

    If we also define the matrix F as the Cholesky decomposition of the block diagonal positive-definite matrix

    (Σ100(Σ1I)(IΣ1)/2)

    then we can write the Fisher information in (2.1) as

    I=(FL)T(FL).

    Therefore, FL is a linear map from θ to RN which sends the ,θ metric to the standard one in RN,

    δθ,δθθ=δθTI(θ)δθ=δθT(FL)T(FL)δθ=(FLδθ)T(FLδθ)

    and relates the FI metric in Θ to the standard one in RN,

    δθ2θ=FLδθ2.

    The matrix s_ characterises sensitivity

    The sensitivity of the probability distribution MVN(μ(θ),Σ(θ)) to changes δθ in θ can therefore be studied using the vector FLδθ. Equation (2.3) shows that

    F(δμ,δvec(Σ))T=FLδθ+O(δθ2).

    We now consider the (thin) SVD of the N×k matrix FL, FL=WDVT, where W=[W1Wk] is an N×k column-orthogonal matrix, D a k×k diagonal matrix with entries of the main diagonal the singular values σ1σk, and V=[V1Vk] a k×k orthogonal matrix. Because I=(FL)TFL, the eigenvalues of I are σ21σ2k0, and Vi, i=1,,k, are the corresponding eigenvectors.

    The N×1 orthogonal column vectors Wi of W and the k×1 eigenvectors Vi of I, satisfy the equation FLVi=σiWi, and if we define Ui=F1Wi, i=1,2,,s then

    LVi=σiUi, (2.4)

    where the N-dimensional vectors Ui can be written as Ui=(Uμi,UΣi) to reflect the correspondence of each of its first mn entries to the (mn)-dimensional mean vector μ and the last (mn)2 entries to the covariance matrix Σ.

    By (2.4), and using that δθ=i(VTiδθ)Vi, up to terms that are O(δθ2),

    Lδθ=Li(VTiδθ)Vi=i(σiVTiδθ)Ui=i(js_ijδθj)Ui=Us_δθ.

    Therefore,

    L=Us_, (2.5)

    and by (2.3),

    (δμ,δvec(Σ))θ=s_δθ+O(δθ2)

    since the Ui are orthonormal in the ,θ metric and the coefficient of Ui in Us_δθ is the ith coordinate of s_δθ. These equations are the reason we call s_=DVT the sensitivity matrix.

    Similarly, up to terms that are O(δθ2),

    F(δμ,δvec(Σ))T=ki=1kj=1Wis_ijδθj=Ws_δθ (2.6)

    i.e. FL=Ws_. We can now make a few useful observations:

    1. Equation (2.6) shows that the change in the probability distribution MVN(μ(θ),Σ(θ)) produced by a change to the value of parameter θθ+δθ, according to the FI metric, is a weighted sum of the vectors Wi=F(UμiT,UΣiT)T with weights-coefficients σjVTjδθ=js_ijδθj. The change in the probability distribution is reflected to the mean through the Uμj directions and the covariance matrix through the UΣj directions.

    2. The coefficients σjVTjδθ are proportional to the singular values σj and the inner products VTjδθ. The latter are the coordinates of δθ in the orthonormal basis of Θ defined by the columns of the matrix V. Therefore, because the singular values are chosen in non-increasing order, i.e. σ1σk, the largest change in the probability distribution, subject to fixed δθ, occurs when the change δθ is parallel to V1 that corresponds to the largest singular value σ1. If the singular values decay fast, there are only a few directions of the signal space that can produce a relatively large change in the MVN(μ(θ),Σ(θ)) distribution (subject to fixed δθ).

    3. Furthermore, the overall contribution of each coordinate δθi of δθ in the change of the probability distribution is measured, according to the FI metric, by kj=1Wjs_ij=Ws_i. That is, if δθ=ϵei with eiRk the usual unit vector with only non-zero entry eii=1 and constant ϵR, then the corresponding change in the probability distribution MVN(μ(θ),Σ(θ)) according to the FI metric is

    ϵFLei=ϵWs_i. (2.7)

    Because of the definition of s_ through the SVD, the sensitivity matrix s_ is optimal for capturing as much sensitivity as possible in the low order principal components of (δμ,δvec(Σ))T. That is, for any (sensitivity) matrix s_ which for some orthogonal matrix U satisfies

    (δμ,δvec(Σ))T=U(s_)Tδθ+O(δθ2),

    the sensitivity matrix s_ satisfies the following inequalities for all <k,

    ijs_2ijijs_2ijandijs_2ijijs_2ij, (2.8)

    i.e. among all such sensitivity matrices s_ squeezes as much of the sensitivity effect as possible into the lower i components.

    For the above reasons, we call s_ij, for j=1,,k, principal coefficients of sensitivity of MVN(μ(θ),Σ(θ)) to changes in the ith component of the signal Si, i=1,,k.

    In the following section, we examine a particular case in which such sensitivity analysis is relevant. These are the reaction networks used to describe cellular processes such as signalling, regulation and development in molecular biology.

    A system of multiple different molecular populations, M1,M2,,Mm has state vector, Y(t)=(Y1(t),,Ym(t))T where Yi(t), i=1,,m, denotes the number of Mi molecules at time t. These molecules undergo reactions Rj, j=1,,r, where Y(t) jumps to a new state Y(t)+νj, with νj=(ν1j,,νmj)TZm the stoichiometric vectors of the reactions. Each reaction occurs with intensity that depends on the current state of the network. If the current state is Y(t)=y, the probability of a single Rj reaction occurring in [t,t+dt) is wj(y)dt+o(dt), while the probability of no Rj reaction in [t,t+dt) is 1wj(y)dt+o(dt). Here limdt0o(dt)/dt=0.

    The Kolmogorov forward equation that describes the time-evolution of the probability distribution, P(y,t)=P(Y(t)=y), of the stochastic process Y={Y(t),t0} is then

    P(y,t)t=rj=1wj(yνj)P(yνj,t)rj=1wj(y)P(y,t). (3.1)

    The Kolmogorov equation is often referred as (chemical) master equation.

    The master equation can rarely be solved analytically and therefore the focus has been on simulation of sample paths of Y. The so-called Stochastic Simulation algorithm (SSA) [13] exactly simulates the sample path Y(t), t[0,T], for a given initial state Y(0), by generating all reactions that occur in [0,T]. SSA quickly becomes slow as the complexity of the network rises.

    Furthermore, computation of the likelihood of sample paths of Y as well as quantities such as the KL divergence and the FIM is extremely expensive and require some form of approximation.

    In this section, we focus on the Linear Noise Approximation (LNA), using which we can compute the likelihood of sample paths, the KL divergence and FIM and therefore perform the sensitivity analysis described in section 2.

    It is common in studying stochastic systems to introduce a system size parameter Ω which is a parameter that occurs in the intensities of the reactions wj(Y(t)). The precise description of this parameter depends on the system. In population models it might be considered to be of the same order of magnitude as the total population size while in molecular biology a natural choice is to use molar concentrations and therefore regard Ω as Avogadro's number in the appropriate molar units (e.g. nM1) multiplied by the volume of the reacting solution (e.g. the cell) in appropriate units (e.g. in litres (L)). In the circadian clock system that we consider in section 6, it has units L/μM.

    The system size governs the size of the state fluctuations and therefore the size of the jumps. Larger system sizes generally imply relatively smaller fluctuations and vice versa. In a certain sense the system size parameter is just a mathematical convenience to control the overall levels of stochasticity and to enable the study of the dependence of stochastic fluctuations upon system size.

    While having a system size parameter is not necessary to apply our methods, it allows one to study the dependence of stochastic fluctuations upon system size and to calculate the deterministic equations that describe the evolution of the concentration vector X(t)=Y(t)/Ω in the limit of Ω (see next section). A sufficient condition to derive this limit is that the intensities wj(Y(t)) depend upon Ω (cf. [14,36,37]) as

    wj(Y)=Ωuj(Y/Ω), (4.1)

    where uj(x) the macroscopic (Ω) rates, derived next, that generally depend on the concentration vector x=x(t). The condition is very general and applies to all common reaction types encountered in the biochemical context.

    The time-evolution of the stochastic process Y can be described using the random time change representation (RTC) [35]

    Y(t)=Y(0)+rj=1νjZj(t0wj(Y(s))ds), (4.1)

    with Zj being independent unit Poisson processes corresponding to reaction Rj*. The term Zj(t0wj(Y(s))ds) in (4.1) counts the number of Rj reactions that happened in [0,t).

    *If Z(t) is a unit Poisson process then it is a Poisson process with rate 1 (see properties of Poisson process) and Z(λt) is a Poisson process with rate λ.

    Using the condition (4.1) we can re-write the infinitesimal RTC equation in (4.1) in terms of X(t)=Y(t)/Ω as

    X(t+dt)X(t)=rj=1νjΩ1Zj(Ωuj(X(t))dt). (4.2)

    If we also define x(t) as the limit in probability of X(t), i.e. X(t)Px(t), as Ω, we can use the law of large numbers (LLN) to derive the limit of equation (4.2), as Ω,

    x(t+dt)x(t)=rj=1νjuj(x(t))dt.

    Equivalently, this can be written as the macroscopic rate equation

    ˙x=dxdt=F(x),F(x)=rj=1νjuj(x(t)). (4.3)

    We now define the LNA ansantz [14,36,37] that describes the relation between the stochastic process X(t) and the deterministic solution of the system x(t) with their difference, scaled by Ω, being a stochastic process, {ξ(t),t0}, describing the noise around x(t). That is,

    X(t)=x(t)+Ω1/2ξ(t).

    The LNA ansantz implies that

    ξ(t+dt)ξ(t)=jνj(Z(1)j+Z(2)j)

    where

    Z(1)j=Ω1/2(Ω1Zj(Ωuj(X(t))dt)uj(X(t))dt)DN(0,uj(x(t))dt),as Ω,

    and

    Z(2)j=Ω1/2(uj(X(t))uj(x(t)))dtΩ(xuj(x(t)))Tξ(t)dt.

    Therefore for sufficiently large values of Ω, the time-evolution of {ξ(t),t0} can be described by the linear Stochastic Differential equation (in the Itô sense)

    dξ=rj=1νj(xuj(x))Tξdt+rj=1νjuj(x)N(0,dt),

    or, in matrix form,

    dξ=Jξdt+EdWt,

    where J=J(x) the Jacobian matrix of (4.3), E=E(x)=Sdiag(u1(x),,ur(x)) the product of the stoichiometry matrix S=[ν1νr], and the square root of the diagonal matrix diag(u1(x),,um(x)), and Wt a Wiener process.

    This linear SDE has a solution that can be written as

    ξ(t)=C(0,t)ξ(0)+η(0,t),η(0,t)MVN(0,V(0,t)),

    where, C(0,t) the fundamental matrix of (4.3), which is the solution of the initial value problem

    ˙C=JC,C(0,0)=In, (4.4)

    and the symmetric positive-definite matrix V(0,t) is the solution of the initial value problem

    ˙V=JV+VJT+EET,V(0,0)=0. (4.5)

    The above representation implies that by solving the initial value problems in (4.3), (4.4) and (4.5) one can easily derive the probability distribution of ξ(t) for any given initial state ξ(0)=ξ0. In particular, the probability distribution

    (ξ(t)|ξ(0)MVN(m0,S0))MVN(m(t),S(t)),m(t)=C(0,t)m0,S(t)=C(0,t)S0C(0,t)T+V(0,t)

    and therefore

    (X(t)|X(0)MVN(μ0,Σ0))MVN(μ(t),Σ(t)),μ(t)=x(t)+Ω1/2m(t),Σ(t)=S(t)/Ω.

    where here X(0)=x(0)+Ω1/2ξ(0).

    It can also be shown that the joint probability distribution of sample paths (X(t1),X(t2),,X(tn)|X(0)MVN(μ0,Σ0)), under the LNA, is also MVN with mean

    μ1:n=(μ(t1)T,,μ(tn)T)T (4.6)

    and precision matrix (inverse of variance matrix) ΩA1:k where A1:k is the block tridiagonal matrix

    [V11+CT1,2V11,2C1,2CT1,2V11,200V11,2C1,2V112+CT2,3V12,3C2,3CT2,3V12,3000V12,3C2,30V1k2,k1+CTk1,kV1k1,kCk1,kCTk1,kV1k1,k00V1k1,kCk1,kV1k1,k]. (4.7)

    Here we used the notation Ci,i+1=C(ti,ti+1), Vi,i+1=V(ti,ti+1), and V1=V0,1+C0,1S0VT0,1.

    The above results make LNA hugely faster in simulation compared to SSA, but perhaps more importantly make feasible the computation of the likelihood function of sample paths and associated quantities such as the FIM.

    However, the important question arises on whether the LNA is an accurate approximation of the master equation for finite Ω. The answer relates to the structure of the reaction network [18,39]. For example, LNA has been found to be accurate for networks involving intensity functions that are up to first order polynomials of the reactants concentrations [40]. The LNA is also very accurate for long-time approximation of reaction networks with a single stable fixed point, while it is inaccurate for long-time approximation of multi-stable networks (see e.g. [39]).

    Furthermore, the LNA is inaccurate for long-time approximation of reaction networks with oscillatory dynamics. This failure of the LNA was extensively studied for those oscillatory networks where in the Ω limit, the ode in (4.3) has a periodic solution, γ, given by x=g(t) with g(t)=g(t+T), for some T>0 [17,24,25,26]. In a nutshell, this failure of the LNA is due to that, for finite Ω, the stochastic sample paths, Y(t), t0, of the master equation increasingly spread in the tangental direction of g(t) (i.e. parallel to F(g(t))) as time grows. This is in contrast to the variance of Y(t), t0, in the direction transversal to γ, which quickly converge to a fixed value. The increasing tangental variability results in the phase of Y(t), t0, increasingly drifting from the phase of the deterministic solution g(t). Therefore, the LNA predictions, which have the same phase as g(t), are increasingly out of phase with Y(t), t0 [17,24,25,26,27].

    We have recently developed (see [27]) a modification of the standard LNA for oscillatory networks, called phase corrected LNA or pcLNA, that corrects for the phase drifts. We first define the section, Sx, for xγ, which is an (m1)-dimensional linear hyperplane with xSx and transversal to the tangent vector, F(x). A particular example is the hyperplane normal to γ at x, i.e. for any uSx, uF(x). Then the mapping G of a neighbourhood of γ onto γ is such that if uSx then G(u)=xγ. We use G to map the stochastic sample path X(t), t0, to the periodic solution g(t). The pcLNA anstantz is

    X(t)=G(X(t))+Ω1/2κ(t).

    Here κ(t) lies on the transversal section SG(X(t)) and therefore, unlike ξ(t) in the standard LNA ansantz, is unaffected by the increasing tangental variance.

    The pcLNA can be used for fast and accurate long-time simulation of sample paths of oscillatory networks. The simulation algorithm (see Figure 1) proceeds as the standard LNA except that after deriving X(t) an extra step is added to find G(X(t)) and subsequently κ(t)=Ω1/2(X(t)G(X(t))) to replace x(t) and ξ(t), respectively, before progressing with another LNA step. The key point here is that G(X(t))=g(s)γ for some s[0,T] and therefore the same solutions of the ode's in (4.3), (4.4), (4.5) are used in all simulations. The same principle can be used for parameter estimation using the corresponding Kalman filter (see [27]).

    Figure 1.  Schematic representation of pcLNA. (A) The mapping GN from transversal sections normal to γ into γ. (B) The main steps of the pcLNA simulation algorithm. For a given X(ti)=x(ti)+Ω1/2ξi (here for simplicity we assume that G(X(t0))=g(t0) and thus κ0=ξ0), the algorithm computes X(ti+1), ti+1=ti+Δt using the standard LNA distribution. Then, the mapping G(X(ti+1))=g(si+1) and κ(ti+1)=Ω1/2(X(ti+1)g(si+1)) are computed and replace x(ti+1) and ξ(ti+1) in computing the next steps using the standard LNA. Note that the periodic solution g(t) is only computed once as only the time/phase changes after the phase correction step.

    The probability distributions on the transversal section derived under the LNA converge to a fixed point probability distribution and are shown to be almost indistinguishable from SSA simulations even for relatively low Ω (for the Drosophila circadian clock [30] for Ω300, see also [26,27]). They can be used to analyse the network at specific important phases of the network, e.g. peaks of the key protein in the network, but also the overall dynamics if joint probability distributions of a large number of phases are considered.

    We can derive the joint probability distribution of a sample path Qx1,,Qxn of points on the transversal sections Sxi, i=1,,n, respectively, where xi=g(ti), 0<t1<<tn, for initial condition X(0)MVN(μ0,Σ0). This is

    (Qx1,,Qxn|X(0)MVN(μ0,Σ0))LNAMVN(μx1:n,Ω1Ax1:n), (4.8)

    where μx1:n and Ax1:n are of the same form with μ1:n in (4.6) and A1:n in (4.7) respectively. However, μ(ti), Ci1,i, and Vi1,i are replaced by their projections on the transversal section Sxi. For normal transversal sections, this can be easily derived by first deriving (e.g. using Gram-Schmidt process) an orthogonal matrix R=[R1R2] that has first column, R1, parallel to the tangent vector F(xi), for i=1,,n, and replacing μ(ti) with RT2μ(ti), Ci1,i with RT2Ci1,iR2 and Vi1,i with RT2Vi1,iR2. For convenience, we henceforth call the probability distributions under the LNA on the transversal sections of given phases in (4.8) as pcLNA distributions.

    In this section, we provide an illustrative example of our approach using the two-dimensional Brusselator model described by the ODE system

    ˙x1=1x1bx1+cx21x2,˙x2=bx1cx21x2.

    The system has a single fixed point, x=(1,b/c), that is stable for b<1+c, while a unique stable periodic solution, x(t)=(x1(t),x2(t)), exists for b>1+c (see Figure 2). We have previously shown that pcLNA probability distributions are almost indistinguishable to SSA empirical probability distributions of this network for Ω1000 [27]. We now use the pcLNA joint probability distributions for b=2.2, c=1, at phases/times, t=0.25,0.5,,6, for Ω=1000 to analyse the parameter sensitivities of the model.

    Figure 2.  The Brusselator model and its parameter sensitivities. (A) The deterministic periodic solution derived for parameter values b=2.2, c=1 plotted against time. (B) The singular values of the FIM of pcLNA probability distributions computed at t=0.25,0.5,,6. (C) The principal sensitivity coefficients of the two parameters of the model. (D) The deterministic periodic solution and the pcLNA confidence intervals μxi±σxi for parameter value θ0=(2.2,1) (black color), θ1=θ0+0.05V1=(2.17,1.04) (blue color) and θ2=θ0+0.05V2=(2.16,0.97) (red color).

    As we can see in Figure 2(B), the first singular value of the FIM is substantially larger (about 23.2510 times) than the second singular value. The large first principal sensitivity coefficients for both parameters reveal that the pcLNA probability distributions are sensitive to changes in both parameter values (see Figure 2(C)). Furthermore, the first singular value corresponds to changes that move the two parameters in opposite directions. That is, the principal sensitivity coefficients s_11,s_12 and similarly the eigenvector V1=(V11,V21) of the FIM corresponding to the first singular value σ1 are such that s_11s_12<0 and V11V21<0. Therefore changes of the parameter value θ0=(2.2,1) in the V1 direction, i.e. θ1=θ0+δV1, δ0, result in a relatively large fixed-point translocation and, as shown in Figure 2(D), a large change in the deterministic solution and pcLNA probability distributions. On contrary, the principal sensitivity coefficients s_21,s_22 and therefore the eigenvector V2=(V21,V22) corresponding to the second singular value σ2 are such that s_21s_22>0 and V21V22>0. Therefore, a change in parameter values, θ2=θ0+δV2, result in much smaller changes in the deterministic solution and pcLNA probability distributions (see Figure 2(D)).

    In this section, we will perform a sensitivity analysis of the reaction network of the Drosophila circadian clock in [30]. The network involves two proteins PER(iod) and TIM(eless), that can be reversely phosphorylated twice to P1, T1 and P2, T2, respectively, with the twice phosphorylated forms able to form a dimer complex, C, that can translocate to the nucleus, CN, and repress the transcription of PER and TIM mRNA, MP and MT, respectively (see Figure 3 and Supplementary A).

    Figure 3.  The Drosophila circadian clock [30]. (A) Schematic representation of the main reactions in the network. (B) The deterministic (Ω) periodic solution of PER mRNA (MP) and PER-TIM dimer complex (CN, nuclear) concentrations over the time interval of one cycle. (C) The deterministic and stochastic sample paths of the network derived using SSA (Ω=300).

    The network involves r=30 reactions parameterised by k=38 parameters. These include (see also Table 2) the constants of each reaction and the half-max constants, say c1, for enzymatic reactions with macroscopic rates either of Michaelis-Menten form, i.e. c2xi/(c1+xi), or Hill form, i.e. c2xhi/(ch1+xhi). Parameter sensitivity analysis attempts to unravel the complexities of the network dynamics.

    The macroscopic rate equations have periodic solutions. Gonze et al. [30] studied the stochastic version of the network using SSA simulations in various system sizes. We have previously shown (see [27]) that pcLNA probability distributions accurately approximate empirical distributions derived using SSA simulations for Ω300. We now use the pcLNA distributions for parameter sensitivity analysis.

    We first compute the FIM and the corresponding singular values σi and principal sensitivity coefficients s_ij of the pcLNA distributions at phases/time-points t=1,3,,23 (period T24). The system size is set to Ω=300. For comparisons, we also compute the FIM under:

    (a) the ordinary least squares (OLS) approximation with mean the deterministic model, i.e. X(t)MVN(x(t),eImn). Then, the (ij)-th entry of the FIM is e1iμTjμ,

    (b) the pcLNA but assuming that Σ=0. This is a weighted least squares (WLS) approximation with the weights arising from the pcLNA method, assumed to be constant to θ. Then, the (ij)-th entry of the FIM is (iμ)TΣ1(jμ).

    (c) the pcLNA but assuming that μ=0. Then, the (ij)-th entry of the FIM is 12tr(Σ1(iΣ)Σ1(jΣ)).

    Here, μ and Σ are equal to μx1:n and (ΩAx1:n)1 as in (4.8). The choice of the constant e in the OLS approximation is arbitrary and therefore, for the purpose of comparisons, it is chosen so that the first singular value, σ1, of the corresponding FIM is equal to σ1 derived under the pcLNA. The model (c) is simply used to allow an investigation of the sensitivities of the covariance matrix of pcLNA.

    The ten largest singular values, σ1σ10 (k=34) for each of these models are displayed in Figure 4(A). As we can see, while σ1,σ2 take similar values for the OLS approximation and pcLNA, the values of σi, i3, for OLS drop much faster than those of the pcLNA method. This indicates that pcLNA contains much more information than the deterministic model. Most of this extra information in pcLNA is because of the use of the variance matrix Ax1:n that provides more accurate scaling than the identity matrix. This is suggested by that the singular values of the WLS model are very close to those of the pcLNA model.

    Figure 4.  Parameter sensitivity analysis of the Drosophila circadian clock. (A) The largest singular values, σi, i=1,,10, of the FIM for the pcLNA (green), WLS (red), pcLNA with μ=0 (blue), and OLS (black) (B) principal sensitivity coefficients s_ij, i=1,,5 (y-axis) for all parameters (x-axis) under pcLNA (top) and OLS (bottom) models.

    However, the first singular value, σ1, of the WLS approximation is substantially lower than σ1 in pcLNA and this is due to the parameter sensitivity of the covariance matrix. The σi values for approximation in (c) are much lower than pcLNA and WLS, but this largely depends on the system size. For smaller system size, the singular values of the WLS become smaller and of approximation (c) larger. Overall, this result indicates that using pcLNA over WLS and OLS approximations substantially improves parameter sensitivities.

    In addition to overall comparisons between parameter sensitivity of different models, we can use the principal sensitivity coefficients to investigate the sensitivities of the model to each parameter. As we can see in Figure 4(B), the change in parameter values that has the greatest effect on the pcLNA probability distributions (see s_1j in first row) is the one that changes a small selection of parameters of PER and TIM in opposite directions. In particular, the model is most sensitive to opposite sign changes of the mRNA transcription parameters vsp and vst, the mRNA degradation parameters vmp and vmt and the half-max constants kip, kit for the repression of transcription, and the translation parameters ksp and kst. Furthermore, while there is some agreement between pcLNA and OLS models on the most influential parameters, OLS fails to capture sensitivity to a large number of parameters (e.g. the translation, ksp and kst, and phosphorylation, v1p and v1t, reaction constants).

    We next look at the parameter sensitivities of the marginal probability distributions of each variable of the network separately. This will be the observed sensitivities if only one variable, say Xi, of the network is observed. For this computation, we eliminate the appropriate entries of μ and Σ and the corresponding partial derivatives and consider only the terms that correspond to Xi. We see in Figure 5 that there are substantial variations in the values of s_1j for the different variables. As expected, the variables of PER (TIM) are most sensitive to parameters related to reactions affecting PER (TIM), but sensitivities to some other parameters are also high. We also see that the variable that gives the largest sensitivities is the dimer complex CN, which is the transcription factor and therefore the regulatory variable of the network.

    Figure 5.  Comparison of parameter sensitivities for each variable being observed. Principal sensitivity coefficients s_1j corresponding to the biggest singular value σ1 for all parameters (x-axis) of the Drosophila circadian clock when a single variable is observed (y-axis).

    We then look in the parameter sensitivities of joint probability distributions of couples of variables of the network. In particular, we assume that either the mRNA levels, un-phosphorylated proteins, once phosphorylated proteins, twice phosphorylated proteins or dimer complexes are observed. This analysis is particularly relevant in experimental design when deciding at which of those levels a process should be observed to give the highest parameter sensitivities. We see in Figure 6(B) that there are considerable differences in sensitivity values with the biggest values observed at the level of the transcription factor (C, CN), followed by mRNA (MP, MT). The differences are more prominent for the first singular value (see Figure 6(A)), which is at least 23 times larger than the rest of the eigenvalues.

    Figure 6.  Comparison of parameter sensitivities for the network observed in different level (e.g. mRNA, phosphorylated protein, dimer) (A) The largest singular values, σi, i=1,,10, of the FIM and (B) principal sensitivity coefficients s_1j for all parameters (x-axis) when a couple of variables (y-axis) of the Drosophila circadian clock are observed.

    We also investigate an important question arising in experimental design and this is the choice of time-points to take observations or to apply perturbations to the network. We compute the parameter sensitivities of the joint probability distributions of the variables of the network for the selected time-points. We first see in Figure 7 that there is considerable variation in the values of s_ij at each time-point. The pcLNA probability distributions are increasingly sensitive to the transcription (vsp, vst) and degradation (vmp and vmt) parameters for the time t[11,23] where PER and TIM mRNA concentrations are expected to increase (see Figure 3). On contrary, the pcLNA probability distributions are less sensitive to those parameters at the times of mRNA decay. Furthermore, there is a sharp increase in sensitivity to the half-max constants for the repression of PER and TIM mRNA transcription (kip,kit) around the time t[9,11], where the transcription factor CN crosses these values. Many of those sensitivities are not captured by the OLS model dispayed in Figure 7, which is overall less sensitive to parameter sensitivities. The deterministic nature of the OLS model is reflected here in the sense that the model appears to be sensitive to parameter changes only in specific time-points rather than time-intervals. For example, the OLS model is very sensitive to the transcription parameters vsp, vst at time t=23 only.

    Figure 7.  Comparison between parameter sensitivities at time-points of the pcLNA (left panel) and the OLS (deterministic, right panel) model. The principal sensitivity coefficients s_ij, i=1 (top), i=2 (middle) and i=3 (bottom) for all parameters (x-axis) and at different time-points (y-axis) of the Drosophila circadian clock.

    Finally, we investigate another important question regarding the number of time-points to be observed. As we can see in Figure 8, there is a great increase in the singular values of the FIM of pcLNA probability distributions as more time-points are observed. The increase in the value of σi, when more time-points are observed, is larger for larger i. For example, the value of σ1 when 12 time-points are observed is approximately 21.41 times larger than σ1 when only one time-point is observed, whereas the corresponding increase in the value of σ10 is approximately twice as large (22).

    Figure 8.  The increase in the singular values of FIM of the probability distributions of sample paths of pcLNA for increasing number of observed time-points.

    As greater understanding of biological processes leads to more complex network structures and more informative models, there is a need for further improving the methodology for analysing various salient aspects of those models. Moreover, as those models involve an increasing number of parameters, there is a need for systematic studies identifying those parameters and reactions that are most important for the given network and can be estimated from data. And as biotechnological innovation provide great opportunities for experimental data of better quality and greater quantity, there is a need for appropriate experimental design tools in order to optimise the collected information. This paper is an effort towards this direction.

    The developed theory enables a study of the effects of parameter value changes to the probability distributions of sample paths of stochastic process. It applies directly to the changes in the relevant probability distribution and does not depend upon the choice of specific observables. It identifies the directions of the parameter space in which these probability distribution are most sensitive to perturbations in their parameter values. When the study considers only marginal, rather than full, probability distributions of subsets of variables or time-points, then the outcomes change substantially. This highlights that different observations capture different aspects of the network dynamics and this outcome has to be carefully considered in designing experiments and estimating parameters.

    This research was funded by the BBSRC Grant BB/K003097/1 (Systems Biology Analysis of Biological Timers and Inflammation) and the EPSRC Grant EP/P019811/1 (Mathematical Foundations of Information and Decisions in Dynamic Cell Signalling). DAR was also supported by funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no 305564. BBSRC website: www.bbsrc.ac.uk EPSRC website: www.epsrc.ac.uk Seventh Framework Programme (FP7) website: cordis.europa.eu/fp7/ home_en.html. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

    The authors declare there is no conflict of interest.

    Drosophila circadian clock

    The variables of the network describing the time-evolution of the Drosophila circadian clock in [30] along with the initial conditions (in nanomolar concentrations) used in our implementation are provided in Table 1. The parameter values used to derive the ODE solution of the system are provided in Table 2. The ODE system for the Drosophila circadian clock is given in Table 3. The intensity functions of the master equation are provided in Table 1 of [30].

    Table 1.  The variables of Drosophila circadian clock system in [30] and the initial conditions (in nanomolar concentrations) used to derive their solution.
    variable description initial condition
    MP PER mRNA 3.0975
    P0 PER protein 0 1.2547
    P1 phosphorylated PER protein 1 1.2302
    P2 phosphorylated PER protein 2 1.7997
    MT TIM mRNA 3.0975
    T0 TIM protein 0 1.2346
    T1 phosphorylated TIM protein 1 1.0577
    T2 phosphorylated TIM protein 2 0.3593
    C PER-TIM cytosolic complex 0.6230
    CN PER-TIM nuclear complex 0.8178

     | Show Table
    DownLoad: CSV
    Table 2.  The parameters of Drosophila circadian clock system in [30] and the values used to derive their ODE solution.
    parameterreactiondescriptionvaluemeasurement unit
    vspMP transcriptionreaction constant1nMh1
    vstMT transcriptionreaction constant1nMh1
    vmpMP degradationreaction constant0.70nMh1
    vmtMT degradationreaction constant0.70nMh1
    vdpP2 degradationreaction constant2nMh1
    kspMP translationreaction constant0.90h1
    kstMT translationreaction constant0.90h1
    k1C CNreaction constant0.60h1
    k2CN Creaction constant0.20h1
    k3P2+T2 Creaction constant1.20h1
    k4C P2+T2reaction constant0.60h1
    kmpMP enzymatic degradationhalf-max constant0.20h1
    kmtMT enzymatic degradationhalf-max constant0.20h1
    kipMP transcriptionHill coefficient1.00h1
    kitMT transcriptionHill coefficient1.00h1
    kdpP2 enzymatic degradationhalf-max constant0.20h1
    kdtT2 enzymatic degradationhalf-max constant0.20h1
    kdlinear degradationreaction constant0.01h1
    kdcC degradationreaction constant0.01h1
    kdnCN degradationreaction constant0.01h1
    vdtT2 degradationreaction constant2.00nMh1
    k1pP0 P1 enzymatichalf-max constant2.00h1
    k1tT0 T1 enzymatichalf-max constant2.00h1
    k2pP1P0 enzymatichalf-max constant2.00h1
    k2tT1T0 enzymatichalf-max constant2.00h1
    k3pP1P2 enzymatichalf-max constant2.00h1
    k3tT1T2 enzymatichalf-max constant2.00h1
    k4pP2P1 enzymatichalf-max constant2.00h1
    k4tT2T1 enzymatichalf-max constant2.00h1
    v1pP0 P1reaction constant8.00nMh1
    v1tT0 T1reaction constant8.00nMh1
    v2pP1P0reaction constant1.00nMh1
    v2tT1T0reaction constant1.00nMh1
    v3pP1P2reaction constant8.00nMh1
    v3tT1T2reaction constant8.00nMh1
    v4pP2P1reaction constant1.00nMh1
    v4tT2T1reaction constant1.00nMh1
    hHill power4.00NA

     | Show Table
    DownLoad: CSV
    Table 3.  The ODE system for the Drosophila circadian clock network in [30].
    ˙MP=vspkipnkipn+CnNvmpMPkmp+MPkdMP
    ˙P0=kspMPv1pP0k1p+P0+v2pP1k2p+P1kdP0
    ˙P1=v1pP0k1p+P0v2pP1k2p+P1v3pP1k3p+P1+v4pP2k4p+P2kdP1
    ˙P2=v3pP1k3p+P1v4pP2k4p+P2k3P2T2+k4CvdpP2kdp+P2kdP2
    ˙MT=vstkitnkitn+CnNvmtMTkmt+MTkdMT
    ˙T0=kstMTv1tT0k1t+T0+v2tT1k2t+T1kdT0
    ˙T1=v1tT0k1t+T0v2tT1k2t+T1v3tT1k3t+T1+v4tT2k4t+T2kdT1
    ˙T2=v3tT1k3t+T1v4tT2k4t+T2k3P2T2+k4CvdtT2kdt+T2kdT2
    ˙C=k3P2T2k4Ck1C+k2CNkdcC
    ˙CN=k1Ck2CNkdnCN.

     | Show Table
    DownLoad: CSV


    [1] Ackert LF, Jiang L, Lee HS, et al. (2016) Influential investors in online stock forums. Int Rev Financ Anal 45: 39-46. doi: 10.1016/j.irfa.2016.02.001
    [2] Alanyali M, Moat HS, Preis T (2013) Quantifying the relationship between financial news and the stock market. Sci Reports 3: 1-6.
    [3] Al-Nasseri A, Ali FM (2018) What does investors' online divergence of opinion tell us about stock returns and trading volume? J Bus Res 86: 166-178.
    [4] Aman H, Moriyasu H (2017) Volatility and public information flows: Evidence from disclosure and media coverage in the Japanese stock market. Int Rev Econ Financ 51: 660-676. doi: 10.1016/j.iref.2017.07.029
    [5] Andrei D, Hasler M (2015) Investor attention and stock market volatility. Rev Financ Stud 28: 33-72. doi: 10.1093/rfs/hhu059
    [6] Antweiler W, Frank MZ (2004) Is all that talk just noise? The information content of internet stock message boards. J Financ 59: 1259-1294. doi: 10.1111/j.1540-6261.2004.00662.x
    [7] Ap Gwilym O, Kita A, Wang Q (2014) Speculate against speculative demand. Int Rev Financ Anal 34: 212-221. doi: 10.1016/j.irfa.2014.03.001
    [8] Audrino F, Sigrist F, Ballinari D (2020) The impact of sentiment and attention measures on stock market volatility. Int J Forecast 36: 334-357. doi: 10.1016/j.ijforecast.2019.05.010
    [9] Awan TM, Khan MS, Haq IU, et al. (2021) Oil and stock markets volatility during pandemic times: a review of G7 countries. Green Financ 3: 15-27. doi: 10.3934/GF.2021002
    [10] Bandhakavi A, Wiratunga N, Massie S, et al. (2016) Emotion-corpus guided lexicons for sentiment analysis on Twitter, International Conference on Innovative Techniques and Applications of Artificial Intelligence, Springer, Cham, 71-85.
    [11] Bank M, Larch M, Peter G (2011) Google search volume and its influence on liquidity and returns of German stocks. Financ Mark Portf Manage 25: 239-264. doi: 10.1007/s11408-011-0165-y
    [12] Barber BM, Odean T (2001) The Internet and the investor. J Econ Perspect 15: 41-54. doi: 10.1257/jep.15.1.41
    [13] Barberis N, Shleifer A, Wurgler J (2005) Comovement. J Financ Econ 75: 283-317. doi: 10.1016/j.jfineco.2004.04.003
    [14] Behrendt S, Schmidt A (2018) The Twitter myth revisited: Intraday investor sentiment, Twitter activity and individual-level stock return volatility. J Bank Financ 96: 355-367. doi: 10.1016/j.jbankfin.2018.09.016
    [15] Berry TD, Howe KM (1994) Public information arrival. J Financ 49: 1331-1346. doi: 10.1111/j.1540-6261.1994.tb02456.x
    [16] Campbell MC (1999) Pricing strategy & practice "Why did you do that?" The important role of inferred motive in perceptions of price fairness. J Product Brand Manage.
    [17] Chan WS (2003) Stock price reaction to news and no-news: drift and reversal after headlines. J Financ Econ 70: 223-260. doi: 10.1016/S0304-405X(03)00146-6
    [18] Chang EC, Mcqueen GR, Pinegar JM (1999) Cross-autocorrelation in Asian stock markets. 7: 471-493.
    [19] Chatrath A, Miao H, Ramchander S, et al. (2014) Currency jumps, cojumps and the role of macro news. J Int Money Financ 40: 42-62. doi: 10.1016/j.jimonfin.2013.08.018
    [20] Chatterjee S, John K, Yan A (2012) Takeovers and divergence of investor opinion. Rev Financ Stud 25: 227-277. doi: 10.1093/rfs/hhr109
    [21] Chen W, Lai K, Cai Y (2018) Topic generation for Chinese stocks: a cognitively motivated topic modeling method using social media data. Quant Financ Econ 2: 279-293. doi: 10.3934/QFE.2018.2.279
    [22] Chen WH, Xu GX (2018) Prediction accuracy of stock market volatility based on deep learning and stock forum data. Manage World 34: 180-181.
    [23] Chen XG, Duan S, Wang L (2017) Research on trend prediction and evaluation of network public opinion. Concurrency Comput Pract Experi 29: e4212.
    [24] Converse PE (1987) Changing conceptions of public opinion in the political process. Public Opinion Q 51: S12-S24.
    [25] Coqueret G (2020) Stock-specific sentiment and return predictability. Quant Financ 20: 1531-1551. doi: 10.1080/14697688.2020.1736314
    [26] Corwin SA, Coughenour JF (2008) Limited attention and the allocation of effort in securities trading. J Financ 63: 3031-3067. doi: 10.1111/j.1540-6261.2008.01420.x
    [27] Da Z, Engelberg J, Gao P (2011) In search of attention. J Financ 66: 1461-1499. doi: 10.1111/j.1540-6261.2011.01679.x
    [28] Danbolt J, Siganos A, Vagenas-Nanos E (2015) Investor sentiment and bidder announcement abnormal returns. J Corp Financ 33: 164-179. doi: 10.1016/j.jcorpfin.2015.06.003
    [29] Dastgir S, Demir E, Downing G, et al. (2019) The causal relationship between Bitcoin attention and Bitcoin returns: Evidence from the Copula-based Granger causality test. Financ Res Lett 28: 160-164. doi: 10.1016/j.frl.2018.04.019
    [30] De Long JB, Shleifer A, Summers LH, et al. (1989) The size and incidence of the losses from noise trading. J Financ 44: 681-696. doi: 10.1111/j.1540-6261.1989.tb04385.x
    [31] Delort JY, Arunasalam B, Milosavljevic M, et al. (2009) The impact of manipulation in Internet stock message boards.
    [32] Malkiel BG, Fama EF (1970) Efficient capital markets: A review of theory and empirical work. J Financ 25: 383-417. doi: 10.1111/j.1540-6261.1970.tb00518.x
    [33] Fang J, Gozgor G, Lau CKM, et al. (2020) The impact of Baidu Index sentiment on the volatility of China's stock markets. Financ Res Lett 32: 101099.
    [34] Fang L, Peress J (2009) Media coverage and the cross‐section of stock returns. J Financ 64: 2023-2052. doi: 10.1111/j.1540-6261.2009.01493.x
    [35] Feng L, Seasholes MS (2004) Correlated trading and location. J Financ 59: 2117-2144. doi: 10.1111/j.1540-6261.2004.00694.x
    [36] Fisk RP, Patrício L, Ordanini A, et al. (2011) Crowd‐funding: transforming customers into investors through innovative service platforms. J Serv Manage.
    [37] Fung GPC, Yu JX, Lam W (2003) Stock prediction: Integrating text mining approach using real-time news, IEEE International Conference on Computational Intelligence for Financial Engineering, Proceedings, IEEE, 395-402.
    [38] Füss R, Guidolin M, Koeppel C (2020) Sentiment Risk Premia in the Cross-Section of Global Equity. University of St. Gallen, School of Finance Research Paper, (2019/13).
    [39] Gao C, Rong X, Chen Y (2011) Research on Public Opinion Monitoring Index-system in Micro-blogging. J Intell 9.
    [40] Hamid A, Heiden M (2015) Forecasting volatility with empirical similarity and Google Trends. J Econ Behav Organ 117: 62-81. doi: 10.1016/j.jebo.2015.06.005
    [41] Hart RP (2001) Redeveloping DICTION: theoretical considerations. Progress Commun Sci 2001: 43-60.
    [42] Hoffmann AOI, Post T, Pennings JME (2015) How investor perceptions drive actual trading and risk-taking behavior. J Behav Financ 16: 94-103. doi: 10.1080/15427560.2015.1000332
    [43] Hoffmann AOI, Post T, Pennings JME (2013) Individual investor perceptions and behavior during the financial crisis. J Bank Financ 37: 60-74. doi: 10.1016/j.jbankfin.2012.08.007
    [44] Hong H, Stein JC (1999) A unified theory of underreaction, momentum trading, and overreaction in asset markets. J Financ 54: 2143-2184. doi: 10.1111/0022-1082.00184
    [45] Hubalek F, Schachermayer W (2001) The limitations of no-arbitrage arguments for real options. Int J Theor Appl Financ 4: 361-373. doi: 10.1142/S0219024901001024
    [46] Ivković Z, Weisbenner S (2007) Information diffusion effects in individual investors' common stock purchases: Covet thy neighbors' investment choices. Rev Financ Stud 20: 1327-1357. doi: 10.1093/revfin/hhm009
    [47] Jegadeesh N, Wu D (2013) Word power: A new approach for content analysis. J Financ Econ 110: 712-729. doi: 10.1016/j.jfineco.2013.08.018
    [48] Jiao P, Veiga A, Walther A (2020) Social media, news media and the stock market. J Econ Behav Organ 176: 63-90. doi: 10.1016/j.jebo.2020.03.002
    [49] Jin F, Self N, Saraf P, et al. (2013) Forex-foreteller: Currency trend modeling using news articles, Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 1470-1473.
    [50] Jones AL (2006) Have internet message boards changed market behavior? Info 8: 67-76.
    [51] Joseph K, Wintoki MB, Zhang Z (2011) Forecasting abnormal stock returns and trading volume using investor sentiment: Evidence from online search. Int Forecast 27: 1116-1127. doi: 10.1016/j.ijforecast.2010.11.001
    [52] Kaustia M, Knüpfer S (2012) Peer performance and stock market entry. J Financ Econ 104: 321-338. doi: 10.1016/j.jfineco.2011.01.010
    [53] Kemp S (2020) Digital 2020: October Global Statshot. Datareportal. Hootsuite.
    [54] Kim HY, Mei JP (2001) What makes the stock market jump? An analysis of political risk on Hong Kong stock returns. J Int Money Financ 20: 1003-1016. doi: 10.1016/S0261-5606(01)00035-3
    [55] Kim N, Lučivjanská K, Molnár P, et al. (2019) Google searches and stock market activity: Evidence from Norway. Financ Res Lett 28: 208-220. doi: 10.1016/j.frl.2018.05.003
    [56] Klemola A, Nikkinen J, Peltomä ki J (2016) Changes in Investors' Market Attention and Near-Term Stock Market Returns. J Behav Financ 17: 18-30. doi: 10.1080/15427560.2016.1133620
    [57] Kruse P (2020) Spreading entrepreneurial news—investigating media influence on social entrepreneurial antecedents. Green Financ 2: 284-301. doi: 10.3934/GF.2020016
    [58] Kudryavtsev A (2017) Absolute Stock Returns and Trading Volumes: Psychological Insights. Quant Financ Econ 1: 186-204. doi: 10.3934/QFE.2017.2.186
    [59] Kumar A, Lee CMC (2006) Retail investor sentiment and return comovements. J Financ 61: 2451-2486. doi: 10.1111/j.1540-6261.2006.01063.x
    [60] Lee R, Kim J (2021) Developing a Social Index for Measuring the Public Opinion Regarding the Attainment of Sustainable Development Goals. Social Indicators Res, 1-21.
    [61] Leitch D, Sherif M (2017) Twitter mood, CEO succession announcements and stock returns. J Comput Sci 21: 1-10. doi: 10.1016/j.jocs.2017.04.002
    [62] LeRoy SF, Porter RD (1981) The present-value relation: Tests based on implied variance bounds. Econometrica 49: 555-574. doi: 10.2307/1911512
    [63] Leung H, Ton T (2015) The impact of internet stock message boards on cross-sectional returns of small-capitalization stocks. J Bank Financ 55: 37-55. doi: 10.1016/j.jbankfin.2015.01.009
    [64] Li F (2010) The information content of forward‐looking statements in corporate filings—A naï ve Bayesian machine learning approach. J Account Res 48: 1049-1102. doi: 10.1111/j.1475-679X.2010.00382.x
    [65] Li J, Chen Y, Shen Y, et al. (2019) Measuring China's Stock Market Sentiment. Available at SSRN 3377684.
    [66] Li X, Shen D, Zhang W (2018) Do Chinese internet stock message boards convey firm-specific information? Pacific-Basin Financ J 49: 1-14.
    [67] Li ZH, Hu ZH (2018) The impact of Internet public opinion on financial asset prices: a literature review. Financ Rev 10: 110-117+122.
    [68] Liew JKS, Budavári T (2016) Do tweet sentiments still predict the stock market? Available at SSRN 2820269.
    [69] Liu B, McConnell JJ (2013) The role of the media in corporate governance: Do the media influence managers' capital allocation decisions? J Financ Econ 110: 1-17.
    [70] Liu F, Ye Q, Li YJ (2014) Impacts of interactions between news attention and investor attention on stock returns: Empirical investigation on financial shares in China. J Manage Sci China 17: 72-85.
    [71] Liu F, Ye Q, Li YJ (2014) The interaction of media attention and investor attention on Stock Returns: An Empirical Study Based on Chinese financial stocks. J Manage Sci 17: 72-85.
    [72] Liu L, Wu J, Li P, et al. (2015) A social-media-based approach to predicting stock comovement. Expert Syst Appl 42: 3893-3901. doi: 10.1016/j.eswa.2014.12.049
    [73] Liu R, Xie Y, Xie Y (2017) A Study of Online Public Opinion in New Media Environment, New Media and China's Social Development, Springer, Singapore, 73-103.
    [74] Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks. J Financ 66: 35-65. doi: 10.1111/j.1540-6261.2010.01625.x
    [75] Lugmayr A, Gossen G (2013) Evaluation of Methods and Techniques for Language Based Sentiment Analysis for DAX 30 Stock Exchange A First Concept of a â € œLUGOâ € Sentiment Indicator. International SERIES on Information Systems and Management in Creative eMedia (CreMedia), 69-76.
    [76] Luo X, Zhang J (2013) How do consumer buzz and traffic in social media marketing predict the value of the firm? J Manage Infor Syst 30: 213-238.
    [77] Mao Y, Wei W, Wang B, et al. (2012) Correlating S & P 500 stocks with twitter data. Proceedings of the First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research, 69-72.
    [78] Martin R (2019) Examination and implications of experimental research on investor perceptions. J Account Liter 43: 145-169. doi: 10.1016/j.acclit.2019.11.001
    [79] McQueen G, Pinegar M, Thorley S (1996) Delayed reaction to good news and the cross‐autocorrelation of portfolio returns. J Financ 51: 889-919. doi: 10.1111/j.1540-6261.1996.tb02711.x
    [80] Meng XJ, Meng XL, Hu YY (2016) Research on investor sentiment index based on text mining and Baidu Index. Macroecon Res, 144-153.
    [81] Meng Y, Chang J (2019) The influence of investor sentiment on scale effect. Stat Inf Forum 34: 98-104.
    [82] Merton RC (1973) An intertemporal capital asset pricing model. Econometrica, 867-887.
    [83] Mitchell ML, Mulherin JH (1994) The impact of public information on the stock market. J Financ 49: 923-950. doi: 10.1111/j.1540-6261.1994.tb00083.x
    [84] Mittermayer MA (2004) Forecasting intraday stock price trends with text mining techniques, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the IEEE, 10.
    [85] Mukwazvure A, Supreethi KP (2015) A hybrid approach to sentiment analysis of news comments, 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO), (Trends and Future Directions), IEEE, 1-6.
    [86] Nofer M, Hinz O (2015) Using twitter to predict the stock market. Bus Infor Syst Eng 57: 229-242. doi: 10.1007/s12599-015-0390-4
    [87] Peramunetilleke D, Wong RK (2002) Currency exchange rate forecasting from news headlines. Aust Comput Sci Commun 24: 131-139.
    [88] Qiao H, Su Y (2020) Media coverage and decomposition of stock market volatility: Based on the generalized dynamic factor model. Emerging Mark Financ Trade 56: 613-625. doi: 10.1080/1540496X.2019.1686974
    [89] Rao T, Srivastava S (2012) Twitter Sentiment Analysis: How To Hedge Your Bets In The Stock Markets. Computence, 227-247.
    [90] Ruan X, Zhang JE (2016) Investor attention and market microstructure. Econ Lett 149: 125-130. doi: 10.1016/j.econlet.2016.10.032
    [91] Sabherwal S, Sarkar SK, Zhang Y (2011) Do Internet stock message boards influence trading? Evidence from heavily discussed stocks with no fundamental news. J Bus Financ Account 38: 1209-1237. doi: 10.1111/j.1468-5957.2011.02258.x
    [92] Savor P, Wilson M (2013) How much do investors care about macroeconomic risk? Evidence from scheduled economic announcements. J Financ Quant Anal, 343-375.
    [93] Schneider G, Troeger VE (2006) War and the world economy: Stock market reactions to international conflicts. J Conflict Resolut 50: 623-645. doi: 10.1177/0022002706290430
    [94] Schumaker RP, Zhang Y, Huang CN, et al. (2012) Evaluating sentiment in financial news articles. Decis Support Syst 53: 458-464. doi: 10.1016/j.dss.2012.03.001
    [95] Shen D, Li X, Zhang W (2017) Baidu news coverage and its impacts on order imbalance and large-size trade of Chinese stocks. Financ Res Lett 23: 210-216. doi: 10.1016/j.frl.2017.06.008
    [96] Shen D, Li X, Zhang W (2018) Baidu news information flow and return volatility: Evidence for the Sequential Information Arrival Hypothesis. Econ Model 69: 127-133. doi: 10.1016/j.econmod.2017.09.012
    [97] Shiller RJ (1981) The use of volatility measures in assessing market efficiency. J Financ 36: 291-304.
    [98] Siganos A, Vagenas-Nanos E, Verwijmeren P (2017) Divergence of sentiment and stock market trading. J Bank Financ 78: 130-141. doi: 10.1016/j.jbankfin.2017.02.005
    [99] Smailović J, Grčar M, Lavrač N, et al. (2014) Stream-based active learning for sentiment analysis in the financial domain. Infor Sci 285: 181-203. doi: 10.1016/j.ins.2014.04.034
    [100] Soni A, van Eck NJ, Kaymak U (2007) Prediction of stock price movements based on concept map information, 2007 IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making, IEEE, 205-211.
    [101] Stone PJ, Dunphy DC, Smith MS (1966) The general inquirer: A computer approach to content analysis.
    [102] Su Y, Liao G (2019) The impact of macroeconomic news on stock returns of energy firms—evidence from China. Green Financ 1: 297-311. doi: 10.3934/GF.2019.3.297
    [103] Tauni MZ, Fang H, Mirza SS, et al. (2017) Do investor's Big Five personality traits influence the association between information acquisition and stock trading behavior? China Financ Rev Int.
    [104] Tetlock PC, Saar‐Tsechansky M, Macskassy S (2008) More than words: Quantifying language to measure firms' fundamentals. J Financ 63: 1437-1467. doi: 10.1111/j.1540-6261.2008.01362.x
    [105] Tetlock PC (2007) Giving content to investor sentiment: The role of media in the stock market. J Financ 62: 1139-1168. doi: 10.1111/j.1540-6261.2007.01232.x
    [106] Uddin MA, Hoque ME, Ali MH (2020) International economic policy uncertainty and stock market returns of Bangladesh: evidence from linear and nonlinear model. Quant Financ Econ 4: 236-251. doi: 10.3934/QFE.2020011
    [107] Wang CY, Wu JW (2015) Media tone, investor sentiment and IPO pricing. Financ Res 423: 174-189.
    [108] Wang Fl, Wang XY (2017) Does social emotion affect stock market returns? Evidence from Sina Weibo. J Shanxi Univ Financ Econ 39: 35-46.
    [109] Wuthrich B, Cho V, Leung S, et al. (1998) Daily stock market forecast from textual web data, SMC'98 Conference Proceedings, 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No. 98CH36218), IEEE, 3: 2720-2725.
    [110] Wysocki PD (1998) Cheap talk on the web: The determinants of postings on stock message boards. University of Michigan Business School Working Paper, 1998 (98025).
    [111] Xu YM, Gao YM (2017) Construction and application of CPI public opinion index based on Internet big data—Taking Baidu Index as an example. Res Quante Econ Technol Econ 034: 94-112.
    [112] Yang C, Zhang R (2013) Sentiment asset pricing model with consumption. Econ Model 30: 462-467. doi: 10.1016/j.econmod.2012.11.004
    [113] Yang SY, Mo SYK, Liu A (2015) Twitter financial community sentiment and its predictive relationship to stock market movement. Quant Financ 15: 1637-1656. doi: 10.1080/14697688.2015.1071078
    [114] Yang X, Zhu Y, Cheng TY (2020) How the individual investors took on big data: The effect of panic from the internet stock message boards on stock price crash. Pacific-Basin Financ J 59: 101245.
    [115] Yu GM (2013) The big data method of constructing the overall judgment of social public opinion—Taking the processing of Baidu's massive search data as an example. News Writing, 67-69.
    [116] Zhai Y, Hsu A, Halgamuge SK (2007) Combining news and technical indicators in daily stock price trends prediction, International symposium on neural networks, Springer, Berlin, Heidelberg, 1087-1096.
    [117] Zhang W, Yan K, Shen D (2021) Can the Baidu Index predict realized volatility in the Chinese stock market? Financ Innovation 7: 1-31.
    [118] Zhang Y, Qi J, Fang B, et al. (2011) Research on the Index System of Public Opinion on Internet for Unexpected Emergency. Inf Sci 9.
  • This article has been cited by:

    1. Giorgos Minas, Dan J. Woodcock, Louise Ashall, Claire V. Harper, Michael R. H. White, David A. Rand, Attila Csikász-Nagy, Multiplexing information flow through dynamic signalling systems, 2020, 16, 1553-7358, e1008076, 10.1371/journal.pcbi.1008076
    2. Ben Swallow, David A. Rand, Giorgos Minas, Bayesian Inference for Stochastic Oscillatory Systems Using the Phase-Corrected Linear Noise Approximation, 2024, -1, 1936-0975, 10.1214/24-BA1471
  • Reader Comments
  • © 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(5276) PDF downloads(224) Cited by(3)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog