Review Special Issues

A review of data mining methods in financial markets

  • Received: 02 December 2021 Accepted: 20 December 2021 Published: 29 December 2021
  • JEL Codes: G15, C22

  • Financial activities are closely related to human social life. Data mining plays an important role in the analysis and prediction of financial markets, especially in the context of the current era of big data. However, it is not simple to use data mining methods in the process of analyzing financial data, due to the differences in the background of researchers in different disciplines. This review summarizes several commonly used data mining methods in financial data analysis. The purpose is to make it easier for researchers in the financial field to use data mining methods and to expand the application scenarios of it used by researchers in the computer field. This review introduces the principles and steps of decision trees, support vector machines, Bayesian, K-nearest neighbors, k-means, Expectation-maximization algorithm, and ensemble learning, and points out their advantages, disadvantages and applicable scenarios. After introducing the algorithms, it summarizes the use of the algorithm in the process of financial data analysis, hoping that readers can get specific examples of using the algorithm. In this review, the difficulties and countermeasures of using data mining methods are summarized, and the development trend of using data mining methods to analyze financial data is predicted.

    Citation: Haihua Liu, Shan Huang, Peng Wang, Zejun Li. A review of data mining methods in financial markets[J]. Data Science in Finance and Economics, 2021, 1(4): 362-392. doi: 10.3934/DSFE.2021020

    Related Papers:

    [1] Sebastian Aniţa, Bedreddine Ainseba . Internal eradicability for an epidemiological model with diffusion. Mathematical Biosciences and Engineering, 2005, 2(3): 437-443. doi: 10.3934/mbe.2005.2.437
    [2] Rongjian Lv, Hua Li, Qiubai Sun, Bowen Li . Model of strategy control for delayed panic spread in emergencies. Mathematical Biosciences and Engineering, 2024, 21(1): 75-95. doi: 10.3934/mbe.2024004
    [3] Glenn Ledder . Using asymptotics for efficient stability determination in epidemiological models. Mathematical Biosciences and Engineering, 2025, 22(2): 290-323. doi: 10.3934/mbe.2025012
    [4] Holly Gaff, Elsa Schaefer . Optimal control applied to vaccination and treatment strategies for various epidemiological models. Mathematical Biosciences and Engineering, 2009, 6(3): 469-492. doi: 10.3934/mbe.2009.6.469
    [5] Guangming Qiu, Zhizhong Yang, Bo Deng . Backward bifurcation of a plant virus dynamics model with nonlinear continuous and impulsive control. Mathematical Biosciences and Engineering, 2024, 21(3): 4056-4084. doi: 10.3934/mbe.2024179
    [6] Yoichi Enatsu, Yukihiko Nakata . Stability and bifurcation analysis of epidemic models with saturated incidence rates: An application to a nonmonotone incidence rate. Mathematical Biosciences and Engineering, 2014, 11(4): 785-805. doi: 10.3934/mbe.2014.11.785
    [7] Jinliang Wang, Gang Huang, Yasuhiro Takeuchi, Shengqiang Liu . Sveir epidemiological model with varying infectivity and distributed delays. Mathematical Biosciences and Engineering, 2011, 8(3): 875-888. doi: 10.3934/mbe.2011.8.875
    [8] Salihu S. Musa, Shi Zhao, Winnie Mkandawire, Andrés Colubri, Daihai He . An epidemiological modeling investigation of the long-term changing dynamics of the plague epidemics in Hong Kong. Mathematical Biosciences and Engineering, 2024, 21(10): 7435-7453. doi: 10.3934/mbe.2024327
    [9] Haifeng Huo, Fanhong Zhang, Hong Xiang . Spatiotemporal dynamics for impulsive eco-epidemiological model with Crowley-Martin type functional response. Mathematical Biosciences and Engineering, 2022, 19(12): 12180-12211. doi: 10.3934/mbe.2022567
    [10] Pannathon Kreabkhontho, Watchara Teparos, Thitiya Theparod . Potential for eliminating COVID-19 in Thailand through third-dose vaccination: A modeling approach. Mathematical Biosciences and Engineering, 2024, 21(8): 6807-6828. doi: 10.3934/mbe.2024298
  • Financial activities are closely related to human social life. Data mining plays an important role in the analysis and prediction of financial markets, especially in the context of the current era of big data. However, it is not simple to use data mining methods in the process of analyzing financial data, due to the differences in the background of researchers in different disciplines. This review summarizes several commonly used data mining methods in financial data analysis. The purpose is to make it easier for researchers in the financial field to use data mining methods and to expand the application scenarios of it used by researchers in the computer field. This review introduces the principles and steps of decision trees, support vector machines, Bayesian, K-nearest neighbors, k-means, Expectation-maximization algorithm, and ensemble learning, and points out their advantages, disadvantages and applicable scenarios. After introducing the algorithms, it summarizes the use of the algorithm in the process of financial data analysis, hoping that readers can get specific examples of using the algorithm. In this review, the difficulties and countermeasures of using data mining methods are summarized, and the development trend of using data mining methods to analyze financial data is predicted.



    1. Introduction

    Onchocerciasis (known colloquially as "River Blindness") is a vector-borne disease affecting the skin and eyes of humans. It is endemic in parts of Africa, Central America, and Yemen, with greater than 99% of the burden of onchocerciasis found in sub-Saharan Africa [3]. There are an estimated 125 million people worldwide who are at risk for onchocerciasis [20]. In Central America, Guatemala accounts for the largest at-risk population for onchocerciasis, but the disease has been designated as eradicated there [17]. It is caused by the filarial nematode Onchocerca volvulus, a parasitic worm with a complicated life cycle that includes five larval stages, labeled L1-L5 in Figure 1 [15], some in a human host and some in a black fly host of the genus Simulium. The disease is listed by the World Health Organization as a neglected tropical disease, but it has been targeted by the Carter Center's River Blindness Elimination Program.

    Figure 1. The O. volvulus life cycle. Microfilariae produced in the human host are transmitted to the black fly of the genus Simulium via a bite. Within the black fly, the larvae pass through larval stages L1-L3. At larval stage L3, they are transmitted to a human host via a bite, where they pass through the final larval stages L3-L5 and become adults [8].

    Microfilaria are produced by mature adults in the human host, where they can live from 6 to 24 months. Black flies of genus Simulium ingest the microfilariae while biting a human host. It takes 6-12 days for the microfilariae to mature through larval stages L1 through L3. Stage L3 larvae migrate to the fly's mouth, where they are transmitted to a new human host through a bite. These take approximately a week to develop to the final L5 larval stage and an additional 7-15 months to mature into reproductive adults, which have a life span of 10-14 years. At maturity, female worms produce 700-1500 microfilariae per day [8].

    The black fly has peak biting times during the daylight hours and largely stays within 5 km of their breeding sites on well-oxygenated water. Communities living near a river are more at risk than those further away, and it happens that the peak biting times of the vector correspond with the times when the exposed class of people are most likely to be at the river for activities such as gathering water or washing [8].

    There are several medications that can help treat onchocerciasis, including diethylcarbamazine and ivermectin (which are both microfilaricides) and flubendazole (which is a macrofilaricide). Due to adverse side effects of some of the medications, ivermectin is considered to be the standard in effective treatment of onchocerciasis [8]. Oral administration of ivermectin rapidly kills microfilariae that are present in the human host; it does not kill the adult worms, but it does reduce their reproductive rate for several months [3,13]. In a study spanning 1987-1991, analysis of data from five consecutive annual treatments with ivermectin showed reduced microfilariae production after each treatment. The microfilariae production did gradually increase over a 10 month period, reaching a plateau that was around 32% lower than pre-treatment values [13].

    The distribution of ivermectin in sub-Saharan Africa remains a challenge due to many factors, including the more pronounced itching caused by the increase in the number of microfilariae deaths brought on by treatment with ivermectin and the restriction that ivermectin is only approved for adults and children over the age of 5 who are neither pregnant nor chronically ill [15]. In the African Programme for Onchocerciasis Control's (APOC) progress report for 2014-2015, 22 of 26 endemic countries (from a total of 28 endemic countries) reported treatment data showing an average of 65.3% global coverage [1].

    A variety of models have been developed to study onchocerciasis. Most recent work has been done using complex simulation software packages ONCHOSIM [14] or SIMON [5] that include all immunological and epidemiological processes believed to be relevant. These can be used to make predictions for specific locations, but the value of the predictions is limited by the difficulty of estimating parameter values. Although predictions from ONCHOSIM closely fit the data for the first five years of ivermectin treatment, the results from the subsequent twelve years of treatment showed the predictions from ONCHOSIM to be overly optimistic with regard to the feasibility of eradication [4,10]. Even if good data is available, simulation models cannot easily be used to characterize the overall effect of each parameter on model behavior, so it is difficult to use them to obtain conclusions of broad applicability.

    Another model in common use is EpiOncho [19], a population-based deterministic epidemiological model that incorporates immunological elements, such as the mean number of female adult worms per host, the mean number of microfilaria per milligram of skin, and the mean number of larvae per vector that are at the stage of development to be transmitted to a human host. The complexity of the model again makes it difficult to draw broad conclusions that do not depend strongly on estimated parameter values.

    As an alternative to complex models, one can construct simpler models that incorporate only the most important biological features of a setting or focus on some aspects of a setting while oversimplifying others. For example, Basáñez and Boussinesq [2] developed an immunological model that focuses on the population dynamics of O. volvulus within the human hosts, while Remme et al. [16] developed a model that focuses on the force of infection.

    Because disease eradication depends on epidemiological factors rather than immunological factors, there should be some value to developing a purely epidemiological deterministic model for onchocerciasis; however, there does not seem to be any such model more recent than 1982 [6]. One reason for this lack may be that the infectivity of human hosts to uninfected flies is not a simple parameter, but is dependent on the microfilarial load, which varies over time from almost nil immediately after ivermectin treatment to about 65% of that in untreated humans by a year after treatment [13]. Nevertheless, it is reasonable to decouple the within-host dynamics from the epidemiological dynamics by assuming an "effective" infectivity factor computed as a weighted average of the infectivities of the individual infected human hosts.

    Epidemiology models are classified according to the specific classes of individuals that need to be tracked (see [7] for an overview) and whether the diseases are infectious or vector-borne. In Section 2, we extend the SEIS (susceptible, exposed, infective, susceptible) model for a vector-borne disease to a nonstandard SEIPMS model that distinguishes three classes of infectives: standard infectives who do not participate in a health care system or are ineligible for ivermectin treatment, premedicated infectives who are participants in a health care system but not yet treated, and medicated infectives who have received ivermectin treatment. We show how this model can be approximated as an infectious disease model with nonlinear incidence. A linearized stability analysis provides a complete characterization of the equilibria as functions of a small number of model parameters. In Section 3 we present a more realistic model that assumes health care delivery occurs at discrete times rather than continuously and analyze that model by characterizing endemic periodic solutions, finding a uniqueness condition for the disease-free solution, and showing that the disease-free solution is stable whenever it is unique. Section 4 presents simulations of treatment scenarios and compares the outcomes of annual and continuous delivery of health care.


    2. A continuous model

    An onchocerciasis model needs to have at least two classes for the black fly host: one for the uninfected flies (U) and one for the infected flies (V). In the absence of treatment, the human population needs to have at least three classes: susceptible (S), exposed (E), and infective (I). The exposed class for the human population is necessary because of the long incubation period for the disease. We omit an exposed class of flies because their incubation period is less than a week.

    With treatment, it is necessary to use three infective classes for the humans: standard infectives who do not participate in a health care system or are ineligible for ivermectin treatment (I), premedicated infectives who are participants in a health care system but not yet treated (P), and medicated infectives who have received ivermectin treatment (M) (see Figure 2).

    Figure 2. The SEIPMS-UV epidemiological model. Fly bites transfer the disease from infected flies (V) to susceptible humans (S) and from infective humans (I, P, and M) to uninfected flies (U), with the transfer from medicated humans (M) decreased by a factor ν. Exposed humans (E) become infective, with the fraction p counting as premedicated (P) while waiting for treatment and the remaining fraction q=1p becoming unmedicated infectives (I). Premedicated humans become medicated when they receive treatment. All three infective classes can become susceptible by clearing all the adult worms. Birth and death rates for humans are equal, with all births into the susceptible class; similarly, birth and death rates for flies are equal with all births into the uninfected class.

    The model includes the following specific assumptions:

    1. The human population N and fly population F are constant, as onchocerciasis is not fatal to either and does not inhibit reproduction. These parameters vary widely by region.

    2. Human and black fly birth and death rates are proportional to the population numbers, with rate constants μ for humans and d for the flies, respectively. We take typical values to be μ=0.02 and d=12 from lifespan estimates of 50 years for humans (appropriate for the regions where the disease is most prevalent) and 1 month for the flies.

    3. The infection rate of humans is proportional to the susceptible population S and the (infected) vector population V, with proportionality constant β. The rate constant β depends on the rate at which humans are bitten by flies and the probability that a given bite transmits the larvae. This value is hard to measure directly.

    4. Exposed individuals become infectious at a rate proportional to their number, with rate constant σ that is independent of participation in the health care system. The incubation period for onchocerciasis is about one year, because the larvae that infect humans must develop into a second larval stage and mature into adult worms before the mature adults begin to produce the microfilaria that infect the flies. Hence, a typical value is σ=1.

    5. Ivermectin treatment is available to a fraction p of the population, limited by restrictions on who can receive the medication and limited health care coverage and participation. Thus, the rates of progress from class E to classes P and I are pσ and qσ, respectively, where q=1p. A typical value is p=0.65 [1]; however, this quantity could perhaps be increased through interventions.

    6. Individuals move from the premedicated class P to the medicated class M at a rate proportional to the population, with rate constant ϕ taken as the reciprocal of the mean time between the first production of microfilaria and the onset of treatment. The typical health care delivery rate of once per year corresponds to ϕ=2.

    7. The effective population of infective humans is W=I+P+(1ν)M, where ν is the relative decrease in infectivity of a medicated host compared to an untreated host. In reality, ν is 1 shortly after treatment and gradually falls to about 0.35 [13]. For our model that neglects population dynamics within the host, we take ν to be constant, with a typical value of 0.6 to 0.8.

    8. The infection rate of the fly vector is proportional to the product of the uninfected fly population FV and the effective population of infective humans (W), with proportionality constant α. Like β, the factor α is hard to measure directly.

    9. Given that ivermectin does not kill the adult worms, patients can only be cured of the disease upon the natural deaths of all the adult worms. The average life span is approximately 12 years, independent of the treatment, so we use a common rate constant γ for the progress of all three infective classes to the S class, with typical value γ=0.08. This assumption ignores the possibility of having the infection reintroduced into human hosts who are already infected, which would restart the clock for clearance of the disease; hence, our model will overestimate the clearance rate.

    These assumptions yield a model consisting of differential equations for E, I, H=P+M (the population of infected individuals who are currently or will eventually be treated), M, and V, along with an algebraic equation for S:

    dEdT=βSV(σ+μ)E, (1)
    dIdT=qσE(γ+μ)I, (2)
    dHdT=pσE(γ+μ)H, (3)
    dMdT=ϕH(ϕ+γ+μ)M, (4)
    dVdT=α(FV)WdV, (5)
    S+E+I+H=N,W=I+HνM, (6)

    where we have used T for time in order to reserve t for dimensionless time. Note that differential equations are not needed for S and U because the populations N and F are constant.


    2.1. Nondimensionalization and simplification

    We define dimensionless parameters

    δ=γ+μd,ϵ=μσ,θ=γ+μϕ,η=γ+μσ+μ,a=αNd,b=βFσ(γ+μ)(σ+μ). (7)

    The parameters δ, ϵ, θ, and η are chosen for convenience from among many possible parameters representing ratios of time scales; specifically, δ, θ, and η represent (approximately) the expected times for fly lifespan, treatment, and larval development relative to the expected duration of the infective stage, respectively, while ϵ represents the ratio of larva development time to human lifespan. The parameters a and b represent the expected number of transmissions from a fully-infective human to a susceptible fly and from an infective fly to a susceptible human, respectively.

    The natural scale for all human population groups is the total population N; however, the analysis benefits from scaling groups according to the sizes needed for equilibrium. In particular, the I and H equations establish that E/N=O((γ+μ)/σ), so [(γ+μ)/σ)]N is a better scale for E than is N. We choose the expected time in the infective class for the time scale, so t=(γ+μ)T. With our typical parameter values, one unit of dimensionless time represents about 10 years.

    With the substitutions

    S=Ns,E=γ+μσNx,I=Ni,H=Nh,M=Nm,V=Fv,t=(γ+μ)T, (8)

    the model becomes

    ηdxdt=bsvx, (9)
    didt=qxi, (10)
    dhdt=pxh, (11)
    θdmdt=h(1+θ)m, (12)
    δdvdt=aw(1v)v, (13)
    s+i+h+ζx=1,w=i+hνm, (14)

    where

    ζ=(1+ϵ)η. (15)

    2.2. Parameter estimation

    Using the estimates μ=0.02, d=12, γ=0.08, σ=1, and ϕ=2, all in inverse years, we can estimate the dimensionless time scale parameters as

    δ0.008,ϵ0.02,θ0.05,η0.1,ζ0.1. (16)

    The infectivity parameters a and b are order 1 and can be estimated from known endemic fractions of infected humans and flies in the absence of ivermectin treatment. To do this, we first compute the equilibrium solutions for the SEIS model obtained by taking p=0:

    i0=1(ab)11+b1+ζ,v0=ai01+ai0.

    Thus we can calculate the infectivity parameters from values for the endemic equilibrium fractions i0=I/N and v0=V/F as

    a=i10v101,b=v10i10(1+ζ). (17)

    Using field estimates from Cameroon, i0=0.46 and v0=0.30 [10], we obtain parameter estimates a=0.93 and b=3.1. Subsequent examples will use a=0.9 and b=3.0 unless otherwise noted. We will also see that the basic reproductive number is the product of the infectivity parameters; hence, the study data suggests a value of R0=2.7. Eradication of the disease from this population will therefore require that the human-to-fly infectivity rate be reduced by almost three-fold.


    2.3. Asymptotic simplification

    We will examine asymptotic limits as ϵ, θ, and η go to 0 as needed. For now, we assume

    δ0, (18)

    which makes the vector equation (13) quasi-steady, yielding the algebraic equation

    v=aw1+aw. (19)

    The final model, consisting of differential equations (9)-(12) along with algebraic equations (14) and (19), is an SEIPMS infectious disease model with nonlinear incidence.

    The error in taking δ0 is only noticeable on a very short time scale. Simulation results in Figure 3 verify this claim for a scenario where a small number of infective humans seed a region where the disease was previously absent. The plot of Figure 3a includes both the solution of the full model ((9)-(14), solid curve) and the simplified model with the quasi-steady approximation for the vector equation ((19) instead of (13), dash-dot); there is no visible difference. In both cases, the numbers of infective humans and flies gradually increase over a six-year period to their stable no-treatment values, with the greatest increase coming between 1 and 3 years. To highlight the actual difference, Figure 3b shows the solution on a much shorter time scale. The infective vector population rises from 0 to 0.018 (1.8% of the total fly population) in the first 20 days, while the quasi-steady approximation jumps to that level instantly. This is the extent of the error caused by the approximation.

    Figure 3. Simulation of the introduction of a small population of human infectives into a previously unexposed population, showing E/N, V/F, I/N, and S/N in Figure 3a, from bottom up, and V/F, I/N in Figure 3b, from bottom up, using a=0.9, b=3.0, η=0.1, ϵ=0, p=0, with solid for δ=0.01 and dash-dot for the asymptotic simplification.

    2.4. Equilibria and stability

    The system (9)-(12), (14), and (19) always has a disease-free equilibrium in which s=1 and all other variables are 0. Any endemic disease equilibria must satisfy the equations

    i=qx,h=px,m=p1+θx,s=1(1+ζ)x,w=ρx, (20)

    where

    ρ1νp1+θ. (21)

    This leaves the algebraic system

    bsv=x,v=ρax1+ρax (22)

    for v and x. Assuming v,x>0, these equations can be rewritten as

    v1=bx1(1+ζ)b, (23)
    v1=(ρa)1x1+1, (24)

    and then elimination of v yields the result

    x=1R101+b1+ζ,provided R0>1, (25)

    where

    R0=ρab (26)

    is the basic reproductive number.

    We can interpret R0 as the product of the number of secondary human infections per infective fly (b) and the number of secondary fly infections per infected human (ρa), where ρ<1 is a weighted average of the relative infectivity for the classes I, P, and M as compared to the infectivity of class Ⅰ.

    The stability of the equilibria is determined by the Jacobian matrix, which is

    J =[η1Q1η1Q3η1Q3η1Q2q100p01000θ1θ11]

    where

    Q1=1+ζbv,Q2=bsdvdw=abs(1+aw)2,Q3=Q2bv. (27)

    The characteristic polynomial can be written as

    P4(λ)=(λ+1)P3(λ) (28)

    where

    P3(λ)=(λ+η1Q1)(λ+θ1+1)(λ+1)η1Q3(λ+θ1+1)+η1θ1νpQ2. (29)

    The stability of the equilibria can then be determined from P3 by the Routh-Hurwitz conditions c1>0, c3>0, and c1c2>c3, where cj is the coefficient of λ3j [12]. Here

    c1=(θ1+1)+(η1Q1+1)>θ1+1>0,
    c2=η1Q1+η1(θ1+1)Q1+(θ1+1)η1Q3=η1(Q1+bv)+η1(θ1+1)Q1η1Q2+(θ1+1)>η1(Q1+bv)+η1[(θ1+1)Q1Q2],
    c3=η1(θ1+1)Q1η1(θ1+1)Q3+η1θ1νpQ2=η1(θ1+1)(Q1+bvρQ2),

    and

    c1c2>η1(θ1+1)(Q1+bv)+c1η1[(θ1+1)Q1Q2].

    The requirement c3>0 yields a necessary condition for stability:

    ρQ2<Q1+bv. (30)

    The stronger requirement

    ρQ2<Q1 (31)

    is sufficient for c1c2>c3 as well as c3>0 because it implies

    (θ1+1)Q1Q2>(θ1+1)ρQ2Q2=[1+θ1(1νp)]Q2Q2>0,

    whence

    c1c2>η1(θ1+1)(Q1+bv)>c3.

    The disease-free equilibrium has

    v=0,Q1=1,Q2=ab,

    so both stability criteria become

    R0=ρab<1,

    confirming that the disease-free equilibrium is stable when R0<1 and unstable when R0>1. The endemic disease equilibrium has (from (22))

    bs=xv=1+awρa;

    then

    ρQ2=ρa(1+aw)21+awρa=11+aw<1<Q1.

    Both criteria are always satisfied provided that the existence requirement R0>1 is met.

    Proposition 1 summarizes the results.

    Proposition 1. The SEIPMS model given by (9)-(12), (14), (19) has

    1. a disease-free equilibrium that is stable whenever R0<1 (26), and

    2. a stable endemic disease equilibrium given by (25) and (20) whenever R0>1.


    3. A pulsed model

    The general pulsed model follows from two changes to the SEIPMS model of Section 2:

    1. Set ϕ=0 because delivery of health care occurs only at fixed intervals;

    2. Introduce a jump condition at times nτ, where τ is the scaled treatment interval (typically 0.1 or 0.05, corresponding to treatment intervals of 1 year or 6 months with a time scale of 10 years). At these points in time, all individuals in the premedicated class become medicated, which means m=h.

    Thus, the model (using the δ0 and ϵ=0 approximations) is

    ηdxdt=bsvx,x+=x at t=nτ, (32)
    didt=qxi,i+=i at t=nτ, (33)
    dhdt=pxh,h+=h at t=nτ, (34)
    dmdt=m,m+=h at t=nτ, (35)
    s=1ihηx,v=aw1+aw,w=i+hνm. (36)

    The system can be simplified somewhat by introducing variables y, z, and r to replace h, i, and m and rescaling time to match the treatment interval:

    y=p1h,z=iqy,r=p1m,t=tτ,ξ=τη. (37)

    The problem for z is then

    dzdt=τz,z+=z at t=n,

    which can be solved immediately, reducing the system to

    dxdt=ξ(bsvx),x+=x at t=n, (38)
    dydt=τ(xy),y+=y at t=n, (39)
    drdt=τr,r+=y at t=n, (40)
    s=1yzηx,v=aw1+aw,w=y+zνpr,z=z(0)eτt. (41)

    3.1. Periodic solutions

    Periodic solutions are defined by the differential equations of (38-40) along with the periodicity conditions

    x(0)=x(1),y(0)=y(1),r(0)=y(1) (42)

    and the auxiliary equations of (41). Clearly any periodic solution must have z=0, and we can solve the r equation analytically; thus, we can recast the problem as that of finding initial conditions (xi,yi) such that the solution of the system defined by

    dxdt=ξ(bsvx),x(0)=xi, (43)
    dydt=τ(xy),y(0)=yi, (44)

    where

    s=1yτξ1x,w=yyiνpeτt,v=wa1+w, (45)

    satisfies the periodicity conditions

    x(1)=x(0),y(1)=y(0). (46)

    Obviously there is a disease-free periodic solution with x0=y0=0. Numerical solutions can in principle be found by solving (43-46); in practice, this system is difficult to work with because the presence of the small parameter τ in (44) makes y very nearly constant. As an alternative, we can derive another periodicity condition by combining (43) and (44) into a single equation

    ξ1dxdt+τ1dydt=bsvy;

    integration over the interval [0,1] then yields the condition

    10bsvdt=10ydt. (47)

    Our numerical scheme implements this condition by adding initial value problem components

    dFdt=bsv,F(0)=0;dYdt=y,Y(0)=0; (48)

    whence we can compute the correct initial conditions (xi,yi) from

    x(1)=x(0),F(1)=Y(1). (49)

    Figure 4 shows some periodic solutions using typical parameter values. The variation over the treatment interval is seen primarily in the exposed populations and not the infective population. The principal driver of this behavior is the sudden drop in infectivity from human to fly each time treatment occurs. This creates a noticeable decrease in the infected fly population, which decreases the rate at which susceptible humans become infected; meanwhile, the rate at which exposed humans become infective changes only slightly during the period. The result is a decline in the exposed population in the first portion of the treatment interval. The subsequent rise is due to the fact that individuals who become infective during the interval between treatments are not medicated and hence more infective to the flies.

    Figure 4. Periodic solutions for the exposed (x, dashed) and total infective (y=h+i, solid) classes, with treatment intervals of 2 years (top), 1 year (middle), and 6 months (bottom), using a=0.9, b=3.5, η=0.1, ϵ=0, νp=0.6.

    3.2. Asymptotic approximation

    It is also possible to compute asymptotic approximations for periodic solutions. The leading order approximation is the solution of the reduced problem having τ=0. With

    xx0(t),yy0,

    we have

    ss0=1y0,ww0=(1νp)y0,vv0=w0a1+w0.

    The integral condition (47) requires

    bs0v0=y0, (50)

    so

    a1+w0=w0v10=bs0w0y10=b(1νpw0),

    from which we obtain the results

    w0=1νp(ab)11+b1,y0=w01νp,s0=1y0,v0=w0a1+w0. (51)

    The result (50) reduces the x0 equation to

    x0=ξ(y0x0),

    which has the unique periodic solution x0=y0.

    To obtain first order corrections to the constant leading order solution, we assume

    x(t;τ)=y0+τx1(t)+O(τ2),y(t;τ)=y0+τy1+O(τ2), (52)

    where y1 is constant because the result x0=y0 means y0=O(τ2); the results are

    y1=a1νps02ξ1w0(a1+w0)2(1νp)(a1+1νp), (53)

    and

    x1(t)=y1+ba1νpy0s0(a1+w0)2(12ξ1+t+eξt1eξ). (54)

    Details of these calculations appear in the Appendix.


    3.3. A necessary condition for an endemic disease equilibrium

    If we begin with a set of parameters that yields an endemic disease equilibrium and gradually make the parameters less favorable, the solutions of the fixed point equations (46) gradually converge to (x0,y0)=(0,0). Hence, the critical case can be thought of as corresponding to the limit of the periodic solution problem as y00 with x0=O(y0). We can identify the bifurcation hypersurface by assuming

    (x,y,x0,x1,y1)=(y0X,y0Y,y0X0,y0X1,y0Y1) (55)

    in (43-46) and taking the asymptotic limit y00, resulting in the problem

    dXdt=ξX+ξabYξabνpeτt,X(1)=X(0), (56)
    dYdt=τXτY,Y(0)=Y(1)=1. (57)

    With three boundary conditions for a system of two differential equations, this problem is overspecified and will have a solution only for a particular relationship among the parameters.

    Letting UT=[XY], the system can be written in vector form as

    dUdt=[ξξabττ]U+[ξab0]νpeτt. (58)

    The eigenvalues of the matrix in (58) are given by

    λ1,2=(ξ+τ)±ξ2+(4ab2)ξτ+τ22, (59)

    which are real for the case where the disease is endemic in the absence of treatment (ab>1). Given a treatment interval that is much shorter than the lifespan of adult worms, we expect τ to be small (τ0.1 for annual treatment); hence, we write the eigenvectors as

    u1=[u11],u2=[u2τ]whereu1=λ1+ττ,u2=λ2+τ. (60)

    With this notation, the solution of the system of differential equations is

    X=c1u1eλ1t+c2u2eλ2t, (61)
    Y=c1eλ1t+c2τeλ2t+νpeτt. (62)

    The three boundary conditions yield a linear system for unknowns c1, c2, and νp, leading to the necessary condition (for existence of an endemic periodic solution)

    νp<(τu1u2)(eλ11)(1eλ2)τu1(eλ11)(eτeλ2)u2(eλ1eτ)(1eλ2). (63)

    Asymptotic expansion of this result (included in the Appendix) yields the approximate condition

    νp<(11ab)(1+τ2)+O(τ2), (64)

    which can be rearranged as

    R0=ab(1νp1+τ/2+O(τ2))>1. (65)

    The quantity on the left side of this inequality is the basic reproductive number for the continuous model (τ/2 is the mean dimensionless time a newly infective person must wait for initial treatment, which was defined as θ in Section 2).


    3.4. Stability of the disease-free solution

    Consider the case where there is a small perturbation to the disease-free solution owing to a small non-zero initial value z(0)=z0, corresponding to the introduction of a small number of unmedicated infectives. We assume

    (x,y,r,z)=(z0X,z0Y,z0R,z0Z)

    and define sequences

    Xn=X(nτ),Yn=Y(nτ),Zn=Z(nτ)=enτ,Kn=νpYnZn,

    along with a shifted time variable on the interval nτ<t<(n+1)τ:

    t=nτ+ˆt.

    With the solution R(ˆt)=Yneτˆt, we obtain a recursive definition of the sequences Xn and Yn through the system

    dXdˆt=ξX+ξabYξabKneτˆt,X(0)=Xn,X(1)=Xn+1, (66)
    dYdˆt=τXτY,Y(0)=Yn,Y(1)=Yn+1. (67)

    The eigenvalues and eigenvectors are again given by (59) and (60), leading to the solutions

    X=c1u1eλ1ˆt+c2u2eλ2ˆt, (68)
    Y=c1eλ1ˆt+c2τeλ2ˆt+Kneτˆt, (69)

    where

    c1=τXn(1νp)u2Ynu2Znτu1u2,c2=Xn+(1νp)u1Yn+u1Znτu1u2. (70)

    Evaluating these solutions at ˆt=1 yields a system of difference equations having the form

    Un+1=AUn+Znb.

    Since limnZn=0, the solution vector for this system decays to 0 if the eigenvalues of A have magnitude less than 1, which we can determine directly from the matrix entries

    a11=τu1eλ1u2eλ2τu1u2,a12=(1νp)u1u2eλ1eλ2τu1u2, (71)
    a21=τeλ1eλ2τu1u2,a22=(1νp)τu1eλ2u2eλ1τu1u2+νpeτ (72)

    using the Jury conditions [9]

    |trA|1<detA<1. (73)

    With τ0, we can quickly see that

    A[eξO(1)O(τ)1],

    from which it is clear that the trace is positive and the determinant is less than 1; hence, the only condition that needs to be satisfied is

    trA1<detA. (74)

    Computing and simplifying the trace and determinant yields the inequality

    (eλ1+eλ21)+νpeτνpτu1eλ2u2eλ1τu1u2
    <(1νp)eλ1+λ2+νpτu1eλ1τu2eλ2ττu1u2.

    Multiplying this inequality by the positive quantity τu1u2 and rearranging leads ultimately to the condition

    νp>(τu1u2)(eλ11)(1eλ2)τu1(eλ11)(eτeλ2)u2(eλ1eτ)(1eλ2), (75)

    which is just the reverse of the inequality needed for existence of the endemic periodic solution.

    Proposition 2 summarizes the results of the pulsed model.

    Proposition 2.In the limit η0, the disease-free periodic solution is unique and stable whenever

    νp>(τu1u2)(eλ11)(1eλ2)τu1(eλ11)(eτeλ2)u2(eλ1eτ)(1eλ2).

    When the inequality is reversed, the disease-free periodic solution is unstable and an endemic disease periodic solution can be found numerically by solving the algebraic equations (49) applied to the system (43), (44), (48). In the limit as τ0, the inequality reduces to

    R0=ab(1νp1+τ/2+O(τ2))<1.

    4. Results and discussion


    4.1. Using the continuous model in lieu of the pulsed model

    In practice, the treatment protocol for onchocerciasis leads to the pulsed model. As we would predict, the continuous model slightly underestimates the numbers of infectives compared to the pulsed model (see Figure 5), which increases the extent to which the model results are overly optimistic. However, the intervals between treatment events are short compared to the time required for a patient to be cleared of the disease (τ1), which means that the results for the continuous model are only slightly better than those of the pulsed model. Certainly the difference between the continuous and pulsed models is small compared to the errors caused by uncertainty in parameter values. Given these considerations, we use the continuous model in the subsequent discussion of our model projections.

    Figure 5. Time average infective populations, with a=0.9, b=3, η=0.1, ϵ=0, νp=0.6. Humans: top 2; Flies: bottom 2; Pulsed: solid; Continuous: dashed.

    4.2. The prognosis for onchocerciasis treatment plans

    In the pre-treatment equilibrium, the product ab is given in terms v0 and i0 (from (17)) by

    ab=1(1v0)[1(1+ζ)i0].

    The condition for a basic reproductive number R0<1 is then

    1νp1+θ=ρ<1ab=(1v0)[1(1+ζ)i0],

    or

    νp1+θ>v0+(1+ζ)i0(1v0). (76)

    Using estimated parameter values i0=0.46 and v0=0.30 [10], this means that eradication would require νp0.69. This is problematic, as the generally accepted treatment fraction is only p=0.65 [1]. With an optimistic value of 0.9 for ν, at best achievable if treatment is more frequent than in the current protocol, a compliance rate higher than p=0.76 would be required. This might be possible, but it would be difficult to achieve since there are people who cannot be given ivermectin treatment, such as pregnant women and children under the age of 5.

    Even if R0 can be brought below 1, the eradication dynamics is unacceptably slow. Figure 6 shows the results of simulations using a=0.9 with optimistic and pessimistic values for b and νp. The combination of a=0.9 and b=3 yields a pre-treatment equilibrium of 44% human infectivity and 28% fly infectivity, which is not quite as much as the reported values given above. The lowest curve in each plot is for the optimistic choices b=2 and νp=0.8, corresponding to R0=0.43. It takes about 60 years in this scenario for the infective populations of humans and flies to be decreased to 10% of their initial values. These disappointing results are due to two key factors:

    Figure 6. Simulations of various treatment scenarios, with a=0.9, η=0.1, ϵ=0.

    1. Ivermectin does not kill adult worms, so the expected value of the time needed to eradicate the disease from an individual human host is still half the lifespan of the adult worms, which is about 6 years.

    2. Even with optimistic projections for microfilaria suppression and fraction of humans who get treated, the expected transmission rate from an individual human to the black fly population is still 20% of the untreated value.

    While our model is overly simple, it should be able to provide an overestimate of the efficacy of onchocerciasis treatment simply by choosing an optimistic value of νp; thus, the results strongly suggest that the current eradication plan is inadequate. The problem cannot be fixed simply by improving parameter values such as the treatment fraction. The more significant reason for the poor results is that ivermectin targets the parasite at the least critical point in its life cycle. From a mathematical point of view, we should apply a treatment plan that targets the adult worms because this would shorten the expected value of the longest time scale. While this might not have a larger effect on the basic reproductive number, it would speed up the approach to the disease-free equilibrium by changing the time scale of that approach.


    4.3. Using an analytical model in lieu of a complex simulation

    Modeling of onchocerciasis has generally been done using complex simulations such as ONCHOSIM and EpiOncho. One would expect that more detail would provide better results than a simplified model such as ours. This is true in theory, but in practice it is only true if the processes are very well understood, the parameter values are known to a modest degree of accuracy, the models are thoroughly tested against data, and the predictions for hypothetical scenarios viewed skeptically rather than accepted without reservation. Here there is ample reason for caution.

    ONCHOSIM predicts eradication of onchocerciasis with 5 to 20 years of treatment, depending on the initial mean population microfilarial load [18]. Given that the adult worm lifespan is 10-12 years, 5 years of treatment is only sufficient to reduce the prevalence of adult worms in the human population by half, and then only with complete suppression of microfilarial production. The actual results must be worse because treatment does not result in complete suppression and coverage is not universal. Perhaps the disease can be temporarily eradicated from the flies in 5 years, but microfilarial production rebounds somewhat after treatment is discontinued and there would be a large number of infective humans available to restart the infection in the fly population. This simple example illustrates the danger of taking the detailed predictions of complex simulations too seriously.

    Simpler analytical models can be thought of as sacrificing precision for accuracy, insofar as they can be used to determine a range of reasonable results and the results are robust to changes in parameter values. Of course the accuracy of this range depends on the specific simplifications in the analytical model. Our model has one major omission, which is the assumption that the expected duration of the onchocerciasis infection in humans is the same as the expected lifespan of the adult worm. This is not true if new onchocerca larvae can establish themselves in a human who is already infected from an earlier time. Reintroduction of larvae into an infective human would reset the 12-year timer for clearance of the disease, resulting in a much smaller expected clearance rate. This omission makes our model more optimistic about the conditions needed for local eradication of the disease from any given population. Our results are not so much a projection of what will happen with a particular treatment plan as they are a best case scenario for what could happen with that plan.

    Another simplification in our model of onchocerciasis is our assumption that the effect of treatment is to lower the infectivity of humans to flies by a fixed fraction ν. There is a considerable amount of literature showing that this is not the case; instead, the mean infectivity of humans drops to near 0 when the dose of ivermectin is administered but then rises to a level somewhat less than that of untreated patients but still significant. In theory, there should be a particular mean value of infectivity loss ν for any treatment protocol, but the best value for ν should depend on the frequency with which the medication is administered. This could be determined by the overall treatment interval (as represented by τ in the pulsed model and θ=τ/2 in the continuous model) if doses are only administered during the periodic visits by the medical community, but it could also be independent of the frequency of medical visits if it is possible to have permanent members of the population arrange for doses to be administered at any interval prescribed by the treatment plan. These considerations affect the choice of ν for investigations with the model, but the idea of using a fixed value of ν in lieu of a complex simulation is valid in any case.


    4.4. The value of asymptotics

    Asymptotic approximation can sometimes simplify the analysis of a model without making an appreciable change in the results. This is most clearly seen in Figure 3, where the error caused by the quasi-steady assumption that changes the model from a vector-borne disease with linear incidence to an infectious disease with nonlinear incidence is only visible for the first 15 days of a disease introduction scenario. The duration of the period for which the initial transient is important depends primarily on the time scale of the differential equation being approximated. This time scale is just 30 days in the onchocerciasis model, so we should not expect a transient duration to be significantly longer than that. The extent of the imbalance between the initial conditions of the experiment and the equilibrium solution does not make much difference. For example, a similar experiment with double the initial load of human infectives shows a transient of the same duration.

    Asymptotics also has a clear value in characterizing solutions, as for example in the analytical result for the periodic solutions of the pulsed model in the limit τ0. While the value in this case is modest, there are examples where asymptotic analysis can provide a detailed explanation of complex behavior (see [11] for an example).


    Appendix: Asymptotics for the pulsed model


    A.1. Periodic solutions

    This subsection shows the calculations necessary to obtain the approximations (53) and (54). We begin by defining

    f=bsv (A)

    and assuming the asymptotic expansions

    xy0+τx1(t)+O(τ2),yy0+τy1+O(τ2), (B)

    where the term y1 is necessarily constant because the leading order solution xy0 implies y=O(τ2). We also define subsidiary expansions, (suppressing the O(τ2) term)

    ss0+τs1

    and

    ww0+τw1,w1=w10+w11tvv0+τv1,v1=v10+v11tff0+τf1,f1=f10+f11t.

    The calculation proceeds by using the equations for s, w, v, and f to determine the corresponding first-order corrections in terms of y1 and then use the periodicity condition (47), which reduces to

    y1=10f1(t)dt=f10+12f11, (C)

    to obtain an algebraic equation for y1. Once y1 is known, then x1 is determined directly from its initial value problem.

    From s=1yτξ1x, we obtain

    s1=y1ξ1y0. (D)

    The expansion

    w=yνpyieτt(y0+τy1)[1νp(1τt)]

    yields

    w10=(1νp)y1,w11=νpy0. (E)

    The equation v=w/(a1+w) can be written as

    (v0+τv1)(a1+w0+τw1)w0+τw1,

    from which we eventually obtain

    v10=a1(1νp)y1(a1+w0)2,v11=a1νpy0(a1+w0)2. (F)

    Expansion of (A) then yields

    f11=bs0v11=ba1νpy0s0(a1+w0)2 (G)

    and

    f10=bv0s1+bs0v10.

    Combining this last result with (C), (D), (F), and (G) yields an algebraic equation for y1, with solution

    y1=a1νps02ξ1w0(a1+w0)2(1νp)(a1+1νp). (H)

    Once this result is known, the differential equation for x1 with the periodicity condition has the solution

    x1=y1(12+ξ1)f11+f11t+f11eξt1eξ. (I)

    A.2. Existence condition for a nontrivial periodic solution

    This subsection shows the calculations necessary to obtain the approximation (64)

    νp<(11ab)(1+τ2)+O(τ2),

    from the original result (63)

    νp<(τu1u2)(eλ11)(1eλ2)τu1(eλ11)(eτeλ2)u2(eλ1eτ)(1eλ2).

    1. We begin by rewriting the original result as

    P=(eλ11)(1τu1u2)(eλ1eτ)τu1u2(eλ11)eτeλ21eλ2,

    or

    P=eλ11eλ1eτ1τu1u21τu1u2eλ11eλ1eτeτeλ21eλ2. (J)

    2. The eigenvalue λ2ξ=O(1) as τ0; hence, the last factor in the denominator of (J) can be expanded as

    eτeλ21eλ21+O(τ)

    and u2=λ2+τξ. We then have

    Peλ11eλ1eτ1+ξ1τu1+O(τ2)1+ξ1τu1eλ11eλ1eτ+O(τ2). (K)

    3. The eigenvalue λ1 is O(τ) as τ0; hence, the other ratio of exponential functions can be expanded as

    eλ11eλ1eτλ1+12λ21+O(τ3)(λ1+12λ21)(τ+12τ2)=λ1+12λ21+O(τ3)(λ1+τ)+12(λ21τ2)=λ1λ1+τ1+12λ1+O(τ2)1+12(λ1τ)λ1λ1+τ(1+12λ1+O(τ2))(112(λ1τ)+O(τ2))λ1λ1+τ(1+τ2+O(τ2)).

    Substituting this result into (K) yields

    Pλ1λ1+τ(1+τ2+O(τ2))1+ξ1τu1+O(τ2)1+ξ1τu1λ1λ1+τ+O(τ2). (L)

    4. The denominator in the last factor of (L) can be expanded as a geometric series, yielding

    Pλ1λ1+τ(1+τ2+O(τ2))(1+ξ1τu1τλ1+τ+O(τ2));

    since τu1=λ1+τ, we have

    Pλ1λ1+τ(1+τ2+O(τ2))(1+ξ1τ+O(τ2)). (M)

    5. Expansion of the formula for λ1 (59) yields the results

    λ1(ab1)τ[1abξ1τ+O(τ2)],λ1+τabτ[1(ab1)ξ1τ+O(τ2)],

    so

    λ1λ1+τab1ab[1abξ1τ+O(τ2)][1+(ab1)ξ1τ+O(τ2)](11ab)[1ξ1τ+O(τ2)].

    Substituting this last result into (M) yields the desired final result

    P(11ab)(1+τ2)+O(τ2). (N)



    [1] Abdalmageed W, Elosery A, Smith CE (2003) Non-parametric expectation maximization: a learning automata approach. In IEEE International Conference on Systems, 2003.
    [2] Agrawal L, Adane D (2021) Improved decision tree model for prediction in equity market using heterogeneous data. IETE J Res, 1–10.
    [3] Ahn JJ, Oh KJ, Kim TY, et al. (2011) Usefulness of support vector machine to develop an early warning system for financial crisis. Expert Syst Appl 38: 2966–2973. doi: 10.1016/j.eswa.2010.08.085
    [4] Alberici A, Querci F (2015) The quality of disclosures on environmental policy: The profile of financial intermediaries. Corp Soc Resp Env Ma 23: 283–296. doi: 10.1002/csr.1375
    [5] Aljawazneh H, Mora AM, Garcia-Sanchez P, et al. (2021) Comparing the performance of deep learning methods to predict companies' financial failure. IEEE Access 9: 97010–97038.
    [6] Atsalakis GS, & Valavanis KP (2009) Surveying stock market forecasting techniques - part II: Soft computing methods. Expert Syst Appl 36: 5932–5941. doi: 10.1016/j.eswa.2008.07.006
    [7] Javed Awan M, Mohd Rahim MS, Nobanee H, et al. (2021) Social media and stock market prediction: A big data approach. Comput Mater Con 67: 2569–2583.
    [8] Barboza F, Kimura H, Altman E (2017) Machine learning models and bankruptcy prediction. Expert Syst Appl 83: 405–417. doi: 10.1016/j.eswa.2017.04.006
    [9] Bernardi M, Catania L (2018) Switching generalized autoregressive score copula models with application to systemic risk. J Appl Econometrics 34: 43–65. doi: 10.1002/jae.2650
    [10] Bielza C, Larranaga P (2014) Discrete bayesian network classifiers. ACM Comput Surv 47: 1–43.
    [11] Bishop CM (2006) Pattern Recognition and Machine Learning. Springer New York, 2006.
    [12] Borges TA, Neves RF (2020) Ensemble of machine learning algorithms for cryptocurrency investment with different data resampling methods. Appl Soft Comput 90: 106187. doi: 10.1016/j.asoc.2020.106187
    [13] Braun B (2018) Central banking and the infrastructural power of finance: the case of ECB support for repo and securitization markets. Socio-Econ Rev 18: 395–418.
    [14] Brusco MJ, Cradit JD (2001) A variable-selection heuristic for k-means clustering. Psychometrika 66: 249–270. doi: 10.1007/BF02294838
    [15] Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2: 121–167. doi: 10.1023/A:1009715923555
    [16] Bustos O, Pomares-Quimbaya A (2020) Stock market movement forecast: A systematic review. Expert Syst Appl 156: 113464. doi: 10.1016/j.eswa.2020.113464
    [17] Cagliero L, Garza P, Attanasio G, et al. (2020) Training ensembles of faceted classification models for quantitative stock trading. Computing 102: 1213–1225. doi: 10.1007/s00607-019-00776-7
    [18] Cao LJ, Tay FEH (2003) Support vector machine with adaptive parameters in financial time series forecasting. IEEE T Neural Networ 14: 1506–1518.
    [19] Carpinteiro OA, Leite JP, Pinheiro CA, et al. (2011) Forecasting models for prediction in time series. Artif Intell Rev 38: 163–171. doi: 10.1007/s10462-011-9275-1
    [20] Carta S, Ferreira A, Podda AS, et al. Multi-DQN: An ensemble of deep q-learning agents for stock market forecasting. Expert Syst Appl 164: 113820.
    [21] Cavalcante RC, Brasileiro RC, Souza VL, et al. Computational intelligence and financial markets: A survey and future directions. Expert Syst Appl 55: 194–211.
    [22] Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40: 200–210. doi: 10.1016/j.eswa.2012.07.021
    [23] Centanni S, Minozzo M (2006) Estimation and filtering by reversible jump MCMC for a doubly stochastic poisson model for ultra-high-frequency financial data. Stat Model 6: 97–118. doi: 10.1191/1471082X06st112oa
    [24] Chen AS, Leung MT, Pan S (2019) Financial hedging in energy market by cross-learning machines. Neural Comput Appl 32: 10321–10335. doi: 10.1007/s00521-019-04572-4
    [25] Chen HL, Liu DY, Yang B, et al. (2011) An adaptive fuzzy k-nearest neighbor method based on parallel particle swarm optimization for bankruptcy prediction. In Adv Knowl Discovery Data Min, 249–264. Springer Berlin Heidelberg, 2011.
    [26] Chen MY (2011) Predicting corporate financial distress based on integration of decision tree classification and logistic regression. Expert Syst Appl 38: 11261–11272. doi: 10.1016/j.eswa.2011.02.173
    [27] Chen S (2019) An effective going concern prediction model for the sustainability of enterprises and capital market development. Appl Econ 51: 3376–3388. doi: 10.1080/00036846.2019.1578855
    [28] Jin C, De-Lin L, Fen-Xiang M (2014) An improved ID3 decision tree algorithm. Adv Mater Res 962-965: 2842–2847. doi: 10.4028/www.scientific.net/AMR.962-965.2842
    [29] Chen Y, Hao Y (2017) A feature weighted support vector machine and k-nearest neighbor algorithm for stock market indices prediction. Expert Syst Appl 80: 340–355. doi: 10.1016/j.eswa.2017.02.044
    [30] Chen Z, Nazir A, Teoh EN, et al. Exploration of the effectiveness of expectation maximization algorithm for suspicious transaction detection in anti-money laundering. In 2014 IEEE Conference on Open Systems (ICOS). IEEE.
    [31] Cheng SH (2014) Predicting stock returns by decision tree combining neural network. Lect Notes Artif Int 8398: 352–360.
    [32] Cheng CH, Chan CP, Sheu YJ (2019) A novel purity-based k nearest neighbors imputation method and its application in financial distress prediction. Eng Appl Artif Intel 81: 283–299. doi: 10.1016/j.engappai.2019.03.003
    [33] Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20: 273–297.
    [34] Dai W (2021) Development and supervision of robo-advisors under digital financial inclusion in complex systems. Complexity 2021: 1–12.
    [35] Daugaard D Emerging new themes in environmental, social and governance investing: a systematic literature review. Account Financ 60: 1501–1530.
    [36] Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via theEMAlgorithm. J Royal Stat Soc 39: 1–22.
    [37] Deng S, Wang C, Wang M, et al. (2019) A gradient boosting decision tree approach for insider trading identification: An empirical model evaluation of china stock market. Appl Soft Comput 83: 105652. doi: 10.1016/j.asoc.2019.105652
    [38] Desokey EN, Badr A, Hegazy AF Enhancing stock prediction clustering using k-means with genetic algorithm. In 2017 13th International Computer Engineering Conference (ICENCO). IEEE.
    [39] Dong X, Yu Z, Cao W, et al. (2019) A survey on ensemble learning. Front Comput Sci 14: 241–258. doi: 10.1007/s11704-019-8208-z
    [40] Ekinci A, Erdal HI (2016) Forecasting bank failure: Base learners, ensembles and hybrid ensembles. Comput Econ 49: 677–686. doi: 10.1007/s10614-016-9623-y
    [41] Farid S, Tashfeen R, Mohsan T, et al. (2020) Forecasting stock prices using a data mining method: Evidence from emerging market. Int J Financ Econ.
    [42] Ferreira FGDC, Gandomi AH, Cardoso RTN (2020) Financial time-series analysis of brazilian stock market using machine learning. In 2020 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE.
    [43] Ferreira LEB, Barddal JP, Gomes HM, et al. (2017) Improving credit risk prediction in online peer-to-peer (p2p) lending using imbalanced learning techniques. In 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE.
    [44] Fields D Constructing a new asset class: Property-led financial accumulation after the crisis. Econ Geogr 94: 118–140.
    [45] Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29: 131–163. doi: 10.1023/A:1007465528199
    [46] Gamage P (2016) New development: Leveraging 'big data' analytics in the public sector. Public Money Manage 36: 385–390. doi: 10.1080/09540962.2016.1194087
    [47] García S, Fernández A, Herrera F (2009) Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems. Appl Soft Comput 9: 1304–1314. doi: 10.1016/j.asoc.2009.04.004
    [48] Garcia-Almanza AL, Tsang EP (2006) The repository method for chance discovery in financial forecasting, In International Conference on Knowledge-based Intelligent Information and Engineering Systems.
    [49] Gonzalez RT, Padilha CA, Barone DAC (2015) Ensemble system based on genetic algorithm for stock market forecasting. In 2015 IEEE Congress on Evolutionary Computation (CEC). IEEE.
    [50] Gou J, Ma H, Ou W, et al. (2019) A generalized mean distance-based k-nearest neighbor classifier. Expert Syst Appl 115: 356–372. doi: 10.1016/j.eswa.2018.08.021
    [51] Goyal K, Kumar S (2020) Financial literacy: A systematic review and bibliometric analysis. Int J Consum Stud 45: 80–105. doi: 10.1111/ijcs.12605
    [52] Guo S, He H, Huang X (2019) A multi-stage self-adaptive classifier ensemble model with application in credit scoring. IEEE Access 7: 78549–78559.
    [53] Han J, Pei J, Kamber M (2000) Data Mining: Concepts and Techniques.
    [54] Han J, Cheng H, Xin D, et al. (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discovery 15: 55–86. doi: 10.1007/s10618-006-0059-1
    [55] He H, Fan Y (2021) A novel hybrid ensemble model based on tree-based method and deep learning method for default prediction. Expert Syst Appl 176: 114899. doi: 10.1016/j.eswa.2021.114899
    [56] He S, Zheng J, Lin J, et al. (2020) Classification-based fraud detection for payment marketing and promotion. Comput Syst Sci Eng 35: 141–149. doi: 10.32604/csse.2020.35.141
    [57] Howe D, Costanzo M, Fey P, et al. (2008) The future of biocuration. Nature 455: 47–50. doi: 10.1038/455047a
    [58] Hssina B, Merbouha A, Ezzikouri H, et al. (2014) A comparative study of decision tree ID3 and c4.5. Int J Adv Comput Sci Appl 4.
    [59] Hsu YS, Lin SJ (2014) An emerging hybrid mechanism for information disclosure forecasting. Int J Mach Learn Cybern 7: 943–952. doi: 10.1007/s13042-014-0295-4
    [60] Huang C, Gao F, Jiang H (2014) Combination of biorthogonal wavelet hybrid kernel OCSVM with feature weighted approach based on EVA and GRA in financial distress prediction. Math Probl Eng 2014: 1–12.
    [61] Huang Q, Wang T, Tao D, et al. (2015) Biclustering learning of trading rules. IEEE T Cybern 45: 2287–2298.
    [62] Huang X, Tang H (2021) Measuring multi-volatility states of financial markets based on multifractal clustering model. J Forecast.
    [63] Iqbal R, Doctor F, More B, et al. (2020) Big data analytics: Computational intelligence techniques and application areas. Technol Forecast Soc 153: 119253. doi: 10.1016/j.techfore.2018.03.024
    [64] Jagadish HV, Gehrke J, Labrinidis A, et al. (2014) Big data and its technical challenges. Commun ACM 57: 86–94.
    [65] Rutkowski L, Jaworski M, Pietruczuk L, et al. (2014) The cart decision tree for mining data streams. Infor Sci.
    [66] Julia D, Pereira A, Silva RE (2018) Designing financial strategies based on artificial neural networks ensembles for stock markets. 1–8.
    [67] Kanhere P, Khanuja HK (2015) A methodology for outlier detection in audit logs for financial transactions. In 2015 International Conference on Computing Communication Control and Automation. IEEE.
    [68] Kercheval AN, Zhang Y (2015) Modelling high-frequency limit order book dynamics with support vector machines. Quant Financ 15: 1315–1329. doi: 10.1080/14697688.2015.1032546
    [69] Kewat P, Sharma R, Singh U, et al. (2017) Support vector machines through financial time series forecasting. In 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA). IEEE.
    [70] Kilimci ZH (2019) Borsa tahmini için derin topluluk modellleri (DTM) ile finansal duygu analizi. Gazi niversitesi Mhendislik-Mimarlık Fakltesi Dergisi.
    [71] Kim SY, Upneja A (2021) Majority voting ensemble with a decision trees for business failure prediction during economic downturns. J Innovation Knowl 6: 112–123. doi: 10.1016/j.jik.2021.01.001
    [72] Kim YJ, Baik B, Cho S (2016) Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning. Expert Syst Appl 62: 32–43. doi: 10.1016/j.eswa.2016.06.016
    [73] Kirkos E, Spathis C, Manolopoulos Y (2007) Data mining techniques for the detection of fraudulent financial statements. Expert Syst Appl 32: 995–1003. doi: 10.1016/j.eswa.2006.02.016
    [74] Kotsiantis SB (2011) Decision trees: a recent overview. Artif Intell Rev 39: 261–283.
    [75] Kum HC, Ahalt S, Carsey TM (2011) Dealing with data: Governments records. Science 332: 1263–1263. doi: 10.1126/science.332.6035.1263-a
    [76] Kumar DA, Murugan S (2013) Performance analysis of indian stock market index using neural network time series model. In 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering. IEEE.
    [77] Lee I (2017) Big data: Dimensions, evolution, impacts, and challenges. Bus Horizons 60: 293–303. doi: 10.1016/j.bushor.2017.01.004
    [78] Lee TK, Cho JH, Kwon DS, et al. (2019) Global stock market investment strategies based on financial network indicators using machine learning techniques. Expert Syst Appl 117: 228–242. doi: 10.1016/j.eswa.2018.09.005
    [79] Li H, Sun J, Sun BL (2009) Financial distress prediction based on OR-CBR in the principle of k-nearest neighbors. Expert Syst Appl 36: 643–659. doi: 10.1016/j.eswa.2007.09.038
    [80] Li L, Wang J, Li X (2020) Efficiency analysis of machine learning intelligent investment based on k-means algorithm. IEEE Access 8: 147463–147470.
    [81] Li ST, Ho HF (2009) Predicting financial activity with evolutionary fuzzy case-based reasoning. Expert Syst Appl 36: 411–422. doi: 10.1016/j.eswa.2007.09.049
    [82] Li T, Li J, Liu Z, et al. (2018) Differentially private naive bayes learning over multiple data sources. Inf Sci 444: 89–104. doi: 10.1016/j.ins.2018.02.056
    [83] Li X, Wang F, Chen X (2015) Support vector machine ensemble based on choquet integral for financial distress prediction. Int J Pattern Recognit Artif Intell 29: 1550016. doi: 10.1142/S0218001415500160
    [84] Liang D, Tsai CF, Dai AJ, et al. (2017) A novel classifier ensemble approach for financial distress prediction. Knowl Inf Syst 54: 437–462. doi: 10.1007/s10115-017-1061-1
    [85] Liao SH, Chu PH, Hsiao PY (2012) Data mining techniques and applications - a decade review from 2000 to 2011. Expert Syst Appl 39: 11303–11311. doi: 10.1016/j.eswa.2012.02.063
    [86] Lin A, Shang P, Feng G, et al. (2012) APPLICATION OF EMPIRICAL MODE DECOMPOSITION COMBINED WITH k-NEAREST NEIGHBORS APPROACH IN FINANCIAL TIME SERIES FORECASTING. Fluct Noise Lett 11: 1250018. doi: 10.1142/S0219477512500186
    [87] Lin CS, Chiu SH, Lin TY (2012) Empirical mode decomposition-based least squares support vector regression for foreign exchange rate forecasting. Econ Model 29: 2583–2590. doi: 10.1016/j.econmod.2012.07.018
    [88] Lin G, Lin A, Cao J (2021) Multidimensional KNN algorithm based on EEMD and complexity measures in financial time series forecasting. Expert Syst Appl 168: 114443. doi: 10.1016/j.eswa.2020.114443
    [89] Liu J, Lin CMM, Chao F (2019) Gradient boost with convolution neural network for stock forecast. In Adv Intell Syst Comput, 155–165.
    [90] Liu M, Luo K, Zhang J, et al. (2021) A stock selection algorithm hybridizing grey wolf optimizer and support vector regression. Expert Syst Appl 179: 115078. doi: 10.1016/j.eswa.2021.115078
    [91] Liu W, Zhao J, Wang D (2021) Data mining for energy systems: Review and prospect. WIREs Data Min Knowl Discovery 11.
    [92] Jan CL (2018) An effective financial statements fraud detection model for the sustainable development of financial markets: Evidence from taiwan. Sustainability 10: 513. doi: 10.3390/su10020513
    [93] Loukeris N, Eleftheriadis I, Livanis E (2013) A novel approach on hybrid support vector machines into optimal portfolio selection. In IEEE Int Symposium Signal Proc Inf TechnoL. IEEE.
    [94] Luintel KB, Khan M, Leon-Gonzalez R, et al. (2016) Financial development, structure and growth: New data, method and results. J Int Financ Mark Inst Money 43: 95–112. doi: 10.1016/j.intfin.2016.04.002
    [95] Luo B, Lin Z (2011) A decision tree model for herd behavior and empirical evidence from the online p2p lending market. Inf Syst e-Bus Manage 11: 141–160. doi: 10.1007/s10257-011-0182-4
    [96] Ma Y, Xu B, Xu X (2017) Real estate confidence index based on real estate news. Emerg Mark Financ Tr 54: 747–760. doi: 10.1080/1540496X.2016.1232193
    [97] Malliaris AG, Malliaris M (2015) What drives gold returns? a decision tree analysis. Financ Res Lett 13: 45–53. doi: 10.1016/j.frl.2015.03.004
    [98] Mazzarisi P, Barucca P, Lillo F, et al. (2020) A dynamic network model with persistent links and node-specific latent variables, with an application to the interbank market. Eur J Oper Res 281: 50–65. doi: 10.1016/j.ejor.2019.07.024
    [99] Mir-Juli M, Fiol-Roig G, Isern-Dey AP (2010) Decision trees in stock market analysis: Construction and validation. In Trends Applied Intelligent Systems-international Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, 2010.
    [100] Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE T Pattern Anal 36: 2227–2240.
    [101] Naranjo R, Santos M (2019) A fuzzy decision system for money investment in stock markets based on fuzzy candlesticks pattern recognition. Expert Syst Appl 133: 34–48. doi: 10.1016/j.eswa.2019.05.012
    [102] Nardo M, Petracco‐Giudici M, Naltsidis, M (2015) WALKING DOWN WALL STREET WITH a TABLET: A SURVEY OF STOCK MARKET PREDICTIONS USING THE WEB. J Econ Surv 30: 356–369. doi: 10.1111/joes.12102
    [103] Al Nasseri A, Tucker A, de Cesare S (2015) Quantifying StockTwits semantic terms' trading behavior in financial markets: An effective application of decision tree algorithms. Expert Syst Appl 42: 9192–9210. doi: 10.1016/j.eswa.2015.08.008
    [104] Nassirtoussi AK, Aghabozorgi S, Wah TY, et al. (2014) Text mining for market prediction: A systematic review. Expert Syst Appl 41: 7653–7670. doi: 10.1016/j.eswa.2014.06.009
    [105] Nf J, Paolella MS, Polak P (2019) Heterogeneous tail generalized COMFORT modeling via cholesky decomposition. J Multivariate Anal 172: 84–106. doi: 10.1016/j.jmva.2019.02.004
    [106] Ng A, Jordan M (2002) On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In T. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems, volume 14. MIT Press, 2002. URL https://proceedings.neurips.cc/paper/2001/file/7b7a53e239400a13bd6be6c91c4f6c4e-Paper.pdf.
    [107] Ng KH, Khor KC (2016) StockProF: a stock profiling framework using data mining approaches. Inf Syst e-Bus Manage 15: 139–158.
    [108] Nie CX (2020) A network-based method for detecting critical events of correlation dynamics in financial markets. EPL (Europhys Lett) 131: 50001.
    [109] Ohana JJ, Ohana S, Benhamou E, et al. (2021) Explainable AI (XAI) models applied to the multi-agent environment of financial markets. In Explainable and Transparent AI and Multi-Agent Systems, pages 189–207. Springer International Publishing, 2021.
    [110] Olson DL (2006) Data mining in business services. Serv Bus 1: 181–193.
    [111] Oussous A, Benjelloun FZ, Lahcen AA, et al. (2018) Big data technologies: A survey. J King Saud University - Comput Inf Sci 30: 431–448.
    [112] Pan I, Bester D (2018) Fuzzy bayesian learning. IEEE T Fuzzy Syst 26: 1719–1731.
    [113] Paolella MS, Polak P, Walker PS (2019) Regime switching dynamic correlations for asymmetric and fat-tailed conditional returns. J Econometrics 213: 493–515. doi: 10.1016/j.jeconom.2019.07.002
    [114] Patrizio A (2018) Idc: Expect 175 zettabytes of data worldwide by 2025. https://www.networkworld.com/article/3325397/idc-expect-175-zettabytes-of-data-worldwide-by-2025.html.
    [115] Pei S, Shen T, Wang X, et al. (2020) 3dacn: 3d augmented convolutional network for time series data. Inf Sci 513: 17–29. doi: 10.1016/j.ins.2019.11.040
    [116] Peng Y, Wang G, Kou G, et al. (2011) An empirical study of classification algorithm evaluation for financial risk prediction. Appl Soft Comput 11: 2906–2915. doi: 10.1016/j.asoc.2010.11.028
    [117] Philip DJ, Sudarsanam N, Ravindran B (2018) Improved insights on financial health through partially constrained hidden markov model clustering on loan repayment data. ACM SIGMIS Database DATABASE Adv Inf Syst 49: 98–113.
    [118] Provost F, Fawcett T (2013) Data science and its relationship to big data and data-driven decision making. Big Data 1: 51–59. doi: 10.1089/big.2013.1508
    [119] Qian B, Rasheed K (2006) Stock market prediction with multiple classifiers. Appl Intell 26: 25–33. doi: 10.1007/s10489-006-0001-7
    [120] Quinlan JR (1986) Induction of decision trees. Mach Learn 1: 81–106.
    [121] Raudys Š (2000) How good are support vector machines? Neural Networks 13: 17–19.
    [122] Rokade A, Malhotra A, Wanchoo A (2016) Enhancing portfolio returns by identifying high growth companies in indian stock market using artificial intelligence. In 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT). IEEE.
    [123] Rosati R, Romeo L, Goday CA (2020) Machine learning in capital markets: Decision support system for outcome analysis. IEEE Access 8: 109080–109091.
    [124] Roshan WDS, Gopura RARC, Jayasekara AGB, et al. (2016) Financial market forecasting by integrating wavelet transform and k-means clustering with support vector machine. In International Conference on Artificial Life and Robotics, 2016.
    [125] Roychowdhury S, Shroff N, Verdi RS (2019) The effects of financial reporting and disclosure on corporate investment: A review. J Account Econ 68: 101246. doi: 10.1016/j.jacceco.2019.101246
    [126] Rudin C, Daubechies I, Schapire RE, et al. (2004) The dynamics of adaboost: Cyclic behavior and convergence of margins. J Mach Learn Res 5: 1557–1595.
    [127] Ryans JP (2020) Textual classification of SEC comment letters. Rev Account Stud 26: 37–80.
    [128] Saidane M, Lavergne C (2009) Optimal prediction with conditionally heteroskedastic factor analysed hidden markov models. Comput Econ 34: 323–364. doi: 10.1007/s10614-009-9181-7
    [129] Salzberg SL (1994) C4.5: Programs for machine learning by j. ross quinlan. morgan kaufmann publishers, inc., 1993. Mach Learn 16: 235–240.
    [130] Samworth RJ (2012) Optimal weighted nearest neighbour classifiers. Annal Stat 40.
    [131] Schumaker RP, Chen H (2009) Textual analysis of stock market prediction using breaking financial news. ACM T Inf Syst 27: 1–19.
    [132] Seong N, Nam K (2021) Predicting stock movements based on financial news with segmentation. Expert Syst Appl 164: 113988. doi: 10.1016/j.eswa.2020.113988
    [133] Shamim S, Zeng J, Shariq SM, et al. (2019) Role of big data management in enhancing big data decision-making capability and quality among chinese firms: A dynamic capabilities view. Inform Manage 56: 103135. doi: 10.1016/j.im.2018.12.003
    [134] Shin HW, Sohn SY (2004) Segmentation of stock trading customers according to potential value. Expert Syst Appl 27: 27–33. doi: 10.1016/j.eswa.2003.12.002
    [135] Si YW, Yin J (2013) OBST-based segmentation approach to financial time series. Eng Appl Artif Intel 26: 2581–2596. doi: 10.1016/j.engappai.2013.08.015
    [136] Sinaga KP, Yang MS (2020) Unsupervised k-means clustering algorithm. IEEE Access 8: 80716–80727.
    [137] Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14: 199–222. doi: 10.1023/B:STCO.0000035301.49549.88
    [138] Soni S (2011) Applications of anns in stock market prediction: A survey. Int J Comput Sci Eng Technol 2: 71–83.
    [139] Sreedharan M, Khedr AM, El Bannany M (2020) A comparative analysis of machine learning classifiers and ensemble techniques in financial distress prediction. In 2020 17th International Multi-Conference on Systems, Signals & Devices (SSD). IEEE, 653–657.
    [140] Sun H, Rong W, Zhang J, et al. (2017) Stacked denoising autoencoder based stock market trend prediction via k-nearest neighbour data selection. In International Conference on Neural Information Processing. Springer, 882–892.
    [141] Sun J, Lang J, Fujita H, et al. (2018a) Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Inf Sci 425: 76–91. doi: 10.1016/j.ins.2017.10.017
    [142] Sun J, Li H, Fujita H, et al. (2020) Class-imbalanced dynamic financial distress prediction based on adaboost-SVM ensemble combined with SMOTE and time weighting. Inform Fusion 54: 128–144. doi: 10.1016/j.inffus.2019.07.006
    [143] Sun SL, Wei YJ, Wang SY (2018b) AdaBoost-LSTM ensemble learning for financial time series forecasting. In International Conference on Computational Science. Springer, 590–597.
    [144] Talebi H, Hoang W, Gavrilova ML (2014) Multi-scale foreign exchange rates ensemble for classification of trends in forex market. Proc Comput Sci 29: 2065–2075. doi: 10.1016/j.procs.2014.05.190
    [145] Tang L, Pan PH, Yao YY (2018a) EPAK: A computational intelligence model for 2-level prediction of stock indices. Int J Comput Commun 13: 268–279. doi: 10.15837/ijccc.2018.2.3187
    [146] Tang XB, Liu GC, Yang J, et al. (2018b) Knowledge-based financial statement fraud detection system: based on an ontology and a decision tree. Knowl Organ 45: 205–219. doi: 10.5771/0943-7444-2018-3-205
    [147] Tsai CF (2014) Combining cluster analysis with classifier ensembles to predict financial distress. Inform Fusion 16: 46–58. doi: 10.1016/j.inffus.2011.12.001
    [148] Tsai CF, Chiou YJ (2009) Earnings management prediction: A pilot study of combining neural networks and decision trees. Expert Syst Appl 36: 7183–7191. doi: 10.1016/j.eswa.2008.09.025
    [149] Vaghela VB, Vandra KH, Modi NK (2014) Mr-mnbc: Maxrel based feature selection for the multi-relational nave bayesian classifier. In Nirma University International Conference on Engineering, 1–9.
    [150] Wang B, Huang H, Wang X (2011a) A support vector machine based MSM model for financial short-term volatility forecasting. Neural Comput Appl 22: 21–28. doi: 10.1007/s00521-011-0742-z
    [151] Wang JZ, Wang JJ, Zhang ZG, et al. (2011b) Forecasting stock indices with back propagation neural network. Expert Syst Appl 38: 14346–14355.
    [152] Wang L, Zhu J (2008) Financial market forecasting using a two-step kernel learning method for the support vector regression. Ann Oper Res 174: 103–120. doi: 10.1007/s10479-008-0357-7
    [153] Wang Q, Xu W, Zheng H (2018) Combining the wisdom of crowds and technical analysis for financial market prediction using deep random subspace ensembles. Neurocomputing 299: 51–61. doi: 10.1016/j.neucom.2018.02.095
    [154] Webb GI, Zheng Z (2004) Multistrategy ensemble learning: reducing error by combining ensemble learning techniques. IEEE T Knowl Data En 16: 980–991.
    [155] Weng B, Lu L, Wang X, et al. (2018) Predicting short-term stock prices using ensemble methods and online data sources. Expert Syst Appl 112: 258–273. doi: 10.1016/j.eswa.2018.06.016
    [156] Wu XD, Kumar V, Quinlan JR, et al. (2007) Top 10 algorithms in data mining. Knowl Inf Syst 14: 1–37.
    [157] Xing FZ, Cambria E, Welsch RE (2017) Natural language based financial forecasting: a survey. Artif Intell Rev 50: 49–73. doi: 10.1007/s10462-017-9588-9
    [158] Xu Y, Yang C, Peng S, et al. (2020) A hybrid two-stage financial stock forecasting algorithm based on clustering and ensemble learning. Appl Intell 50: 3852–3867. doi: 10.1007/s10489-020-01766-5
    [159] Yan L, Bai B (2016) Correlated industries mining for chinese financial news based on LDA trained with research reports. In 2016 16th International Symposium on Communications and Information Technologies (ISCIT). IEEE, 131–135.
    [160] Yang R, Yu L, Zhao Y, et al. (2020) Big data analytics for financial market volatility forecast based on support vector machine. Int J Inf Manag 50: 452–462. doi: 10.1016/j.ijinfomgt.2019.05.027
    [161] Yeo B, Grant D (2018) Predicting service industry performance using decision tree analysis. Int J Inf Manag 38: 288–300. doi: 10.1016/j.ijinfomgt.2017.10.002
    [162] Yoo PD, Kim MH, Jan T (2005) Machine learning techniques and use of event information for stock market prediction: A survey and evaluation. In International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC06). IEEE. 2: 835–841.
    [163] Zhang Y, Yu G, Jin ZQ (2013) Violations detection of listed companies based on decision tree and k-nearest neighbor. In 2013 International Conference on Management Science and Engineering 20th Annual Conference Proceedings, 1671–1676.
    [164] Wu KP, Wu YP, Lee HM (2014) Stock trend prediction by using k-means and aprioriall algorithm for sequential chart pattern mining. J Inf Sci Eng 30: 653–667.
    [165] Zemke S (1999) Nonlinear index prediction. Physica A 269: 177–183.
    [166] Chenggang Zhang and Jingqing Jiang. A financial early warning algorithm based on ensemble learning. In 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA). IEEE, sep 2017. doi: 10.1109/ciapp.2017.8167192.
    [167] Zhang H, Li SF (2010) Forecasting volatility in financial markets. Key Eng Mater 439: 679–682. doi: 10.4028/www.scientific.net/KEM.439-440.679
    [168] Zhang JL, Härdle WK (2010) The bayesian additive classification tree applied to credit risk modelling. Comput Stat Data An 54: 1197–1205. doi: 10.1016/j.csda.2009.11.022
    [169] Zhang N, Lin A, Shang P (2017) Multidimensionalk-nearest neighbor model based on EEMD for financial time series forecasting. Physica A 477: 161–173. doi: 10.1016/j.physa.2017.02.072
    [170] Zhao QJ, SunQ, Che WG (2014) The application of bayesian discrimination in the analysis on media sector stock. Applied Mechanics and Materials 488: 1310–1313. doi: 10.4028/www.scientific.net/AMM.488-489.1310
    [171] Zhao Y (2021) Sports enterprise marketing and financial risk management based on decision tree and data mining. J Healthc Eng 2021: 1–8.
    [172] Guo ZQ, Wang HQ, Liu Q (2012) Financial time series forecasting using LPP and SVM optimized by PSO. Soft Comput 17: 805–818.
    [173] Zhu X, Che WG (2014) Research of outliers in time series of stock prices based on improved k-means clustering algorithm. Wit Trans Inf Commun Technol 46: 633–641.
    [174] Zhu Y, Xie C, Wang GJ, et al. (2016) Comparison of individual, ensemble and integrated ensemble machine learning methods to predict china's SME credit risk in supply chain finance. Neural Comput Appl 28: 41–50. doi: 10.1007/s00521-016-2304-x
    [175] Zhu Z, Liu N (2021) Early warning of financial risk based on k-means clustering algorithm. Complexity 2021: 1–12.
    [176] Zhuang Y, Xu Z, Tang Y (2015) A credit scoring model based on bayesian network and mutual information. In 2015 12th Web Information System and Application Conference (WISA).
    [177] Mirsadeghpour Zoghi SM, Saneie M, Tohidi G, et al. (2021) The effect of underlying distribution of asset returns on efficiency in dea models. Journal of Intelligent and Fuzzy Systems 40: 10273–10283. doi: 10.3233/JIFS-202332
    [178] Özorhan MO, Toroslu İH, Şehitoğlu OT (2018) Short-term trend prediction in financial time series data. Knowl Inf Syst 61: 397–429. doi: 10.1007/s10115-018-1303-x
  • This article has been cited by:

    1. Asha Hassan, Nyimvua Shaban, Onchocerciasis dynamics: modelling the effects of treatment, education and vector control, 2020, 14, 1751-3758, 245, 10.1080/17513758.2020.1745306
    2. Glenn Ledder, 2023, Chapter 6, 978-3-031-09453-8, 259, 10.1007/978-3-031-09454-5_6
  • Reader Comments
  • © 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(8336) PDF downloads(674) Cited by(11)

Article outline

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog