Research article Special Issues

Forecasting the movements of Bitcoin prices: an application of machine learning algorithms

  • Cryptocurrencies, such as Bitcoin, are one of the most controversial and complex technological innovations in today's financial system. This study aims to forecast the movements of Bitcoin prices at a high degree of accuracy. To this aim, four different Machine Learning (ML) algorithms are applied, namely, the Support Vector Machines (SVM), the Artificial Neural Network (ANN), the Naï ve Bayes (NB) and the Random Forest (RF) besides the logistic regression (LR) as a benchmark model. In order to test these algorithms, besides existing continuous dataset, discrete dataset was also created and used. For the evaluations of algorithm performances, the F statistic, accuracy statistic, the Mean Absolute Error (MAE), the Root Mean Square Error (RMSE) and the Root Absolute Error (RAE) metrics were used. The t test was used to compare the performances of the SVM, ANN, NB and RF with the performance of the LR. Empirical findings reveal that, while the RF has the highest forecasting performance in the continuous dataset, the NB has the lowest. On the other hand, while the ANN has the highest and the NB the lowest performance in the discrete dataset. Furthermore, the discrete dataset improves the overall forecasting performance in all algorithms (models) estimated.

    Citation: Hakan Pabuçcu, Serdar Ongan, Ayse Ongan. Forecasting the movements of Bitcoin prices: an application of machine learning algorithms[J]. Quantitative Finance and Economics, 2020, 4(4): 679-692. doi: 10.3934/QFE.2020031

    Related Papers:

    [1] Makoto Nakakita, Teruo Nakatsuma . Analysis of the trading interval duration for the Bitcoin market using high-frequency transaction data. Quantitative Finance and Economics, 2025, 9(1): 202-241. doi: 10.3934/QFE.2025007
    [2] Per B. Solibakke . Forecasting hourly WTI oil front monthly price volatility densities. Quantitative Finance and Economics, 2024, 8(3): 466-501. doi: 10.3934/QFE.2024018
    [3] Sylvia Gottschalk . Digital currency price formation: A production cost perspective. Quantitative Finance and Economics, 2022, 6(4): 669-695. doi: 10.3934/QFE.2022030
    [4] David Alaminos, M. Belén Salas, Ángela M. Callejón-Gil . Managing extreme cryptocurrency volatility in algorithmic trading: EGARCH via genetic algorithms and neural networks. Quantitative Finance and Economics, 2024, 8(1): 153-209. doi: 10.3934/QFE.2024007
    [5] Nilcan Mert, Mustafa Caner Timur . Bitcoin and money supply relationship: An analysis of selected country economies. Quantitative Finance and Economics, 2023, 7(2): 229-248. doi: 10.3934/QFE.2023012
    [6] Haoyu Wang, Dejun Xie . Optimal profit-making strategies in stock market with algorithmic trading. Quantitative Finance and Economics, 2024, 8(3): 546-572. doi: 10.3934/QFE.2024021
    [7] Samuel Asante Gyamerah . Modelling the volatility of Bitcoin returns using GARCH models. Quantitative Finance and Economics, 2019, 3(4): 739-753. doi: 10.3934/QFE.2019.4.739
    [8] Timotheos Paraskevopoulos, Peter N Posch . A hybrid forecasting algorithm based on SVR and wavelet decomposition. Quantitative Finance and Economics, 2018, 2(3): 525-553. doi: 10.3934/QFE.2018.3.525
    [9] Lukáš Pichl, Taisei Kaizoji . Volatility Analysis of Bitcoin Price Time Series. Quantitative Finance and Economics, 2017, 1(4): 474-485. doi: 10.3934/QFE.2017.4.474
    [10] Zheng Nan, Taisei Kaizoji . Bitcoin-based triangular arbitrage with the Euro/U.S. dollar as a foreign futures hedge: modeling with a bivariate GARCH model. Quantitative Finance and Economics, 2019, 3(2): 347-365. doi: 10.3934/QFE.2019.2.347
  • Cryptocurrencies, such as Bitcoin, are one of the most controversial and complex technological innovations in today's financial system. This study aims to forecast the movements of Bitcoin prices at a high degree of accuracy. To this aim, four different Machine Learning (ML) algorithms are applied, namely, the Support Vector Machines (SVM), the Artificial Neural Network (ANN), the Naï ve Bayes (NB) and the Random Forest (RF) besides the logistic regression (LR) as a benchmark model. In order to test these algorithms, besides existing continuous dataset, discrete dataset was also created and used. For the evaluations of algorithm performances, the F statistic, accuracy statistic, the Mean Absolute Error (MAE), the Root Mean Square Error (RMSE) and the Root Absolute Error (RAE) metrics were used. The t test was used to compare the performances of the SVM, ANN, NB and RF with the performance of the LR. Empirical findings reveal that, while the RF has the highest forecasting performance in the continuous dataset, the NB has the lowest. On the other hand, while the ANN has the highest and the NB the lowest performance in the discrete dataset. Furthermore, the discrete dataset improves the overall forecasting performance in all algorithms (models) estimated.


    The rapid development of digital currencies during the last decade is one of the most controversial and ambiguous innovations in the modern global economy. Rising technology changes the structure of economies, financial markets and payment methods. The world's financial markets have become more digital than ever before and cashless society is around the corner. Today's technology enables people to create their own money (digital cryptocurrency) and the functions of the central banks, as lenders of last resorts, are discussed and questioned. Bitcoin, as a financial phenomenon, as well as other cryptocurrencies, are in fact data treated like money. Users (called "miners") send and receive these cryptocurrencies (data) electronically from their computers in peer-to-peer network systems to pay for things, if other parties are willing to accept such payments. Market capitalization and the number of miners of 2957 cryptocurrencies reached $221 billion (Bitcoin $147) and 42 million in 2019. The price of Bitcoin has drastically increased from $0.0008 to $10,168 per single coin from being launched in January 2009 to February 2020. Hence, first and foremost, the Bitcoin and other cryptocurrencies have become extremely popular due to increasing number of their users and their huge gains. On the other hand, Bitcoin's and other cryptocurrencies' prices-series, similar to other financial assets-series, exhibit chaotic fluctuations. Because of asymmetric information problems in financial markets, increasing economic-political uncertainties and changing behaviors of miners may make the prices of cryptocurrencies not easily predictable for investors. Cryptocurrencies' forecasting difficulties may well be higher than those of other conventional assets; although they are so popular for investors, very little is known about them, about how they work and how they are created (mined), since they are not physical currencies. Accordingly, accurately forecasting their prices may minimize potential losses-risks for users.

    This study aims to forecast the movements of Bitcoin prices at high degree of accuracy. To this end, machine learning (henceforth, ML) algorithms are applied, which do not require strict assumptions like traditional methods, (e.g., regression analysis, discriminant analysis, cluster analysis, etc.). While traditional models use whole data to investigate causal relations, ML algorithms normally split the dataset into training and testing sets. Hence, ML allows computers to "learn" and make predictions. Although both methods try to increase the accuracy by minimizing some loss functions, ML does so using nonlinear algorithms (Butner et al., 2019; Makridakis et al., 2018). This does not mean that ML algorithms always outperform traditional models. However, ML algorithms, specially developed to address specific problems, may provide better forecasts for large datasets. All these make ML algorithms very popular for the scholars to apply.

    Many studies empirically compare traditional models and ML algorithms concerning their forecasting performances. In some studies, traditional models outperform ML algorithms, while in others the latter outperform the former. For instance, Jang & Lee (2018) compare the Bayesian neural network and traditional models in forecasting Bitcoin prices. They find that the Bayesian neural network offers higher performance than traditional models. Similarly, McNally et al. (2018) compare the accuracy rates of ML algorithms with auto regressive integrated moving average (ARIMA) model for forecasting Bitcoin prices. They find that ML outperforms the ARIMA model. Rebane et al. (2018) find that the recurrent neural network (RNN) outperforms the ARIMA model in forecasting the prices of cryptocurrencies. Nguyen & Le (2019) apply the ARIMA model and ML algorithms to forecast Bitcoin prices and find that ML algorithms outperform the ARIMA model. Yao et al. (2019) examine the impacts of news articles on Bitcoin prices and find that ML algorithms offer better performance than traditional models. However, Felizardo et al. (2019) compare the performances of the ARIMA with the RF and the SVM when forecasting Bitcoin prices. They find that the ARIMA model outperforms ML algorithms. Chen et al. (2020) compare traditional models, such as logistic regression and discriminant analysis, with ML algorithms and find that traditional models show better performance in forecasting Bitcoin prices. On the other hand, in some studies, different ML algorithms are compared against each other for their forecasting performances. For instance, Ji et al. (2019) compare the deep neural network (DNN) and long short-term memory (LSTM) for forecasting Bitcoin prices and find that the LSTM slightly outperforms the DNN. Kwon et al. (2019) compare the LSTM and gradient boosting algorithms and find that the LSTM provides a better performance than the gradient boosting algorithm. Furthermore, Miller et al. (2019) use the nonparametric regression method of smoothing splines on 1-minute Bitcoin price data. They find that this method provides better performance than unconditional trading strategies. Lahmiri & Bekiros (2020) use deep learning techniques to forecast the price of the Bitcoin, Digital Cash and Ripple. They find that long-short term memory neural network topologies (LSTM) provides better performance than the generalized regression neural architecture. Huang (2019) use classification tree-based model with 124 technical indicators to investigate cryptocurrency return predictability. They find that this model has strong predictive power. Corbet et al. (2019) find that the variable-length moving average rule performs the best with buy signals for Bitcoin. Atsalakis et al. (2019) use a hybrid Neuro-Fuzzy controller, namely PATSOS, to predict the directional change of the daily price of Bitcoin. They find that performance of the PATSOS system is robust to be used for all cryptocurrencies. Adcock & Gradojevic (2019) find that neural networks provides better performance than various competing models on the prediction of Bitcoin returns. Shu & Zhu (2020) use adaptive multilevel time series detection methodology to predict the bubbles in Bitcoin. They find that this methodology is robust to be used not only on cryptocurrencies but also in financial markets. Balcilar et al. (2017) use non-parametric causality-in-quantiles test to investigate casual relation between trading volume and Bitcoin returns and volatility. They reveal the importance of modelling nonlinearity and accounting on causal relationships. Gyamerah (2019) uses the GARCH models to evaluate the volatility of Bitcoin returns. The author finds that t-GARCH-NIG has the best performance in prediction of the volatilities. Panagiotidis et al. (2018) use the least absolute shrinkage and selection operator (LASSO) framework to examine the effects of factors on Bitcoin returns. The find that gold returns have the most important effects on returns.

    This study differs from the studies mentioned in three aspects. First, we apply four different ML algorithms simultaneously to compare their performances. Second, we use nine technical input parameters followed by Armano et al. (2005), Atsalakis & Valavanis (2009), Kara et al. (2011) and Kim (2003). Third, besides the existing continuous dataset, discrete dataset was also created and used. Therefore, all these will enable us to understand which ML algorithm offers higher forecasting performance in continuous and discrete datasets separately and comparatively.

    This study is organized as follows. Sections 2 and 3 provide research data preparation and empirical methodology, respectively. Section 4 provides empirical findings obtained from continuous and discrete datasets. Finally, section 5 presents the discussion-conclusion.

    In this study, for the output, we considered the changes of up and down movements of closing prices from previous days. We coded these as +1 and −1 for ups and downs, respectively. We used the same output for continuous and discrete datasets. Closing, high and low prices were used for computing technical indicators and output as reported in Table 1. Continuous (existing) dataset, between 2008–2019 was normalized for all ML models (n = 1935). We used model validation to compare the performances and significances of the models with benchmark model. The validation dataset is consisting of the number (n = 100) of Bitcoin series between June 2020–October 2020. This validation dataset was divided into 10 sub-datasets with 10 samples for each. For each sample, the movement estimations of each estimated model and accuracy statistics were calculated. The average accuracies of these 10 sub-datasets were bilaterally compared with the LR statistics with t test. The accuracy statistics for both continuous and discrete datasets were calculated.

    Table 1.  Selected technical indicators.
    Indicators Formula
    Simple 14 days moving average (MA) Ct+Ct1++Ct14/14
    Simple 14 days weighted moving average (WMA) ((n)Ct+(n1)Ct1++Ct14(n+(n1)++1)
    Momentum (Mom) CtCtn
    Stochastic K% (K%) CtLLtnHHtnLLtn100
    Stochastic D% (D%) n1i=0Kti%/n
    Relative strength index (RSI) 1001001+(n1i=0Upti/n)/n1i=0Dwti/n)
    Moving average convergence/divergence (MACD) MACD(n)t1+2n+1(DIFFtMACD(n)t1)
    Larry William's R% (LW) HnCtHnLn100
    Accumulation/distribution oscillator (A/D) HtCt1HtLt
    Note: Source: (Kara et al., 2011); *n is the number of days accepted as 10 here, Ct closing price, Lt low price ve Ht High price. DIFFt:EMA(12)tEMA(26)t. EMA is exponential moving average, EMA(k)t:EMA(k)t1+α(CtEMA(k)t1),α is correction factor. LLt is the lowest low, HHt is the highest high for the last t days. Mt=(Ht+Lt+Ct)/3, SMt=(ni=0Mti+1/n), Dt=(ni=1|Mti+1SMt|/n), Upt and Dwt are upward and downward price change at time t respectively…

     | Show Table
    DownLoad: CSV

    For creating the discrete dataset, the continuous dataset was converted to −1 or +1 by applying the discretization process. +1 and −1 indicate upward and downward movements, respectively (Patel et al., 2015). This new dataset represents the trend of indicators. The discretization process of each technical input indicator is explained in the following paragraphs.

    The moving average (MA) and the weighted moving average (WMA) represent average price changes over a certain period. The MA, as a most used and simplest indicator, indicates the general direction of the trend. In this paper, 14 days MA and WMA were used for short-term forecasting. If the current Bitcoin price is above the MA or WMA, this means that the trend is upward, and the value is labeled as +1. If the current Bitcoin price is below the MA or WMA, this means that the trend is downward, and the value is labeled as −1. Financial time series like Bitcoin prices exhibits speculative movements. Therefore, long run predictions may provide not accurate results. Hence, in our technical analyses, we used short-term moving averages for 7–14 days. The exponential moving average (EMA) assigns more weight the most recent data. Hence, it smooths the data and thereby provides more importance to the current trend.

    Momentum (Mom) is an indicator that represents the effect of price changes and presents information of the sustainability of the current trend. If the momentum value is positive, the trend is "upward" and labeled as +1. If the momentum value is negative, the trend is "downward" and labeled as −1. The main problems of determining the momentum boundary line are crisp rises and slumps of time "t" for any value, since these changes can affect the momentum boundary line. Momentum is one of the leading indicators which measures velocity of the changes in security prices in a specific period of time. It compares prices of t and t-1terms.

    The stochastic indicators K%, D% and LW are clear data trends. A stochastic oscillator, as a one of the most important indicators, determines securities' momentum and identifies the overbought and oversold levels. It utilizes a 0–100 bounded range of values. The LW, developed by Larry Williams, is very similar to the stochastic oscillator and is used in the same way. It compares securities' closing prices and their the high-low ranges over time. If the value of an indicator at time "t" is greater than the value at time "t1", the trend is "upward" and labeled as +1 and if the value of an indicator at time "t" is lower than the value at time "t1", the trend is "downward" and labeled as −1. A stochastic oscillator tends to vary around some mean price level, since they consider-account an asset's price history as an overbought and oversold signal. It utilizes a 0–100 bounded range of values.

    The Relative strength index (RSI) charts the speed and scale of directional changes in values. The RSI has different values that determine trend behavior. It measures the speed and magnitude of the changes in recent prices to determine overbought or oversold levels of the prices of the securities. If the value of RSI is lower than 30, it is labeled as +1, higher than 70 is labeled as −1. For values between 30–70, if the value of RSI at time "t" is higher than the value at time "t1", the trend is "upward" and labeled as +1, and vice-a-versa.

    Moving average convergence/divergence (MACD) indicator is related to movements of prices. The MACD, developed by Gerald Appel, shows the relationships between two moving averages of the securities' prices. It is calculated by using the differences of short and long Exponential Moving Averages (EMA). If the MACD increases, then prices increase and if the MACD decreases, then prices decrease. If the value of MACD at time "t" is greater than the value at time "t−1", the trend is "upward" and labeled as +1, and vice-a-versa.

    Figure 1.  Forecasting mechanisms.

    Following the calculations of nine technical input parameters, we apply our ML algorithms. The ANN (Artificial Neural Network) has been commonly used in forecasting price movements. Researchers prefer this algorithm due to its multilayer perceptron (MLP) flexibility (Mallqui & Fernandes, 2019). In this study, tangent sigmoid transfer and logistic transfer functions are used for hidden and output layers, respectively. Threshold was used to predict up and down movements of the Bitcoin prices. Several configurations were tried to determine the best parameter settings for the ANN. Parameter setting levels of the ANN models are reported in Table 2.

    Table 2.  Parameter settings for ANN.
    Parameter Level
    Number of neurons in hidden layer (n)
    Iteration(ep)
    Momentum constant (mc)
    Learning rate (lr)
    5, …, 50
    250,500, …, 2000
    0.1, 0.2, …, 0.9
    0.1, 0.2, 0.3

     | Show Table
    DownLoad: CSV

    The SVM (Support Vector Machine), proposed by Vapnik (1995) is based on a structural risk minimization process by maximizing the margin between negative and positive samples. The SVM constructs a hyperplane, which can separate the classes of the real problem (Kara et al., 2011). This is not a stochastic model. It means that it always gives the same results when the same dataset is processed at any given time. In this study, different levels of parameter settings were used to determine the best estimator, as reported in Table 3.

    Table 3.  Parameter settings for SVM.
    Parameter Level (polynomial) Level (RBF-Gaussian)
    Kernel function degree (d)
    Kernel function Gamma coefficient (γ)
    Regularization parameter (c)
    1, 2, 3, 4
     
    1, 10,100
     
    0, 0.1, 0.2, …, 5.0
    1, 10,100

     | Show Table
    DownLoad: CSV

    The NB (Naïve Bayes) is one of the machine learning classification algorithms based on a conditional probability principle, which is known as Bayes Theorem. Due to the simplicity of its calculation and usage, the NB is superior to other machine learning algorithms. This algorithm uses a Bayesian classifier to forecast the probability of samples belonging to a specific class of the given dataset. The NB has no other parameter set to construct the forecasting model.

    The RF (Random Forest) is a classification algorithm that is very efficient and offers the opportunity to compare the results with other classification algorithms. ID3 (Quinlan, 1986), C4.5 (Quinlan, 1988) and the CART (Breiman, 1984) are the most powerful and commonly used classification algorithms, known as decision-tree based. The RF belongs to an ensemble-learning algorithm based on the idea that a single classifier could not be capable of determining the class of test data. In this study, randomly selected features varying from 3 to 100 and a number of trees varying from 3 to 300 were used to determine the best parameter setting.

    The LR (Logistic Regression) is a popular technique to model the probability of discrete (i.e., binary or multinomial) outcomes. In this study, this technique, as a benchmark model, was used to compare the performances of machine learning algorithms.

    In order to test algorithms mentioned above and compare their performance, F statistics are calculated by using the true/false positive (TP-FP) and true/false negative (TN-FN), following the equations below (Patel et al., 2015).

    Precisionpositive=TPTP+FP (1)
    Precisionnegative=TNTN+FN (2)
    Recallpositive=TPTP+FN (3)
    Recallnegative=TNTN+FP (4)
    Accuracy=TP+TNTP+FP+TN+FN (5)
    F=2PrecisionRecallPrecision+Recall (6)

    Machine learning algorithms do not require stationary tests differently from econometric models. In order to test the performances of selected algorithms, besides F statistics, mean absolute error (MAE), root mean square error (RMSE) and root absolute error (RAE) are also used. The continuous dataset was normalized for all models estimated and divided into two parts as training (75%) and testing (25%). Furthermore, a new validation dataset (n = 100) was used for testing the statistical significances of the performance differecences between estimated models and benchmark model.

    In this section, the estimated model parameters are reported for continuous and discrete datasets, respectively. Descriptive statistics for inputs are reported in Table 4.

    Table 4.  Descriptive statistics for selected indicators.
    Indicator Minimum Maximum Mean Standard dev.
    MA 158.407 16866.037 2501.552 3395.42
    WMA 176.498 17802.757 2507.79 3402.815
    Mom −5578 8212.55 23.237 920.879
    K% 0 100 54.624 29.336
    D% 6.337 93.153 54.343 22.571
    RSI 10.954 93.491 52.549 14.209
    MACD −1479.221 2520.715 12.44 292.976
    LW −100 0 -45.376 29.33
    A/D −0.879 1.521 0.399 60.185

     | Show Table
    DownLoad: CSV

    The best parameter combinations are determined by means of experiments for each forecasting algorithm. The estimated best three parameter combinations of the ANN models for continuous data are reported in Tables 58.

    Table 5.  Best three-parameter combinations for ANN.
    Learning rate (lr) Iteration (ep) momentum constant (mc) Hidden neuron (n) Accuracy MAE RMSE RAE
    1 0.3 500 0.2 6 0.843 0.203 0.341 0.409
    2
    3
    0.3
    0.3
    500
    500
    0.2
    0.2
    8
    7
    0.841
    0.835
    0.201
    0.201
    0.360
    0.349
    0.405
    0.404

     | Show Table
    DownLoad: CSV
    Table 6.  Best three-parameter combinations for SVM.
    Kernel function d γ c Accuracy MAE RMSE RAE
    1 Polynomial 2 - 100 0.808 0.192 0.438 0.387
    2
    3
    4
    5
    6
    Polynomial
    Polynomial
    RBF (Gaussian)
    RBF (Gaussian)
    RBF (Gaussian)
    1
    2
    -
    -
    -
    -
    -
    0.1
    0.1
    0.1
    30
    20
    20
    10
    40
    0.804
    0.802
    0.733
    0.729
    0.717
    0.196
    0.198
    0.266
    0.271
    0.283
    0.443
    0.445
    0.516
    0.520
    0.532
    0.395
    0.339
    0.538
    0.546
    0.571

     | Show Table
    DownLoad: CSV
    Table 7.  NB classification parameters.
    Accuracy MAE RMSE RAE
    1 0.626 0.368 0.572 0.743
    2 (Gaussian) 0.717 0.283 0.461 0.571

     | Show Table
    DownLoad: CSV
    Table 8.  Best three-parameter combinations for RF.
    Feature Number of tree Accuracy MAE RMSE RAE
    1 3 297 0.884 0.191 0.297 0.384
    2
    3
    8
    6
    251
    267
    0.882
    0.880
    0.180
    0.184
    0.293
    0.293
    0.362
    0.370

     | Show Table
    DownLoad: CSV

    Test results in Table 5 indicate that the accuracy levels and error statistics calculated are within acceptable levels. The best accuracy level is determined as 0.843 for the ANN. This means that we will be able to forecast the movements of Bitcoin prices at a high degree of accuracy. After training processes, hidden neurons, momentum constant and learning rates are found as 6, 0.2 and 0.3, respectively. Test results of the best three SVM models based on 3 polynomial and gaussian functions are reported in Table 6.

    The accuracy statistics are used to determine the best estimated model. Polynomial and radial basis (Gaussian) Kernel functions are used. The best accuracy level is determined as 0.808 with second degree polynomial Kernel, as reported in Table 6. The test results of the NB are reported in Table 7.

    Test results indicate that the accuracy level is determined to be 0.717 for Gaussian NB classifiers as the best forecasting algorithm. Test results for the RF model are reported in Table 8.

    In Table 8, a number of trees are selected as parameter for the RF. It ranges from 50 to 300 during the best parameter selection process and it uses 1 to 10 features to train the trees. The best accuracy level is selected as 0.884 for RF with 3 features and 297 trees. Performance comparisons of the models described above are reported in Table 9.

    Table 9.  Comparison the best models.
    TP FP ROC F-Stat. Rank
    ANN
    SVM
    NB
    RF
    LR
    0.843
    0.808
    0.717
    0.884
    0.781
    0.149
    0.191
    0.278
    0.118
    0.832
    0.910
    0.809
    0.826
    0.949
    0.828
    0.843
    0.808
    0.717
    0.884
    0.562
    2
    3
    4
    1
    (Benchmark)

     | Show Table
    DownLoad: CSV

    Test results in Table 9 indicate that, while the Gaussian process NB model presents the lowest performance at 0.717, the RF model has the highest at 0.884 value of F statistic. The performance differences of the ANN, RF, SVM and NB algorithms with the LR model are statistically significant and they provide better performances compared to the LR model. The results of t tests were reported in Table 15.

    Table 10.  Best three-parameter combinations for ANN.
    Learning rate (lr) Iteration (ep) momentum constant (mc) Hidden neuron (n) Accuracy MAE RMSE RAE
    1 0.3 500 0.2 20 0.9483 0.072 0.206 0.480
    2 0.1 500 0.1 20 0.9463 0.077 0.207 0.512
    3 0.1 500 0.1 20 0.9395 0.086 0.214 0.546

     | Show Table
    DownLoad: CSV
    Table 11.  Best three parameter combinations for SVM.
    Kernel function d γ c Accuracy MAE RMSE RAE
    1 Polynomial 3 - 1 0.9463 0.054 0.232 0.358
    2 Polynomial 3 - 2 0.9442 0.056 0.236 0.372
    3 Polynomial 2 - 1 0.9421 0.058 0.240 0.386
    4 RBF (Gaussian) - 0.2 1 0.9483 0.052 0.227 0.344
    5 RBF (Gaussian) - 0.2 100 0.9463 0.054 0.232 0.358
    6 RBF (Gaussian) - 0.1 10 0.9442 0.056 0.236 0.372

     | Show Table
    DownLoad: CSV
    Table 12.  NB classification parameters.
    Accuracy MAE RMSE RAE
    1 0.8822 0.136 0.310 0.905
    2 (Gaussian) 0.8822 0.134 0.309 0.905

     | Show Table
    DownLoad: CSV
    Table 13.  Best three-parameter combinations for RF.
    Feature Number of tree Accuracy MAE RMSE RAE
    1 10 79 0.9462 0.076 0.205 0.509
    2 8 71 0.9438 0.078 0.208 0.513
    3 10 69 0.9390 0.085 0.212 0.538

     | Show Table
    DownLoad: CSV
    Table 14.  Comparison the best models.
    TP FP ROC F-Stat. Rank
    ANN 0.948 0.557 0.931 0.941 1
    SVM 0.948 0.610 0.669 0.938 3
    NB 0.882 0.167 0.901 0.902 4
    RF 0.946 0.557 0.923 0.939 2
    LR 0.858 0.873 0.681 0.854 (Benchmark)

     | Show Table
    DownLoad: CSV
    Table 15.  t test results of model comparisons in terms of benchmark.
    Dataset Model Mean (Accuracy) N Std. Dev. t
    Continuous LR 0.551 10 0.719
    ANN 0.826 10 0.017 −13.658*
    RF 0.854 10 0.056 −9.467*
    SVM 0.754 10 0.064 −11.809*
    NB 0.657 10 0.040 −4.262**
    Discrete LR 0.623 10 0.032
    ANN 0.850 10 0.037 −13.208*
    RF 0.835 10 0.012 −16.582*
    SVM 0.786 10 0.053 −7.398*
    NB 0.674 10 0.027 −2.908*
    Note: *shows the statistical significance at level 0.01 ** shows the statistical significance at level 0.05.

     | Show Table
    DownLoad: CSV

    The best parameter combinations for discrete dataset were determined by means of experiments for each of the forecasting algorithms using discrete data. The selected best three parameter combinations of all models except NB are reported in Tables 1013.

    Test results in Table 10 indicate that accuracy levels and error statistics calculated are within acceptable levels. The best accuracy level is determined as 0.948 for the ANN. After training processes, hidden neurons, momentum constant and learning rates are found to be 20, 0.2 and 0.3, respectively. Test results for the SVM model are reported in Table 11.

    The error statistics and accuracy statistic are used to determine the best estimated models. Polynomial and radial basis (Gaussian) Kernel functions are used. The best accuracy level is calculated to be 0.9483 and Gaussian Kernel functions are used with the 0.2 Gamma coefficient, and 1 as a regularization parameter. Test results of the NB model are reported in Table 12 below.

    Test results in Table 12 indicate that the best accuracy level for the NB classifiers is estimated to be 0.882 with lower MAE and RMSE by fitting the multivariate Bernoulli distribution. Test results of the RF model are reported in Table 13 below.

    A number of trees was selected for the RF algorithm. It ranged from 50 to 300 during the best parameter selection process and it used 1 to 10 features to train the trees. The best accuracy level is determined as 0.946 for the RF with 10 features and 79 trees. Performance comparisons of these models are reported in Table 14.

    Test results in Table 14 indicate that, while the NB with the multivariate Bernoulli distribution has the lowest performance at 0.902, the ANN presents the highest accuracy at 0.941 value of F statistic. For discrete dataset, the ANN, RF, SVM and NB algorithms provided higher performance with higher accuracy and F statistics compared to the LR model as shown in Figure 2. The performance differences are statistically significant for all compared groups as shown in Table 15. Hence, the ANN, RF, SVM and NB algorithms produced better Bitcoin movement predictions compared to benchmark model.

    Figure 2.  F statistics with continuous and discrete data for all models.

    This study aims to forecast the movements of Bitcoin prices at a high degree of accuracy. To this aim, four different Machine Learning algorithms are applied, namely the Artificial Neural Network, Random Forest, Support Vector Machines, the Naïve Bayes and besides to the logistic regression (LR) as benchmark model. In order to test these algorithms, besides existing continuous dataset, a discrete dataset was also created and used.

    Empirical findings reveal that, while the RF has the highest forecasting performance, the NB has the lowest in continuous dataset. On the other hand, while the ANN has the highest performance, the NB has the lowest in discrete dataset. Furthermore, discrete dataset improves the overall forecasting performance in all models estimated. The RF has become more popular than the ANN with its ease of use. However, these comparisons potentially can change with new datasets. Furthermore, it should be noted that it will be hard to consider all combinations in a single study. Hence, the performances of the Machine Learning algorithms increase over time.

    Each of the nine technical parameters used in this study can also be considered as an estimator. However, these parameters were used after considering their trend characteristics rather than their direct usage as estimators, since this transformation may increase the forecasting performance. This means that with this transformation done in this study, the real-time expert systems may provide advantages to investors to allow for more profitable and safe investments. Furthermore, although it is widely accepted that preprocessing data is not necessary when ML algorithms are used, this study reveals that preprocessing data increases forecasting performances. In this study, algorithms classify Bitcoin prices as up-down. However, instead of only two categories, it is suggested that multi-categories using different algorithms may be used for future forecasts.

    This study shows the need for more empirical studies using other techniques to ensure more accurate forecasts for the movements of the Bitcoin price, which exhibit chaotic and nonlinear characteristics (fluctuations) in conditions of increasing economic uncertainties. At this point, besides the nine technical parameters used in this study, some other macroeconomic parameters, such as exchange rate, interest rate, government policy implementations, are proposed for use in these models as new inputs (variables), because all these variables may easily affect the financial markets involving cryptocurrencies.

    The authors declare no conflicts of interest in this paper.



    [1] Adcock R, Gradojevic N (2019) Non-fundamental, non-parametric Bitcoin forecasting. Phys A 531: 121727.
    [2] Armano G, Marchesi M, Murru A (2005) A hybrid genetic-neural architecture for stock indexes forecasting. Inf Sci 170: 3-33. doi: 10.1016/j.ins.2003.03.023
    [3] Atsalakis GS, Atsalaki IG, Pasiouras F, et al. (2019) Bitcoin price forecasting with neuro-fuzzy techniques. Eur J Oper Res 276: 770-780. doi: 10.1016/j.ejor.2019.01.040
    [4] Atsalakis GS, Valavanis KP (2009) Forecasting stock market short-term trends using a neuro-fuzzy based methodology. Expert Syst Appl 36: 10696-10707. doi: 10.1016/j.eswa.2009.02.043
    [5] Balcilar M, Bouri E, Gupta R, et al. (2017) Can volume predict Bitcoin returns and volatility? A quantiles-based approach. Econ Model 64: 74-81. doi: 10.1016/j.econmod.2017.03.019
    [6] Breiman L (1984) Classification and regression trees (Online pub), New York, NY: Routledge.
    [7] Butner JE, Munion AK, Baucom BRW, et al. (2019) Ghost hunting in the nonlinear dynamic machine. PloS One 14: 1-21. doi: 10.1371/journal.pone.0226572
    [8] Chen Z, Li C, Sun W (2020) Bitcoin price prediction using machine learning: An approach to sample dimension engineering. J Comput Appl Math 365: 1-13. doi: 10.1007/s12190-020-01341-8
    [9] Corbet S, Eraslan V, Lucey B, et al. (2019) The effectiveness of technical trading rules in cryptocurrency markets. Financ Res Lett 31: 32-37. doi: 10.1016/j.frl.2019.04.027
    [10] Felizardo L, Oliveira R, Del-Moral-Hernandez E, et al. (2019) Comparative study of Bitcoin price prediction using WaveNets, Recurrent Neural Networks and other Machine Learning Methods, In 2019 6th International Conference on Behavioral, Economic and Socio-Cultural Computing (BESC), 1-6.
    [11] Gyamerah SA (2019) Modelling the volatility of Bitcoin returns using GARCH models. Quant Finan Econ 3: 739-753. doi: 10.3934/QFE.2019.4.739
    [12] Huang JZ, Huang W, Ni J (2019) Predicting bitcoin returns using high-dimensional technical indicators. J Financ Data Sci 5: 140-155. doi: 10.1016/j.jfds.2018.10.001
    [13] Jang H, Lee J (2018) An Empirical Study on Modeling and Prediction of Bitcoin Prices With Bayesian Neural Networks Based on Blockchain Information. IEEE Access 6: 5427-5437. doi: 10.1109/ACCESS.2017.2779181
    [14] Ji S, Kim J, Im H (2019) A comparative study of bitcoin price prediction using deep learning. Mathematics 7: 1-20.
    [15] Kara Y, Acar Boyacioglu M, Baykan ÖK (2011) Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul Stock Exchange. Expert Syst Appl 38: 5311-5319. doi: 10.1016/j.eswa.2010.10.027
    [16] Kim KJ (2003) Financial time series forecasting using support vector machines. Neurocomputing 55: 307-319. doi: 10.1016/S0925-2312(03)00372-2
    [17] Kwon DH, Kim JB, Heo JS, et al. (2019) Time series classification of cryptocurrency price trend based on a recurrent LSTM neural network. J Inf Process Syst 15: 694-706.
    [18] Lahmiri S, Bekiros S (2020) Intelligent forecasting with machine learning trading systems in chaotic intraday Bitcoin market. Chaos Solitons Fractals 133: 109641.
    [19] Makridakis S, Spiliotis E, Assimakopoulos V (2018) Statistical and Machine Learning forecasting methods: Concerns and ways forward. PLOS ONE 13: 1-26. doi: 10.1371/journal.pone.0194889
    [20] Mallqui DCA, Fernandes RAS (2019) Predicting the direction, maximum, minimum and closing prices of daily Bitcoin exchange rate using machine learning techniques. Appl Soft Comput 75: 596-606. doi: 10.1016/j.asoc.2018.11.038
    [21] McNally S, Roche J, Caton S (2018) Predicting the Price of Bitcoin Using Machine Learning, Proceedings—26th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2018,339-343.
    [22] Miller N, Yang Y, Sun B, et al. (2019) Identification of technical analysis patterns with smoothing splines for bitcoin prices. J Appl Stat 46: 2289-2297. doi: 10.1080/02664763.2019.1580251
    [23] Nguyen DT, Le HV (2019) Predicting the Price of Bitcoin Using Hybrid ARIMA and Machine Learning, In: T. K. Dang, J. Küng, M. Takizawa, & S. H. Bui (Eds.), Future Data and Security Engineering, Cham: Springer International Publishing, 696-704.
    [24] Panagiotidis T, Stengos T, Vravosinos O (2018) On the determinants of bitcoin returns: A LASSO approach. Financ Res Lett 27: 235-240. doi: 10.1016/j.frl.2018.03.016
    [25] Patel J, Shah S, Thakkar P, et al. (2015) Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning techniques. Expert Syst Appl 42: 259-268. doi: 10.1016/j.eswa.2014.07.040
    [26] Quinlan JR (1986) Induction of decision trees. Mach Learn 1: 81-106.
    [27] Quinlan JR (1988) C4.5: programs for machine learning, London, England: Morgan Kaufmann Publishers, Inc.
    [28] Rebane J, Karlsson I, Denic S, et al. (2018) Seq2Seq RNNs and ARIMA models for Cryptocurrency Prediction: A Comparative Study. SIGKDD Fintech 18: 2-6.
    [29] Shu M, Zhu W (2020) Real-time prediction of Bitcoin bubble crashes. Phys A 548: 124477.
    [30] Vapnik V (1995) The nature of statistical learning theory, New York, NY: Springer.
    [31] Yao W, Xu K, Li Q (2019) Exploring the Influence of News Articles on Bitcoin Price with Machine Learning, In: 2019 IEEE Symposium on Computers and Communications (ISCC), 1-6.
  • This article has been cited by:

    1. Zhenghui Li, Hao Dong, Christos Floros, Athanasios Charemis, Pierre Failler, Re-examining Bitcoin Volatility: A CAViaR-based Approach, 2021, 1540-496X, 1, 10.1080/1540496X.2021.1873127
    2. Lean Yu, Lihang Yu, Kaitao Yu, A high-dimensionality-trait-driven learning paradigm for high dimensional credit classification, 2021, 7, 2199-4730, 10.1186/s40854-021-00249-x
    3. Nguyen Dinh Thuan, Nguyen Thi Viet Huong, 2023, Chapter 29, 978-3-031-18460-4, 426, 10.1007/978-3-031-18461-1_29
    4. Feite Zhou, Zhehao Huang, Changhong Zhang, Carbon price forecasting based on CEEMDAN and LSTM, 2022, 311, 03062619, 118601, 10.1016/j.apenergy.2022.118601
    5. Konstantinos Gkillas, Maria Tantoula, Manolis Tzagarakis, Transaction activity and bitcoin realized volatility, 2021, 49, 01676377, 715, 10.1016/j.orl.2021.06.016
    6. Shruti Goswami, Vijendra Singh Bramhe, Kanika Singla, Shaveta Khepra, 2022, Chapter 4, 978-981-19-2718-8, 29, 10.1007/978-981-19-2719-5_4
    7. Naveen Chakravarthy Sattaru, Dhananjay Umrao, K K Ramachandran, K K Karthick, Mohit Tiwari, Suresh Kumar M V, 2022, Machine Learning as a Predictive Technology And Its Impact on Digital Pricing and Cryptocurrency Markets, 978-1-6654-3789-9, 1077, 10.1109/ICACITE53722.2022.9823734
    8. Pavan Kumar Nagula, Christos Alexakis, A new hybrid machine learning model for predicting the bitcoin (BTC-USD) price, 2022, 36, 22146350, 100741, 10.1016/j.jbef.2022.100741
    9. Mamoona Zahid, Farhat Iqbal, Dimitrios Koutmos, Forecasting Bitcoin Volatility Using Hybrid GARCH Models with Machine Learning, 2022, 10, 2227-9091, 237, 10.3390/risks10120237
    10. Yiyun Zhou, Prediction on Bitcoin Price Trends based on Machine Learning Algorithms, 2022, 34, 2692-6156, 21, 10.54691/bcpbm.v34i.2860
    11. Francisco Jareño, María De La O González, Pascual Belmonte, Asymmetric interdependencies between cryptocurrency and commodity markets: the COVID-19 pandemic impact, 2022, 6, 2573-0134, 83, 10.3934/QFE.2022004
    12. Kexian Zhang, Min Hong, Forecasting crude oil price using LSTM neural networks, 2022, 2, 2769-2140, 163, 10.3934/DSFE.2022008
    13. Haihua Liu, Peng Wang, Zejun Li, Is There Any Difference in the Impact of Digital Transformation on the Quantity and Efficiency of Enterprise Technological Innovation? Taking China’s Agricultural Listed Companies as an Example, 2021, 13, 2071-1050, 12972, 10.3390/su132312972
    14. Zimei Huang, Zhenghui Li, What reflects investor sentiment? Empirical evidence from China, 2021, 1, 2769-2140, 235, 10.3934/DSFE.2021013
    15. Thibaut G. Morillon, Ryan G. Chacon, Dissecting the stock to flow model for Bitcoin, 2022, 39, 1086-7376, 506, 10.1108/SEF-10-2021-0409
    16. Micheal Olaolu Arowolo, Peace Ayegba, Shakirat Ronke Yusuff, Sanjay Misra, 2022, Chapter 7, 978-3-030-89545-7, 127, 10.1007/978-3-030-89546-4_7
    17. Syed Abul Basher, Perry Sadorsky, Forecasting Bitcoin price direction with random forests: How important are interest rates, inflation, and market volatility?, 2022, 9, 26668270, 100355, 10.1016/j.mlwa.2022.100355
    18. Amila Žunić, Adnan Dželihodžić, 2023, Chapter 33, 978-3-031-17696-8, 412, 10.1007/978-3-031-17697-5_33
    19. Codruţa Mare, Daniela Manaţe, Gabriela-Mihaela Mureşan, Simona Laura Dragoş, Cristian Mihai Dragoş, Alexandra-Anca Purcel, Machine Learning Models for Predicting Romanian Farmers’ Purchase of Crop Insurance, 2022, 10, 2227-7390, 3625, 10.3390/math10193625
    20. Askar Akaev, Tessaleno Devezas, Askar Sarygulov, Aleksander Petryakov, 2022, Chapter 4, 978-3-030-93871-0, 39, 10.1007/978-3-030-93872-7_4
    21. Syed Abul Basher, Perry Sadorsky, Forecasting Bitcoin Price Direction With Random Forests: How Important Are Interest Rates, Inflation, and Market Volatility?, 2022, 1556-5068, 10.2139/ssrn.4128509
    22. Atieh Armin, Ali Shiri, Behnam Bahrak, 2022, Comparison of Machine Learning Methods for Cryptocurrency Price Prediction, 978-1-6654-7623-2, 1, 10.1109/ICSPIS56952.2022.10043898
    23. Rajesh Rohilla, Raaghav Raj Maiya, Ritvik Bharti, 2023, Sentiment Driven Reinforcement Learning Trading Strategies to Enhance Market Performance, 979-8-3503-3509-5, 1, 10.1109/ICCCNT56998.2023.10307400
    24. Mamun Ahmed, Sayma Alam Suha, Fahamida Hossain Mahi, Forhad Uddin Ahmed, EVALUATING THE PERFORMANCE OF BITCOIN PRICE FORECASTING USING MACHINE LEARNING TECHNIQUES ON HISTORICAL DATA, 2024, 14, 2391-6761, 101, 10.35784/iapgos.5657
    25. Fayad Ali, Ravate Suryakant, Sunil Nimbore, 2023, Ensemble Model Based on Deep Learning for Forecasting Crypto Asset Futures in Markets, 979-8-3503-1912-5, 1, 10.1109/SMARTGENCON60755.2023.10442376
    26. Irene Henriques, Perry Sadorsky, Forecasting NFT coin prices using machine learning: Insights into feature significance and portfolio strategies, 2023, 58, 10440283, 100904, 10.1016/j.gfj.2023.100904
    27. Masoud Muhammed Hassan, Bitcoin Price Prediction Using Deep Bayesian LSTM With Uncertainty Quantification: A Monte Carlo Dropout–Based Approach, 2024, 13, 2049-1573, 10.1002/sta4.70001
    28. Seyyedeh Zahra Elahiyan, Parisa Rostami, Reza Javanmard Alitappeh, 2022, Predicting Bitcoin Fluctuations Using Deep Neural Networks, 978-1-6654-7623-2, 1, 10.1109/ICSPIS56952.2022.10043940
    29. Yadong Liu, Nathee Naktnasukanjn, Anukul Tamprasirt, Tanarat Rattanadamrongaksorn, Do crude oil, gold and the US dollar contribute to Bitcoin investment decisions? An ANN-DCC-GARCH approach, 2024, 8, 2615-9821, 2, 10.1108/AJEB-10-2023-0106
    30. Tegar Ahmad Arsy, Nughthoh Arfawi Kurdhi, Wakhid Ahmad Jauhari, 2024, 3235, 0094-243X, 020008, 10.1063/5.0234629
    31. Domicián Máté, Hassan Raza, Ishtiaq Ahmad, Sándor Kovács, Next step for bitcoin: Confluence of technical indicators and machine learning, 2024, 17, 2306-3483, 68, 10.14254/2071-8330.2023/17-3/4
    32. Kokulo K. Lawuobahsumo, Bernardina Algieri, Arturo Leccadito, Forecasting cryptocurrencies returns: Do macroeconomic and financial variables improve tail expectation predictions?, 2024, 58, 0033-5177, 2647, 10.1007/s11135-023-01761-1
    33. Meijun Ling, Guangxi Cao, Carbon trading price forecasting based on parameter optimization VMD and deep network CNN–LSTM model, 2024, 11, 2424-7863, 10.1142/S2424786324500026
    34. Radisha Fanni Sianti, Nur Iriawan, , 2024, 3231, 0094-243X, 060006, 10.1063/5.0231008
    35. Lauren Al Hawi, Sally Sharqawi, Qasem Abu Al-Haija, Abdallah Qusef, Empirical Evaluation of Machine Learning Performance in Forecasting Cryptocurrencies, 2023, 14, 17982340, 639-, 10.12720/jait.14.4.639-647
    36. Kasukurthy Aakash, A Reyana, 2024, Forecasting Bitcoin Price Trends: Integrating Sentiment Analysis and Machine Learning, 979-8-3503-7519-0, 709, 10.1109/ICAAIC60222.2024.10575095
    37. Tea Šestanović, Tea Kalinić Milićević, Identification of the Optimal Neural Network Architecture for Prediction of Bitcoin Return, 2024, 0868-4952, 1, 10.15388/24-INFOR561
    38. Darya Lapitskaya, M. Hakan Eratalay, Rajesh Sharma, Prediction of Cryptocurrency Prices with the Momentum Indicators and Machine Learning, 2024, 0927-7099, 10.1007/s10614-024-10784-1
    39. Haoran Wu, Jiahe Zhou, 2023, A Comparative Study Between Economic Models and Machine Learning Models for Bitcoin Return Volatility Prediction, 979-8-3503-4392-2, 18, 10.1109/ICI3C60830.2023.00015
    40. Hulusi Mehmet Tanrikulu, Hakan Pabuccu, The Effect of Data Types’ on the Performance of Machine Learning Algorithms for Cryptocurrency Prediction, 2025, 0927-7099, 10.1007/s10614-025-10919-y
  • Reader Comments
  • © 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(9794) PDF downloads(746) Cited by(40)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog