We analyze how the sentiment of financial news can be used to predict stock returns and build profitable trading strategies. Combining the textual analysis of financial news headlines and statistical methods, we build multi-class classification models to predict the stock return. The main contribution of this paper is twofold. Firstly, we develop a performance evaluation metric to compare multi-class classification methods, taking into account the precision and accuracy of the models and methods. By maximizing the metric, we find optimal combinations of models and methods and select the best approach for prediction and decision-making. Secondly, this metric enables us to construct profitable option trading strategies, which can also be used as an assessment tool to analyze models' prediction power. We apply our methodology to historical data from Apple stock and financial news headlines from Reuters from January 1, 2012 to May 31, 2019. During validation (May 31, 2018, to May 31, 2019), our models consistently outperformed the market, with two-class one-stage models yielding returns between 30% and 45%, compared to the S & P500 index's 1.73% return over the same period.
Citation: Jiawei He, Roman N. Makarov, Jake Tuero, Zilin Wang. Performance evaluation metric for statistical learning trading strategies[J]. Data Science in Finance and Economics, 2024, 4(4): 570-600. doi: 10.3934/DSFE.2024024
We analyze how the sentiment of financial news can be used to predict stock returns and build profitable trading strategies. Combining the textual analysis of financial news headlines and statistical methods, we build multi-class classification models to predict the stock return. The main contribution of this paper is twofold. Firstly, we develop a performance evaluation metric to compare multi-class classification methods, taking into account the precision and accuracy of the models and methods. By maximizing the metric, we find optimal combinations of models and methods and select the best approach for prediction and decision-making. Secondly, this metric enables us to construct profitable option trading strategies, which can also be used as an assessment tool to analyze models' prediction power. We apply our methodology to historical data from Apple stock and financial news headlines from Reuters from January 1, 2012 to May 31, 2019. During validation (May 31, 2018, to May 31, 2019), our models consistently outperformed the market, with two-class one-stage models yielding returns between 30% and 45%, compared to the S & P500 index's 1.73% return over the same period.
[1] | Abdi H, Williams LJ (2010) Principal component analysis. Wires Comput Stat 2: 433–459. https://doi.org/10.1002/wics.101 doi: 10.1002/wics.101 |
[2] | Abdul-Rauf S, Kiani K, Zafar A, et al. (2019) Exploring transfer learning and domain data selection for the biomedical translation. In Proceedings of the Fourth Conference on Machine Translation, 3: 156–163. https://doi.org/10.18653/v1/W19-5419 |
[3] | Ashtiani MN, Raahemi B (2023) News-based intelligent prediction of financial markets using text mining and machine learning: A systematic literature review. Expert Syst Appl 217: 119509. https://doi.org/10.1016/j.eswa.2023.119509 doi: 10.1016/j.eswa.2023.119509 |
[4] | Ballabio D, Grisoni F, Todeschini R (2018) Multivariate comparison of classification performance measures. Chemometr Intell Lab 174: 33–44. https://doi.org/10.1016/j.chemolab.2017.12.004 doi: 10.1016/j.chemolab.2017.12.004 |
[5] | Barucci E, Bonollo M, Poli F, et al. (2021) A machine learning algorithm for stock picking built on information based outliers. Expert Syst Appl 184: 115497. https://doi.org/10.1016/j.eswa.2021.115497 doi: 10.1016/j.eswa.2021.115497 |
[6] | Campolieti G, Makarov RN (2021) Financial Mathematics: A Comprehensive Treatment in Discrete Time. CRC Press. https://doi.org/10.1201/9781315373768 |
[7] | Duz Tan S, Tas O (2021) Social media sentiment in international stock returns and trading activity. J Behav Financ 22: 221–234. https://doi.org/10.1080/15427560.2020.1772261 doi: 10.1080/15427560.2020.1772261 |
[8] | Frattini A, Bianchini I, Garzonio A, et al. (2022) Financial technical indicator and algorithmic trading strategy based on machine learning and alternative data. Risks 10: 225. https://doi.org/10.3390/risks10120225 doi: 10.3390/risks10120225 |
[9] | Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33: 1–22. |
[10] | Grandini M, Bagli E, Visani G (2020) Metrics for multi-class classification: an overview. arXiv Preprint. https://doi.org/10.48550/arXiv.2008.05756 |
[11] | Heston SL, Sinha NR (2017) News vs. sentiment: Predicting stock returns from news stories. Financ Anal J 73: 67–83. https://doi.org/10.2469/faj.v73.n3.3 doi: 10.2469/faj.v73.n3.3 |
[12] | Hoo ZH, Candlish J, Teare D (2017) What is an roc curve? Emerg Med J 34: 357–359. https://doi.org/10.1136/emermed-2017-206735 doi: 10.1136/emermed-2017-206735 |
[13] | Hutto C, Gilbert E (2014) VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international AAAI conference on web and social media, 8: 216–225. https://doi.org/10.1609/icwsm.v8i1.14550 |
[14] | Li X, Xie H, Chen L, et al. (2014) News impact on stock price return via sentiment analysis. Knowl-Based Syst 69: 14–23. https://doi.org/10.1016/j.knosys.2014.04.022 doi: 10.1016/j.knosys.2014.04.022 |
[15] | Mohan S, Mullapudi S, Sammeta S, et al. (2019) Stock price prediction using news sentiment analysis. In 2019 IEEE fifth international conference on big data computing service and applications (BigDataService), 205–208. IEEE. https://doi.org/10.1109/BigDataService.2019.00035 |
[16] | Nazareth N, Reddy YVR (2023) Financial applications of machine learning: A literature review. Expert Syst Appl 219: 119640. https://doi.org/10.1016/j.eswa.2023.119640 doi: 10.1016/j.eswa.2023.119640 |
[17] | Nevasalmi L (2020) Forecasting multinomial stock returns using machine learning methods. J Financ Data Sci 6: 86–106. https://doi.org/10.1016/j.jfds.2020.09.001 doi: 10.1016/j.jfds.2020.09.001 |
[18] | Nti IK, Adekoya AF, Weyori BA (2020) A systematic review of fundamental and technical analysis of stock market predictions. Artif Intell Rev 53: 3007–3057. https://doi.org/10.1007/s10462-019-09754-z doi: 10.1007/s10462-019-09754-z |
[19] | Plisson J, Lavrac N, Mladenic D (2004) A rule based approach to word lemmatization. In Proceedings of IS, 3: 83–86. |
[20] | Refaeilzadeh P, Tang L, Liu H (2009) Cross-validation. Encyclopedia database syst 5: 532–538. |
[21] | Shah D, Isah H, Zulkernine F (2018) Predicting the effects of news sentiments on the stock market. In 2018 IEEE International Conference on Big Data (Big Data), 4705–4708, IEEE. https://doi.org/10.1109/BigData.2018.8621884 |
[22] | Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inform Process Manag 45: 427–437. https://doi.org/10.1016/j.ipm.2009.03.002 doi: 10.1016/j.ipm.2009.03.002 |
[23] | Stoltzfus JC (2011) Logistic regression: a brief primer. Acad Emerg Med 18: 1099–1104. https://doi.org/10.1111/j.1553-2712.2011.01185.x doi: 10.1111/j.1553-2712.2011.01185.x |
[24] | Swiderski B, Kurek J, Osowski S (2012) Multistage classification by using logistic regression and neural networks for assessment of financial condition of company. Decis Support Syst 52: 539–547. https://doi.org/10.1016/j.dss.2011.10.018 doi: 10.1016/j.dss.2011.10.018 |
[25] | Tang D, Qin B, Feng X, et al. (2015) Effective LSTMs for target-dependent sentiment classification. arXiv Preprint. https://doi.org/10.48550/arXiv.1512.01100 |
[26] | Team RC (2013) R: A language and environment for statistical computing. R foundation for statistical computing, vienna, austria. Available from: http://www.R-project.org/. |
[27] | Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58: 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x doi: 10.1111/j.2517-6161.1996.tb02080.x |
[28] | Ukil A (2007) Support vector machine. In Intelligent Systems and Signal Processing in Power Engineering, 161–226. Springer. |
[29] | Wainer J, Cawley G (2021) Nested cross-validation when selecting classifiers is overzealous for most practical applications. Expert Syst Appl 182: 115222. https://doi.org/10.1016/j.eswa.2021.115222 doi: 10.1016/j.eswa.2021.115222 |
[30] | Yang SY, Mo SYK, Liu A, et al. (2017) Genetic programming optimization for a sentiment feedback strength based trading strategy. Neurocomputing 264: 29–41. https://doi.org/10.1016/j.neucom.2016.10.103 doi: 10.1016/j.neucom.2016.10.103 |
[31] | Zhang W, Skiena S (2010) Trading strategies to exploit blog and news sentiment. In Fourth international aAAI conference on weblogs and social media, 4: 375–378. |