Outliers can cause significant errors in forecasting, and it is essential to reduce their impact without losing the information they store. Information loss naturally arises if observations are dropped from the dataset. Thus, two alternative procedures are considered here: the Fast Minimum Covariance Determinant and the Iteratively Reweighted Least Squares. The procedures are used to estimate factor models robust to outliers, and a comparison of the forecast abilities of the robust approaches is carried out on a large dataset widely used in economics. The dataset includes observations relative to the 2009 crisis and the COVID-19 pandemic, some of which can be considered outliers. The comparison is carried out at different sampling frequencies and horizons, in-sample and out-of-sample, on relevant variables such as GDP, Unemployment Rate, and Prices for both the US and the EU.
Citation: Fausto Corradin, Monica Billio, Roberto Casarin. Forecasting Economic Indicators with Robust Factor Models[J]. National Accounting Review, 2022, 4(2): 167-190. doi: 10.3934/NAR.2022010
Outliers can cause significant errors in forecasting, and it is essential to reduce their impact without losing the information they store. Information loss naturally arises if observations are dropped from the dataset. Thus, two alternative procedures are considered here: the Fast Minimum Covariance Determinant and the Iteratively Reweighted Least Squares. The procedures are used to estimate factor models robust to outliers, and a comparison of the forecast abilities of the robust approaches is carried out on a large dataset widely used in economics. The dataset includes observations relative to the 2009 crisis and the COVID-19 pandemic, some of which can be considered outliers. The comparison is carried out at different sampling frequencies and horizons, in-sample and out-of-sample, on relevant variables such as GDP, Unemployment Rate, and Prices for both the US and the EU.
[1] | Ahelgebey DF, Billio M, Casarin R (2016a) Bayesian Graphical Models for Structural Vector Autoregressive Processes. J Appl Economet 31: 357–386. https://doi.org/10.1002/jae.2443 doi: 10.1002/jae.2443 |
[2] | Ahelgebey DF, Billio M, Casarin R (2016b) Sparse Graphical Vector Autoregression: A Bayesian Approach. Ann Econ Stat 123: 333–361. https://doi.org/10.15609/annaeconstat2009.123-124.0333 doi: 10.15609/annaeconstat2009.123-124.0333 |
[3] | Artis MJ, Banerjee A, Marcellino M (2005) Factor forecasts for the UK. J Forecasting 28. https://doi.org/10.1002/for.957 doi: 10.1002/for.957 |
[4] | Bai J, Ng S (2002) Determining the number of factors in approximate factor models. Econometrica 70: 191–221. https://doi.org/10.1111/1468-0262.00273 doi: 10.1111/1468-0262.00273 |
[5] | Bai X, Zheng L (2022) Robust factor models for high-dimensional time series and their forecasting. Commun Stat-Theor M, 1–14.https://doi.org/10.1080/03610926.2022.2033777 doi: 10.1080/03610926.2022.2033777 |
[6] | Banbura M, Giannone D, Reichlin L (2010) Large Bayesian vector autoregressions. J Appl Economet 25: 71–92. https://doi.org/10.1002/jae.1137 doi: 10.1002/jae.1137 |
[7] | Banbura M, Giannone D, Lenza M (2014) Conditional Forecast and Scenario Analysis with vector autoregressions for large cross-sections. Available from: https://www.ecb.europa.eu/pub/pdf/scpwps/ecbwp1733.pdf. |
[8] | Barnett V, Lewis T (1994) Outliers in Statistical Data. Int J Forecasting 12. https://doi.org/10.1002/bimj.4710370219 doi: 10.1002/bimj.4710370219 |
[9] | Bergstrom P, Edlund O (2014) Robust Registration of point sets using Iteratively Reweighted Least Squares. Comput Optim Appl 58: 543–561. https://doi.org/10.1007/s10589-014-9643-2 doi: 10.1007/s10589-014-9643-2 |
[10] | Billio M, Casarin R, Corradin F (2022) Understanding Economic Instability during the Pandemic: A Factor Model Approach. In Baltagi, B. H., Moscone, F., Tosetti, E., The Economics of COVID-19, Emerald Publishing. https://doi.org/10.1108/S0573-855520220000296003 |
[11] | Birch J, Jensen W, Woodall WH (2007) High Breakdown Estimation Methods for Phase I Multivariate Control Charts. Qual Reliab Eng Int 23: 615–629. https://doi.org/10.1002/qre.837 doi: 10.1002/qre.837 |
[12] | Butler RW, Davies PL, Jhun M (1993) Asymptotic for the Minimum Covariance Estimator. Ann Stat 21: 1385–1400. https://doi.org/10.1214/aos/1176349264 doi: 10.1214/aos/1176349264 |
[13] | Casarin R, Corradin F, Ravazzolo F, et al. (2020) A Scoring Rule for Factor and Autoregressive Models Under Misspecification. Adv Decis Sci 2: 66–103. https://doi.org/10.47654/v24y2020i2p66-103 doi: 10.47654/v24y2020i2p66-103 |
[14] | Casarin R, Veggente V (2021) Random Projection Methods in Economics and Finance. In Petr, H., Uddin, M.M., Abedin, M. Z., The Essentials of Machine Learning in Finance and Accounting, Routledge. https://doi.org/10.4324/9781003037903-6 |
[15] | Cator E, Lopuhaa H (2010) Asymptotic expansion of the minimum covariance determinant estimators, J Multivariate Anal 101: 2372–2388. https://doi.org/10.1016/j.jmva.2010.06.009 doi: 10.1016/j.jmva.2010.06.009 |
[16] | Choi H, Varian H (2012) Predicting the present with Google trends. Econ Rec 88: 2–9. https://doi.org/10.1111/j.1475-4932.2012.00809.x doi: 10.1111/j.1475-4932.2012.00809.x |
[17] | Croux C, Haesbroek G (1999) Influence Function and Efficiency of the Minimum Covariance Determinant Scatter Matrix Estimator. J Multivariate Anal 71: 161–190. https://doi.org/10.1006/jmva.1999.1839 doi: 10.1006/jmva.1999.1839 |
[18] | Croux C, Filzmoser P, Rousseeuw J, et al. (2003) Robust factor analysis. J Multivariate Anal 84: 145–172. https://doi.org/10.1016/S0047-259X(02)00007-6 doi: 10.1016/S0047-259X(02)00007-6 |
[19] | Davidson R, MacKinnon JG (2004) Econometric theory and methods. New York: Oxford University Press. |
[20] | Davies L (1992) The Asymptotics of Rousseeuw's Minimum Volume Ellipsoid Estimator. Ann Stat 20: 1828–1843. https://doi.org/10.1214/aos/1176348891 doi: 10.1214/aos/1176348891 |
[21] | Daubechies I, DeVore R, Fornasier M, et al. (2009) Iteratively Reweighted Least Squares minimization for sparse recovery. Wiley Pure Appl Math 63: 1–38. https://doi.org/10.1002/cpa.20303 doi: 10.1002/cpa.20303 |
[22] | De la Torre F, Black MJ (2004) A framework for robust subspace learning. Int J Comput Vision 54: 117–142. https://doi.org/10.1023/A:1023709501986 doi: 10.1023/A:1023709501986 |
[23] | Diebold FX (2003) "Big Data" Dynamic Factor Models for Macroeconomic Measurement and Forecasting: A Discussion of the Papers by Lucrezia Reichlin and by Mark W. Watson. In Dewatripont, M, Hansen, L., Turnovsky S., Advances in Economics and Econometrics: Theory and Applications, Eighth World Congress, Cambridge: Cambridge University Press, 115–122. https://doi.org/10.1017/CBO9780511610264.005 |
[24] | Donoho DL (1982) Breakdown Properties of Multivariate Location Estimators. Qualifying paper, Harward University, Boston. |
[25] | Einav L, Levin J (2014) Economics in the age of big data. Science 346: 715–718. https://doi.org/10.1126/science.1243089 doi: 10.1126/science.1243089 |
[26] | Eurostat (2020) Guidance on Time Series Treatment in the Context of the COVID–19 Crisis. Available from: https://ec.europa.eu/eurostat/documents/10186/10693286/Time_series_treatment_guidance.pdf. |
[27] | Fabeil NF, Langgat J, Pazim KH (2020) The Impact of COVID–19 Pandemic Crisis on Microenterprises: Entrepreneurs' Perspective on Business Continuity and recovery Strategy. J Econ Bus 3: 837–844. https://doi.org/10.31014/aior.1992.03.02.241 doi: 10.31014/aior.1992.03.02.241 |
[28] | Fan J, Wang K, Zhong Y, et al. (2021) Robust High-Dimensional Factor Models with Applications to Statistical Machine Learning. Stat Sci 36: 303–327. https://doi.org/10.1214/20-STS785 doi: 10.1214/20-STS785 |
[29] | Fernandes N (2020) Economic Effects of Coronavirus outbreak (COVID–19) on the world economy. IESE Business School working paper. https://doi.org/10.2139/ssrn.3557504 doi: 10.2139/ssrn.3557504 |
[30] | Filzmoser P, van Gaans PFM, van Helvoort PJ (2005) Sequential Factor Analysis as a new approach to multivariate analysis of heterogeneous geochemical datasets: An application to a bulk chemical characterization of fluvial deposits (Rhine-Meuse delta, The Netherlands). Appl Geochem 20: 2233–2251. https://doi.org/10.1016/j.apgeochem.2005.08.009 doi: 10.1016/j.apgeochem.2005.08.009 |
[31] | Gambacciani M, Paolella MS (2017) Robust Normal mixtures for financial portfolio allocation. Economet Stat 3: 91–111. https://doi.org/10.1016/j.ecosta.2017.02.003 doi: 10.1016/j.ecosta.2017.02.003 |
[32] | Geman S, McClure D (1987) Statistical methods for tomographic image reconstruction. Proceedings of the 46th Session of the ISI, Bulletin of the ISI 52: 5–21. |
[33] | George EI, Sun D, Ni S (2008) Bayesian stochastic search for VAR model restrictions. J Economet 142: 553–580. https://doi.org/10.1016/j.jeconom.2007.08.017 doi: 10.1016/j.jeconom.2007.08.017 |
[34] | Goldstein S, Pavlovic V, Stolfi J, et al. (2004) Outlier Rejection in Deformable Model Tracking. 2004 Conference on Computer Vision and Pattern Recognition Workshop 19–19. https://doi.org/10.1109/CVPR.2004.415. doi: 10.1109/CVPR.2004.415 |
[35] | Granger CWJ (1998) Extracting Information from mega–panels and high frequency data. Stat Neederlanica 52: 257–272. https://doi.org/10.1111/1467-9574.00084 doi: 10.1111/1467-9574.00084 |
[36] | Green PJ (1984) Iteratively Reweighted Least Squares for Maximum Likelihood Estimation, and some Robust and resistant Alternatives. J R Stat Soc 46: 149–170. https://doi.org/10.1111/j.2517-6161.1984.tb01288.x doi: 10.1111/j.2517-6161.1984.tb01288.x |
[37] | Hampel FR, Ronchetti EM, Rousseeuw PJ, et al. (1986) Robust Statistics: The Approach Based on Influence Functions. New York: John Wiley & Sons. |
[38] | Hubert M, Debruyne M, Rousseeuw PJ (2017) Minimum covariance determinant and extension. Wiley Computational Statistics, 101002. https://doi.org/10.1002/wics.1421 doi: 10.1002/wics.1421 |
[39] | Hubert M (1981) Robust Statistics. Wiley Series in Probability and Statistics. https://doi.org/10.1002/0471725250 doi: 10.1002/0471725250 |
[40] | Kargoll B, Omidalizarandi M, Loth I, et al. (2018) An Iteratively reweighted least squares approach to adaptive robust adjustment of parameters in linear regression models with autoregressive and t-distributed deviations. J Geodesy 92: 271–297. https://doi.org/10.1007/s00190-017-1062-6 doi: 10.1007/s00190-017-1062-6 |
[41] | Koop G, Korobilis D, Pettenuzzo D (2017) Bayesian compressed VARs. J Economet 1:1–30. https://doi.org/10.1016/j.jeconom.2018.11.009 doi: 10.1016/j.jeconom.2018.11.009 |
[42] | Liu K (2021) COVID–19 and the Chinese economy: impacts, policy responses and implications. Int Rev Appl Econ 35: 308–330. https://doi.org/10.1080/02692171.2021.1876641 doi: 10.1080/02692171.2021.1876641 |
[43] | Lopuhaa H, Rousseeuw P (1991) Breackdown points of affine equivalent estimators of multivariate location and covariance matrices. Ann Stat 19: 229–248. https://doi.org/10.1214/aos/1176347978 doi: 10.1214/aos/1176347978 |
[44] | Lütkepohl H (2005) New introduction to multiple time series analysis. Springer Verlag. https://doi.org/10.1007/978-3-540-27752-1 doi: 10.1007/978-3-540-27752-1 |
[45] | Maronna R, Zamar R (2002) Robust Estimates of Location and Dispersion for High–dimensional Datasets. Technometrics 44: 307–317. https://doi.org/10.1198/004017002188618509 doi: 10.1198/004017002188618509 |
[46] | Mbamalu GAN, Hawary ME (1993) Load forecasting via suboptimal seasonal autoregressive models and Iteratively Reweighted Least Squares. IEEE T Power Syst 8: 343–348. https://doi.org/10.1109/59.221222 doi: 10.1109/59.221222 |
[47] | McKibbin W, Vines D (2020) Global macroeconomic cooperation in response to the COVID-19 pandemic: a roadmap for the G20 and the IMF. Oxford Rev Econ Pol 36: S297–S337. https://doi.org/10.1093/oxrep/graa032 doi: 10.1093/oxrep/graa032 |
[48] | McKibbin W, Roshen F (2021) The global macroeconomics impacts of COVID–19: seven scenarios. Asian Econ Pap 20: 1–30. https://doi.org/10.1162/asep_a_00796 doi: 10.1162/asep_a_00796 |
[49] | Mohan K, Fazel M (2012) Iterative Reweighted Algorithms for Matrix Rank Minimization. J Mach Learn Res 13: 3441–3473. |
[50] | Neykov NM, Neytchev PN, Todorov V, et al. (2013) Robust detection of discordant sites in regional frequency analysis. Water Resour Res 43: W06417. https://doi.org/10.1029/2006WR005322 doi: 10.1029/2006WR005322 |
[51] | Orhan M, Rousseuw PJ, Zaman A (2001) Econometric applications of high- breakdown regression techniques. Econ Lett 1: 1–8. https://doi.org/10.1016/S0165-1765(00)00404-3 doi: 10.1016/S0165-1765(00)00404-3 |
[52] | Rousseuw P (1984) Least Median of Squares Regression. J Am Stat Assoc 79: 871–880. https://doi.org/10.1080/01621459.1984.10477105 doi: 10.1080/01621459.1984.10477105 |
[53] | Rousseeuw P, Leroy AM (1987) Robust Regression and Outliers Detection. Wiley Series in Probability and Statistics. https://doi.org/10.1002/0471725382 doi: 10.1002/0471725382 |
[54] | Rousseeuw P, Van Driessen K (1999) A Fast Algorithm for the minimum Covariance Determinant Estimator. Technometrics 41: 212–223. https://doi.org/10.1080/00401706.1999.10485670 doi: 10.1080/00401706.1999.10485670 |
[55] | Stock JH, Watson WM (2002) Forecasting using principal components from a large number of predictors. J Am Stat Assoc 97: 1167–1179. https://doi.org/10.1198/016214502388618960 doi: 10.1198/016214502388618960 |
[56] | Stock JH, Watson WM (2004) Combination forecasts of output growth in a seven–country data set. J f Forecasting 23: 405–430. https://doi.org/10.1002/for.928 doi: 10.1002/for.928 |
[57] | Stock JH, Watson WM (2005) Implications of dynamic factor models for VAR analysis. Natl Breau Econ Res. https://doi.org/10.3386/w11467 doi: 10.3386/w11467 |
[58] | Stock JH, Watson WM (2009) Forecasting in dynamic factor models subject to structural instability. The Methodology and Practice of Econometrics. A Festschrift in Honour of David F. Hendry 173: 205. https://doi.org/10.1093/acprof:oso/9780199237197.001.0001 doi: 10.1093/acprof:oso/9780199237197.001.0001 |
[59] | Stock JH, Watson WM (2012) Disentangling the channels of the 2007–09 recession. Brookings Pap Eco Ac, 81–156. https://doi.org/10.1353/eca.2012.0005 doi: 10.1353/eca.2012.0005 |
[60] | Stock JH, Watson WM (2014) Estimating turning points using large data sets. J Economet 178: 368–381. https://doi.org/10.1016/j.jeconom.2013.08.034 doi: 10.1016/j.jeconom.2013.08.034 |
[61] | Varian H (2014) Machine Learning: New tricks for econometrics. J Econ Perspect 28: 3–28. https://doi.org/10.1257/jep.28.2.3 doi: 10.1257/jep.28.2.3 |
[62] | Varian H, Scott S (2014) Predicting the present with Bayesian structural time series. International J Math Model Numer Optim 5: 4–23. https://doi.org/10.1504/IJMMNO.2014.059942 doi: 10.1504/IJMMNO.2014.059942 |
[63] | Vidal R, Ma Y, Sastry SS (2016) Generalized Principal Component Analysis, Springer Verlag. https://doi.org/10.1007/978-0-387-87811-9 |
[64] | Zou H, Hastie T (2005) Regularization and variable selection via the elastic-net. J R Stat Soc B 67: 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x doi: 10.1111/j.1467-9868.2005.00503.x |
NAR-04-02-010-s001.pdf |