Sensors are prone to malfunction, leading to blank or erroneous measurements that cannot be ignored in most practical applications. Therefore, data users are always looking for efficient methods to substitute missing values with accurate estimations. Traditionally, empirical methods have been used for this purpose, but with the increasing accessibility and effectiveness of Machine Learning (ML) methods, it is plausible that the former will be replaced by the latter. In this study, we aimed to provide some insights on the state of this question using the network of meteorological stations installed and operated by the GIS Research Unit of the Agricultural University of Athens in Nemea, Greece as a test site for the estimation of daily average solar radiation. Routine weather parameters from ten stations in a period spanning 1,548 days were collected, curated, and used for the training, calibration, and validation of different iterations of two empirical equations and three iterations each of Random Forest (RF) and Recurrent Neural Networks (RNN). The results indicated that while ML methods, and especially RNNs, are in general more accurate than their empirical counterparts, the investment in technical knowledge, time, and processing capacity they require for their implementation cannot constitute them as a panacea, as such selection for the best method is case-sensitive. Future research directions could include the examination of more location-specific models or the integration of readily available spatiotemporal indicators to increase model generalization.
Citation: Konstantinos X Soulis, Evangelos E Nikitakis, Aikaterini N Katsogiannou, Dionissios P Kalivas. Examination of empirical and Machine Learning methods for regression of missing or invalid solar radiation data using routine meteorological data as predictors[J]. AIMS Geosciences, 2024, 10(4): 939-964. doi: 10.3934/geosci.2024044
Sensors are prone to malfunction, leading to blank or erroneous measurements that cannot be ignored in most practical applications. Therefore, data users are always looking for efficient methods to substitute missing values with accurate estimations. Traditionally, empirical methods have been used for this purpose, but with the increasing accessibility and effectiveness of Machine Learning (ML) methods, it is plausible that the former will be replaced by the latter. In this study, we aimed to provide some insights on the state of this question using the network of meteorological stations installed and operated by the GIS Research Unit of the Agricultural University of Athens in Nemea, Greece as a test site for the estimation of daily average solar radiation. Routine weather parameters from ten stations in a period spanning 1,548 days were collected, curated, and used for the training, calibration, and validation of different iterations of two empirical equations and three iterations each of Random Forest (RF) and Recurrent Neural Networks (RNN). The results indicated that while ML methods, and especially RNNs, are in general more accurate than their empirical counterparts, the investment in technical knowledge, time, and processing capacity they require for their implementation cannot constitute them as a panacea, as such selection for the best method is case-sensitive. Future research directions could include the examination of more location-specific models or the integration of readily available spatiotemporal indicators to increase model generalization.
[1] | Colle S, De Abreu SL, Ruther R (2001) Uncertainty in economic analysis of solar water heating and photovoltaic systems. Sol Energy 70: 131–142. https://doi.org/10.1016/S0038-092X(00)00134-1 doi: 10.1016/S0038-092X(00)00134-1 |
[2] | Allen RG, Pereira LS, Raes D, et al. (1998) Crop evapotranspiration: guidelines for computing crop water requirements. Available from FAO eBooks (Issue 1). Available from: https://www.fao.org/4/x0490e/x0490e00.htm. |
[3] | Zang H, Xu Q, Bian H (2012) Generation of typical solar radiation data for different climates of China. Energy 38: 236–248. https://doi.org/10.1016/j.energy.2011.12.008 doi: 10.1016/j.energy.2011.12.008 |
[4] | Zang H, Jiang X, Cheng L, et al. (2022) Combined empirical and machine learning modeling method for estimation of daily global solar radiation for general meteorological observation stations. Renew. Energy 195: 795–808. https://doi.org/10.1016/j.renene.2022.06.063 |
[5] | Ağbulut Ü, Gürel AE, Biçen Y (2021) Prediction of daily global solar radiation using different machine learning algorithms: Evaluation and comparison. Renewable Sustainable Energy Rev 135: 110114. https://doi.org/10.1016/j.rser.2020.110114 doi: 10.1016/j.rser.2020.110114 |
[6] | Soulis K, Kalivas D, Apostolopoulos C (2018) Delimitation of agricultural areas with natural constraints in Greece: Assessment of the dryness climatic criterion using geostatistics. Agronomy 8: 161. https://doi.org/10.3390/agronomy8090161 doi: 10.3390/agronomy8090161 |
[7] | Hargreaves GH, Samani ZA (1982) Estimating potential evapotranspiration. J Irrig Drain Div 108: 225–230. https://doi.org/10.1061/jrcea4.0001390 doi: 10.1061/jrcea4.0001390 |
[8] | Meza FJ, Yebra ML (2016) Estimation of daily global solar radiation as a function of routine meteorological data in Mediterranean areas. Theor Appl Climatol 125: 479–488. https://doi.org/10.1007/s00704-015-1519-6 doi: 10.1007/s00704-015-1519-6 |
[9] | Mousavi SM, Mostafavi ES, Jaafari A, et al. (2015) Using measured daily meteorological parameters to predict daily solar radiation. Measurement 76: 148–155. https://doi.org/10.1016/j.measurement.2015.08.004 doi: 10.1016/j.measurement.2015.08.004 |
[10] | Thota SKR, Mala C, Chandamuri P, et al. (2023) Solar Radiation Prediction Using the Random Forest Regression Algorithm. In: Haldorai A, Ramu A, Mohanram S, et al. Eds., 4th EAI International Conference on Big Data Innovation for Sustainable Cognitive Computing. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-031-07654-1_11 |
[11] | Villegas-Mier C, Rodriguez-Resendiz J, Álvarez-Alvarado J, et al. (2022) Optimized random forest for solar radiation prediction using sunshine hours. Micromachines 13: 1406. https://doi.org/10.3390/mi13091406 doi: 10.3390/mi13091406 |
[12] | Taki M, Rohani A, Yildizhan H (2021) Application of machine learning for solar radiation modeling. Theor Appl Climatol 143: 1599–1613. https://doi.org/10.1007/s00704-020-03484-x doi: 10.1007/s00704-020-03484-x |
[13] | Demir V, Citakoglu H (2023) Forecasting of solar radiation using different machine learning approaches. Neural Comput Applic 35: 887–906. https://doi.org/10.1007/s00521-022-07841-x doi: 10.1007/s00521-022-07841-x |
[14] | Valiantzas JD (2013) Simplified forms for the standardized FAO-56 Penman–Monteith reference evapotranspiration using limited weather data. J Hydrol 505: 13–23. https://doi.org/10.1016/j.jhydrol.2013.09.005 doi: 10.1016/j.jhydrol.2013.09.005 |
[15] | Fernández-Delgado M, Cernadas E, Barro S, et al. (2014) Do we need hundreds of classifiers to solve real-world classification problems? J Mach Learn Res 15: 3133–3181. |
[16] | LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521: 436–444. https://doi.org/10.1038/nature14539 doi: 10.1038/nature14539 |
[17] | Samani Z (2004) Discussion of "History and Evaluation of Hargreaves Evapotranspiration Equation" by George H. Hargreaves and Richard G. Allen. J Irrig Drain Eng 130: 447–448. https://doi.org/10.1061/(ASCE)0733-9437(2004)130: 5(447.2) |
[18] | Montgomery DC, Peck EA, Vining GG (2012) Introduction to linear regression analysis, 5th Ed., Hoboken: John Wiley & Sons. |
[19] | Breiman L (2001) Random forests. Mach Learn 45: 5–32. https://doi.org/10.1023/A:1010933404324 doi: 10.1023/A:1010933404324 |
[20] | Hastie T (2009) The elements of statistical learning: data mining, inference, and prediction. 2nd Ed., New York: Springer Science & Business Media. |
[21] | Probst P, Wright MN, Boulesteix AL (2019) Hyperparameters and tuning strategies for random forest. Wiley Interdiscip Rev Data Min Knowl Disc 9: e1301. https://doi.org/10.1002/widm.1301 doi: 10.1002/widm.1301 |
[22] | Oshiro TM, Perez PS, Baranauskas JA (2012) How Many Trees in a Random Forest? In: Perner P, Eds., Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science, 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_13 |
[23] | Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63: 3–42. https://doi.org/10.1007/s10994-006-6226-1 doi: 10.1007/s10994-006-6226-1 |
[24] | Schmidhuber J (2015) Deep learning in neural networks: An overview. Neural Networks 61: 85–117. https://doi.org/10.1016/j.neunet.2014.09.003 doi: 10.1016/j.neunet.2014.09.003 |
[25] | Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv. https://doi.org/10.48550/arXiv.1506.00019 |
[26] | Cabello-Solorzano K, Ortigosa de Araujo I, Peña M, et al. (2023) The Impact of Data Normalization on the Accuracy of Machine Learning Algorithms: A Comparative Analysis. In: García Bringas P, et al. 18th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2023). Lecture Notes in Networks and Systems, 750. Springer, Cham. https://doi.org/10.1007/978-3-031-42536-3_33 |
[27] | Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. arXiv. https://doi.org/10.48550/arXiv.1409.3215 |
[28] | Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9: 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 doi: 10.1162/neco.1997.9.8.1735 |
[29] | Srivastava N, Hinton G, Krizhevsky A, et al. (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15: 1929–1958. |
[30] | Goodfellow I, Bengio Y, Courville A (2016) Deep learning, Cambridge: MIT Press. Available from: https://www.deeplearningbook.org/. |
[31] | Willmott C, Matsuura K (2005) Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res 30: 79–82. https://doi.org/10.3354/cr030079 doi: 10.3354/cr030079 |
[32] | Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)? —Arguments against avoiding RMSE in the literature. Geosci Model Dev 7: 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014 doi: 10.5194/gmd-7-1247-2014 |
[33] | Tymvios F, Jacovides C, Michaelides S, et al. (2005) Comparative study of Ångström's and artificial neural networks' methodologies in estimating global solar radiation. Sol Energy 78: 752–762. https://doi.org/10.1016/j.solener.2004.09.007 doi: 10.1016/j.solener.2004.09.007 |
[34] | Louppe G (2015) Understanding random forests: from theory to practice. Mach Learn. Available from: https://doi.org/10.48550/arXiv.1506.00019 |
[35] | Cutler DR, Edwards TC, Beard KH, et al. (2007) Random forests for classification in ecology. Ecology 88: 2783–2792. https://doi.org/10.1890/07-0539.1 doi: 10.1890/07-0539.1 |
[36] | Marcus GF (2018) Deep learning: A critical appraisal. Artif Intell. https://doi.org/10.48550/arXiv.1801.00631 |
[37] | Molnar C (2020) Interpretable machine learning: A guide for making black box models explainable. Christoph Molnar. Available from: https://christophm.github.io/interpretable-ml-book/. |