Analysis of Data Splitting on Streamflow Prediction using Random Forest

  • Received: 07 June 2024 Revised: 13 July 2024 Accepted: 17 July 2024 Published: 26 July 2024
  • This study is focused on the use of random forest (RF) to forecast the streamflow in the Kesinga River basin. A total of 169 data points were gathered monthly for the years 1991–2004 to create a model for streamflow prediction. The dataset was allotted into training and testing stages using various ratios, such as 50/50, 60/40, 70/30, and 80/20. The produced models were evaluated using three statistical indices: the root mean square error (RMSE), the mean absolute error (MAE), and the correlation coefficient (CC). The analysis of the models' performances revealed that the training and testing ratios had a substantial impact on the RF model's predictive abilities; models performed best when the ratio was 60/40. The findings demonstrated the right dataset ratios for precise streamflow prediction, which will be beneficial for hydraulic engineers during the water-related design and engineering stages of water projects.

    Citation: Diksha Puri, Parveen Sihag, Mohindra Singh Thakur, Mohammed Jameel, Aaron Anil Chadee, Mohammad Azamathulla Hazi. Analysis of Data Splitting on Streamflow Prediction using Random Forest[J]. AIMS Environmental Science, 2024, 11(4): 593-609. doi: 10.3934/environsci.2024029

