Research article Special Issues

Divide-and-train: A new approach to improve the predictive tasks of bike-sharing systems

  • Received: 08 January 2024 Revised: 02 March 2024 Accepted: 14 May 2024 Published: 02 July 2024
  • Bike-sharing systems (BSSs) have become commonplace in most cities worldwide as an important part of many smart cities. These systems generate a continuous amount of large data volumes. The effectiveness of these BSS systems depends on making decisions at the proper time. Thus, there is a vital need to build predictive models on the BSS data for the sake of improving the process of decision-making. The overwhelming majority of BSS users register before utilizing the service. Thus, several BSSs have prior knowledge of the user's data, such as age, gender, and other relevant details. Several machine learning and deep learning models, for instance, are used to predict urban flows, trip duration, and other factors. The standard practice for these models is to train on the entire dataset to build a predictive model, whereas the biking patterns of various users are intuitively distinct. For instance, the user's age influences the duration of a trip. This endeavor was motivated by the existence of distinct user patterns. In this work, we proposed divide-and-train, a new method for training predictive models on station-based BSS datasets by dividing the original datasets on the values of a given dataset attribute. Then, the proposed method was validated on different machine learning and deep learning models. All employed models were trained on both the complete and split datasets. The enhancements made to the evaluation metric were then reported. Results demonstrated that the proposed method outperformed the conventional training approach. Specifically, the root mean squared error (RMSE) and mean absolute error (MAE) metrics have shown improvements in both trip duration and distance prediction, with an average accuracy of 85% across the divided sub-datasets for the best performing model, i.e., random forest.

    Citation: Ahmed Ali, Ahmad Salah, Mahmoud Bekhit, Ahmed Fathalla. Divide-and-train: A new approach to improve the predictive tasks of bike-sharing systems[J]. Mathematical Biosciences and Engineering, 2024, 21(7): 6471-6492. doi: 10.3934/mbe.2024282

    Related Papers:

  • Bike-sharing systems (BSSs) have become commonplace in most cities worldwide as an important part of many smart cities. These systems generate a continuous amount of large data volumes. The effectiveness of these BSS systems depends on making decisions at the proper time. Thus, there is a vital need to build predictive models on the BSS data for the sake of improving the process of decision-making. The overwhelming majority of BSS users register before utilizing the service. Thus, several BSSs have prior knowledge of the user's data, such as age, gender, and other relevant details. Several machine learning and deep learning models, for instance, are used to predict urban flows, trip duration, and other factors. The standard practice for these models is to train on the entire dataset to build a predictive model, whereas the biking patterns of various users are intuitively distinct. For instance, the user's age influences the duration of a trip. This endeavor was motivated by the existence of distinct user patterns. In this work, we proposed divide-and-train, a new method for training predictive models on station-based BSS datasets by dividing the original datasets on the values of a given dataset attribute. Then, the proposed method was validated on different machine learning and deep learning models. All employed models were trained on both the complete and split datasets. The enhancements made to the evaluation metric were then reported. Results demonstrated that the proposed method outperformed the conventional training approach. Specifically, the root mean squared error (RMSE) and mean absolute error (MAE) metrics have shown improvements in both trip duration and distance prediction, with an average accuracy of 85% across the divided sub-datasets for the best performing model, i.e., random forest.


    加载中


    [1] X. Y. Ni, D. J. Sun, Q. C. Lu, Q. Chen, A proportional allocation model for parking reservation systems considering entrance capacity constraints, IEEE Intell. Transp. Syst. Mag., 16 (2024), 162–173. https://doi.org/10.1109/MITS.2023.3316276 doi: 10.1109/MITS.2023.3316276
    [2] G. Xiao, L. Chen, X. Chen, C. Jiang, A. Ni, C. Zhang, et al., A hybrid visualization model for knowledge mapping: Scientometrics, SAOM, and SAO, IEEE Trans. Intell. Transp. Syst., 25 (2024), 2208–2221. https://doi.org/10.1109/TITS.2023.3327266 doi: 10.1109/TITS.2023.3327266
    [3] X. Yao, J. Feng, An end to end two-stream framework for station-level bike-sharing flow prediction, Expert Syst. Appl., 247 (2024), 123273. https://doi.org/10.1016/j.eswa.2024.123273 doi: 10.1016/j.eswa.2024.123273
    [4] Y. Zhou, Q. Li, X. Yue, J. Nie, Q. Guo, A novel predict-then-optimize method for sustainable bike-sharing management: a data-driven study in china, Ann. Oper. Res., 2022 (2022), 1–33. http://doi.org/10.1007/s10479-022-04965-0 doi: 10.1007/s10479-022-04965-0
    [5] I. Otero, M. Nieuwenhuijsen, D. Rojas-Rueda, Health impacts of bike sharing systems in europe, Environ. Int., 115 (2018), 387–394. http://doi.org/10.1016/j.envint.2018.04.014 doi: 10.1016/j.envint.2018.04.014
    [6] V. Albuquerque, M. S. Dias, F. Bacao, Machine learning approaches to bike-sharing systems: A systematic literature review, ISPRS Int. J. Geo-Inf., 10 (2021), 62. http://doi.org/10.3390/ijgi10020062 doi: 10.3390/ijgi10020062
    [7] L. Caggiani, R. Camporeale, Z. Hamidi, C. Zhao, Evaluating the efficiency of bike-sharing stations with data envelopment analysis, Sustainability, 13 (2021), 881. http://doi.org/10.3390/su13020881 doi: 10.3390/su13020881
    [8] M. A. Butt, S. Danjuma, M. S. B. Ilyas, U. M. Butt, M. Shahid, I. Tariq, Demand prediction on bike sharing data using regression analysis approach, J. Innovative Comput. Emerging Technol., 3 (2023). https://doi.org/10.56536/jicet.v3i1.52
    [9] L. Cheng, J. Yang, X. Chen, M. Cao, H. Zhou, Y. Sun, How could the station-based bike sharing system and the free-floating bike sharing system be coordinated?, J. Transp. Geogr., 89 (2020), 102896. http://doi.org/10.1016/j.jtrangeo.2020.102896 doi: 10.1016/j.jtrangeo.2020.102896
    [10] New York City Bike Share Dataset. Available from: https://www.kaggle.com/akkithetechie/new-york-city-bike-share-dataset.
    [11] C. Rudloff, B. Lackner, Modeling demand for bikesharing systems: neighboring stations as source for demand and reason for structural breaks, Transp. Res. Rec., 2430 (2014), 1–11. http://doi.org/10.3141/2430-01 doi: 10.3141/2430-01
    [12] H. Yang, K. Xie, K. Ozbay, Y. Ma, Z. Wang, Use of deep learning to predict daily usage of bike sharing systems, Transp. Res. Rec., 2672 (2018), 92–102. http://doi.org/10.1177/0361198118801354 doi: 10.1177/0361198118801354
    [13] W. Wang, Forecasting Bike Rental Demand Using New York Citi Bike Data, Master's thesis, Technological University Dublin, 2016.
    [14] B. Wang, I. Kim, Short-term prediction for bike-sharing service using machine learning, Transp. Res. Procedia, 34 (2018), 171–178. http://doi.org/10.1016/j.trpro.2018.11.029 doi: 10.1016/j.trpro.2018.11.029
    [15] Y. Li, Y. Zheng, Citywide bike usage prediction in a bike-sharing system, IEEE Trans. Knowl. Data Eng., 32 (2019), 1079–1091. http://doi.org/10.1109/TKDE.2019.2898831 doi: 10.1109/TKDE.2019.2898831
    [16] C. Wirtgen, M. Kowald, J. Luderschmidt, H. Hünemohr, Multivariate demand forecasting for rental bike systems based on an unobserved component model, Electronics, 11 (2022), 4146. http://doi.org/10.3390/electronics11244146 doi: 10.3390/electronics11244146
    [17] H. Lin, Y. He, S. Li, Y. Liu, Insights into travel pattern analysis and demand prediction: A data-driven approach in bike-sharing systems, J. Transp. Eng. Part A. Syst., 150 (2024), 04023132. https://doi.org/10.1061/JTEPBS.TEENG-8137 doi: 10.1061/JTEPBS.TEENG-8137
    [18] C. M. Vallez, M. Castro, D. Contreras, Challenges and opportunities in dock-based bike-sharing rebalancing: a systematic review, Sustainability, 13 (2021), 1829. https://doi.org/10.3390/su13041829 doi: 10.3390/su13041829
    [19] X. Ma, S. Zhang, T. Wu, Y. Yang, J. Yu, Can dockless and docked bike-sharing substitute each other? Evidence from Nanjing, China, Renewable Sustainable Energy Rev., 188 (2023), 113780. https://doi.org/10.1016/j.rser.2023.113780 doi: 10.1016/j.rser.2023.113780
    [20] Z. Chen, D. van Lierop, D. Ettema, Dockless bike-sharing systems: What are the implications?, Transport Rev., 40 (2020), 333–353. https://doi.org/10.1080/01441647.2019.1710306 doi: 10.1080/01441647.2019.1710306
    [21] Y. Wang, Z. Zhan, Y. Mi, A. Sobhani, H. Zhou, Nonlinear effects of factors on dockless bike-sharing usage considering grid-based spatiotemporal heterogeneity, Transp. Res. Part D Transp. Environ., 104 (2022), 103194. https://doi.org/10.1016/j.trd.2022.103194 doi: 10.1016/j.trd.2022.103194
    [22] W. Jiang, Bike sharing usage prediction with deep learning: a survey, Neural Comput. Appl., 34 (2022), 15369–15385. https://doi.org/10.1007/s00521-022-07380-5 doi: 10.1007/s00521-022-07380-5
    [23] X. Li, Y. Xu, X. Zhang, W. Shi, Y. Yue, Q. Li, Improving short-term bike sharing demand forecast through an irregular convolutional neural network, Transp. Res. Part C Emerging Technol., 147 (2023), 103984. https://doi.org/10.1016/j.trc.2022.103984 doi: 10.1016/j.trc.2022.103984
    [24] C. Song, S. Zhou, W. Chang, Y. Xiao, Y. Fu, L. Yang, A short-term demand of bike-sharing forecasting model based on spatio-temporal graph data, in 2023 28th International Conference on Automation and Computing (ICAC), IEEE, (2023), 1–5. https://doi.org/10.1109/ICAC57885.2023.10275167
    [25] S. Zhou, C. Song, T. Wang, X. Pan, W. Chang, L. Yang, A short-term hybrid TCN-GRU prediction model of bike-sharing demand based on travel characteristics mining, Entropy, 24 (2022), 1193. https://doi.org/10.3390/e24091193 doi: 10.3390/e24091193
    [26] J. Y. Xu, Y. Qian, S. Zhang, C. C. Wu, Demand prediction of shared bicycles based on graph convolutional network-gated recurrent unit-attention mechanism, Mathematics, 11 (2023), 4994. https://doi.org/10.3390/math11244994 doi: 10.3390/math11244994
    [27] B. Pan, L. Tian, Y. Pei, The novel application of deep reinforcement to solve the rebalancing problem of bicycle sharing systems with spatiotemporal features, Appl. Sci., 13 (2023), 9872. https://doi.org/10.3390/app13179872 doi: 10.3390/app13179872
    [28] X. Chang, J. Wu, H. Sun, X. Yan, A smart predict-then-optimize method for dynamic green bike relocation in the free-floating system, Transp. Res. Part C Emerging Technol., 153 (2023), 104220. https://doi.org/10.1016/j.trc.2023.104220 doi: 10.1016/j.trc.2023.104220
    [29] X. Li, Y. Xu, Q. Chen, L. Wang, X. Zhang, W. Shi, Short-term forecast of bicycle usage in bike sharing systems: a spatial-temporal memory network, IEEE Trans. Intell. Transp. Syst., 23 (2021), 10923–10934. http://doi.org/10.1109/TITS.2021.3097240 doi: 10.1109/TITS.2021.3097240
    [30] X. Ma, Y. Yin, Y. Jin, M. He, M. Zhu, Short-term prediction of bike-sharing demand using multi-source data: a spatial-temporal graph attentional LSTM approach, Appl. Sci., 12 (2022), 1161. http://doi.org/10.3390/app12031161 doi: 10.3390/app12031161
    [31] P. Xie, T. Li, J. Liu, S. Du, X. Yang, J. Zhang, Urban flow prediction from spatiotemporal data using machine learning: A survey, Inf. Fusion, 59 (2020), 1–12. http://doi.org/10.1016/j.inffus.2020.01.002 doi: 10.1016/j.inffus.2020.01.002
    [32] B. Wang, H. L. Vu, I. Kim, C. Cai, Short-term traffic flow prediction in bike-sharing networks, J. Intell. Transp. Syst., 26 (2022), 461–475. http://doi.org/10.1080/15472450.2021.1904921. doi: 10.1080/15472450.2021.1904921
    [33] W. Zi, W. Xiong, H. Chen, L. Chen, TAGCN: Station-level demand prediction for bike-sharing system via a temporal attention graph convolution network, Information Sciences, 561 (2021), 274–285. http://doi.org/10.1016/j.ins.2021.01.065 doi: 10.1016/j.ins.2021.01.065
    [34] E. Collini, P. Nesi, G. Pantaleo, Deep learning for short-term prediction of available bikes on bike-sharing stations, IEEE Access, 9 (2021), 124337–124347. http://doi.org/10.1109/ACCESS.2021.3110794 doi: 10.1109/ACCESS.2021.3110794
    [35] M. Cipriano, L. Colomba, P. Garza, A data-driven based dynamic rebalancing methodology for bike sharing systems, Appl. Sci., 11 (2021), 6967. http://doi.org/10.3390/app11156967 doi: 10.3390/app11156967
    [36] J. Schuijbroek, R. C. Hampshire, W. J. Van Hoeve, Inventory rebalancing and vehicle routing in bike sharing systems, Eur. J. Oper. Res., 257 (2017), 992–1004. http://doi.org/10.1016/j.ejor.2016.08.029 doi: 10.1016/j.ejor.2016.08.029
    [37] A. Maleki, E. Nejati, A. Aghsami, F. Jolai, Developing a data-driven learning-based simulation method as a decision support tool for rebalancing problem in the bike-sharing systems, Available at SSRN 4329723. http://doi.org/10.2139/ssrn.4329723
    [38] M. Du, L. Cheng, X. Li, F. Tang, Static rebalancing optimization with considering the collection of malfunctioning bikes in free-floating bike sharing system, Transp. Res. Part E Logist. Transp. Rev., 141 (2020), 102012. http://doi.org/10.1016/j.tre.2020.102012 doi: 10.1016/j.tre.2020.102012
    [39] S. Chang, R. Song, S. He, G. Qiu, Innovative bike-sharing in china: Solving faulty bike-sharing recycling problem, J. Adv. Transp., 2018 (2018). http://doi.org/10.1155/2018/4941029
    [40] Z. Sun, Y. Li, Y. Zuo, Optimizing the location of virtual stations in free-floating bike-sharing systems with the user demand during morning and evening rush hours, J. Adv. Transp., 2019 (2019). http://doi.org/10.1155/2019/4308509
    [41] A. Fathalla, A. Salah, M. A. Mohamed, N. I. Lestari, M. Bekhit, A novel dual prediction scheme for data communication reduction in IoT-based monitoring systems, in International Conference on Internet of Things as a Service, Springer, 421 (2021), 208–220. https://doi.org/10.1007/978-3-030-95987-6_15
    [42] A. Pajankar, A. Joshi, Introduction to machine learning with scikit-learn, in Hands-on Machine Learning with Python: Implement Neural Network Solutions with Scikit-Learn and PyTorch, Springer, (2022), 65–77. https://doi.org/10.1007/978-1-4842-7921-2_5
    [43] A. V. Dorogush, V. Ershov, A. Gulin, Catboost: gradient boosting with categorical features support, preprint, arXiv: 1810.11363. http://doi.org/10.48550/arXiv.1810.11363
    [44] N. Bantilan, pandera: Statistical data validation of pandas dataframes, in Proceedings of the Python in Science Conference (SciPy), (2020), 116–124.
    [45] J. Unpingco, Numpy, in Python Programming for Data Analysis, Springer, (2021), 103–126. https://doi.org/10.1007/978-3-030-68952-0_4
    [46] S. Cao, Y. Zeng, S. Yang, S. Cao, Research on python data visualization technology, in J. Phys.: Conf. Ser., IOP Publishing, 1757 (2021), 012122. https://doi.org/10.1088/1742-6596/1757/1/012122
    [47] A. Sanmiguel-Rodríguez, Bike-sharing systems: Effects on physical activity in a spanish municipality, Phys. Act. Rev., 10 (2022), 66–76. http://doi.org/10.16926/par.2022.10.22 doi: 10.16926/par.2022.10.22
    [48] Y. Chen, Y. Zhang, D. Coffman, Z. Mi, An environmental benefit analysis of bike sharing in New York city, Cities, 121 (2022), 103475. http://doi.org/10.1016/j.cities.2021.103475 doi: 10.1016/j.cities.2021.103475
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(714) PDF downloads(51) Cited by(0)

Article outline

Figures and Tables

Figures(7)  /  Tables(9)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog