Traffic signal control (TSC) plays a crucial role in enhancing traffic capacity. In recent years, researchers have demonstrated improved performance by utilizing deep reinforcement learning (DRL) for optimizing TSC. However, existing DRL frameworks predominantly rely on manually crafted states, actions, and reward designs, which limit direct information exchange between the DRL agent and the environment. To overcome this challenge, we propose a novel design method that maintains consistency among states, actions, and rewards, named uniformity state-action-reward (USAR) method for TSC. The USAR method relies on: 1) Updating the action selection for the next time step using a formula based on the state perceived by the agent at the current time step, thereby encouraging rapid convergence to the optimal strategy from state perception to action; and 2) integrating the state representation with the reward function design, allowing for precise assessment of the efficacy of past action strategies based on the received feedback rewards. The consistency-preserving design method jointly optimizes the TSC strategy through the updates and feedback among the Markov elements. Furthermore, the method proposed in this paper employs a residual block into the DRL model. It introduces an additional pathway between the input and output layers to transfer feature information, thus promoting the flow of information across different network layers. To assess the effectiveness of our approach, we conducted a series of simulation experiments using the simulation of urban mobility. The USAR method, incorporating a residual block, outperformed other methods and exhibited the best performance in several evaluation metrics.
Citation: Bao-Lin Ye, Peng Wu, Lingxi Li, Weimin Wu. Uniformity of markov elements in deep reinforcement learning for traffic signal control[J]. Electronic Research Archive, 2024, 32(6): 3843-3866. doi: 10.3934/era.2024174
Traffic signal control (TSC) plays a crucial role in enhancing traffic capacity. In recent years, researchers have demonstrated improved performance by utilizing deep reinforcement learning (DRL) for optimizing TSC. However, existing DRL frameworks predominantly rely on manually crafted states, actions, and reward designs, which limit direct information exchange between the DRL agent and the environment. To overcome this challenge, we propose a novel design method that maintains consistency among states, actions, and rewards, named uniformity state-action-reward (USAR) method for TSC. The USAR method relies on: 1) Updating the action selection for the next time step using a formula based on the state perceived by the agent at the current time step, thereby encouraging rapid convergence to the optimal strategy from state perception to action; and 2) integrating the state representation with the reward function design, allowing for precise assessment of the efficacy of past action strategies based on the received feedback rewards. The consistency-preserving design method jointly optimizes the TSC strategy through the updates and feedback among the Markov elements. Furthermore, the method proposed in this paper employs a residual block into the DRL model. It introduces an additional pathway between the input and output layers to transfer feature information, thus promoting the flow of information across different network layers. To assess the effectiveness of our approach, we conducted a series of simulation experiments using the simulation of urban mobility. The USAR method, incorporating a residual block, outperformed other methods and exhibited the best performance in several evaluation metrics.
[1] | B. Ye, S. Zhu, L. Li, W. Wu, Short-term traffic flow prediction at isolated intersections based on parallel multi-task learning, Syst. Sci. Control Eng., 12 (2024), 1–17. https://doi.org/10.1080/21642583.2024.2316160 doi: 10.1080/21642583.2024.2316160 |
[2] | M. J. Smith, T. Iryo, R. Mounce, K. Satsukawa, D. Watling, Zero-queue traffic control, using green-times and prices together, Transp. Res. Part C: Emerging Technol., 138 (2022), 103630. https://doi.org/10.1016/j.trc.2022.103630 doi: 10.1016/j.trc.2022.103630 |
[3] | B. Ye, W. Wu, L. Li, W. Mao, A hierarchical model predictive control approach for signal splits optimization in large-scale urban road networks, IEEE Trans. Intell. Transp. Syst., 17 (2016), 2182–2192. https://doi.org/10.1109/TITS.2016.2517079 doi: 10.1109/TITS.2016.2517079 |
[4] | H. Wang, J. Zhu, B. Gu, Model-based deep reinforcement learning with traffic inference for traffic signal control, Appl. Sci., 13 (2023), 4010. https://doi.org/10.3390/app13064010 doi: 10.3390/app13064010 |
[5] | B. Ye, W. Wu, K. Ruan, L. Li, T. Chen, H. Gao, et al., A survey of model predictive control methods for traffic signal control, IEEE/CAA J. Autom. Sin., 6 (2019), 623–640. https://doi.org/10.1109/JAS.2019.1911471 doi: 10.1109/JAS.2019.1911471 |
[6] | B. B. Elallid, N. Benamar, A. S. Hafid, T. Rachidi, N. Mrani, A comprehensive survey on the application of deep and reinforcement learning approaches in autonomous driving, J. King Saud Univ.-Comput. Inf. Sci., 34 (2022), 7366–7390. https://doi.org/10.1016/j.jksuci.2022.03.013 doi: 10.1016/j.jksuci.2022.03.013 |
[7] | B. Ye, W. Wu, W. Mao, A two-way arterial signal coordination method with queueing process considered, IEEE Trans. Intell. Transp. Syst., 16 (2015), 3440–3452. https://doi.org/10.1109/TITS.2015.2461493 doi: 10.1109/TITS.2015.2461493 |
[8] | X. Li, Webster sequences, apportionment problems, and just-in-time sequencing, Discrete Appl. Math., 306 (2022), 52–69. https://doi.org/10.1016/j.dam.2021.09.020 |
[9] | T. Thunig, R. Scheffler, M. Strehler, K. Nagel, Optimization and simulation of fixed-time traffic signal control in real-world applications, Proc. Comput. Sci., 151 (2019), 826–833. https://doi.org/10.1016/j.procs.2019.04.113 doi: 10.1016/j.procs.2019.04.113 |
[10] | C. Yu, W. Ma, X. Yang, A time-slot based signal scheme model for fixed-time control at isolated intersections, Transp. Res. Part B: Methodol., 140 (2020), 176–192. https://doi.org/10.1016/j.trb.2020.08.004 doi: 10.1016/j.trb.2020.08.004 |
[11] | A. J. Calle-Laguna, J. Du, H. A. Rakha, Computing optimum traffic signal cycle length considering vehicle delay and fuel consumption, Transp. Res. Interdiscip. Perspect., 3 (2019), 100021. http://doi.org/10.1016/j.trip.2019.100021 doi: 10.1016/j.trip.2019.100021 |
[12] | M. Noaeen, A. Naik, L. Goodman, J. Crebo, T. Abrar, Z. S. H. Abad, et al., Reinforcement learning in urban network traffic signal control: A systematic literature review, Expert Syst. Appl., 199 (2022), 116830. https://doi.org/10.1016/j.eswa.2022.116830 doi: 10.1016/j.eswa.2022.116830 |
[13] | R. Bokade, X. Jin, C. Amato, Multi-agent reinforcement learning based on representational communication for large-scale traffic signal control, IEEE Access, 11 (2023), 47646–47658. https://doi.org/10.1109/ACCESS.2023.3275883 doi: 10.1109/ACCESS.2023.3275883 |
[14] | A. A. A. Alkhatib, K. A. Maria, S. AlZu'bi, E. A. Maria, Smart traffic scheduling for crowded cities road networks, Egypt. Inf. J., 23 (2022), 163–176. https://doi.org/10.1016/j.eij.2022.10.002 doi: 10.1016/j.eij.2022.10.002 |
[15] | M. R. T. Fuad, E. O. Fernandez, F. Mukhlish, A. Putri, H. Y. Sutarto, Y. A. Hidayat, et al., Adaptive deep Q-network algorithm with exponential reward mechanism for traffic control in urban intersection networks, Sustainability, 14 (2022), 14590. https://doi.org/10.3390/su142114590 doi: 10.3390/su142114590 |
[16] | S. Choi, D. Lee, S. Kim, S. Tak, Framework for connected and automated bus rapid transit with sectionalized speed guidance based on deep reinforcement learning: Field test in sejong city, Transp. Res. Part C: Emerging Technol., 148 (2023), 104049. https://doi.org/10.1016/j.trc.2023.104049 doi: 10.1016/j.trc.2023.104049 |
[17] | D. He, J. Kim, H. Shi, B. Ruan, Autonomous anomaly detection on traffic flow time series with reinforcement learning, Transp. Res. Part C: Emerging Technol., 150 (2023), 104089. https://doi.org/10.1016/j.trc.2023.104089 doi: 10.1016/j.trc.2023.104089 |
[18] | D. Li, F. Zhu, T. Chen, Y. D. Wong, C. Zhu, J. Wu, COOR-PLT: A hierarchical control model for coordinating adaptive platoons of connected and autonomous vehicles at signal-free intersections based on deep reinforcement learning, Transp. Res. Part C: Emerging Technol., 146 (2023), 103933, https://doi.org/10.1016/j.trc.2022.103933 doi: 10.1016/j.trc.2022.103933 |
[19] | I. Tunc, M. T. Soylemez, Fuzzy logic and deep Q learning based control for traffic lights, Alexandria Eng. J., 67 (2023), 343–359. https://doi.org/10.1016/j.aej.2022.12.028 doi: 10.1016/j.aej.2022.12.028 |
[20] | M. Gregurić, K. Kušić, E. Ivanjko, Impact of Deep Reinforcement Learning on Variable Speed Limit strategies in connected vehicles environments, Eng. Appl. Artif. Intell., 112 (2022), 104850. https://doi.org/10.1016/j.engappai.2022.104850 doi: 10.1016/j.engappai.2022.104850 |
[21] | B. Liu, Z. Ding, A distributed deep reinforcement learning method for traffic light control, Neurocomputing, 490 (2022), 390–399. https://doi.org/10.1016/j.neucom.2021.11.106 doi: 10.1016/j.neucom.2021.11.106 |
[22] | T. A. Haddad, D. Hedjazi, S. Aouag, A deep reinforcement learning-based cooperative approach for multi-intersection traffic signal control, Eng. Appl. Artif. Intell., 114 (2022), 105019. https://doi.org/10.1016/j.engappai.2022.105019 doi: 10.1016/j.engappai.2022.105019 |
[23] | S. M. A. B. A. Islam, A. Hajbabaie, H. A. A. Aziz, A real-time network-level traffic signal control methodology with partial connected vehicle information, Transp. Res. Part C: Emerging Technol., 121 (2020), 102830. https://doi.org/10.1016/j.trc.2020.102830 doi: 10.1016/j.trc.2020.102830 |
[24] | A. Jaleel, M. A. Hassan, T. Mahmood, M. U. Ghani, A. U. Rehman, Reducing congestion in an intelligent traffic system with collaborative and adaptive signaling on the edge, IEEE Access, 8 (2020), 205396–205410. https://doi.org/10.1109/ACCESS.2020.3037348 doi: 10.1109/ACCESS.2020.3037348 |
[25] | S. Bouktif, A. Cheniki, A. Ouni, H. El-Sayed, Deep reinforcement learning for traffic signal control with consistent state and reward design approach, Knowl.-Based Syst., 267 (2023), 110440, https://doi.org/10.1016/j.knosys.2023.110440 doi: 10.1016/j.knosys.2023.110440 |
[26] | S. Bouktif, A. Cheniki, A. Ouni, Traffic signal control using hybrid action space deep reinforcement learning, Sensors, 21 (2021), 2302. https://doi.org/10.3390/s21072302 doi: 10.3390/s21072302 |
[27] | B. Ye, P. Wu, W. Wu, L. Li, Y. Zhu, B. Chen, Q-learning based traffic signal control method for an isolated intersection, in 2022 China Automation Congress (CAC), (2022), 6063–6068, https://doi.org/10.1109/CAC57257.2022.10054839 |
[28] | Y. Gong, M. Abdel-Aty, Q. Cai, M. S. Rahman, Decentralized network level adaptive signal control by multi-agent deep reinforcement learning, Transp. Res. Interdiscip. Perspect., 1 (2019), 100020. https://doi.org/10.1016/j.trip.2019.100020 doi: 10.1016/j.trip.2019.100020 |
[29] | J. Gu, Y. Fang, Z. Sheng, P. Wen, Double deep Q-network with a dual-agent for traffic signal control, Appl. Sci., 10 (2020), 1622. https://doi.org/10.3390/app10051622 doi: 10.3390/app10051622 |
[30] | W. Ma, L. Wan, C. Yu, L. Zou, J. Zheng, Multi-objective optimization of traffic signals based on vehicle trajectory data at isolated intersections, Transp. Res. Part C: Emerging Technol., 120 (2020), 102821. https://doi.org/10.1016/j.trc.2020.102821 doi: 10.1016/j.trc.2020.102821 |
[31] | A. Lopez, W. Jin, M. A. Al Faruque, Security analysis for fixed-time traffic control systems, Transp. Res. Part B: Methodol., 139 (2020), 473–495. https://doi.org/10.1016/j.trb.2020.07.002 doi: 10.1016/j.trb.2020.07.002 |
[32] | W. Lin, H. Wei, Cyber-physical models for distributed CAV data intelligence in support of self-organized adaptive traffic signal coordination control, Expert Syst. Appl., 224 (2023), 120035. https://doi.org/10.1016/j.eswa.2023.120035 doi: 10.1016/j.eswa.2023.120035 |