Research on reinforcement learning based on PPO algorithm for human-machine intervention in autonomous driving

Gaosong Shi; Qinghai Zhao; Jirong Wang; Xin Dong; Gaosong Shi; Qinghai Zhao; Jirong Wang; Xin Dong

doi:10.3934/era.2024111

Electronic Research Archive

2024, Volume 32, Issue 4: 2424-2446. doi: 10.3934/era.2024111

Previous Article Next Article

Research article Special Issues

Research on reinforcement learning based on PPO algorithm for human-machine intervention in autonomous driving

1.
College of Mechanical and Electrical Engineering, Qingdao University, Qingdao 266071, China
2.
National Engineering Research Center for Intelligent Electrical Vehicle Power System, Qingdao 266071, China

Received: 05 November 2023 Revised: 12 March 2024 Accepted: 19 March 2024 Published: 26 March 2024

Given the current limitations in intelligence and processing capabilities, machine learning systems are yet unable to fully tackle diverse scenarios, thereby restricting their potential to completely substitute for human roles in practical applications. Recognizing the robustness and adaptability demonstrated by human drivers in complex environments, autonomous driving training has incorporated driving intervention mechanisms. By integrating these interventions into Proximal Policy Optimization (PPO) algorithms, it becomes possible for drivers to intervene and rectify vehicles' irrational behaviors when necessary, during the training process, thereby significantly accelerating the enhancement of model performance. A human-centric experiential replay mechanism has been developed to increase the efficiency of utilizing driving intervention data. To evaluate the impact of driving intervention on the performance of intelligent agents, experiments were conducted across four distinct intervention frequencies within scenarios involving lane changes and navigation through congested roads. The results demonstrate that the bespoke intervention mechanism markedly improves the model's performance in the initial stages of training, enabling it to overcome local optima through timely driving interventions. Although an increase in intervention frequency typically results in improved model performance, an excessively high intervention rate can detrimentally affect the model's efficiency. To assess the practical applicability of the algorithm, a comprehensive testing scenario that includes lane changes, traffic signals, and congested road sections was devised. The performance of the trained model was evaluated under various traffic conditions. The outcomes reveal that the model can adapt to different traffic flows, successfully and safely navigate the testing segment, and maintain speeds close to the target. These findings highlight the model's robustness and its potential for real-world application, emphasizing the critical role of human intervention in enhancing the safety and reliability of autonomous driving systems.

Keywords:

Citation: Gaosong Shi, Qinghai Zhao, Jirong Wang, Xin Dong. Research on reinforcement learning based on PPO algorithm for human-machine intervention in autonomous driving[J]. Electronic Research Archive, 2024, 32(4): 2424-2446. doi: 10.3934/era.2024111

Related Papers:

[1]	Boshuo Geng, Jianxiao Ma, Shaohu Zhang . Ensemble deep learning-based lane-changing behavior prediction of manually driven vehicles in mixed traffic environments. Electronic Research Archive, 2023, 31(10): 6216-6235. doi: 10.3934/era.2023315
[2]	Mingxing Xu, Hongyi Lin, Yang Liu . A deep learning approach for vehicle velocity prediction considering the influence factors of multiple lanes. Electronic Research Archive, 2023, 31(1): 401-420. doi: 10.3934/era.2023020
[3]	Zhiyuan Wang, Chu Zhang, Shaopei Xue, Yinjie Luo, Jun Chen, Wei Wang, Xingchen Yan . Dynamic coordinated strategy for parking guidance in a mixed driving parking lot involving human-driven and autonomous vehicles. Electronic Research Archive, 2024, 32(1): 523-550. doi: 10.3934/era.2024026
[4]	Lijing Ma, Shiru Qu, Lijun Song, Junxi Zhang, Jie Ren . Human-like car-following modeling based on online driving style recognition. Electronic Research Archive, 2023, 31(6): 3264-3290. doi: 10.3934/era.2023165
[5]	Jian Wan, Peiyun Yang, Wenbo Zhang, Yaxing Cheng, Runlin Cai, Zhiyuan Liu . A taxi detour trajectory detection model based on iBAT and DTW algorithm. Electronic Research Archive, 2022, 30(12): 4507-4529. doi: 10.3934/era.2022229
[6]	Jian Yuan, Hao Liu, Yang Zhang . Automatic detection method of abnormal vibration of engineering electric drive construction machinery. Electronic Research Archive, 2023, 31(10): 6327-6346. doi: 10.3934/era.2023320
[7]	Seongmin Park, Juneyoung Park, Youngkwon Yoon, Jinhee Kim, Jaehyun So . Operation standards for exclusive bus lane on expressway using simulation and traffic big data. Electronic Research Archive, 2024, 32(4): 2323-2341. doi: 10.3934/era.2024106
[8]	Xiaojie Huang, Gaoke Liao . Identifying driving factors of urban digital financial network—based on machine learning methods. Electronic Research Archive, 2022, 30(12): 4716-4739. doi: 10.3934/era.2022239
[9]	Hao Li, Zhengwu Wang, Shuiwang Chen, Weiyao Xu, Lu Hu, Shuai Huang . Integrated optimization of planning and operation of a shared automated electric vehicle system considering the trip selection and opportunity cost. Electronic Research Archive, 2024, 32(1): 41-71. doi: 10.3934/era.2024003
[10]	Ismail Ben Abdallah, Yassine Bouteraa, Saleh Mobayen, Omar Kahouli, Ali Aloui, Mouldi Ben Amara, Maher JEBALI . Fuzzy logic-based vehicle safety estimation using V2V communications and on-board embedded ROS-based architecture for safe traffic management system in hail city. Electronic Research Archive, 2023, 31(8): 5083-5103. doi: 10.3934/era.2023260

Abstract

1. Introduction

The field of transportation has experienced significant transformations in recent years, primarily due to the rapid advancement of autonomous driving technology ^[1]. Deep Reinforcement Learning (DRL) stands as a cornerstone technology in the autonomous driving decision-making process. It enables vehicles to make informed decisions within complex and dynamically changing traffic environments through the use of both simulation and real-world training ^[2,3].

Imitation Learning (IL) and DRL emerge as key subfields within the realm of machine learning methodologies, especially in the arena of end-to-end autonomous driving, where sensor raw data is used as input and the network model directly outputs the final control command of the vehicle ^[4,5,6]. IL aims to mimic human driver behavior by replicating observed control actions under specific scenarios ^[7]. The behavior-planning runtime assurance method, as delineated by Peng et al. ^[8] is based on imitation learning and a responsibility-sensitive safety model. This proposed framework effectively ensures the safety of behavior planning for autonomous vehicles in complex situations and demonstrates notable real-time efficacy. However, the training of deep neural networks requires a large volume of data. Eraqi et al. ^[9] recently introduced a novel IL approach named Dynamic Conditional Imitation Learning (DCIL), assessing its effectiveness through experiments on the CARLA simulator with promising outcomes. Nonetheless, the system's performance may significantly diminish when deployed in novel or uncharted environments. To tackle the intricacies of end-to-end autonomous driving in dense urban settings, Teng et al. ^[10] proposed a new strategy termed Hierarchical Interpretable Imitation Learning (HIIL). This approach seeks to enhance the vehicle's proficiency in navigating through complex and adverse conditions, necessitating the segmentation of tasks into sub-tasks and employing hierarchical solutions to address the resulting complexity. Within semi-structured driving scenarios, Ahn et al. ^[11] introduced a technique called "Imitation Learning for Autonomous Driving using Foresight Points", which has improved autonomous driving performance. Yet, it still relies on the use of foresight points for determining the vehicle's trajectory. Despite IL's advantageous attributes, such as high sample efficiency and swift convergence, it faces two primary challenges. It is critical to acknowledge that intelligent automation systems can replicate driver errors or hazardous maneuvers, thus augmenting the risk of accidents. Moreover, it is important to recognize that while IL models are adept at reproducing known behaviors, they struggle in unfamiliar situations, thereby hindering their decision-making capabilities ^[12].

DRL is a methodology that advances driving strategies through interactive learning between an agent and its environment ^[13,14]. The fundamental goal of this approach is to optimize the total rewards garnered from the agent's experiences ^[15]. Khalil and Mouftah ^[16] developed an innovative method that integrates multimodal fusion with latent deep reinforcement learning to enhance the perception abilities of autonomous vehicles in urban settings. Zhang et al. ^[17] presented a unique technique for urban autonomous driving that utilizes a dictionary ranking-based Actor-Critic deep reinforcement learning strategy. This method allows for the consideration of multiple objectives and facilitates achieving a balanced performance among them. The application of the dictionary ranking approach effectively addresses the challenges present in continuous action domains. Furthermore, Du et al. ^[18] introduced a trajectory planner leveraging deep reinforcement learning to devise an automated parking system. The performance of the parking agent's trajectory planning was examined through simulation experiments. Despite significant progress in DRL, challenges persist, primarily concerning computational or learning efficiencies ^[19,20]. Often, the interaction between the agent and the environment is inefficient, leading to substantial computational and time investments required for model training ^[21]. Additionally, the construction of the reward function is of paramount importance. An inadequately designed reward function can negatively impact the algorithm's convergence rate ^[22].

To overcome these hurdles, a synthesis of IL and DRL has been adopted, featuring a driver intervention mechanism to enhance the capabilities of autonomous driving systems. During the training of DRL models, the integration of driver-driving experiences dynamically enriches the model learning process. Success during this learning phase is continually assessed using driver experiences, with interventions applied as needed for model adjustments. A mechanism for driver-guided experience replay is instituted, allowing the model to iteratively refine towards an optimal configuration. This methodology not only boosts the interaction efficiency between the model and the environment but also maintains the exploratory nature of DRL. Consequently, the model's learning is not solely reliant on the insights derived from driving experiences.

2. Human-machine interactive learning

Within the framework of the Proximal Policy Optimization (PPO) algorithm, this paper introduces a human-computer interaction mechanism to establish the algorithmic structure presented herein, as depicted in Figure 1.

Figure 1. Human-computer interaction deep reinforcement learning.

Assembly	Specifications
CPU	Intel i9-12900H
GPU	NVIDIA RTX3050Ti
Memory	16G
Driving equipment	Kraton 900

Scene	Initial lane	Ego vehicle speed (m/s)	The speed of both obstacle courses (m/s)	Obstacle vehicle location (Forward, m)
Left lane change	left lane	5	3	10
Right lane change	right lane	5	3	10
Following	randomization	-	1–6	5–15

Value type	Intervention plan	Left lane change	Right lane change	Following
Reward	A	-0.20	0.07	0.48
	B	-0.11	0.09	0.61
	C	-0.08	0.11	0.68
	D	-0.13	0.10	0.69
Episode	A	1892	1586	1926
	B	978	893	1455
	C	791	852	886
	D	889	796	874
Training time (s)	A	49,807	48,932	89,320
	B	32,832	36,932	80,019
	C	23,465	31,275	61,023
	D	26,321	32,321	60,693
Passing rate (%)	A	83	82	81
	B	86	85	82
	C	96	96	91
	D	95	96	93

Scene	Intervention plan	Enhancement ratio (%)
Scene	Intervention plan	Episode	Training time	Passing rate
Left lane change	C	57.89	52.89	14.46
Right lane change	C	43.75	36.08	17.07
Following	D	55	31.68	14.81

Participants	Age	Proficiency	Training time	Passing rate (%)
A	22	Unqualified driver	8.5 h	92
B	22	Unqualified driver	8.6 h	93
C	24	Qualified driver	8.3 h	93
D	25	Qualified driver	8.1 h	95
E	24	Skilled driver	7.8 h	95
F	25	Skilled driver	7.7 h	96

[1]	I. Yaqoob, L. U. Khan, S. M. A. Kazmi, M. Imran, N. Guizani, C. S. Hong, Autonomous driving cars in smart cities: Recent advances, requirements, and challenges, IEEE Network, 34 (2020), 174–181. https://doi.org/10.1109/MNET.2019.1900120
[2]	B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Sallab, S. Yogamani, et al., Deep reinforcement learning for autonomous driving: a survey, IEEE Trans. Intell. Transp. Syst., 23 (2022), 4909–4926. https://doi.org/10.1109/TITS.2021.3054625 doi: 10.1109/TITS.2021.3054625
[3]	L. Anzalone, P. Barra, S. Barra, A. Castiglione, M. Nappi, An end-to-end curriculum learning approach for autonomous driving scenarios, IEEE Trans. Intell. Transp. Syst., 23 (2022), 19817–19826. https://doi.org/10.1109/TITS.2022.3160673 doi: 10.1109/TITS.2022.3160673
[4]	J. Hua, L. Zeng, G. Li, Z. Ju, Learning for a Robot: Deep reinforcement learning, imitation learning, transfer learning, Sensors, 21 (2021), 1278. https://doi.org/10.3390/s21041278
[5]	K. Makantasis, M. Kontorinaki, I. Nikolos, Deep reinforcement‐learning‐based driving policy for autonomous road vehicles, IET Intell. Transp. Syst., 14 (2019), 13–24. https://doi.org/10.1049/iet-its.2019.0249 doi: 10.1049/iet-its.2019.0249
[6]	L. L. Mero, D. Yi, M. Dianati, A. Mouzakitis, A survey on imitation learning tech-niques for end-to-end autonomous vehicles, IEEE Trans. Intell. Transp. Syst., 23 (2022), 14128–14147. https://doi.org/10.1109/TITS.2022.3144867 doi: 10.1109/TITS.2022.3144867
[7]	A. Hussein, M. M. Gaber, E. Elyan, C. Jayne, Imitation learning: A survey of learning methods, ACM Comput. Surv., 50 (2017), 1–35. https://doi.org/10.1145/3054912 doi: 10.1145/3054912
[8]	Y. Peng, G. Tan, H. Si, RTA-IR: A runtime assurance framework for behavior planning based on imitation learning and responsibility-sensitive safety model, Expert Syst. Appl., 232 (2023). https://doi.org/10.1016/j.eswa.2023.120824
[9]	H. M. Eraqi, M. N. Moustafa, J. Honer, Dynamic conditional imitation learning for autonomous driving, IEEE Trans. Intell. Transp. Syst., 23 (2022), 22988–23001. https://doi.org/10.1109/TITS.2022.3214079 doi: 10.1109/TITS.2022.3214079
[10]	S. Teng, L. Chen, Y. Ai, Y. Zhou, Z. Xuanyuan, X. Hu, Hierarchical interpretable imitation learning for end-to-end autonomous driving, IEEE Trans. Intell. Transp. Syst., 8 (2023), 673–683. https://doi.org/10.1109/TIV.2022.3225340 doi: 10.1109/TIV.2022.3225340
[11]	J. Ahn, M. Kim, J. Park, Autonomous driving using imitation learning with a look ahead point for semi-structured environments, Sci. Rep., 12 (2022), 21285. https://doi.org/10.1038/s41598-022-23546-6 doi: 10.1038/s41598-022-23546-6
[12]	B. Zheng, S. Verma, J. Zhou, I. W. Tsang, F. Chen, Imitation learning: Progress, taxonomies and challenges, IEEE Trans. Neural Networks Learn. Syst., (2022), 1–16. https://doi.org/10.1109/TNNLS.2022.3213246
[13]	Z. Wu, K. Qiu, H. Gao, Driving policies of V2X autonomous vehicles based on reinforcement learning methods, IET Intell. Transp. Syst., 14 (2020), 331–337. https://doi.org/10.1049/iet-its.2019.0457 doi: 10.1049/iet-its.2019.0457
[14]	C. You, J. Lu, D. Filev, P. Tsiotras, Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning, Rob. Auton. Syst., 114 (2019), 1–18. https://doi.org/10.1016/j.robot.2019.01.003 doi: 10.1016/j.robot.2019.01.003
[15]	D. Zhang, X. Han, C. Deng, Review on the research and practice of deep learning and reinforcement learning in smart grids, CSEE J. Power Energy Syst., 4 (2018), 362–370. https://doi.org/10.17775/CSEEJPES.2018.00520 doi: 10.17775/CSEEJPES.2018.00520
[16]	Y. H. Khalil, H. T. Mouftah, Exploiting multi-modal fusion for urban autonomous driving using latent deep reinforcement learning, IEEE Trans. Veh. Technol., 72 (2023), 2921–2935. https://doi.org/10.1109/TVT.2022.3217299 doi: 10.1109/TVT.2022.3217299
[17]	H. Zhang, Y. Lin, S. Han, K. Lv, Lexicographic actor-critic deep reinforcement learning for urban autonomous driving, IEEE Trans. Veh. Technol., 72 (2023), 4308–4319. https://doi.org/10.1109/TVT.2022.3226579 doi: 10.1109/TVT.2022.3226579
[18]	Z. Du, Q. Miao, C. Zong, Trajectory planning for automated parking systems using deep reinforcement learning, Int. J. Automot. Technol., 21 (2020), 881–887. https://doi.org/10.1007/s12239-020-0085-9 doi: 10.1007/s12239-020-0085-9
[19]	E. O. Neftci, B. B. Averbeck, Reinforcement learning in artificial and biological systems, Nat. Mach. Intell., 1 (2019), 133–143. https://doi.org/10.1038/s42256-019-0025-4 doi: 10.1038/s42256-019-0025-4
[20]	M. L. Littman, Reinforcement learning improves behavior from evaluative feedback, Nature, 521 (2015), 445–451. https://doi.org/10.1038/nature14540 doi: 10.1038/nature14540
[21]	E. O. Neftci, B. B. Averbeck, Reinforcement learning in artificial and biological systems, Nat. Mach. Intell., 1 (2019), 133–143. https://doi.org/10.1038/s42256-019-0025-4 doi: 10.1038/s42256-019-0025-4
[22]	C. Zhu, Y. Cai, J. Zhu, C. Hu, J. Bi, GR(1)-guided deep reinforcement learning for multi-task motion planning under a stochastic environment, Electronics, 11 (2022), 3716. https://doi.org/10.3390/electronics11223716 doi: 10.3390/electronics11223716
[23]	J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, preprint, arXiv: 1707.06347. https://doi.org/10.48550/arXiv.1707.06347
[24]	W. Guan, Z. Cui, X. Zhang, Intelligent smart marine autonomous surface ship decision system based on improved PPO algorithm, Sensors, 22 (2022), 5732. https://doi.org/10.3390/s22155732 doi: 10.3390/s22155732
[25]	J. Han, K. Jo, W. Lim, Y. Lee, K. Ko, E. Sim, et al., Reinforcement learning guided by double replay memory, J. Sens., 2021 (2021), 1–8. https://doi.org/10.1155/2021/6652042 doi: 10.1155/2021/6652042
[26]	H. Liu, A. Trott, R. Socher, C. Xiong, Competitive experience replay, preprint, arXiv: 1902.00528. https://doi.org/10.48550/arXiv.1902.00528
[27]	X. Wang, H. Xiang, Y. Cheng, Q. Yu, Prioritised experience replay based on sample optimization, J. Eng., 2020 (2020), 298–302. https://doi.org/10.1049/joe.2019.1204 doi: 10.1049/joe.2019.1204
[28]	A. Karalakou, D. Troullinos, G. Chalkiadakis, M. Papageorgiou, Deep reinforcement learning reward function design for autonomous driving in lane-free traffic, Systems, 11 (2023), 134. https://doi.org/10.3390/systems11030134 doi: 10.3390/systems11030134
[29]	B. Geng, J. Ma, S. Zhang, Ensemble deep learning-based lane-changing behavior prediction of manually driven vehicles in mixed traffic environments, Electron. Res. Arch., 31 (2023), 6216–6235. https://doi.org/10.3934/era.2023315 doi: 10.3934/era.2023315
[30]	M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, Hindsight experience replay, preprint, arXiv: 1707.01495.
[31]	J. Wu, Z. Huang, Z. Hu, C. Lu, Toward human-in-the-loop AI: Enhancing deep reinforcement learning via real-time human intervention for autonomous driving, Engineering, 21(2023), 75–91. https://doi.org/10.1016/j.eng.2022.05.017 doi: 10.1016/j.eng.2022.05.017
[32]	F. Pan, H. Bao, Preceding vehicle following algorithm with human driving characteristics, Proc. Inst. Mech. Eng., Part D: J. Automob. Eng., 235 (2021), 1825–1834. https://doi.org/10.1177/0954407020981546 doi: 10.1177/0954407020981546
[33]	Y. Zhou, R. Fu, C. Wang, Learning the car-following behavior of drivers using maximum entropy deep inverse reinforcement learning, J. Adv. Transp. , 2020 (2020), 1–13. https://doi.org/10.1155/2020/4752651 doi: 10.1155/2020/4752651
[34]	S. Lee, D. Ngoduy, M. Keyvan-Ekbatani, Integrated deep learning and stochastic car-following model for traffic dynamics on multi-lane freeways, Transp. Res. Part C Emerging Technol., 106 (2019), 360–377. https://doi.org/10.1016/j.trc.2019.07.023 doi: 10.1016/j.trc.2019.07.023

1.	HyeonJun Lee, Jun Hyeok Ji, Bo Hoon Moon, Aye Aye Maw, Jaewoo Lee, 2025, Synergistic Multi-UAV Path and Surveillance Planning Using APF-Enhanced PPO, 978-1-62410-723-8, 10.2514/6.2025-1613
2.	Chao Lv, Ming Zhu, Xiao Guo, Jiajun Ou, Wenjie Lou, Hierarchical reinforcement learning method for long-horizon path planning of stratospheric airship, 2025, 160, 12709638, 110075, 10.1016/j.ast.2025.110075

Electronic Research Archive

Research on reinforcement learning based on PPO algorithm for human-machine intervention in autonomous driving

Related Papers:

Abstract

1. Introduction

2. Human-machine interactive learning

3. Reward function design

3.1. Design of lane change reward function

3.2. Designing a reward function in a following-vehicle scenario

4. Experiment

4.1. Intervention analysis

4.2. Impact of human proficiency and driving qualifications

4.3. Lane changing experiment

4.4. Following experiment

4.5. Full section experiment

5. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog