Research article Special Issues

CrowdGAIL: A spatiotemporal aware method for agent navigation

  • Received: 08 October 2022 Revised: 07 December 2022 Accepted: 11 December 2022 Published: 20 December 2022
  • Agent navigation has been a crucial task in today's service and automated factories. Many efforts are to set specific rules for agents in a certain scenario to regulate the agent's behaviors. However, not all situations could be in advance considered, which might lead to terrible performance in a real-world application. In this paper, we propose CrowdGAIL, a method to learn from expert behaviors as an instructing policy, can train most 'human-like' agents in navigation problems without manually setting any reward function or beforehand regulations. First, the proposed model structure is based on generative adversarial imitation learning (GAIL), which imitates how humans take actions and move toward the target to a maximum extent, and by comparison, we prove the advantage of proximal policy optimization (PPO) to trust region policy optimization, thus, GAIL-PPO is what we base. Second, we design a special Sequential DemoBuffer compatible with the inner long short-term memory structure to apply spatiotemporal instruction on the agent's next step. Third, the paper demonstrates the potential of the model with an integrated social manner in a multi-agent scenario by considering human collision avoidance as well as social comfort distance. At last, experiments on the generated dataset from CrowdNav verify how close our model would act like a human being in the trajectory aspect and also how it could guide the multi-agents by avoiding any collision. Under the same evaluation metrics, CrowdGAIL shows better results compared with classic Social-GAN.

    Citation: Longchao Da, Hua Wei. CrowdGAIL: A spatiotemporal aware method for agent navigation[J]. Electronic Research Archive, 2023, 31(2): 1134-1146. doi: 10.3934/era.2023057

    Related Papers:

  • Agent navigation has been a crucial task in today's service and automated factories. Many efforts are to set specific rules for agents in a certain scenario to regulate the agent's behaviors. However, not all situations could be in advance considered, which might lead to terrible performance in a real-world application. In this paper, we propose CrowdGAIL, a method to learn from expert behaviors as an instructing policy, can train most 'human-like' agents in navigation problems without manually setting any reward function or beforehand regulations. First, the proposed model structure is based on generative adversarial imitation learning (GAIL), which imitates how humans take actions and move toward the target to a maximum extent, and by comparison, we prove the advantage of proximal policy optimization (PPO) to trust region policy optimization, thus, GAIL-PPO is what we base. Second, we design a special Sequential DemoBuffer compatible with the inner long short-term memory structure to apply spatiotemporal instruction on the agent's next step. Third, the paper demonstrates the potential of the model with an integrated social manner in a multi-agent scenario by considering human collision avoidance as well as social comfort distance. At last, experiments on the generated dataset from CrowdNav verify how close our model would act like a human being in the trajectory aspect and also how it could guide the multi-agents by avoiding any collision. Under the same evaluation metrics, CrowdGAIL shows better results compared with classic Social-GAN.



    加载中


    [1] D. Helbing, P. Molnar, Social force model for pedestrian dynamics, Phys. Rev. E, 51 (1995). https://doi.org/10.1103/PhysRevE.51.4282
    [2] F. Zanlungo, T. Ikeda, T. Kanda, Social force model with explicit collision prediction, Europhys. Lett., 93 (2011). https://doi.org/10.1209/0295-5075/93/68005
    [3] R. Zhou, Y. Cui, Y. Wang, J. Jiang, A modified social force model with different categories of pedestrians for subway station evacuation, Tunnelling Underground Space Technol., 110 (2021), 103837. https://doi.org/10.1016/j.tust.2021.103837 doi: 10.1016/j.tust.2021.103837
    [4] S. Pellegrini, A. Ess, K. Schindler, L. Gool, You'll never walk alone: modeling social behavior for multi-target tracking, in IEEE 12th international conference on computer vision, (2009), 261–268. https://doi.org/10.1109/ICCV.2009.5459260
    [5] T. Fan, X. Cheng, J. Pan, P. Long, W. Liu, R. Yang, et al., Getting robots unfrozen and unlost in dense pedestrian crowds, IEEE Rob. Autom. Lett., 4 (2019), 1178–1185. https://doi.org/10.1109/LRA.2019.2891491 doi: 10.1109/LRA.2019.2891491
    [6] A. Ravankar, A. A. Ravankar, Y. Kobayashi, Y. Hoshino, C. Peng, Path smoothing techniques in robot navigation: State-of-the-Art, current and future challenges, Sensors, 18 (2018), 3170. https://doi.org/10.3390/s18093170 doi: 10.3390/s18093170
    [7] J. Schulman, S. Levine, P. Abbeel, M. Jordan, P. Moritz, Trust region policy optimization, arXiv preprint, (2015), arXiv: 1502.05477. https://doi.org/10.48550/arXiv.1502.05477
    [8] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, arXiv preprint, (2017), arXiv: 1707.06347. https://doi.org/10.48550/arXiv.1707.06347
    [9] A. Mohamed, K. Qian, M. Elhoseiny, C. Claudel, Social-stgcnn: a social spatio-temporal graph convolutional neural network for human trajectory prediction, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 14424–14432. https://doi.org/10.1109/CVPR42600.2020.01443
    [10] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, F. Li, S. Savarese, Social LSTM: human trajectory prediction in crowded spaces, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 961–971. https://doi.org/10.1109/CVPR.2016.110
    [11] H. Song, D. Luan, W. Ding, M. Wang, Q. Chen, Learning to predict vehicle trajectories with model-based planning, arXiv preprint, (2022), arXiv: 2103.04027. https://doi.org/10.48550/arXiv.2103.04027
    [12] Y. Yuan, X. Weng, Y. Ou, K. Kitani, Agentformer: agent-aware transformers for socio-temporal multi-agent forecasting, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 9813–9823. https://doi.org/10.1109/ICCV48922.2021.00967
    [13] A. Hussein, M. M. Gaber, E. Elyan, C. Jayne, Imitation learning: a survey of learning methods, ACM Comput. Surv., 50 (2017), 1–35. https://doi.org/10.1145/3054912 doi: 10.1145/3054912
    [14] A. Aggarwal, M. Mittal, G. Battineni, Generative adversarial network: an overview of theory and applications, Int. J. Inf. Manage. Data Insights, 1 (2021), 100004. https://doi.org/10.1016/j.jjimei.2020.100004 doi: 10.1016/j.jjimei.2020.100004
    [15] A. Kuefler, J. Morton, T. Wheeler, M. Kochenderfer, Imitating driver behavior with generative adversarial networks, in 2017 IEEE Intelligent Vehicles Symposium (IV), (2017), 204–211. https://doi.org/10.1109/IVS.2017.7995721
    [16] Y. Mao, F. Gao, Q. Zhang, Z. Yang, An AUV target-tracking method combining imitation learning and deep reinforcement learning, J. Mar. Sci. Eng., 10 (2022), 383. https://doi.org/10.3390/jmse10030383 doi: 10.3390/jmse10030383
    [17] S. Samsani, M. Muhammad, Socially compliant robot navigation in crowded environment by human behavior resemblance using deep reinforcement learning, IEEE Rob. Autom. Lett., 6 (2021), 5223–5230. https://doi.org/10.1109/LRA.2021.3071954 doi: 10.1109/LRA.2021.3071954
    [18] C. Chen, Y. Liu, S. Kreiss, A. Alahi, Crowd-robot interaction: crowd-aware robot navigation with attention-based deep reinforcement learning, in 2019 International Conference on Robotics and Automation (ICRA), (2019), 6015–6022. https://doi.org/10.1109/ICRA.2019.8794134
    [19] K. Guo, D. Wang, T. Fan, J. Pan, VR-ORCA: variable responsibility optimal reciprocal collision avoidance, IEEE Rob. Autom. Lett., 6 (2021), 4520–4527. https://doi.org/10.1109/LRA.2021.3067851 doi: 10.1109/LRA.2021.3067851
    [20] J. Ho, S. Ermon, Generative adversarial imitation learning, arXiv preprint, (2016), arXiv: 1606.03476. https://doi.org/10.48550/arXiv.1606.03476
    [21] L. Mero, D. Yi, M. Dianati, A. Mouzakitis, A survey on imitation learning techniques for end-to-end autonomous vehicles, IEEE Trans. Intell. Transp. Syst., 23 (2022), 14128–14147. https://doi.org/10.1109/TITS.2022.3144867 doi: 10.1109/TITS.2022.3144867
    [22] S. Arora, P. Doshi, A survey of inverse reinforcement learning: challenges, methods and progress, Artif. Intell., 297 (2021), 103500. https://doi.org/10.1016/j.artint.2021.103500 doi: 10.1016/j.artint.2021.103500
    [23] T. Kim, J. Oh, N. Kim, S. Cho, S. Yun, Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation, arXiv preprint, (2021), arXiv: 2105.08919. https://doi.org/10.48550/arXiv.2105.08919
    [24] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural comput., 9 (1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
    [25] D. Fox, W. Burgard, S. Thrun, The dynamic window approach to collision avoidance, IEEE Rob. Autom. Mag. IEEE, 4 (1997), 23–33. https://doi.org/10.1109/100.580977 doi: 10.1109/100.580977
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1759) PDF downloads(91) Cited by(1)

Article outline

Figures and Tables

Figures(6)  /  Tables(1)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog