交通运输系统工程与信息 ›› 2024, Vol. 24 ›› Issue (4): 105-115.DOI: 10.16097/j.cnki.1009-6744.2024.04.011

• 系统工程理论与方法 • 上一篇    下一篇

基于生成对抗模仿学习的路段非机动车行为仿真

魏书樵1 ,倪颖*1 ,孙剑1 ,邱红桐2   

  1. 1. 同济大学,道路与交通工程教育部重点实验室,上海201804;2.公安部交通管理科学研究所,江苏无锡214151
  • 收稿日期:2024-04-24 修回日期:2024-06-10 接受日期:2024-06-17 出版日期:2024-08-25 发布日期:2024-08-21
  • 作者简介:魏书樵(1994- ),男,山东泰安人,博士生。
  • 基金资助:
    国家重点研发计划 (2019YFB1600200);国家自然科学基金 (52072262)。

Generative Adversarial Imitation Learning Based Bicycle Behaviors Simulation on Road Segments

WEIShuqiao1,NI Ying*1,SUN Jian1,QIU Hongtong2   

  1. 1. Key Laboratory of Road and Traffic Engineering of the Ministry of Education, Tongji University, Shanghai 201804, China; 2. Traffic Management Research Institute of the Ministry of Public Security, Wuxi 214151, Jiangsu, China
  • Received:2024-04-24 Revised:2024-06-10 Accepted:2024-06-17 Online:2024-08-25 Published:2024-08-21
  • Supported by:
    NationalKeyResearchandDevelopmentProgram of China (2019YFB1600200);National Natural Science Foundation of China (52072262)。

摘要: 为精准复现路段非机动车干扰行为,满足自动驾驶仿真测试需求,本文提出一种位置奖励增强的生成对抗模仿学习(Position RewardAugmented Generative Adversarial Imitation Learning, PRA-GAIL)方法训练仿真模型。城市道路中,干扰行为主要由电动自行车产生,故以电动自行车作为研究对象。在构建的仿真环境中,使用生成对抗模仿学习(GAIL)更新仿真模型使仿真轨迹逐步逼近真实轨迹,同时加入位置奖励与Lagrangian约束方法以解决现有仿真方法中的均质化和行为不可控的问题。结果表明:在测试集表现上,GAIL和PRA-GAIL方法平均每步长距离误差相比于常用的行为克隆方法下降了61.7%和65.8%。在行为层仿真精度上,与GAIL相比,PRA GAIL的加速度分布与真实分布间的KL散度显著降低,越线、超车数量的百分比误差下降了7.2%和20.2%。使用Lagrangian方法添加安全约束使有危险行为的智能体数量相比于常用的奖励增强方法下降了75.8%。在轨迹层仿真精度上,整体仿真环境下,PRA-GAIL的平均每步长距离误差相比于GAIL下降了17.5%。本文模型真实再现了非机动车超车时的操作空间,说明PRAGAIL方法对非机动车行为仿真有良好的适用性。本文提出的改动有效提升了仿真效果,最终所得的仿真模型能够真实地再现路段非机动车的干扰行为,能够应用于自动驾驶仿真测试。

关键词: 交通工程, 非机动车行为, 强化学习, 生成对抗模仿学习, 自动驾驶测试, 微观交通仿真

Abstract: In order to accurately reproduce the interaction behavior of bicycles to meet the needs of autonomous driving simulation testing, a Position Reward Augmented Generative Adversarial Imitation Learning (PRA-GAIL) method is proposed. In urban roads, since the disturbance behavior is mainly generated by electric bicycles, electric bicycles are selected as the research object. In the constructed simulation environment, Generative Adversarial Imitation Learning (GAIL) is used to make the simulated trajectories approximate the real trajectories, while Position Reward and Lagrangian Constraint methods are added to solve the homogenization and uncontrollable behaviors of existing simulation methods. In the test set validation, the average displacement error of the GAIL and PRA-GAIL methods decreased by 61.7% and 65.8% , respectively, compared to the behavioral cloning method. In the behavioral performance validation, the KL divergence of acceleration distributions between simulation and reality was significantly reduced in PRA-GAIL compared to GAIL, and the percentage error of overtaking and illegal lane changing behaviors decreased by 7.2% and 20.2%, respectively. Using the Lagrangian method to add constraints resulted in a 75.8% reduction in the number of agents with risky behavior compared to commonly used reward augmentation methods. In trajectory validation, in the simulation environment, the average displacement error of PRAGAIL is reduced by 17.5% compared to GAIL. The resulting model realistically reproduces the overtaking maneuver space of cyclists. The results show that the method adopted in this paper is suitable for bicycle behavior simulation, the proposed modifications effectively enhance the simulation performance, and the obtained simulation model accurately reproduces the disturbance behavior of bicycles on road segments, which can be applied to automated vehicle simulation tests.

Key words: traffic engineering, bicycle behavior, reinforcement learning, generative adversarial imitation learning, automatic vehicle test, micro traffic simulation

中图分类号: