交通运输系统工程与信息 ›› 2024, Vol. 24 ›› Issue (2): 105-113.DOI: 10.16097/j.cnki.1009-6744.2024.02.011

• 智能交通系统与信息技术 • 上一篇    下一篇

基于柔性演员-评论家算法的决策规划协同研究

唐斌*1 ,刘光耀1 ,江浩斌1 ,田宁1 ,米伟1 ,王春宏2   

  1. 1. 江苏大学,汽车工程研究院,江苏镇江212013;2.江苏罡阳转向系统有限公司,江苏泰州225318
  • 收稿日期:2023-12-31 修回日期:2024-02-17 接受日期:2024-02-26 出版日期:2024-04-25 发布日期:2024-04-25
  • 作者简介:唐斌(1983- ),男,江苏泰兴人,副教授,博士。
  • 基金资助:
    国家自然科学基金 (52225212);江苏省六大人才高峰项目 (2019-GDZB-084);泰州市科技支撑计划项目(TG202307)。

Collaborative Study of Decision-making and Trajectory Planning for Autonomous Driving Based on Soft Acto-Critic Algorithm

TANGBin*1,LIU Guangyao1,JIANG Haobin1,TIAN Ning1,MI Wei1,WANG Chunhong2   

  1. 1. Automotive Engineering Research Institute, Jiangsu University, Zhenjiang 212013, Jiangsu, China; 2. Jiangsu Gangyang Steering System Co Ltd, Taizhou 225318, Jiangsu, China
  • Received:2023-12-31 Revised:2024-02-17 Accepted:2024-02-26 Online:2024-04-25 Published:2024-04-25
  • Supported by:
    NationalNaturalScienceFoundation of China (52225212);SixTalent Peaks Project of Jiangsu Province (2019-GDZB-084); Key Science and Technology Support Program in Taizhou (TG202307)。

摘要: 为了解决基于常规深度强化学习(DeepReinforcementLearning,DRL)的自动驾驶决策存在学习速度慢、安全性及合理性较差的问题,本文提出一种基于柔性演员-评论家(SoftActor-Critic, SAC)算法的自动驾驶决策规划协同方法,并将SAC算法与基于规则的决策规划方法相结合设计自动驾驶决策规划协同智能体。结合自注意力机制(SelfAttentionMechanism,SAM)和门控循环单元(Gate Recurrent Unit, GRU)构建预处理网络;根据规划模块的具体实现方式设计动作空间;运用信息反馈思想设计奖励函数,给智能体添加车辆行驶条件约束,并将轨迹信息传递给决策模块,实现决策规划的信息协同。在CARLA自动驾驶仿真平台中搭建交通场景对智能体进行训练,并在不同场景中将所提出的决策规划协同方法与常规的基于SAC算法的决策规划方法进行比较,结果表明,本文所设计的自动驾驶决策规划协同智能体学习速度提高了25.10%,由其决策结果生成的平均车速更高,车速变化率更小,更接近道路期望车速,路径长度与曲率变化率更小。

关键词: 智能交通, 自动驾驶, 柔性演员-评论家算法, 决策规划协同, 深度强化学习

Abstract: To improve the learning speed, safety and rationality of autonomous driving decision-making, this paper proposed a collaborative method of autonomous driving decision-making and planning based on Soft Actor-Critic (SAC) algorithm. The autonomous driving decision planning collaborative agent was designed by introducing the SAC algorithm with the rule-based decision planning method. Combined with the Self-Attention Mechanism (SAM) and the Gated Recurrent Unit (GRU), a preprocessing network was constructed to improve the agent's ability to understand traffic scenarios and improve the agent's learning speed. Considering the specific implementation mode of the planning module, the study used the action space to improve the executability of the decision- making results. The reward function was designed by using the information feedback, adding the constraints of vehicle driving conditions to the agent, and transmitting the trajectory information to the decision-making module. The collaboration of decision-making and planning improved the safety and rationality of decision-making. The dynamic traffic scenarios were built in the CARLAautonomous driving simulation platform to train the agent, and the proposed decision-making and planning collaboration method was compared with the conventional decision-making planning method based on SAC algorithm in different scenarios. The experimental results showed that the learning speed of the agent designed in this paper had increased by 25.10%. The average vehicle speed generated by its decision outcomes was higher and closer to the expected road speed. The speed variation rate produced by its decision outcomes was smaller, and the path length and curvature variation rate resulting from its decision outcomes were also smaller compared to traditional methods.

Key words: intelligent transportation, autonomous driving, soft actor-critic algorithm, collaborative decision and planning, deep reinforcement learning

中图分类号: