交通运输系统工程与信息 ›› 2024, Vol. 24 ›› Issue (3): 127-139.DOI: 10.16097/j.cnki.1009-6744.2024.03.013

• 智能交通系统与信息技术 • 上一篇    下一篇

信控路段混行交通生态驾驶深度强化学习模型

辛琪*1 ,王嘉琪1 ,杨文科1 ,徐猛2 ,袁伟1   

  1. 1. 长安大学,汽车学院,西安 710064;2. 北京交通大学,系统科学学院,北京 100044
  • 收稿日期:2023-12-14 修回日期:2024-01-25 接受日期:2024-01-29 出版日期:2024-06-25 发布日期:2024-06-23
  • 作者简介:辛琪(1987- ),男,陕西咸阳人,副教授
  • 基金资助:
    国家自然科学基金(52002035);长安大学中央高校基本科研业务费专项资金(300102223501,300102223205)

Eco-driving Under Mixed Autonomy at Signalized Intersection: A Deep Reinforcement Learning Model

XIN Qi*1 , WANG Jiaqi1, YANG Wenke1 , XU Meng2 , YUAN Wei1   

  1. 1. School of Automobile, Chang'an University, Xi'an 710064, China; 2. School of Systems Science, Beijing Jiaotong University, Beijing 100044, China
  • Received:2023-12-14 Revised:2024-01-25 Accepted:2024-01-29 Online:2024-06-25 Published:2024-06-23
  • Supported by:
    National Natural Science Foundation of China (52002035);Fundamental Research Funds for the Central Universities, CHD(300102223501,300102223205)

摘要: 针对考虑通过性约束和安全性约束的动态规划模型,其在混行和大流量条件下模型复杂度较高,甚至会出现无解的问题,本文提出一种混行信控路段智能网联车辆生态驾驶轨迹优化的深度强化学习模型。本文所提模型通过设定不同程度的奖惩机制,并采用双延迟深度确定性策略梯度算法优化混行车流中智能网联车辆接近信号交叉口的轨迹。首先,选取车距、速度差、速度、到交叉口距离、排队长度、信号相位及配时等特征作为智能体状态,刻画驾驶安全性和通行效率,特别地,将交叉口排队长度扩增到状态中,解决智能网联车辆因有人驾驶车辆排队而临时停车的问题;其次,构建基于智能体状态和预期到达交叉口时间的多目标奖励函数,同时,优化混行车流下智能网联车辆的效率、能耗、舒适性和安全性,解决动态规划模型约束与求解复杂度关联的问题。仿真训练和测试结果表明,随着智能网联车辆渗透率的提高,车辆在交叉口等待时间显著减少;与无控制相比,能耗降低约5.47%;与动态规划模型相比,能耗降低约4.42%,与基于深度确定性策略梯度轨迹规划模型相比,能耗降低约2.91%。此外,在交通需求和信号周期波动条件下,本文所提模型均可实现智能网联车辆不停车通过信号交叉口。

关键词: 智能交通, 轨迹优化, 双延迟深度确定性策略梯度, 信号交叉口, 智能网联车辆

Abstract: Dynamic programming model with eco-through constraint and safety constraint often causes computational inefficiency and even unfeasible solutions in mixed autonomy and heavy traffic conditions. This paper proposes an eco-driving-oriented and deep reinforcement learning based trajectory optimization model for Connected and Autonomous Vehicles (CAVs) in mixed autonomy. The model uses a compound reward reshaping and a twin delayed deep deterministic policy gradient algorithm to optimize CAV trajectories at the upstream of signalized intersection in mixed autonomy. The vehicular gap, speed difference, speed, distance to intersection, queue length, signal phasing and timing are selected as agent state to describe safety and driving mobility. The queue length is augmented in state representation to mitigate CAV halting possibility caused by queue of human driving vehicles. A multi-objective reward function is established based on agent state and anticipated arrival time at the intersection to optimize the CAV driving mobility, energy efficiency, comfortability, and safety. The proposed model performs better than the dynamic programing model in terms of decoupling the strong correlation between model constraints and computational complexity. The training and testing of the proposed model with simulation demonstrate that the vehicle delay at intersections significantly decreases with the increase of CAV penetration rate. Besides, the energy consumption relatively decreases by 5.47%, 4.42%, and 2.91%, compared to uncontrolled scenarios, dynamic programming-based trajectory optimization model, and deep deterministic policy gradient-based trajectory optimization model. In addition, the proposed model can ensure the CAV to cross the signalized intersection without stopping, and also show robustness against traffic demand and signal cycle.

Key words: intelligent transportation, trajectory optimization, twin delayed deep deterministic policy gradient, signalized intersection, connected and autonomous vehicle

中图分类号: