交通运输系统工程与信息 ›› 2023, Vol. 23 ›› Issue (3): 110-122.DOI: 10.16097/j.cnki.1009-6744.2023.03.013

• 智能交通系统与信息技术 • 上一篇    下一篇

混合交通流环境下基于改进强化学习的可变限速控制策略

韩磊a,张轮*a,郭为安b,c   

  1. 同济大学,a.道路与交通工程教育部重点实验室;b.电子与信息工程学院;c.中德工程学院,上海201804
  • 收稿日期:2023-03-13 修回日期:2023-04-20 接受日期:2023-04-23 出版日期:2023-06-25 发布日期:2023-06-23
  • 作者简介:韩磊(1994-),男,安徽六安人,博士生
  • 基金资助:
    国家自然科学基金(71771176,U20A20330);上海市自然科学基金项目(20692191200)

Variable Speed Limit Control Based on Improved Dueling Double Deep Q Network Under Mixed Traffic Environment

HAN Leia, ZHANG Lun*a, GUO Wei-anb,c   

  1. a. Key Laboratory of Road and Traffic Engineering of the Ministry of Education; b. College of Electronic and Information Engineering; c. Sino-German College of Applied Sciences, Tongji University, Shanghai 201804, China
  • Received:2023-03-13 Revised:2023-04-20 Accepted:2023-04-23 Online:2023-06-25 Published:2023-06-23
  • Supported by:
    National Natural Science Foundation of China (71771176 U20A20330);Natural Science Foundation of Shanghai, China (20692191200)

摘要: 现有的可变限速(VSL)控制策略灵活性较差,响应速度较慢,对驾驶人遵从度和交通流状态预测模型的依赖性较高,且单纯依靠可变限速标志(VMS)向驾驶人发布限速值,难以在智能网联车辆(CAVS)与人工驾驶车辆(HDVS)混行的交通环境中实现较好的控制效果。对此,结合深度强化学习无需建立交通流预测模型,能自动适应复杂环境,以及CAVs可控性的优势,提出一种混合交通流环境下基于改进竞争双深度Q网络(PD3QN)的VSL控制策略,即IPD3QN-VSL。首先,将优先经验回放机制引入深度强化学习的竞争双深度Q网络(D3Q)框架中,提升网络的收敛速度和参数更新效率;并提出一种新的自适应ε-贪婪算法克服深度强化学习过程中探索与利用难以平衡的问题,实现探索效率和稳定性的提高。其次,以最小化路段内车辆总出行时间(TTS)为控制目标,将实时交通数据和上个控制周期内的限速值作为PD3QN算法的输入,构造奖励函数引导算法输出VSL控制区域内执行的动态限速值。该策略通过基础设施到车辆通信(I2V)向CAVs发布限速信息,HDVs则根据VMS上公布的限速值以及周围CAVs的行为变化做出决策。 最后,在不同条件下验证IPD3QN-VSL控制策略的有效性,并与无控制情况、反馈式VSL控制和D3QN-VSL控制进行控制效果上的优劣对比。结果表明:在30%渗透率下,所提策略即可发挥显著控制性能,在稳定和波动交通需求情境中均能有效提升瓶颈区域的通行效率,缩小交通拥堵时空范围,与次优的D3QN-VSL控制相比,两种情境中的TTS分别改善了14.46%和10.36%。

关键词: 智能交通, 可变限速控制, 改进竞争双深度Q网络, 混合交通流, 智能网联车辆, 深度强化学习

Abstract: Existing variable speed limit (VSL) control strategies suffer from poor flexibility, slow response time, and a heavy reliance on the compliance rate and traffic flow prediction models. Additionally, it is difficult to achieve effective control by relying solely on variable message signs (VMS) to post speed limits to drivers in the mixed traffic environment where connected and automated vehicles (CAVs) and human-driven vehicles (HDVs) coexist. To this end, this paper proposes a VSL control strategy based on the improved dueling double deep Q network (IPD3QN) under the mixed traffic flow environment, i.e., IPD3QN-VSL. This strategy integrates the ability of deep reinforcement learning to automatically adapt to complex environments without establishing traffic flow prediction models, and the advantages of controllability of CAVs. Firstly, the prioritized experience replay mechanism is introduced into the dueling double deep Q network (D3QN) framework of deep reinforcement learning to enhance the convergence speed and parameter update efficiency of the network. Meanwhile, a novel adaptive 𝜀-greedy algorithm is proposed to solve the problem of balance between exploration and utilization in D3QN’s learning process. The proposed VSL control strategy aims to minimize the total time spent (TTS) of vehicles on the freeway section. Real-time traffic data and speed limits within the previous control cycle are used as inputs to the IPD3QN algorithm. Then, a reward function is constructed to guide the algorithm to generate the dynamic speed limit value executed in the VSL control area. Finally, the effectiveness of the IPD3QN-VSL control strategy is verified under different conditions and compared to no control, feedback control, and D3QN-VSL control in terms of control performance. Analysis results indicate that the proposed strategy can achieve remarkable control performance at a 30% penetration rate and effectively improve bottleneck traffic efficiency and reduce the spatiotemporal range of traffic congestion in both stable and fluctuating demand scenarios. Compared to the suboptimal D3QN-VSL control, the proposed strategy can achieve improvements of 14.46% and 10.36% on TTS in stable and fluctuating traffic demand scenarios, respectively.

Key words: intelligent transportation, variable speed limit control, improved dueling double deep Q network, mixed traffic flow, connected and automated vehicles, deep reinforcement learning

中图分类号: