交通运输系统工程与信息 ›› 2026, Vol. 26 ›› Issue (1): 45-54.DOI: 10.16097/j.cnki.1009-6744.2026.01.005

• 智能交通系统与信息技术 • 上一篇    下一篇

混合交通环境下基于动态决策间隔的强化学习信号控制方法

王福建1a,马佳豪1b,李廷浩1b,马东方*2   

  1. 1. 浙江大学,a.建筑工程学院,智能交通研究所,b.工程师学院,智能交通研究所,杭州310058;2. 浙江大学,海洋学院,海洋传感与网络研究所,浙江舟山316021
  • 收稿日期:2025-10-17 修回日期:2025-12-03 接受日期:2025-12-17 出版日期:2026-02-25 发布日期:2026-02-13
  • 作者简介:王福建(1969—),男,安徽阜阳人,副教授,博士。
  • 基金资助:
    国家自然科学基金(52172334);浙江省智能交通工程技术研究中心开放课题项目(2023ERCITZJ-KF09)。

A Reinforcement Learning Signal Control Method Based on Dynamic Decision Intervals in Mixed Traffic Environments

WANG Fujian1a, MA Jiahao1b, LI Tinghao1b, MA Dongfang*2   

  1. 1a. Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, 1b. Institute of Intelligent Transportation Systems, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China; 2. Institute of Ocean Sensing and Networking, Ocean College, Zhejiang University, Zhoushan 316021, Zhejiang, China
  • Received:2025-10-17 Revised:2025-12-03 Accepted:2025-12-17 Online:2026-02-25 Published:2026-02-13
  • Supported by:
    National Natural Science Foundation of China (52172334);Opening Foundation of Zhejiang Intelligent Transportation Engineering Technology Research Center(2023ERCITZJ-KF09)。

摘要: 智能网联车辆(Connected and Automated Vehicle, CAV)为交通信号控制提供了新的数据源与优化机遇。然而,现有方法普遍存在两大局限:其一,多采用固定决策间隔,难以适应交通流的动态变化,导致控制策略的全局最优性不足;其二,缺乏对低渗透率场景下混合交通流复杂交互特征的深入建模,限制了实际应用的鲁棒性。为此,本文提出一种基于近端策略优化(Proximal Policy Optimization,PPO)的动态决策间隔信号控制方法。首先,利用卷积神经网络与多头注意力机制,构建融合CAV与常规车辆(Regular Vehicle, RV)的多源交通状态表征;进而,设计融合动态决策间隔与相位选择的多离散动作空间,自适应生成信号控制策略,平衡决策效率与控制灵活性。在奖励函数设计中,引入累计延误、排队长度与延误标准差的多目标自适应加权机制,协同优化通行效率与公平性。基于实际路网仿真测试模型控制效果,结果表明:在不同交通需求下,本文方法相较于传统离散控制方法,平均等待时间和平均排队长度均降低8.50%以上;尤其在CAV渗透率低至20%时,本文方法仍能保持稳定的控制性能,验证了其在混合交通环境中的有效性与强适应性。

关键词: 智能交通, 交通工程, 深度强化学习, 混合交通环境, 动态决策间隔, 交通信号控制

Abstract: Connected and Automated Vehicles (CAV) offer novel data sources and optimization opportunities for traffic signal control. However, the existing methods are generally limited in two aspects: first, most methods rely on fixed decision intervals, which struggle to adapt to the dynamic variations of traffic flow, leading to insufficient global optimality of control strategies; second, there is a lack of in-depth modeling of the complex interaction characteristics of mixed traffic flow in low-penetration scenarios, which restricts the robustness of practical applications. To address these issues, this paper proposes a dynamic decision interval signal control method based on Proximal Policy Optimization (PPO). The approach first constructs a multi-source traffic state representation that integrates information from both CAV and Regular Vehicle (RV) by employing Convolutional Neural Networks (CNN) and a multi-head attention mechanism. Subsequently, it designs a multi-discrete action space that combines dynamic decision intervals with phase selection to adaptively generate signal control strategies, thereby balancing decision- efficiency and control flexibility. In the design of the reward function, a multi-objective adaptive weighting mechanism for cumulative delay, queue length, and delay standard deviation is introduced to co-optimize traffic efficiency and fairness. The simulation tests based on real-world road networks demonstrate the control effectiveness of the proposed model. The results indicate that under varying traffic demands, the proposed method reduces both the average waiting time and the average queue length by over 8.50% compared to the traditional discrete control methods. Notably, the method maintains stable control performance even when the CAV penetration rate is as low as 20%, validating its effectiveness and strong adaptability in mixed traffic environments.

Key words: intelligent transportation, traffic engineering, deep reinforcement learning, mixed traffic environment, dynamic decision interval, traffic signal control

中图分类号: