混合交通环境下基于动态决策间隔的强化学习信号控制方法

doi:10.16097/j.cnki.1009-6744.2026.01.005

交通运输系统工程与信息 ›› 2026, Vol. 26 ›› Issue (1): 45-54.DOI: 10.16097/j.cnki.1009-6744.2026.01.005

• 智能交通系统与信息技术 • 上一篇下一篇

混合交通环境下基于动态决策间隔的强化学习信号控制方法

王福建^1a，马佳豪^1b，李廷浩^1b，马东方^*2

1. 浙江大学，a.建筑工程学院，智能交通研究所，b.工程师学院，智能交通研究所，杭州310058；2. 浙江大学，海洋学院，海洋传感与网络研究所，浙江舟山316021

收稿日期:2025-10-17 修回日期:2025-12-03 接受日期:2025-12-17 出版日期:2026-02-25 发布日期:2026-02-13
作者简介:王福建（1969—），男，安徽阜阳人，副教授，博士。
基金资助:
国家自然科学基金(52172334)；浙江省智能交通工程技术研究中心开放课题项目(2023ERCITZJ-KF09)。

A Reinforcement Learning Signal Control Method Based on Dynamic Decision Intervals in Mixed Traffic Environments

WANG Fujian^1a, MA Jiahao^1b, LI Tinghao^1b, MA Dongfang^*2

1a. Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, 1b. Institute of Intelligent Transportation Systems, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China; 2. Institute of Ocean Sensing and Networking, Ocean College, Zhejiang University, Zhoushan 316021, Zhejiang, China

Received:2025-10-17 Revised:2025-12-03 Accepted:2025-12-17 Online:2026-02-25 Published:2026-02-13
Supported by:
National Natural Science Foundation of China (52172334)；Opening Foundation of Zhejiang Intelligent Transportation Engineering Technology Research Center(2023ERCITZJ-KF09)。

摘要/Abstract

摘要： 智能网联车辆（Connected and Automated Vehicle, CAV）为交通信号控制提供了新的数据源与优化机遇。然而，现有方法普遍存在两大局限：其一，多采用固定决策间隔，难以适应交通流的动态变化，导致控制策略的全局最优性不足；其二，缺乏对低渗透率场景下混合交通流复杂交互特征的深入建模，限制了实际应用的鲁棒性。为此，本文提出一种基于近端策略优化（Proximal Policy Optimization,PPO）的动态决策间隔信号控制方法。首先，利用卷积神经网络与多头注意力机制，构建融合CAV与常规车辆（Regular Vehicle, RV）的多源交通状态表征；进而，设计融合动态决策间隔与相位选择的多离散动作空间，自适应生成信号控制策略，平衡决策效率与控制灵活性。在奖励函数设计中，引入累计延误、排队长度与延误标准差的多目标自适应加权机制，协同优化通行效率与公平性。基于实际路网仿真测试模型控制效果，结果表明：在不同交通需求下，本文方法相较于传统离散控制方法，平均等待时间和平均排队长度均降低8.50%以上；尤其在CAV渗透率低至20%时，本文方法仍能保持稳定的控制性能，验证了其在混合交通环境中的有效性与强适应性。

关键词: 智能交通, 交通工程, 深度强化学习, 混合交通环境, 动态决策间隔, 交通信号控制

Abstract: Connected and Automated Vehicles (CAV) offer novel data sources and optimization opportunities for traffic signal control. However, the existing methods are generally limited in two aspects: first, most methods rely on fixed decision intervals, which struggle to adapt to the dynamic variations of traffic flow, leading to insufficient global optimality of control strategies; second, there is a lack of in-depth modeling of the complex interaction characteristics of mixed traffic flow in low-penetration scenarios, which restricts the robustness of practical applications. To address these issues, this paper proposes a dynamic decision interval signal control method based on Proximal Policy Optimization (PPO). The approach first constructs a multi-source traffic state representation that integrates information from both CAV and Regular Vehicle (RV) by employing Convolutional Neural Networks (CNN) and a multi-head attention mechanism. Subsequently, it designs a multi-discrete action space that combines dynamic decision intervals with phase selection to adaptively generate signal control strategies, thereby balancing decision- efficiency and control flexibility. In the design of the reward function, a multi-objective adaptive weighting mechanism for cumulative delay, queue length, and delay standard deviation is introduced to co-optimize traffic efficiency and fairness. The simulation tests based on real-world road networks demonstrate the control effectiveness of the proposed model. The results indicate that under varying traffic demands, the proposed method reduces both the average waiting time and the average queue length by over 8.50% compared to the traditional discrete control methods. Notably, the method maintains stable control performance even when the CAV penetration rate is as low as 20%, validating its effectiveness and strong adaptability in mixed traffic environments.

Key words: intelligent transportation, traffic engineering, deep reinforcement learning, mixed traffic environment, dynamic decision interval, traffic signal control

中图分类号:

U495

王福建, 马佳豪, 李廷浩, 马东方. 混合交通环境下基于动态决策间隔的强化学习信号控制方法[J]. 交通运输系统工程与信息, 2026, 26(1): 45-54.

WANG Fujian, MA Jiahao, LI Tinghao, MA Dongfang. A Reinforcement Learning Signal Control Method Based on Dynamic Decision Intervals in Mixed Traffic Environments[J]. Journal of Transportation Systems Engineering and Information Technology, 2026, 26(1): 45-54.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: http://www.tseit.org.cn/CN/10.16097/j.cnki.1009-6744.2026.01.005

http://www.tseit.org.cn/CN/Y2026/V26/I1/45

参考文献

[1]徐东伟,周磊,王达,等.基于深度强化学习的城市交通信号控制综述[J].交通运输工程与信息学报,2022, 20(1): 15-30. [XU D W, ZHOU L, WANG D, et al. Overview of reinforcement learning-based urban traffic signal control[J]. Journal of Transportation Engineering and Information, 2022, 20(1): 15-30.]

[2]马万经,李金珏,俞春辉.智能网联混合交通流交叉口控制:研究进展与前沿[J].中国公路学报,2023,36(2): 22-40. [MA W J, LI J J, YU C H. China intersection control in mixed traffic with connected automated vehicles: A review of recent developments and research frontiers[J]. China Journal of Highway and Transport, 2023, 36(2): 22-40.]

[3]YANG T, FAN W. Enhancing robustness of deep reinforcement learning based adaptive traffic signal controllers in mixed traffic environments through data fusion and multi-discrete actions[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(10): 14196-14208.

[4]陈喜群,朱奕璋,吕朝锋.基于混合近端策略优化的交叉口信号相位与配时优化方法[J].交通运输系统工程与信息,2023, 23(1): 106-113. [CHEN X Q, ZHU Y Z, LV C F. Signal phase and timing optimization method for intersection based on hybrid proximal policy optimization [J]. Journal of Transportation Systems Engineering and Information Technology, 2023, 23(1): 106-113.]

[5]陈喜群,朱奕璋,谢宁珂,等.基于异构多智能体自注意力网络的路网信号协调顺序优化方法[J].交通运输系统工程与信息,2024,24(3): 114-126. [CHEN X Q, ZHU Y Z, XIE N K. Coordinated sequential optimization for network-wide traffic signal control based on heterogeneous multi-agent transformer, 2024, 24(3): 114 -126.]

[6]SHABESTARY S M A, ABDULHAI B. Adaptive traffic signal control with deep reinforcement learning and high dimensional sensory comprehensive inputs: sensitivity Case study and analyses[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(11): 20021-20035.

[7]MAO F, LI Z, LI L. A comparison of deep reinforcement learning models for isolated traffic signal control[J]. IEEE Intelligent Transportation Systems Magazine, 2022, 15(1): 160-180.

[8]王福建,范诚睿,周斌,等.基于多维时空层递的交通信号分布式强化学习方法[J].中国公路学报,2024,37 (7): 250-263. [WANG F J, FAN C R, ZHOU B, et al. Traffic signal decentralized reinforcement learning method based on a multi-perspective spatio-temporal hierarchical structure[J]. China Journal of Highway and Transport, 2024, 37(7): 250-263.]

[9]马东方,陈曦,吴晓东,等.基于强化学习的干线信号混合协同优化方法[J]. 交通运输系统工程与信息, 2022, 22(2): 145-153. [MA D F, CHEN X, WU X D, et al. Mixed-coordinated decision-making method for arterial signals based on reinforcement learning[J]. Journal of Transportation Systems Engineering and Information Technology, 2022, 22(2): 145-153.]

[10] MA D, ZHOU B, SONG X, et al. A deep reinforcement learning approach to traffic signal control with temporal traffic pattern mining[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(8): 11789- 11800.

[11] 张玺君, 聂生元,李喆,等.基于自注意力机制的深度强化学习交通信号控制[J].交通运输系统工程与信息, 2024, 24(2): 96-104. [ZHANG X J, NIE S Y, LI Z, et al. Traffic signal control with deep reinforcement learning and self-attention mechanism[J]. Journal of Transportation Systems Engineering and Information Technology, 2024, 24(2): 96-104.]

[12] 王庞伟, 王思淼,雷方舒,等.混合动作表示强化学习下的城市交叉口智慧信控方法[J].交通运输系统工程与信息,2025, 25(4): 73-83. [WANG P W, WANG S M, LEI F S, et al. Intelligent signal control method under hybrid action representation reinforcement learning for urban intersections[J]. Journal of Transportation Systems Engineering and Information Technology, 2025, 25(4): 73-83.]

[13] FAN C, WANG F, ZHOU B, et al. A centralized reinforcement learning-based method for traffic signal optimization using an adaptive sequential decision[J]. IEEE Transactions on Intelligent Transportation Systems, 2025, 26(9): 13201-13216.

[1]	蒋贤才, 曲悦, 魏贺迪. 人机混合驾驶协同合流的多智能体近端策略优化算法[J]. 交通运输系统工程与信息, 2026, 26(1): 65-75.
[2]	谢金苹, 崔洪军, 朱敏清, 马新卫. 不同共乘参与程度下异构自动驾驶车队规模研究[J]. 交通运输系统工程与信息, 2026, 26(1): 76-89.
[3]	甘佐贤, 刘雅新, 秦严严. 面向混合交通流的逆向可变车道左转车辆轨迹优化控制[J]. 交通运输系统工程与信息, 2026, 26(1): 90-103.
[4]	李熙莹, 陈泽, 李锦, 刘静宇, 盘婳燕, 江倩殷. 融合时序感知与边界损失的车辆轨迹重构方法[J]. 交通运输系统工程与信息, 2026, 26(1): 104-114.
[5]	马健霄, 王雨, 陆涛, 白莹佳, 王羽尘, 赵顗. 融合多源势场的公路作业区智能驾驶动态路径规划[J]. 交通运输系统工程与信息, 2026, 26(1): 115-124.
[6]	尚婷, 胥浩, 毛慧涵, 何军. 低能见度下智能汽车双因子避障轨迹优化研究[J]. 交通运输系统工程与信息, 2026, 26(1): 125-134.
[7]	单肖年, 胡颖, 成嘉琪, 田大新. 网联混行下城市干道动态公交专用道管控策略[J]. 交通运输系统工程与信息, 2026, 26(1): 135-147.
[8]	李琼, 林若雪, 汪勇杰, 陈艳. 面向多目标的旅游客运车辆生态驾驶策略优化[J]. 交通运输系统工程与信息, 2026, 26(1): 205-216.
[9]	杜太升, 彭正鍾, 张源凯, 田琼, 蒋晓桐. 考虑乘客偏好的需求响应定制公交线路优化[J]. 交通运输系统工程与信息, 2026, 26(1): 217-227.
[10]	谭一帆, 唐瑞雪, 姚志洪, 蒲云. 稀疏轨迹下结构-行为联合建模的生成式路径推理[J]. 交通运输系统工程与信息, 2026, 26(1): 252-260.
[11]	杨洋, 陈冠华, 王明涛, 黄海博. 考虑数据平衡影响的道路交通事故建模与致因分析[J]. 交通运输系统工程与信息, 2026, 26(1): 261-269.
[12]	缪鸿志, 王俊朋, 吴佳雨, 李歆蔚, 郑建风. 考虑交通-电力两网交互的电动集卡超充设施规划方法[J]. 交通运输系统工程与信息, 2026, 26(1): 305-317.
[13]	邹复民, 陈培烨, 蔡祈钦, 廖律超, 罗永煜. 基于两级自适应空间建模的多视距高速公路交通流预测[J]. 交通运输系统工程与信息, 2026, 26(1): 318-328.
[14]	郑展骥, 吴程宇, 王振科, 饶嘉强, 凃强, 徐进. 城市地下道路接入口驾驶人眼动特征影响机制研究[J]. 交通运输系统工程与信息, 2026, 26(1): 329-339.
[15]	杨海飞, 唐勇, 郭延永, 李红伟, 赵恩泽. 可接受间距策略下网联车辆队列安全优化控制方法[J]. 交通运输系统工程与信息, 2025, 25(6): 50-61.

混合交通环境下基于动态决策间隔的强化学习信号控制方法

A Reinforcement Learning Signal Control Method Based on Dynamic Decision Intervals in Mixed Traffic Environments

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics