混合动作表示强化学习下的城市交叉口智慧信控方法

doi:10.16097/j.cnki.1009-6744.2025.04.008

交通运输系统工程与信息 ›› 2025, Vol. 25 ›› Issue (4): 73-83.DOI: 10.16097/j.cnki.1009-6744.2025.04.008

• 智能交通系统与信息技术 • 上一篇下一篇

混合动作表示强化学习下的城市交叉口智慧信控方法

王庞伟^*1 ，王思淼¹ ，雷方舒² ，徐京辉¹，王子鹏¹，王力¹

1. 北方工业大学，城市道路交通智能控制技术北京市重点实验室，北京100144； 2. 北京交通发展研究院，城市交通运行仿真与决策支持北京市重点实验室，北京100073

收稿日期:2025-05-22 修回日期:2025-06-16 接受日期:2025-06-20 出版日期:2025-08-25 发布日期:2025-08-25
作者简介:王庞伟(1982—)，男，山西阳泉人，教授，博士。
基金资助:
车路一体智能交通全国重点实验室开放基金课题(2024-A001)；科技部雄安新区科技创新专项课题 (2022XAGG0126)。

Intelligent Signal Control Method Under HybridAction Representation Reinforcement Learning for Urban Intersections

WANG Pangwei^*1, WANG Simiao¹, LEI Fangshu², XU Jinghui¹, WANG Zipeng¹, WANG Li¹

1. Beijing Key Lab of Urban Intelligent Traffic Control Technology, North China University of Technology, Beijing 100144, China; 2. Beijing Key Laboratory of Urban Transport Simulation and Decision Making Support, Beijing Transport Institute, Beijing 100073, China

Received:2025-05-22 Revised:2025-06-16 Accepted:2025-06-20 Online:2025-08-25 Published:2025-08-25
Supported by:
Project of State Key Lab of Intelligent Transportation System (2024-A001)；Science and Technology Innovation Program of Xiongan New Area (2022XAGG0126)。

摘要/Abstract

摘要： 针对城市交叉口环境下单一离散或连续信号控制动作难以充分应对交通流量时空变化，以及现有强化学习方法无法同时解决混合动作空间中的可扩展性和动作依赖性问题，本文提出一种基于混合动作表示强化学习的城市交叉口智慧信控方法。首先，将交叉口智能体的动作空间设置为离散化信号相位选择及其相对应的连续性绿灯持续时间，并进行状态空间与奖励函数的一致性设计；其次，应用离散动作嵌入表和条件变分自编码器构建连续可解码的表示空间，将原始混合动作策略学习问题转化为隐式动作表示空间中的连续策略学习问题；再次，使用近端策略优化方法进行隐式动作表示空间策略训练，并通过解码器将输出动作解码为原始混合动作，与环境进行实时交互；最后，基于北京市高级别自动驾驶示范区实际数据开展测试验证。通过不同时段对比测试结果表明，本文所提方法相比于最优基准模型平均延误时间、平均排队长度和平均停车次数分别降低了2.57%~14.84%，4.00%~9.15%和7.25%~20.69%，达到了良好的城市交叉口信控优化效果。

关键词: 智能交通, 交通信号控制, 表示学习, 混合动作空间, 近端策略优化

Abstract: Traditional traffic signal control based on either discrete or continuous actions often fails to adapt to the spatiotemporal variability of traffic flows in urban intersections. Existing reinforcement learning (RL) approaches are unable to manage hybrid action spaces effectively, particularly with respect to scalability and interdependence among actions. To address these challenges, this paper proposes a novel hybrid action representation reinforcement learning method for intelligent traffic signal control at urban intersections. Firstly, the action space of each intersection agent is formulated as a combination of the selection of discrete signal phases and the corresponding continuous duration of green lights, through a consistent design for state and reward space. Secondly, a conditional variational autoencoder (CVAE) is employed alongside a discrete action embedding table to encode the original hybrid action space into a continuous latent representation, thus transforming the hybrid policy learning problem into a tractable continuous policy optimization task. Thirdly, the proximal policy optimization (PPO) method is then used to train policies within the latent space, and then the learned actions are decoded back into the original hybrid action domain for real-time interaction with the environment. Finally, experimental evaluations, using real-world data from the Beijing High-Level Autonomous Driving Demonstration Zone, show that the proposed approach reduces the average delay time, average queue length and average number of stops by 2.57% to 14.84%, 4.00% to 9.15%, 7.25% to 20.69%, respectively, which demonstrates the effectiveness of proposed approach in optimizing urban intersection control compared to the state-of-the-art benchmark models.

Key words: intelligent transportation, traffic signal control, representation learning, hybrid action space, proximal policy optimization

中图分类号:

U491

王庞伟, 王思淼, 雷方舒, 徐京辉, 王子鹏, 王力. 混合动作表示强化学习下的城市交叉口智慧信控方法[J]. 交通运输系统工程与信息, 2025, 25(4): 73-83.

WANG Pangwei, WANG Simiao, LEI Fangshu, XU Jinghui, WANG Zipeng, WANG Li. Intelligent Signal Control Method Under HybridAction Representation Reinforcement Learning for Urban Intersections[J]. Journal of Transportation Systems Engineering and Information Technology, 2025, 25(4): 73-83.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: http://www.tseit.org.cn/CN/10.16097/j.cnki.1009-6744.2025.04.008

http://www.tseit.org.cn/CN/Y2025/V25/I4/73

参考文献

[1]REN F Y, DONG W, ZHAO X D, et al. Two-layer coordinated reinforcement learning for traffic signal control in traffic network[J]. Expert Systems with Applications, 2024, 235: 121111.

[2] 张玺君,聂生元,李喆,等.基于自注意力机制的深度强化学习交通信号控制[J].交通运输系统工程与信息, 2024, 24(2): 96-104. [ZHANG X J, NIE S Y, LI Z, et al. Traffic signal control with deep reinforcement learning and self-attention mechanism[J]. Journal of Transportation Systems Engineering and Information Technology, 2024, 24(2): 96-104.]

[3] ZHOU B, ZHOU Q, HU S, et al. Cooperative traffic signal control using a distributed agent-based deep reinforcement learning with incentive communication[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(8): 10147-10160.

[4] ZENG J, XIN J, CONG Y, et al. Halight: Hierarchical deep reinforcement learning for cooperative arterial traffic signal control with cycle strategy[C]. Macau: 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), 2022.

[5]ASLANI M, MESGARI M S, WIERING M. Adaptive traffic signal control with actor-critic methods in a real world traffic network with different traffic disruption events[J]. Transportation Research Part C: Emerging Technologies, 2017, 85: 732-752.

[6]陈喜群,朱奕璋,吕朝锋,等.基于混合近端策略优化的交叉口信号相位与配时优化方法[J].交通运输系统工程与信息,2023,23(1): 106-113. [CHEN X Q, ZHU Y Z, LV C F, et al. Signal phase and timing optimization method for intersection based on hybrid proximal policy optimization[J]. Journal of Transportation Systems Engineering and Information Technology, 2023, 23(1): 106-113.]

[7] LUO H Q, BIE Y M, JIN S. Reinforcement learning for traffic signal control in hybrid action space[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(6): 5225-5241.

[8] BOUKTIF S, CHENIKI A, OUNI A. Traffic signal control using hybrid action space deep reinforcement learning[J]. Sensors, 2021, 21(7): 2302.

[9] SADEGHI M, LEGLAIVE S, ALAMEDA-PINEDA X, et Audio-visual speech enhancement using conditional variational auto-encoders[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1788-1800.

[10] MASSON W, RANCHOD P, KONIDARIS G. Reinforcement learning with parameterized actions[C]. Phoenix: Proceedings of the AAAI Conference on Artificial Intelligence, 2016.

[11] ZHENG G J, ZANG X S, XU N, et al. Diagnosing reinforcement learning for traffic signal control[J]. arXiv Preprint arXiv:1905.04716, 2019.

[12] BOUKTIF S, CHENIKI A, OUNI A, et al. Deep reinforcement learning for traffic signal control with consistent state and reward design approach[J]. Knowledge-Based Systems, 2023, 267: 110440.

[13] WANG T, ZHU Z P, ZHANG J, et al. A large-scale traffic signal control algorithm based on multi-layer graph deep reinforcement learning[J]. Transportation Research Part C: Emerging Technologies, 2024, 162: 104582.

[14] LI B Y, TANG H Y, ZHENG Y, et al. HyAR: Addressing discrete-continuous action reinforcement learning via hybrid action representation[C]//Proceedings of the International Conference on Learning Representations (ICLR), 2022.

[15] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv Preprint arXiv: 1707.06347, 2017.

混合动作表示强化学习下的城市交叉口智慧信控方法

Intelligent Signal Control Method Under HybridAction Representation Reinforcement Learning for Urban Intersections

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	姚振兴, 刘贤, 赵一飞, 王亮, 王彦琛. 手机信令不均匀定位下出行端点自适应识别方法[J]. 交通运输系统工程与信息, 2025, 25(4): 44-52.
[2]	张鹏, 李兴旺, 姬炳豪, 孙超, 李文权. 路口重复放行的公交与社会车辆协同绿波优化模型[J]. 交通运输系统工程与信息, 2025, 25(4): 53-62.
[3]	代亮, 杜鹏飞, 黄自彬, 杨朋博. 基于深度强化学习的城市交通信号分层协同控制方法[J]. 交通运输系统工程与信息, 2025, 25(4): 63-72.
[4]	王连震, 沈超文, 王宇萍, 薛淑祺. 网联高速公路合流区基于间隙优化的车辆协同控制方法[J]. 交通运输系统工程与信息, 2025, 25(4): 84-95.
[5]	王维锋, 黄建鑫, 王晓全, 吴昕韩, 卞子馨. 基于无锚旋转框的航拍图像车辆全向检测方法[J]. 交通运输系统工程与信息, 2025, 25(4): 104-115.
[6]	陈峥, 张景, 陈博闻, 李春宇, 郭凤香, 魏福星. 基于异构多图时空融合的长时域车辆轨迹预测[J]. 交通运输系统工程与信息, 2025, 25(4): 126-136.
[7]	王祥, 任浩, 谭国真, 李健平, 王珏, 王妍力. 大语言模型协同强化学习的自动驾驶决策方法[J]. 交通运输系统工程与信息, 2025, 25(4): 137-146.
[8]	郑展骥, 冯昌奎, 赵杨洋, 凃强, 张河山, 徐进. 无人机航拍视角下密集场景非机动车小目标检测方法[J]. 交通运输系统工程与信息, 2025, 25(4): 147-161.
[9]	吴剑凡, 谢征宇, 秦勇, 王力, 王佳丽. 基于计算机视觉的地铁车站内乘客异常行为检测模型[J]. 交通运输系统工程与信息, 2025, 25(4): 162-174.
[10]	宋翠颖, 丁杰, 张春波. 模块化公交车辆调度研究综述[J]. 交通运输系统工程与信息, 2025, 25(4): 175-192.
[11]	谢秉磊, 冯健茜, 秦筱然. 多特征融合的网约车拼车起讫点需求时空预测[J]. 交通运输系统工程与信息, 2025, 25(4): 193-205.
[12]	陈喜群, 祝文琪, 吕朝锋. 融合轨迹时序与行为修正的车辆冲突风险预测[J]. 交通运输系统工程与信息, 2025, 25(4): 219-229.
[13]	高远, 付金龙, 冯文文. 考虑时空特征动态耦合的车辆轨迹预测方法[J]. 交通运输系统工程与信息, 2025, 25(3): 107-116.
[14]	赵霞, 李之红, 刘剑锋, 杨静, 吴梦琳, 秦伊萌. 行为模式时空动态超图聚类的公共交通异常团体检测[J]. 交通运输系统工程与信息, 2025, 25(3): 132-141.
[15]	常文文, 芦家磊, 黄霄, 闫光辉. 融合递归图的脑电驾驶行为分类方法研究[J]. 交通运输系统工程与信息, 2025, 25(3): 152-162.