交通运输系统工程与信息 ›› 2025, Vol. 25 ›› Issue (4): 73-83.DOI: 10.16097/j.cnki.1009-6744.2025.04.008

• 智能交通系统与信息技术 • 上一篇    下一篇

混合动作表示强化学习下的城市交叉口智慧信控方法

王庞伟*1 ,王思淼1 ,雷方舒2 ,徐京辉1 ,王子鹏1 ,王力1   

  1. 1. 北方工业大学,城市道路交通智能控制技术北京市重点实验室,北京100144; 2. 北京交通发展研究院,城市交通运行仿真与决策支持北京市重点实验室,北京100073
  • 收稿日期:2025-05-22 修回日期:2025-06-16 接受日期:2025-06-20 出版日期:2025-08-25 发布日期:2025-08-25
  • 作者简介:王庞伟(1982—),男,山西阳泉人,教授,博士。
  • 基金资助:
    车路一体智能交通全国重点实验室开放基金课题(2024-A001);科技部雄安新区科技创新专项课题 (2022XAGG0126)。

Intelligent Signal Control Method Under HybridAction Representation Reinforcement Learning for Urban Intersections

WANG Pangwei*1, WANG Simiao1, LEI Fangshu2, XU Jinghui1, WANG Zipeng1, WANG Li1   

  1. 1. Beijing Key Lab of Urban Intelligent Traffic Control Technology, North China University of Technology, Beijing 100144, China; 2. Beijing Key Laboratory of Urban Transport Simulation and Decision Making Support, Beijing Transport Institute, Beijing 100073, China
  • Received:2025-05-22 Revised:2025-06-16 Accepted:2025-06-20 Online:2025-08-25 Published:2025-08-25
  • Supported by:
    Project of State Key Lab of Intelligent Transportation System (2024-A001);Science and Technology Innovation Program of Xiongan New Area (2022XAGG0126)。

摘要: 针对城市交叉口环境下单一离散或连续信号控制动作难以充分应对交通流量时空变化,以及现有强化学习方法无法同时解决混合动作空间中的可扩展性和动作依赖性问题,本文提出一种基于混合动作表示强化学习的城市交叉口智慧信控方法。首先,将交叉口智能体的动作空间设置为离散化信号相位选择及其相对应的连续性绿灯持续时间,并进行状态空间与奖励函数的一致性设计;其次,应用离散动作嵌入表和条件变分自编码器构建连续可解码的表示空间,将原始混合动作策略学习问题转化为隐式动作表示空间中的连续策略学习问题;再次,使用近端策略优化方法进行隐式动作表示空间策略训练,并通过解码器将输出动作解码为原始混合动作,与环境进行实时交互;最后,基于北京市高级别自动驾驶示范区实际数据开展测试验证。通过不同时段对比测试结果表明,本文所提方法相比于最优基准模型平均延误时间、平均排队长度和平均停车次数分别降低了2.57%~14.84%,4.00%~9.15%和7.25%~20.69%,达到了良好的城市交叉口信控优化效果。

关键词: 智能交通, 交通信号控制, 表示学习, 混合动作空间, 近端策略优化

Abstract: Traditional traffic signal control based on either discrete or continuous actions often fails to adapt to the spatiotemporal variability of traffic flows in urban intersections. Existing reinforcement learning (RL) approaches are unable to manage hybrid action spaces effectively, particularly with respect to scalability and interdependence among actions. To address these challenges, this paper proposes a novel hybrid action representation reinforcement learning method for intelligent traffic signal control at urban intersections. Firstly, the action space of each intersection agent is formulated as a combination of the selection of discrete signal phases and the corresponding continuous duration of green lights, through a consistent design for state and reward space. Secondly, a conditional variational autoencoder (CVAE) is employed alongside a discrete action embedding table to encode the original hybrid action space into a continuous latent representation, thus transforming the hybrid policy learning problem into a tractable continuous policy optimization task. Thirdly, the proximal policy optimization (PPO) method is then used to train policies within the latent space, and then the learned actions are decoded back into the original hybrid action domain for real-time interaction with the environment. Finally, experimental evaluations, using real-world data from the Beijing High-Level Autonomous Driving Demonstration Zone, show that the proposed approach reduces the average delay time, average queue length and average number of stops by 2.57% to 14.84%, 4.00% to 9.15%, 7.25% to 20.69%, respectively, which demonstrates the effectiveness of proposed approach in optimizing urban intersection control compared to the state-of-the-art benchmark models.

Key words: intelligent transportation, traffic signal control, representation learning, hybrid action space, proximal policy optimization

中图分类号: