交通运输系统工程与信息 ›› 2026, Vol. 26 ›› Issue (3): 192-202.DOI: 10.16097/j.cnki.1009-6744.2026.03.018

• 智能交通系统与信息技术 • 上一篇    下一篇

基于双智能体的交通信号与车辆轨迹动态权重联合优化

蒋贤才* ,魏贺迪,张馨月   

  1. 东北林业大学,土木与交通学院,哈尔滨150040
  • 收稿日期:2025-10-28 修回日期:2025-11-18 接受日期:2025-11-27 出版日期:2026-06-25 发布日期:2026-06-23
  • 作者简介: 蒋贤才(1974—),男,重庆人,教授,博士。
  • 基金资助:
    黑龙江省自然科学基金 (PL2024E012)。

Joint Optimization of Traffic Signals and Vehicle Trajectory Dynamic Weights Based on Dual-agent

JIANG Xiancai*, WEI Hedi, ZHANG Xinyue   

  1. School of Civil Engineering and Transportation, Northeast Forestry University, Harbin 150040, China
  • Received:2025-10-28 Revised:2025-11-18 Accepted:2025-11-27 Online:2026-06-25 Published:2026-06-23
  • Supported by:
    Heilongjiang Provincial Natural Science Foundation of China (PL2024E012)。

摘要: 既有基于深度强化学习的交通信号与网联自动驾驶汽车(ComnectedandAutomatedVehicles,CAV)轨迹联合优化存在优化目标冲突问题,为此,本文设计信号灯与CAV双智能体,提出基于双深度Q网络(Double Deep Q-Network, DDQN)的动态权重联合优化框架(Dymnamic-weichted joint Opimizaion with DDON, DOP-DDON)。通过构建融合信号灯与CAV的联合奖励函数,实现交通信号对通行效率以及CAV对安全、能耗和效率的平衡;借助模糊逻辑动态调节联合奖励中信号灯与CAV的权重,将目标冲突转化为优先级动态分配问题,以自适应交通状态调整优化的侧重目标。仿真结果表明:DOP-DDQN较MaxPressure,平均排队、行程时间和油耗分别降低15.97%~23.74%、8.69%~9.89%和4.19%~9.53%;相较于其他同类方法,上述3个指标则分别下降4.06%~15.19%、2.30%~6.62%和1.42%~5.11%。进一步研究表明,DOP-DDON控制成效随CAV渗透率升高显著增强,但当CAV渗透率超过0.6后,控制成效提升幅度趋于平缓。

关键词: 智能交通, 动态权重联合优化, 双深度Q网络, 交通信号, CAV轨迹

Abstract: There is an optimization objective conflict issue in the joint optimization of traffic signals and trajectories of connected and automated vehicles (CAV) based on deep reinforcement learning. To address this conflict, a dual-agent system consisting of traffic signals and CAVs is designed, and a dynamic weight joint optimization framework (Dynamic-Weighted Optimization into Positioning with Double Deep Q-Network, DOP-DDQN) is proposed. By constructing a joint reward function that integrates traffic signals and CAVs, the balance between the efficiency of traffic signal and CAV on safety, energy consumption, and efficiency is achieved. The weights of traffic signals and CAVs in the joint reward are dynamically adjusted by using fuzzy logic, which convert the objective conflict into a dynamic priority allocation problem, and thus enable the adaptive adjustment of the optimization focus based on traffic conditions. Simulation results show that compared to MaxPressure, DOP-DDQN reduces average queue length, travel time, and fuel consumption by 15.97%~23.74%, 8.69%~9.89%, and 4.19%~9.53%, respectively. Compared to other similar methods, these three indicators decrease by 4.06%~15.19%, 2.30%~6.62%, and 1.42%~5.11%, respectively. Further research indicates that the control effectiveness of DOP-DDQN significantly increases with the penetration rate of CAVs, but the improvement tends to level off when the penetration rate exceeds 0.6.

Key words: intelligent transportation, dynamic weight joint optimization, dual deep Q-network, traffic signal, CAV trajectory

中图分类号: