基于双智能体的交通信号与车辆轨迹动态权重联合优化

doi:10.16097/j.cnki.1009-6744.2026.03.018

交通运输系统工程与信息 ›› 2026, Vol. 26 ›› Issue (3): 192-202.DOI: 10.16097/j.cnki.1009-6744.2026.03.018

• 智能交通系统与信息技术 • 上一篇下一篇

基于双智能体的交通信号与车辆轨迹动态权重联合优化

蒋贤才^* ，魏贺迪，张馨月

东北林业大学，土木与交通学院，哈尔滨150040

收稿日期:2025-10-28 修回日期:2025-11-18 接受日期:2025-11-27 出版日期:2026-06-25 发布日期:2026-06-23
作者简介: 蒋贤才（1974—），男，重庆人，教授，博士。
基金资助:
黑龙江省自然科学基金 (PL2024E012)。

Joint Optimization of Traffic Signals and Vehicle Trajectory Dynamic Weights Based on Dual-agent

JIANG Xiancai^*, WEI Hedi, ZHANG Xinyue

School of Civil Engineering and Transportation, Northeast Forestry University, Harbin 150040, China

Received:2025-10-28 Revised:2025-11-18 Accepted:2025-11-27 Online:2026-06-25 Published:2026-06-23
Supported by:
Heilongjiang Provincial Natural Science Foundation of China (PL2024E012)。

摘要/Abstract

摘要： 既有基于深度强化学习的交通信号与网联自动驾驶汽车（ComnectedandAutomatedVehicles,CAV）轨迹联合优化存在优化目标冲突问题，为此，本文设计信号灯与CAV双智能体，提出基于双深度Q网络（Double Deep Q-Network, DDQN）的动态权重联合优化框架（Dymnamic-weichted joint Opimizaion with DDON, DOP-DDON）。通过构建融合信号灯与CAV的联合奖励函数，实现交通信号对通行效率以及CAV对安全、能耗和效率的平衡；借助模糊逻辑动态调节联合奖励中信号灯与CAV的权重，将目标冲突转化为优先级动态分配问题，以自适应交通状态调整优化的侧重目标。仿真结果表明：DOP-DDQN较MaxPressure，平均排队、行程时间和油耗分别降低15.97%~23.74%、8.69%~9.89%和4.19%~9.53%；相较于其他同类方法，上述3个指标则分别下降4.06%~15.19%、2.30%~6.62%和1.42%~5.11%。进一步研究表明，DOP-DDON控制成效随CAV渗透率升高显著增强，但当CAV渗透率超过0.6后，控制成效提升幅度趋于平缓。

关键词: 智能交通, 动态权重联合优化, 双深度Q网络, 交通信号, CAV轨迹

Abstract: There is an optimization objective conflict issue in the joint optimization of traffic signals and trajectories of connected and automated vehicles (CAV) based on deep reinforcement learning. To address this conflict, a dual-agent system consisting of traffic signals and CAVs is designed, and a dynamic weight joint optimization framework (Dynamic-Weighted Optimization into Positioning with Double Deep Q-Network, DOP-DDQN) is proposed. By constructing a joint reward function that integrates traffic signals and CAVs, the balance between the efficiency of traffic signal and CAV on safety, energy consumption, and efficiency is achieved. The weights of traffic signals and CAVs in the joint reward are dynamically adjusted by using fuzzy logic, which convert the objective conflict into a dynamic priority allocation problem, and thus enable the adaptive adjustment of the optimization focus based on traffic conditions. Simulation results show that compared to MaxPressure, DOP-DDQN reduces average queue length, travel time, and fuel consumption by 15.97%~23.74%, 8.69%~9.89%, and 4.19%~9.53%, respectively. Compared to other similar methods, these three indicators decrease by 4.06%~15.19%, 2.30%~6.62%, and 1.42%~5.11%, respectively. Further research indicates that the control effectiveness of DOP-DDQN significantly increases with the penetration rate of CAVs, but the improvement tends to level off when the penetration rate exceeds 0.6.

Key words: intelligent transportation, dynamic weight joint optimization, dual deep Q-network, traffic signal, CAV trajectory

中图分类号:

U491

蒋贤才, 魏贺迪, 张馨月. 基于双智能体的交通信号与车辆轨迹动态权重联合优化[J]. 交通运输系统工程与信息, 2026, 26(3): 192-202.

JIANG Xiancai, WEI Hedi, ZHANG Xinyue. Joint Optimization of Traffic Signals and Vehicle Trajectory Dynamic Weights Based on Dual-agent[J]. Journal of Transportation Systems Engineering and Information Technology, 2026, 26(3): 192-202.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: http://www.tseit.org.cn/CN/10.16097/j.cnki.1009-6744.2026.03.018

http://www.tseit.org.cn/CN/Y2026/V26/I3/192

参考文献

[1] LIU J, QIN S, LUO Y, et al. Intelligent traffic light control by exploring strategies in an optimised space of deep Q-learning[J]. IEEE Transactions on Vehicular Technology, 2022, 71(6): 5960-5970.

[2]MA D, ZHOU B, SONG X, et al. A deep reinforcement learning approach to traffic signal control with temporal traffic pattern mining[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 11789 11800.

[3]FENG Y, HE D, GUAN Y. Composite platoon trajectory planning strategy for intersection throughput maximization[J]. IEEE Transactions on Vehicular Technology, 2019, 68(7): 6305-6319.

[4]RAJA G, KOTTURSAMY K, DEV K, et al. Blockchain integrated multiagent deep reinforcement learning for securing cooperative adaptive cruise control[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(7): 9630-9639

[5]YANG Z, ZHENG Z, KIM J, et al. Eco-driving strategies using reinforcement learning for mixed traffic in the vicinity of signalized intersections[J]. Transportation Research Part C: Emerging Technologies, 2024, 165: 104683.

[6]GUOZ, WUY,WANGL, et al. Heuristic-based multi agent deep reinforcement learning approach for coordinating connected and automated vehicles at non signalized intersection[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(11): 16235 16248.

[7]ZHOU D, HANG P, SUN J. Reasoning graph-based reinforcement learning to cooperate mixed connected and autonomous traffic at unsignalized intersections[J]. Transportation Research Part C: Emerging Technologies, 2024, 167: 104807.

[8]GUO J, CHENG L, WANG S. CoTV: Cooperative control for traffic light signals and connected autonomous vehicles using deep reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(10): 10501-10512.

[9] HUANG H, HU Z, LI M, et al. Cooperative optimization of traffic signals and vehicle speed using a novel multi agent deep reinforcement learning[J]. IEEE Transactions on Vehicular Technology, 2024, 73(6): 7785-7798.

[10] WEI H, CHEN C, ZHENG G, et al. PressLight: Learning max pressure control to coordinate traffic signals in arterial network[C]//Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019: 1290-1298.

[11] GU H, WANG S, MA X, et al. Large-Scale traffic signal control using constrained network partition and adaptive deep reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(7): 7619 7632.

[1]	黄凯, 谢子俊, 李濠宇, 刘芯彤, 刘志远. 基于锚点分类改进的自动驾驶车道线检测研究[J]. 交通运输系统工程与信息, 2026, 26(3): 14-24.
[2]	王正礼, 唐子墨, 郑振杰. 融合检索增强生成与思维链的交通事故责任认定大模型构建[J]. 交通运输系统工程与信息, 2026, 26(3): 83-92.
[3]	姜晓红, 仲韫豪, 肖靖沂, 邢吉平, 李家伟, 华晶雯. 考虑即时零售订单的城乡公交响应式客货联运调度[J]. 交通运输系统工程与信息, 2026, 26(3): 156-165.
[4]	陈予禾, 徐新忠, 姜雯文, 阙恒荣, 王屏. 面向多目标协调的细粒度交通信号智能体建模与控制方法[J]. 交通运输系统工程与信息, 2026, 26(3): 203-213.
[5]	徐晓美, 符蒙, 赵峻伟, 张涌. 混合交通流场景下基于改进人工势场法的队列协同换道控制[J]. 交通运输系统工程与信息, 2026, 26(3): 214-225.
[6]	张建华, 公佳豪, 张文会. 智能网联汽车复用公交车道协同控制研究[J]. 交通运输系统工程与信息, 2026, 26(3): 226-234.
[7]	曹倩霞, 陈世文, 吕松涛, 王大为. 复杂交通场景基于边缘特征增强的长队列排队检测[J]. 交通运输系统工程与信息, 2026, 26(3): 235-246.
[8]	郑展骥, 廖方正, 李燊, 冯昌奎, 凃强, 张河山, 徐进. 无人机夜间航拍视角下小目标车辆精确检测方法[J]. 交通运输系统工程与信息, 2026, 26(3): 247-258.
[9]	田君豪, 邢璐, 廖世豪, 桂瑰 , 蒋小晴. 基于地理自编码与跨域迁移的公交出行需求分层聚类方法[J]. 交通运输系统工程与信息, 2026, 26(3): 302-314.
[10]	刘少博, 苏蔚. 基于强化学习的地铁站客流动态管控策略研究[J]. 交通运输系统工程与信息, 2026, 26(3): 338-347.
[11]	王亦兵, 陈安妮, PAPAGEORGIOU Markos, 余宏鑫, 郭静秋, 章立辉. 智能网联无车道线城市道路内边界与交通信号协同控制[J]. 交通运输系统工程与信息, 2026, 26(2): 81-90.
[12]	谷远利, 宇泓儒, 陈龙, 邓社军, 陆文琦. 网联自动驾驶车辆专用车道动态宏微观协同部署方法[J]. 交通运输系统工程与信息, 2026, 26(2): 125-136.
[13]	孙健, 纪裕伟, 于珂伟, 李子豪, 赵昱霖. 融合双注意力机制的快速路协同深度强化学习方法[J]. 交通运输系统工程与信息, 2026, 26(2): 137-147.
[14]	万平, 陈盈, 邓鑫焰, 马晓凤. 基于高斯混合隐马尔可夫模型的路怒攻击性驾驶行为辨识[J]. 交通运输系统工程与信息, 2026, 26(2): 268-279.
[15]	刘涛, 李林. 融合贝叶斯优化与深度学习的机场公交专线短时客流预测[J]. 交通运输系统工程与信息, 2026, 26(2): 300-308.

基于双智能体的交通信号与车辆轨迹动态权重联合优化

Joint Optimization of Traffic Signals and Vehicle Trajectory Dynamic Weights Based on Dual-agent

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics