人机混合驾驶协同合流的多智能体近端策略优化算法

doi:10.16097/j.cnki.1009-6744.2026.01.007

交通运输系统工程与信息 ›› 2026, Vol. 26 ›› Issue (1): 65-75.DOI: 10.16097/j.cnki.1009-6744.2026.01.007

• 智能交通系统与信息技术 • 上一篇下一篇

人机混合驾驶协同合流的多智能体近端策略优化算法

蒋贤才^*，曲悦，魏贺迪

东北林业大学，土木与交通学院，哈尔滨150040

收稿日期:2025-08-19 修回日期:2025-08-31 接受日期:2025-09-04 出版日期:2026-02-25 发布日期:2026-02-13
作者简介:蒋贤才（1974—），男，重庆梁平人，教授。
基金资助:
黑龙江省自然科学基金(PL2024E012)。

Multi-agent Proximal Policy Optimization Algorithm for Collaborative Merging of Human-Machine Hybrid Driving

JIANG Xiancai^*, QU Yue, WEI Hedi

School of Civil Engineering and Transportation, Northeast Forestry University, Harbin 150040, China

Received:2025-08-19 Revised:2025-08-31 Accepted:2025-09-04 Online:2026-02-25 Published:2026-02-13
Supported by:
Heilongjiang Provincial Natural Science Foundation of China (PL2024E012)。

摘要/Abstract

摘要： 为平衡高速公路合流区网联自动驾驶车辆（CAV）与人工驾驶车辆（HDV）协同控制的安全与效率问题，本文以多智能体近端策略优化（Multi-Agent Proximal Policy Optimization, MAPPO）算法为基础，引入静态与动态双层动作掩码过滤规则，建立基于任务紧迫性、空间临界性和时间风险性的优先级指数，并采用近端策略优化（Proximal Policy Optimization, PPO）裁剪与广义优势估计（Generalized Advantage Estimation,GAE）长程收益估计优化“策略-价值网络”协同机制，提出融合优先级安全监管与动作掩码的混合交通协同合流多智能体近端策略优化算法——Priority-SAAM MAPPO。仿真结果表明，Priority-SAAM MAPPO在基础及复杂异构场景中的学习收敛性好，策略与价值网络协同优化稳定；安全性能方面，基础异构场景碰撞风险率低于4%，较MAPPO下降了50%，复杂异构场景碰撞风险率约8%，优于MAPPO（12%）和QMIX（一种基于单调价值函数分解的深度多智能体强化学习算法，18%）；效率表现上，平均奖励均高于基准算法，且合流区时空密度从无序波动转为规则分布，交通流有序性显著增强，验证了其在合流区混合交通流协同控制中的有效性和鲁棒性。进一步分析表明，Priority-SAAM MAPPO适用于高交通密度和HDV行为异构性强的混合交通流合流控制。

关键词: 智能交通, 混合交通, 多智能体近端策略优化, 协同控制, 合流区

Abstract: To address safety and efficiency challenges in cooperative control between connected and automated vehicles (CAVs) and human-driven vehicles (HDVs) at expressway merging areas, this paper proposes a Priority SAAM MAPPO algorithm that integrates priority safety supervision and action masking for collaborative control of mixed traffic flow in merging areas based on the Multi Agent Proximal Policy Optimization. This algorithm introduces the filtering rules of static and dynamic double-layer action mask, and establishes a priority index based on task urgency, spatial criticality and time risk, and optimizes the collaborative mechanism of strategy and value network by the PPO pruning and GAE long-range benefit estimation. The simulation results show that Priority SAAM MAPPO has a good learning convergence in basic and complex heterogeneous scenarios, and has a stable collaborative optimization on the networks of strategy and value. In terms of safety performance, the collision risk rate in basic heterogeneous scenarios is less than 4%, which is half that of MAPPO. The collision risk rate in complex heterogeneous scenarios is about 8%, which is better than MAPPO (12%) and QMIX (18%). In terms of efficiency performance, the average reward is higher than that of in the benchmark algorithm. The spatiotemporal density in merging area has changed from disorderly fluctuations to regular distribution, and the orderliness of traffic flow is significantly enhanced, which verifies its effectiveness and robustness in the collaborative control of mixed traffic flow in merging areas. Further analysis indicates that Priority SAAM MAPPOis suitable for merging control of mixed traffic flows with high traffic density and strong heterogeneity of HDV behavior.

Key words: intelligent transportation, mixed traffic, multi-agent proximal policy optimization, cooperative control, on-ramp merging area

中图分类号:

U491

蒋贤才, 曲悦, 魏贺迪. 人机混合驾驶协同合流的多智能体近端策略优化算法[J]. 交通运输系统工程与信息, 2026, 26(1): 65-75.

JIANG Xiancai, QU Yue, WEI Hedi. Multi-agent Proximal Policy Optimization Algorithm for Collaborative Merging of Human-Machine Hybrid Driving[J]. Journal of Transportation Systems Engineering and Information Technology, 2026, 26(1): 65-75.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: http://www.tseit.org.cn/CN/10.16097/j.cnki.1009-6744.2026.01.007

http://www.tseit.org.cn/CN/Y2026/V26/I1/65

参考文献

[1] ZHOU Y, CHUNG E, BHASKAR A, et al. A state- constrained optimal control based trajectory planning strategy for cooperative freeway mainline facilitating and on-ramp merging maneuvers under congested traffic[J]. Transportation Research Part C: Emerging Technologies, 2019, 109: 321-342.

[2]梁伟,张毅,胡坚明.基于局部连通性的在途动态路径诱导方法[J]. 交通运输系统工程与信息,2018,18(1): 59-65. [LIANG W, ZHANG Y, HU J M, et al. A dynamic route guidance method based on local connectivity[J]. Journal of Transportation Systems Engineering and Information Technology, 2018, 18(1): 59-65.]

[3]HU X, SUN J. Trajectory optimization of connected and autonomous vehicles at a multilane freeway merging area [J]. Transportation Research Part C: Emerging Technologies, 2019, 101: 111-125.

[4]汪洪波,王春阳,赵林峰,等.基于强化学习的智能车辆路径跟踪变参数MPC多目标控制[J].中国公路学报, 2024, 37(3): 157-169. [WANG H B, WANG C Y, ZHAO L F, et al. A multi-objective control of variable parameter MPC for intelligent vehicle path tracking based on reinforcement learning[J]. China Journal of Highway and Transport, 2024, 37(3): 157-169.]

[5]LI M, LI Z, WANG S, et al. Enhancing cooperation of vehicle merging control in heavy traffic using communication-based soft actor-critic algorithm[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(6): 6491-6506.

[6]赵聪，张昕源，李兴华，等.基于多智能体深度强化学习的停车系统智能延时匹配方法[J].中国公路学报，2022, 35(7): 261-272. [ZHAO C, ZHANG X Y, LI X H, et al. A method of intelligent delay matching for parking system based on multi-agent deep reinforcement learning [J]. China Journal of Highway and Transport, 2022, 35 (7): 261-272.]

[7] GUO W R, LIU G J, ZHOU Z Y, et al. Enhancing the robustness of QMIX against state-adversarial attacks[J]. Neurocomputing, 2024, 572: 127191.

[8]CHEN D, HAJIDAVALLOO M R, LI Z, et al. Deep multi-agent reinforcement learning for highway on-ramp merging in mixed traffic[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(11): 11623- 11638.

[9]DING H, DI Y, ZHENG X, et al. Automated cooperative control of multilane freeway merging areas in connected and autonomous vehicle environments[J]. Transportmetrica B: Transport Dynamics, 2021, 9(1): 437-455.

[10] LIU J, HANG P, NA X, et al. Cooperative decision making for CAVs at unsignalized intersections: A MARL approach with attention and hierarchical game priors[J]. IEEE Transactions on Intelligent Transportation Systems, 2025, 26(1): 443-456.

[11] 郝威, 龚雅馨,张兆磊,等.面向高速公路混合交通流的车辆协同合流策略[J].交通运输系统工程与信息,2023, 23(1): 224-235. [HAO W, GONG Y X, ZHANG Z L, et al. A cooperative merging strategy for vehicles in mixed traffic flow on expressways[J]. Journal of Transportation Systems Engineering and Information Technology, 2023, 23(1): 224-235.]

[12] WANG F, REN B M, LIU Y, et al. Tracking moving target for 6 degree-of-freedom robot manipulator with adaptive visual servoing based on deep reinforcement learning PID controller[J]. Review of Scientific Instruments, 2022, 93(4): 045108.

[13] 牟轩庭, 张宏军,廖湘琳,等.规则引导的智能体决策框架[J]. 计算机技术与发展, 2022, 32(10): 156-163. [MU XT, ZHANG HJ, LIAO X L, et al. Rule-guided decision framework for agents[J]. Computer Technology and Development, 2022, 32(10): 156-163.]

[14] CHU T, WANG J, CODECA L, et al. Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(3): 1086-1095.

[15] WANG J, CHEN Y, JI X, et al. Metaverse meets intelligent transportation system: An efficient and instructional visual perception framework[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(10): 14986-15001.

人机混合驾驶协同合流的多智能体近端策略优化算法

Multi-agent Proximal Policy Optimization Algorithm for Collaborative Merging of Human-Machine Hybrid Driving

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	王福建, 马佳豪, 李廷浩, 马东方. 混合交通环境下基于动态决策间隔的强化学习信号控制方法[J]. 交通运输系统工程与信息, 2026, 26(1): 45-54.
[2]	谢金苹, 崔洪军, 朱敏清, 马新卫. 不同共乘参与程度下异构自动驾驶车队规模研究[J]. 交通运输系统工程与信息, 2026, 26(1): 76-89.
[3]	甘佐贤, 刘雅新, 秦严严. 面向混合交通流的逆向可变车道左转车辆轨迹优化控制[J]. 交通运输系统工程与信息, 2026, 26(1): 90-103.
[4]	李熙莹, 陈泽, 李锦, 刘静宇, 盘婳燕, 江倩殷. 融合时序感知与边界损失的车辆轨迹重构方法[J]. 交通运输系统工程与信息, 2026, 26(1): 104-114.
[5]	马健霄, 王雨, 陆涛, 白莹佳, 王羽尘, 赵顗. 融合多源势场的公路作业区智能驾驶动态路径规划[J]. 交通运输系统工程与信息, 2026, 26(1): 115-124.
[6]	尚婷, 胥浩, 毛慧涵, 何军. 低能见度下智能汽车双因子避障轨迹优化研究[J]. 交通运输系统工程与信息, 2026, 26(1): 125-134.
[7]	单肖年, 胡颖, 成嘉琪, 田大新. 网联混行下城市干道动态公交专用道管控策略[J]. 交通运输系统工程与信息, 2026, 26(1): 135-147.
[8]	谭一帆, 唐瑞雪, 姚志洪, 蒲云. 稀疏轨迹下结构-行为联合建模的生成式路径推理[J]. 交通运输系统工程与信息, 2026, 26(1): 252-260.
[9]	缪鸿志, 王俊朋, 吴佳雨, 李歆蔚, 郑建风. 考虑交通-电力两网交互的电动集卡超充设施规划方法[J]. 交通运输系统工程与信息, 2026, 26(1): 305-317.
[10]	邹复民, 陈培烨, 蔡祈钦, 廖律超, 罗永煜. 基于两级自适应空间建模的多视距高速公路交通流预测[J]. 交通运输系统工程与信息, 2026, 26(1): 318-328.
[11]	杨海飞, 唐勇, 郭延永, 李红伟, 赵恩泽. 可接受间距策略下网联车辆队列安全优化控制方法[J]. 交通运输系统工程与信息, 2025, 25(6): 50-61.
[12]	何永明, 卢杨彭, 刘汇洋, 李鑫然. 超高速公路智能网联车辆编队策略研究[J]. 交通运输系统工程与信息, 2025, 25(6): 62-73.
[13]	辛琪, 王彦锋, 王智龙, 王畅, 牛世峰. 考虑车辆运动状态信息特性的自由换道意图识别模型[J]. 交通运输系统工程与信息, 2025, 25(6): 74-86.
[14]	李雪玮, 孙齐, 刘筱萌, 康学建, 赵晓华. 基于冲突理论的平视显示行人预警系统安全效用建模分析[J]. 交通运输系统工程与信息, 2025, 25(6): 350-359.
[15]	田大新, 肖啸, 周建山. AI驱动的自动驾驶汽车轨迹预测方法综述[J]. 交通运输系统工程与信息, 2025, 25(5): 1-24.