交通运输系统工程与信息 ›› 2026, Vol. 26 ›› Issue (1): 65-75.DOI: 10.16097/j.cnki.1009-6744.2026.01.007

• 智能交通系统与信息技术 • 上一篇    下一篇

人机混合驾驶协同合流的多智能体近端策略优化算法

蒋贤才*,曲悦,魏贺迪   

  1. 东北林业大学,土木与交通学院,哈尔滨150040
  • 收稿日期:2025-08-19 修回日期:2025-08-31 接受日期:2025-09-04 出版日期:2026-02-25 发布日期:2026-02-13
  • 作者简介:蒋贤才(1974—),男,重庆梁平人,教授。
  • 基金资助:
    黑龙江省自然科学基金(PL2024E012)。

Multi-agent Proximal Policy Optimization Algorithm for Collaborative Merging of Human-Machine Hybrid Driving

JIANG Xiancai*, QU Yue, WEI Hedi   

  1. School of Civil Engineering and Transportation, Northeast Forestry University, Harbin 150040, China
  • Received:2025-08-19 Revised:2025-08-31 Accepted:2025-09-04 Online:2026-02-25 Published:2026-02-13
  • Supported by:
    Heilongjiang Provincial Natural Science Foundation of China (PL2024E012)。

摘要: 为平衡高速公路合流区网联自动驾驶车辆(CAV)与人工驾驶车辆(HDV)协同控制的安全与效率问题,本文以多智能体近端策略优化(Multi-Agent Proximal Policy Optimization, MAPPO)算法为基础,引入静态与动态双层动作掩码过滤规则,建立基于任务紧迫性、空间临界性和时间风险性的优先级指数,并采用近端策略优化(Proximal Policy Optimization, PPO)裁剪与广义优势估计(Generalized Advantage Estimation,GAE)长程收益估计优化“策略-价值网络”协同机制,提出融合优先级安全监管与动作掩码的混合交通协同合流多智能体近端策略优化算法——Priority-SAAM MAPPO。仿真结果表明,Priority-SAAM MAPPO在基础及复杂异构场景中的学习收敛性好,策略与价值网络协同优化稳定;安全性能方面,基础异构场景碰撞风险率低于4%,较MAPPO下降了50%,复杂异构场景碰撞风险率约8%,优于MAPPO(12%)和QMIX(一种基于单调价值函数分解的深度多智能体强化学习算法,18%);效率表现上,平均奖励均高于基准算法,且合流区时空密度从无序波动转为规则分布,交通流有序性显著增强,验证了其在合流区混合交通流协同控制中的有效性和鲁棒性。进一步分析表明,Priority-SAAM MAPPO适用于高交通密度和HDV行为异构性强的混合交通流合流控制。

关键词: 智能交通, 混合交通, 多智能体近端策略优化, 协同控制, 合流区

Abstract: To address safety and efficiency challenges in cooperative control between connected and automated vehicles (CAVs) and human-driven vehicles (HDVs) at expressway merging areas, this paper proposes a Priority SAAM MAPPO algorithm that integrates priority safety supervision and action masking for collaborative control of mixed traffic flow in merging areas based on the Multi Agent Proximal Policy Optimization. This algorithm introduces the filtering rules of static and dynamic double-layer action mask, and establishes a priority index based on task urgency, spatial criticality and time risk, and optimizes the collaborative mechanism of strategy and value network by the PPO pruning and GAE long-range benefit estimation. The simulation results show that Priority SAAM MAPPO has a good learning convergence in basic and complex heterogeneous scenarios, and has a stable collaborative optimization on the networks of strategy and value. In terms of safety performance, the collision risk rate in basic heterogeneous scenarios is less than 4%, which is half that of MAPPO. The collision risk rate in complex heterogeneous scenarios is about 8%, which is better than MAPPO (12%) and QMIX (18%). In terms of efficiency performance, the average reward is higher than that of in the benchmark algorithm. The spatiotemporal density in merging area has changed from disorderly fluctuations to regular distribution, and the orderliness of traffic flow is significantly enhanced, which verifies its effectiveness and robustness in the collaborative control of mixed traffic flow in merging areas. Further analysis indicates that Priority SAAM MAPPOis suitable for merging control of mixed traffic flows with high traffic density and strong heterogeneity of HDV behavior.

Key words: intelligent transportation, mixed traffic, multi-agent proximal policy optimization, cooperative control, on-ramp merging area

中图分类号: