基于强化学习的地铁站客流动态管控策略研究

doi:10.16097/j.cnki.1009-6744.2026.03.030

交通运输系统工程与信息 ›› 2026, Vol. 26 ›› Issue (3): 338-347.DOI: 10.16097/j.cnki.1009-6744.2026.03.030

基于强化学习的地铁站客流动态管控策略研究

刘少博^*1,2，苏蔚^1,2

1. 武汉理工大学，智能交通系统研究中心，武汉430063；2.交通信息与安全教育部工程研究中心，武汉430063

收稿日期:2026-02-09 修回日期:2026-03-26 接受日期:2026-04-20 出版日期:2026-06-25 发布日期:2026-06-23
作者简介:刘少博（1985—），男，河南洛阳人，副教授，博士。
基金资助:
国家自然科学基金(52172308)；湖北省交通运输厅科技项目 (2025-69-3-6)。

Reinforcement Learning-based Dynamic Control Strategies for Metro Station Pedestrian Flows

LIU Shaobo^*1,2, SU Wei^1,2

1. Intelligent Transportation Systems Research Center, Wuhan University of Technology, Wuhan 430063, China; 2. Engineering Research Center of Transportation Information and Safety, Ministry of Education, Wuhan 430063, China

Received:2026-02-09 Revised:2026-03-26 Accepted:2026-04-20 Online:2026-06-25 Published:2026-06-23
Supported by:
National Natural Science Foundation of China(52172308)；Department of Transport of Hubei Province, China (2025-69-3-6)。

摘要/Abstract

摘要： 聚焦地铁站内高峰时段及突发大客流状况下的行人流动态协同管控问题，本文提出一种基于强化学习的运营策略自适应调整方法，将站内行人流管控问题转化为马尔可夫决策过程。针对传统固定策略难以应对客流实时波动与拥堵不确定性的挑战，本文构建了包含关键区域密度与乘客平均通行时间的状态空间，并设计兼顾安全、效率与经济性的多目标奖励函数。通过近端策略优化（Proximal Policy Optimization, PPO）算法，模型能够在线学习并动态调整入口客流管制、发车间隔和增设引导栏杆等策略的组合。将强化学习应用于站内行人流协同管控，实现从静态离线优化到动态在线决策的转变。实验对比强化学习与离线优化方案，PPO策略可降低高密度风险与服务时延，其中安检区高密度情况占比由22%降至10%，安检排队时间大于3min情况的占比由23%降至13%。可在确保安全阈值的同时提升通行效率，并兼顾相对运营成本。

关键词: 智能交通, 行人流管控, 深度强化学习, 行人流仿真建模, 近端策略优化

Abstract: This study addresses the dynamic and coordinated control of pedestrian flow within metro stations, specifically during peak hours and under the conditions of sudden passenger surges. A station-level pedestrian flow control as a Markov decision process is formulated, a state space is constructed centered on key-area crowd density and travel time, and a multi-objective design is rewarded which jointly account for safety, efficiency, and operating cost. Using the Proximal Policy Optimization (PPO) within a simulation environment, the agent performs online joint optimization over entrance inflow control, train headway adjustment, queue-guidance barrier deployment, and other strategies. The application of reinforcement learning enables coordinated pedestrian flow control through dynamic online decision-making, rather than traditional static offline optimization. Compared with a static optimization benchmark, the PPO policy reduces high-density risk and service delays: the share of high-density states in the security screening area decreases from 22% to 10%, and the proportion of passengers with security queue times more than 3 minutes drops from 23% to 13%. These results indicate that the proposed method improves the operational efficiency while it satisfies safety thresholds and balancing relative operating costs.

Key words: intelligent transportation, pedestrian crowd management, deep reinforcement learning, pedestrian flow modeling and simulation, proximal policy optimization

中图分类号:

U293.13

刘少博, 苏蔚. 基于强化学习的地铁站客流动态管控策略研究[J]. 交通运输系统工程与信息, 2026, 26(3): 338-347.

LIU Shaobo, SU Wei. Reinforcement Learning-based Dynamic Control Strategies for Metro Station Pedestrian Flows[J]. Journal of Transportation Systems Engineering and Information Technology, 2026, 26(3): 338-347.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: http://www.tseit.org.cn/CN/10.16097/j.cnki.1009-6744.2026.03.030

http://www.tseit.org.cn/CN/Y2026/V26/I3/338

参考文献

[1]郭浩,李世中,王辛岩.基于AnyLogic的城市轨道车站站厅布局优化研究[J].物流科技,2025, 48(9): 85-88, 95. [GUO H, LI S Z, WANG X Y. Research on optimizing the layout of urban rail station halls based on Anylogic[J]. Logistics Sci-Tech, 2025, 48(9): 85-88, 95.]

[2] 李建华,陈伟,陈祥儒.基于AnyLogic的人流增多地铁站系统再优化[J].科学技术与工程,2020,20(33): 13847 13851. [LI J H, CHEN W, CHEN X R. Simulation and optimization of metro station system based on Anylogic [J]. Science Technology and Engineering, 2020, 20(33): 13847-13851.]

[3] 杨天阳,朱志国.基于Anylogic的地铁车站通道设施设备规模与布局分析[J]. 交通运输工程与信息学报, 2017, 15(1): 115-121. [YANG T Y, ZHU Z G. Scale and layout analyses of subway station facilities based on Anylogic software[J]. Journal of Transportation Engineering and Information, 2017, 15(1): 115-121.]

[4]李昌宇,曹忠伟,张若楠.基于AnyLogic的地铁换乘站客流组织仿真与优化[J].现代城市轨道交通,2025(2): 123-129. [LI C Y, CAO Z W, ZHANG R N. Simulation and optimization of passenger flow organization in metro transfer station based on AnyLogic[J]. Modern Urban Transit, 2025(2): 123-129.]

[5]ZHANG J, AI Q, YE Y, et al. Dynamic flow analysis and crowd management for transfer stations: A case study of Suzhou Metro[J]. Public Transport, 2024, 16(2): 619-653.

[6]CHEN B, GAO C, ZHANG L, et al. Optimal control algorithm for subway train operation by proximal policy optimization[J]. Applied Sciences, 2023, 13(13): 7456.

[7] 李茜,李蔚,曹悦,等.基于DDDQN的城轨列车节能运行控制方法研究[J]. 铁道科学与工程学报,2024,21 (12): 4960-4970. [LI Q, LI W, CAO Y, et al. Research on energy-saving operation control method of urban rail train based on DDDQN[J]. Journal of Railway Science and Engineering, 2024, 21(12): 4960-4970.]

[8]ZHONG J, HE Z, WANG J, et al. A hierarchical framework for passenger inflow control in metro system with reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(10): 10895 10911.

[9] 马飞虎,陈晓燕,孙翠羽,等.高速公路收费站动态车道配置的在线优化方法[J].交通运输系统工程与信息, 2025, 25(5): 333-342. [MA F H, CHEN X Y, SUN C Y, et al. Online optimization methods for dynamic lane configuration at highway toll plazas[J]. Journal of Transportation Systems Engineering and Information Technology, 2025, 25(5): 333-342.]

[10] WANG Y, YU H, LUO Y, et al. Research on passenger flow control plans for a metro station based on social force model[J]. Promet-Traffic & Transportation, 2023, 35 (3): 422-433.

[1]	黄凯, 谢子俊, 李濠宇, 刘芯彤, 刘志远. 基于锚点分类改进的自动驾驶车道线检测研究[J]. 交通运输系统工程与信息, 2026, 26(3): 14-24.
[2]	王正礼, 唐子墨, 郑振杰. 融合检索增强生成与思维链的交通事故责任认定大模型构建[J]. 交通运输系统工程与信息, 2026, 26(3): 83-92.
[3]	姜晓红, 仲韫豪, 肖靖沂, 邢吉平, 李家伟, 华晶雯. 考虑即时零售订单的城乡公交响应式客货联运调度[J]. 交通运输系统工程与信息, 2026, 26(3): 156-165.
[4]	蒋贤才, 魏贺迪, 张馨月. 基于双智能体的交通信号与车辆轨迹动态权重联合优化[J]. 交通运输系统工程与信息, 2026, 26(3): 192-202.
[5]	徐晓美, 符蒙, 赵峻伟, 张涌. 混合交通流场景下基于改进人工势场法的队列协同换道控制[J]. 交通运输系统工程与信息, 2026, 26(3): 214-225.
[6]	张建华, 公佳豪, 张文会. 智能网联汽车复用公交车道协同控制研究[J]. 交通运输系统工程与信息, 2026, 26(3): 226-234.
[7]	曹倩霞, 陈世文, 吕松涛, 王大为. 复杂交通场景基于边缘特征增强的长队列排队检测[J]. 交通运输系统工程与信息, 2026, 26(3): 235-246.
[8]	郑展骥, 廖方正, 李燊, 冯昌奎, 凃强, 张河山, 徐进. 无人机夜间航拍视角下小目标车辆精确检测方法[J]. 交通运输系统工程与信息, 2026, 26(3): 247-258.
[9]	田君豪, 邢璐, 廖世豪, 桂瑰 , 蒋小晴. 基于地理自编码与跨域迁移的公交出行需求分层聚类方法[J]. 交通运输系统工程与信息, 2026, 26(3): 302-314.
[10]	王亦兵, 陈安妮, PAPAGEORGIOU Markos, 余宏鑫, 郭静秋, 章立辉. 智能网联无车道线城市道路内边界与交通信号协同控制[J]. 交通运输系统工程与信息, 2026, 26(2): 81-90.
[11]	谷远利, 宇泓儒, 陈龙, 邓社军, 陆文琦. 网联自动驾驶车辆专用车道动态宏微观协同部署方法[J]. 交通运输系统工程与信息, 2026, 26(2): 125-136.
[12]	孙健, 纪裕伟, 于珂伟, 李子豪, 赵昱霖. 融合双注意力机制的快速路协同深度强化学习方法[J]. 交通运输系统工程与信息, 2026, 26(2): 137-147.
[13]	万平, 陈盈, 邓鑫焰, 马晓凤. 基于高斯混合隐马尔可夫模型的路怒攻击性驾驶行为辨识[J]. 交通运输系统工程与信息, 2026, 26(2): 268-279.
[14]	刘涛, 李林. 融合贝叶斯优化与深度学习的机场公交专线短时客流预测[J]. 交通运输系统工程与信息, 2026, 26(2): 300-308.
[15]	王福建, 马佳豪, 李廷浩, 马东方. 混合交通环境下基于动态决策间隔的强化学习信号控制方法[J]. 交通运输系统工程与信息, 2026, 26(1): 45-54.

基于强化学习的地铁站客流动态管控策略研究

Reinforcement Learning-based Dynamic Control Strategies for Metro Station Pedestrian Flows

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics