交通运输系统工程与信息 ›› 2026, Vol. 26 ›› Issue (3): 338-347.DOI: 10.16097/j.cnki.1009-6744.2026.03.030

• 系统工程理论与方法 • 上一篇    下一篇

基于强化学习的地铁站客流动态管控策略研究

刘少博*1,2 ,苏蔚1,2   

  1. 1. 武汉理工大学,智能交通系统研究中心,武汉430063;2.交通信息与安全教育部工程研究中心,武汉430063
  • 收稿日期:2026-02-09 修回日期:2026-03-26 接受日期:2026-04-20 出版日期:2026-06-25 发布日期:2026-06-23
  • 作者简介:刘少博(1985—),男,河南洛阳人,副教授,博士。
  • 基金资助:
    国家自然科学基金(52172308);湖北省交通运输厅科技项目 (2025-69-3-6)。

Reinforcement Learning-based Dynamic Control Strategies for Metro Station Pedestrian Flows

LIU Shaobo*1,2, SU Wei1,2   

  1. 1. Intelligent Transportation Systems Research Center, Wuhan University of Technology, Wuhan 430063, China; 2. Engineering Research Center of Transportation Information and Safety, Ministry of Education, Wuhan 430063, China
  • Received:2026-02-09 Revised:2026-03-26 Accepted:2026-04-20 Online:2026-06-25 Published:2026-06-23
  • Supported by:
    National Natural Science Foundation of China(52172308);Department of Transport of Hubei Province, China (2025-69-3-6)。

摘要: 聚焦地铁站内高峰时段及突发大客流状况下的行人流动态协同管控问题,本文提出一种基于强化学习的运营策略自适应调整方法,将站内行人流管控问题转化为马尔可夫决策过程。针对传统固定策略难以应对客流实时波动与拥堵不确定性的挑战,本文构建了包含关键区域密度与乘客平均通行时间的状态空间,并设计兼顾安全、效率与经济性的多目标奖励函数。通过近端策略优化(Proximal Policy Optimization, PPO)算法,模型能够在线学习并动态调整入口客流管制、发车间隔和增设引导栏杆等策略的组合。将强化学习应用于站内行人流协同管控,实现从静态离线优化到动态在线决策的转变。实验对比强化学习与离线优化方案,PPO策略可降低高密度风险与服务时延,其中安检区高密度情况占比由22%降至10%,安检排队时间大于3min情况的占比由23%降至13%。可在确保安全阈值的同时提升通行效率,并兼顾相对运营成本。

关键词: 智能交通, 行人流管控, 深度强化学习, 行人流仿真建模, 近端策略优化

Abstract: This study addresses the dynamic and coordinated control of pedestrian flow within metro stations, specifically during peak hours and under the conditions of sudden passenger surges. A station-level pedestrian flow control as a Markov decision process is formulated, a state space is constructed centered on key-area crowd density and travel time, and a multi-objective design is rewarded which jointly account for safety, efficiency, and operating cost. Using the Proximal Policy Optimization (PPO) within a simulation environment, the agent performs online joint optimization over entrance inflow control, train headway adjustment, queue-guidance barrier deployment, and other strategies. The application of reinforcement learning enables coordinated pedestrian flow control through dynamic online decision-making, rather than traditional static offline optimization. Compared with a static optimization benchmark, the PPO policy reduces high-density risk and service delays: the share of high-density states in the security screening area decreases from 22% to 10%, and the proportion of passengers with security queue times more than 3 minutes drops from 23% to 13%. These results indicate that the proposed method improves the operational efficiency while it satisfies safety thresholds and balancing relative operating costs.

Key words: intelligent transportation, pedestrian crowd management, deep reinforcement learning, pedestrian flow modeling and simulation, proximal policy optimization

中图分类号: