交通运输系统工程与信息 ›› 2023, Vol. 23 ›› Issue (5): 279-289.DOI: 10.16097/j.cnki.1009-6744.2023.05.029

• 运输组织优化理论与方法 • 上一篇    下一篇

基于强化学习的干扰条件下高速铁路时刻表调整研究

庞子帅a,王丽雯a,彭其渊*a,b   

  1. 西南交通大学,a. 交通运输与物流学院;b. 综合交通运输国家地方联合工程实验室,成都 611756
  • 收稿日期:2023-06-30 修回日期:2023-08-30 接受日期:2023-09-05 出版日期:2023-10-25 发布日期:2023-10-23
  • 作者简介:庞子帅(1988- ),男,山东菏泽人,博士生
  • 基金资助:
    国家重点研发计划(2022YFB4300502)

High-speed Railway Timetable Rescheduling Under Random Interruptions Based on Reinforcement Learning

PANG Zi-shuaia, WANG Li-wena, PENG Qi-yuan*a,b   

  1. a. School of Transportation and Logistics; b. National United Engineering Laboratory of Integrated and Intelligent Transportation, Southwest Jiaotong University, Chengdu 611756, China
  • Received:2023-06-30 Revised:2023-08-30 Accepted:2023-09-05 Online:2023-10-25 Published:2023-10-23
  • Supported by:
    National Key Research and Development Program of China (2022YFB4300502)

摘要: 研究干扰条件下列车时刻表调整对提高高速铁路实时调度指挥决策水平和行车组织效率具有重要意义。本文基于数据驱动的优化方法研究干扰条件下列车时刻表调整,旨在提升时刻表调整模型实时应用效果。考虑列车运行约束,以列车晚点时间最小为目标,基于强化学习近端策略优化(Proximal Policy Optimization, PPO)模型提出列车时刻表实时调整方法。建立列车运行仿真环境,PPO智能体与环境不断交互贪婪搜索使目标函数最优的策略。分别使用随机干扰案例和我国武广高速铁路实际数据中干扰案例测试PPO模型的性能及效率。结果表明:PPO模型优于其他常见的强化学习模型,以及调度员现场决策方案(由历史数据获得),PPO模型至少可减少13%的 列车晚点时间;PPO模型收敛速度明显优于其他常用强化学习模型;PPO得到解的质量与最优解仅相差约2%,且相比于得到最优解的速度具有明显提升,使其能更好地应用于实时决策。

关键词: 铁路运输, 时刻表调整, PPO模型, 高速列车, 列车运行干扰

Abstract: Research on high-speed train timetable rescheduling under interruption conditions is of significant importance for enhancing the real-time dispatching capabilities of railways and optimizing train operation efficiency. This study employs a data-driven optimization approach, specifically deep reinforcement learning, to explore methods for reconstructing train operation trajectories under interruptions. Using the Proximal Policy Optimization (PPO) model while considering train operation constraints, we propose a train rescheduling approach to minimize train delays. We establish a train operation simulation environment where the PPO intelligent agent continuously interacts with the environment, seeking the optimal strategy with minimal delay. To evaluate the PPO model's performance and efficiency, we conduct tests using scenarios involving random interruptions and actual data from the Wuhan-Guangzhou high-speed railway in China. The verification results demonstrate that the train rescheduling scheme derived from the PPO model outperforms those obtained from other common reinforcement learning models and even the decisions made by on-site dispatchers. It can reduce train delays by about 13%. PPO exhibits significantly faster convergence compared to other commonly used reinforcement learning models. Although the solution quality obtained by PPO is about 2% less than the optimal solution, the PPO model has a significant improvement in the computation speed of obtaining the near-optimal solution. This makes it a more practical choice for real-time decision-making.

Key words: railway transportation, timetable rescheduling, proximal policy optimization (PPO), high-speed railway; train operation interruptions

中图分类号: