交通运输系统工程与信息 ›› 2025, Vol. 25 ›› Issue (4): 254-264.DOI: 10.16097/j.cnki.1009-6744.2025.04.023

• 系统工程理论与方法 • 上一篇    下一篇

考虑负载不确定性的城轨列车目标速度曲线实时鲁棒优化

朱琴跃* ,李纪元,李泓羿,钱舒杨,赵亚辉   

  1. 同济大学,电子与信息工程学院,上海201804
  • 收稿日期:2025-03-07 修回日期:2025-05-26 接受日期:2025-06-03 出版日期:2025-08-25 发布日期:2025-08-25
  • 作者简介:朱琴跃(1970—),女,江苏无锡人,教授,博士。

Real-time Robust Optimization of Target Speed Profiles for Urban Rail Trains Considering Load Uncertainty

ZHU Qinyue*, LI Jiyuan, LI Hongyi, QIAN Shuyang, ZHAO Yahui   

  1. School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
  • Received:2025-03-07 Revised:2025-05-26 Accepted:2025-06-03 Online:2025-08-25 Published:2025-08-25

摘要: 针对城轨列车自动驾驶过程中乘客负载不确定性变化对列车正常运行的影响,本文提出一种考虑负载不确定性的列车目标速度曲线实时鲁棒优化方法,包括模型设计、模型训练和模型验证这3个部分。首先,基于马尔可夫决策过程构建列车驾驶强化学习模型,其奖励设计兼顾对性能指标和操纵策略的鲁棒优化;其次,使用基于潜力奖励塑形技术(Potential-Based Reward Shaping,PBRS)改善模型训练的收敛性能,通过深度Q网络(DeepQ-Network, DQN)估计价值函数,实现实时响应城轨列车负载的变化;最后,以北京地铁某运营线路的列车运行场景为仿真案例验证模型的有效性。仿真结果表明,DQN-PBRS算法的平均计算时长为26ms,可实现实时生成列车的目标速度,生成的目标速度曲线在极端负载和负载变化情况下相较于DQN算法具有更好的鲁棒性,且列车运行能耗降低5%以上。通过对算法中关键超参数进行敏感性分析,确定了训练效果最优的超参数组合。

关键词: 铁路运输, 目标速度曲线优化, 深度强化学习, 城轨列车, 负载不确定性

Abstract: To address the impact of uncertain passenger load variations on train operation in urban rail transit, this paper proposed a real-time robust optimization method for target speed profiles considering load uncertainty. The implementation included three parts: model design, model training, and model validation. First, a reinforcement learning model for train operation was developed based on the Markov decision process, with reward design balancing the robust optimization of performance metrics and control strategies. Second, the model training convergence performance was enhanced by employing the Potential-Based Reward Shaping (PBRS) technology. Real-time response to passenger load changes was achieved through the Deep Q-Network (DQN) value function estimation. At last, the effectiveness of the model was validated via simulation cases based on train operation scenarios of a Beijing subway line. The simulation results show that the DQN-PBRS algorithm achieves an average computation time of 26 millisecond, enabling real-time generation of target speeds. The generated speed profiles exhibit better robustness under extreme load and load variation conditions compared to the DQN algorithm, while also reducing energy consumption by more than 5%. By conducting a sensitivity analysis of key hyperparameters in the algorithm, the optimal hyperparameter combination for the best training performance was determined.

Key words: railway transportation, target speed profile optimization, deep reinforcement learning, urban rail train, load uncertainty

中图分类号: