交通运输系统工程与信息 ›› 2024, Vol. 24 ›› Issue (2): 249-262.DOI: 10.16097/j.cnki.1009-6744.2024.02.025

• 系统工程理论与方法 • 上一篇    下一篇

基于可解释机器学习框架的列车乘车区段客流分布预测方法

孙国锋1a,景云*1a,1b,李和壁2,田志强3a,3b,田小鹏3a   

  1. 1. 北京交通大学,a.交通运输学院,b.智慧高速铁路系统前沿科学中心,北京100044;2.中国铁道科学研究院集团有限公司,铁道科学技术研究发展中心,北京100081;3.兰州交通大学,a.交通运输学院, b. 高原铁路运输智慧管控铁路行业重点实验室,兰州730070
  • 收稿日期:2023-11-14 修回日期:2024-02-16 接受日期:2024-02-23 出版日期:2024-04-25 发布日期:2024-04-25
  • 作者简介:孙国锋(1994- ),男,甘肃通渭人,博士生。
  • 基金资助:
    国家自然科学基金 (52372300, 72161023);中央高校基本科研业务费专项资金(2023YJS146)。

An Interpretable Machine Learning Framework-based Approach for Predicting Passenger Flow Distribution in Train Riding Sections

SUNGuofeng1a, JING Yun*1a,1b, LI Hebi2, TIAN Zhiqiang3a,3b, TIAN Xiaopeng3a   

  1. 1a. School of Traffic and Transportation, 1b. Frontiers Science Center for Smart High-speed Railway Systems, Beijing Jiaotong University, Beijing 100044, China; 2. Railway Science and Technology Research and Development Center, China Academy of Railway Sciences Corporation Limited, Beijing 100081, China; 3a. School of Traffic and Transportation, 3b. Key Laboratory of Railway Industry on Plateau Railway Transportation Intelligent Management and Control, Lanzhou Jiaotong University, Lanzhou 730070, China
  • Received:2023-11-14 Revised:2024-02-16 Accepted:2024-02-23 Online:2024-04-25 Published:2024-04-25
  • Supported by:
    NationalNaturalScienceFoundation of China (52372300, 72161023);Fundamental Research Funds for the Central Universities of Ministry of Education of China (2023YJS146)。

摘要: 为解释客运产品特征对列车乘车区段客流分布预测的影响,本文提出一种基于可解释机器学习框架的高速铁路列车乘车区段客流分布预测方法。首先,提出基于梯度提升树模型的高速铁路列车乘车区段客流分布预测框架,构建不同梯度提升树模型(GBDT、XGBoost、LightGBM及CatBoost)的高速铁路列车乘车区段客流分布预测模型;其次,计算特征贡献重要度,基于SHAP(SHapleyAdditive exPlanations)方法实现特征变量优化,揭示单一特征和交互特征与列车乘车区段客流分布预测的非线性关系。北京南—上海虹桥间列车客流分布预测结果表明:4种模型可精准预测客流分布结果,GBDT,XGBoost,LightGBM及CatBoost在测试集的决定系数分别为 0.9664,0.9601,0.9680及0.9715;特征优化后,按贡献重要度排序依次为标杆车,票价,旅行时间,日期,星期,车次及出发时间;特征优化后,CatBoost-7模型在验证集中的决定系数为0.9458;日期和标杆车对客流分布预测呈现非线性正相关,旅行时间对客流分布预测呈现非线性负相关,低旅行时间、高票价及出发时间整点的标杆车对客流分布预测产生正向影响。本文研究结果能够为高速铁路客运产品设计提供一定参考价值。

关键词: 铁路运输, 客流分布预测, 可解释机器学习, 列车乘车区段, 非线性关系

Abstract: In order to clarify the impact of railway passenger transportation services on the prediction of passenger flow distribution, we propose a method based on an interpretable machine learning framework to predict passenger flow distribution in high-speed railway sections. First, we propose a framework capable of predicting passenger flow distribution in sections by using gradient-boosted tree models. Meanwhile, we construct different gradient-boosted tree models, including GBDT, XGBoost, LightGBM, and CatBoost. Secondly, the importance of feature contributions and feature variables are calculated using the SHapley Additive exPlanations (SHAP) method. A non-linear relationship between different features and passenger flow distribution is revealed. The experiment from Beijing South to Shanghai Hongqiao shows that all four models accurately predict the distribution. The coefficients of determination for GBDT, XGBoost, LightGBM, and CatBoost in the test set are 0.9664, 0.9601, 0.9680, and 0.9715 respectively. After optimizing the features, the order of importance in the contribution is as follows: benchmark train, ticket price, travel time, date, day of the week, and train code departure time. The coefficient of determination for the CatBoost-7 model in the validation set after feature optimization is 0.9458. Both the date and the benchmark train show a non-linear positive correlation with the passenger flow distribution prediction, while the travel time shows a non- linear negative correlation. In addition, low travel time, high ticket price and the benchmark train departing exactly at the scheduled departure time positively influence the passenger flow distribution prediction. This study provides valuable insights into the design of high-speed rail passenger transportation services.

Key words: railway transportation, passenger flow distribution forecast, interpretable machine learning, train-riding segments, non-linear relationship

中图分类号: