交通运输系统工程与信息 ›› 2025, Vol. 25 ›› Issue (5): 236-247.DOI: 10.16097/j.cnki.1009-6744.2025.05.021

• 系统工程理论与方法 • 上一篇    下一篇

高铁列车晚点数据的异常检测与增量聚类方法

沈鹏举1a,1b,宋丽英*1a,1b,林川倩2   

  1. 1. 北京交通大学,a.交通运输学院,b.综合交通运输大数据应用技术交通运输行业重点实验室,北京100044;2. 国家铁路局,运输监督管理司,北京100891
  • 收稿日期:2025-06-20 修回日期:2025-08-18 接受日期:2025-08-21 出版日期:2025-10-25 发布日期:2025-10-25
  • 作者简介:沈鹏举(2000— ),男,陕西渭南人,博士生。
  • 基金资助:
    广西科技重大专项-江海联运重点货种运输组织方案关键技术研究(桂科AA23062021-2)。

Anomaly Detection and Incremental Clustering Method for High-speed Train Delay Data

SHEN Pengju1a,1b, SONG Liying*1a,1b, LIN Chuanqian2   

  1. 1a. School of Traffic and Transportation, 1b. Key Laboratory of Transport Industry of Big Data Application Technologies for Comprehensive Transport of Ministry of Transport, Beijing Jiaotong University, Beijing 100044, China; 2. Department of Transport Supervision and Management, National Railway Administration of the People's Republic of China, Beijing 100891, China
  • Received:2025-06-20 Revised:2025-08-18 Accepted:2025-08-21 Online:2025-10-25 Published:2025-10-25
  • Supported by:
    Guangxi Science and Technology Major Special Project-Research on Key Technologies for the Transportation Organization Scheme of Key Cargo Types in River-Sea Intermodal Transport (桂科AA23062021-2)。

摘要: 针对高铁列车运行数据中异常样本难以实时识别和聚类结构随数据演化动态变化等问题,本文提出一种基于狄利克雷过程混合模型的后验归类式增量聚类与异常检测方法(Posterior Classification-based Incremental Dirichlet Process Mixture Model, PC-IDPMM)。该方法在离线阶段构建聚类模型并识别异常样本,在线阶段结合后验概率快速归类新样本,并通过密度聚类提取新结构,实现模型的结构扩展与参数更新。为验证模型性能,本文基于广深高铁实测数据开展实验。结果表明:PC-IDPMM在保持聚类结构一致性的同时,实现主簇统计特征的稳定更新,AUC(Area Under the Curve)达90.55%,优于多种离线方法;计算效率方面,训练时间与内存消耗较离线模型分别减少约85%和80%。此外,PC-IDPMM可基于列车前序站点数据实现实时异常预警,辅助调度系统在延误初期干预,将累计晚点由572min降至320min,实验结果验证了该方法在高频数据环境下的实时性与应用价值。

关键词: 智能交通, 异常检测, 增量聚类, 列车晚点, 狄利克雷过程

Abstract: To improve the real-time anomaly detection and the dynamic evolution of clustering structures in high-speed train operation data, this paper proposes a Posterior Classification-based Incremental Dirichlet Process Mixture Model (PC-IDPMM) for incremental clustering and anomaly detection. The method adopts a two-stage framework: in the offline phase, a clustering model is developed and anomalous samples are identified; in the online phase, new samples are rapidly classified using posterior probabilities, and density-based clustering is applied to extract potential new structures, enabling structural expansion and parameter updating. To validate model performance, experiments are conducted using real-world data from the Guangzhou-Shenzhen high-speed railway. Results show that PC-IDPMM maintains cluster consistency while achieving stable updates of statistical features, with an AUC (Area Under the Curve) of 90.55%, outperforming several offline methods. Compared to offline models, the training time and memory usage are reduced by approximately 85% and 80%, respectively. The PC-IDPMM enables real-time anomaly warnings based on upstream station data before a train completes its route, supporting early-stage dispatch interventions that reduce cumulative delay from 572 minutes to 320 minutes. These results demonstrate the model’s real-time capability and practical value in high-frequency data environments.

Key words: intelligent transportation, anomaly detection, incremental clustering, train delays, Dirichlet process

中图分类号: