交通运输系统工程与信息 ›› 2018, Vol. 18 ›› Issue (5): 121-128.

• 系统工程理论与方法 • 上一篇    下一篇

基于改进KNN算法的城轨进站客流实时预测

郇宁 1,谢俏 2,叶红霞 2,姚恩建*1   

  1. 1. 北京交通大学 城市交通复杂系统理论与技术教育部重点实验室,北京 100044; 2. 广州地铁集团有限公司,广州 510030
  • 收稿日期:2018-04-11 修回日期:2018-06-18 出版日期:2018-10-25 发布日期:2018-10-26
  • 作者简介:郇宁(1994-),男,山东威海人,博士生.
  • 基金资助:

    中央高校基本科研业务经费专项资金/Fundamental Research Funds for the Central Universities (2017YJS104);城市轨道交通系统安全与运维保障国家工程实验室建设项目/ National Engineering Laboratory for System Safety and Operation Assurance of Urban Rail Transit (2016582).

Real-time Forecasting of Urban Rail Transit Ridership at the Station Level Based on Improved KNN Algorithm

HUAN Ning 1, XIE Qiao 2, YE Hong-xia 2, YAO En-jian 1   

  1. 1. MOE Key Laboratory for Urban Transportation Complex Systems Theory and Technology, Beijing Jiaotong University, Beijing 100044, China; 2. Guangzhou Metro Group Co., Ltd, Guangzhou 510030, China
  • Received:2018-04-11 Revised:2018-06-18 Online:2018-10-25 Published:2018-10-26

摘要:

针对实时进站客流数据的高维数、多噪声、波动频繁等特征,本文提出一种基于改进 K最近邻(K-nearest-neighbor, KNN)算法的城轨进站客流实时预测方法.首先,通过对分时客流数据的相关性分析,确定表征客流特征的状态向量;其次,结合数据特性改进近邻样本的模式匹配过程,利用关键点法去除原始序列中的噪声扰动,并引入动态时间规整算法实现考虑序列形态的相似性度量;再次,根据样本间流量差异引入距离权重和趋势系数,推演未来时段的进站量,实现滚动的实时预测;最后,依托广州地铁客流数据仓库对预测模型进行精度分析. 结果表明,对于全网159个站点,5 min粒度下全天分时进站量预测的平均绝对百分比误差的均值为11.6%,能够为路网状态监控提供可靠的数据支撑.

关键词: 城市交通, 实时预测, K近邻, 进站客流, 动态时间规整

Abstract:

Driven by the data characteristics of high-dimensionality and multi-interference, an improved Knearest-neighbor (KNN) algorithm is proposed to realize real-time forecasting of urban rail transit ridership at the station level. Firstly, a self-correlation analysis on historical passenger flow is conducted to determine a reasonable state vector for samples. Then, the traditional pattern matching process in KNN is primarily modified to adapt to the data characteristics. Specifically, the noise perturbations in raw time series are eliminated to a great extent with the approach of key point segmentation, and time alignment is introduced to the similarity measurement to account for the morphological feature of series. Furthermore, in order to capture the flow fluctuations in a short period, the distance-based weight and trend coefficient of matched template samples are employed to make further correction on forecast results. Finally, the automated fare collection (AFC) data of Guangzhou metro system is used for performance evaluation of proposed method. The results show that the mean absolute percentage error (MAPE) of all granularities in one day for entire 159 stations is 11.6%, which could provides effective foundation data support for network monitoring.

Key words: urban traffic, short-time prediction, K-nearest-neighbor, passenger flow, non-parametric regression

中图分类号: