交通运输系统工程与信息 ›› 2023, Vol. 23 ›› Issue (6): 42-50.DOI: 10.16097/j.cnki.1009-6744.2023.06.005

• 智能交通系统与信息技术 • 上一篇    下一篇

多种缺失模式下交通数据组合近似填补方法

郭凤香1,黄金涛1,陈昱光1,郭延永2,刘攀*2   

  1. 1. 昆明理工大学,交通工程学院,昆明 650504;2. 东南大学,交通学院,南京 210096
  • 收稿日期:2023-08-09 修回日期:2023-08-28 接受日期:2023-09-12 出版日期:2023-12-25 发布日期:2023-12-23
  • 作者简介:郭凤香(1979- ),女,黑龙江海林人,教授,博士
  • 基金资助:
    国家杰出青年科学基金(51925801);国家重点研发计划 (2018YFB1600900)

Combined Approximate Method for Traffic Data Imputation Under Multiple Missing Modes

GUO Feng-xiang1,HUANG Jin-tao1,CHEN Yu-guang1,GUO Yan-yong2,LIU Pan*2   

  1. 1. Faculty of Transportation Engineering, Kunming University of Science and Technology, Kunming 650504, China; 2. School of Transportation, Southeast University, Nanjing 210096, China
  • Received:2023-08-09 Revised:2023-08-28 Accepted:2023-09-12 Online:2023-12-25 Published:2023-12-23
  • Supported by:
    National Science Fund for Distinguished Young Scholars of China (51925801);National Key Research and Development Program of China (2018YFB1600900)

摘要: 随着智能交通系统中采集和应用的基础数据规模不断扩大,数据缺失问题的重要性也日益凸显。针对交通数据中常出现的数据随机缺失和连续缺失问题,本文提出基于鲸鱼优化算法优化最小二乘支持向量机的组合近似填补方法(Combined Approximate Filling, CAF)。考虑缺失数据整体变化趋势的同时,参考数据的波动特征,根据多重填补思想对缺失值分别使用单变量填补和多变量填补,然后引入图片识别中自适应阈值分割法对不同时段下的差异值进行动态划分处理,最后利用不同时段的动态差异度阈值将单变量填补和多变量填补的结果进行结合,完成缺失值的高精度近似填补。为验证填补方法的性能,利用云南省玉溪市大量实车轨迹处理数据设计多组实验。实验结果表明,在小样本数据中,CAF填补方法能够适应多种场合的填补工作,该方法总体优于其他方法,在不同缺失率下均表现良好,尤其是随机缺失填补,最大 RMSE 为0.365。实验还证明了该方法在不同缺失类型和不同数据离散度下数据填补效果相比于其他方法优势更加明显。

关键词: 智能交通, 数据填补, 最小二乘支持向量机, 轨迹数据, 差异度

Abstract: With the continuous expansion of the scale of basic data collected and applied in intelligent transportation systems, the importance of missing data imputation has become increasingly prominent. Aiming at the problem of random missing and continuous missing data in traffic dataset, this paper proposes a Combined Approximate Filling (CAF) method based on least squares support vector machine optimized by whale optimization algorithm. Considering the overall trend of the missing data and the fluctuation characteristics of the reference data, this paper uses the univariate imputation and multivariate imputation to fill the missing values based on the multiple imputation theory. The adaptive threshold segmentation method in image recognition is introduced to dynamically divide the difference values under different periods. The results of univariate imputation and multivariate imputation are combined through the dynamic dissimilarity threshold under different periods to complete the high- precision approximate imputation. Several experiments are designed to verify the performance of the imputation method, using a large number of real vehicle trajectory data in Yuxi City of Yunnan Province in China. The results show that the CAF imputation method can adapt to various occasions in small sample data. This method produces satisfied results under different missing rates, especially for random missing imputation. The maximum RMSE is 0.365. The experimental results also prove that the data imputation from the proposed method is more effective than some traditional methods under different missing types and different data dispersion conditions.

Key words: intelligent transportation;data imputation;least squares support vector machine;trajectory data, degree of difference

中图分类号: