交通运输系统工程与信息 ›› 2016, Vol. 16 ›› Issue (3): 81-87.

• 智能交通系统与技术 • 上一篇    下一篇

基于随机森林的公路隧道运营缺失数据插补方法

钱超*a,陈建勋b,罗彦斌b,代亮a   

  1. 长安大学a. 电子与控制工程学院;b. 公路学院,西安710064
  • 收稿日期:2015-11-13 修回日期:2016-03-29 出版日期:2016-06-25 发布日期:2016-06-27
  • 作者简介:钱超(1984-),男,江苏新沂人,讲师,博士后.
  • 基金资助:

    973计划项目/973 Program(2013CB036003);国家自然科学基金项目/National Natural Science Foundation of China (51408054);中央高校基本科研业务费专项资金项目/Fundamental Research Funds for the Central Universities (310832161006).

Random Forest Based Operational Missing Data Imputation for Highway Tunnel

QIAN Chaoa, CHEN Jian-xunb, LUO Yan-binb, DAI Lianga   

  1. a. School of Electronic & Control Engineering; b. School of Highway, Chang’an University, Xi’an 710064, China
  • Received:2015-11-13 Revised:2016-03-29 Online:2016-06-25 Published:2016-06-27

摘要:

对隧道内环境、交通状态等各类运营数据的实时、完整获取并深入挖掘,是提 高应急处置能力、实现运营安全预警的基础.提出一种基于随机森林的缺失数据插补方 法,根据缺失特征对缺失数据集进行分割;建立随机森林回归模型进行迭代插补并确定 迭代终止条件;以标准均方根误差最小确定了随机森林中决策树的数量和分裂节点随机 抽取变量数的最优组合.对公路隧道运营缺失数据集插补结果表明:本方法插补精度高、 鲁棒性好,与KNN、SVD、MICE 和PPCA 等插补方法相比,标准均方根误差降低25%以 上;利用并行运算大幅度提高了插补效率,弥补了插补速度慢的缺陷,保证了插补的有效 性和时效性.

关键词: 公路运输, 缺失数据插补, 随机森林, 公路隧道, 运营管理

Abstract:

Real- time & completely accessing and deeply mining of tunnel operational data such as environment state and traffic status is a foundation work to improve emergency response capacity and realize safety early warning. An imputation method is proposed based on Random Forest algorithm. Missing data set is separated according to missing features. Random Forest regression model is built to iteratively impute after the determination of stopping criterion. The optimal combination of decision tree numbers and variables numbers randomly sampled at each split in Random Forest are identified by taking the minimum normalized root mean square error as objective function. Imputation results on highway tunnel operational missing data indicate that the method provides significantly higher precision and better robustness than KNN, SVD, MICE, PPCA, reducing normalized root mean square error by at least 25% . Moreover, the imputation efficiency is improved significantly by using parallel computation. It covers the shortage of slow imputation speed and provides a warranty of effectiveness and timeliness in missing data imputation.

Key words: highway transportation, missing data imputation, Random Forest, highway tunnel, operation management

中图分类号: