交通运输系统工程与信息 ›› 2024, Vol. 24 ›› Issue (1): 124-131.DOI: 10.16097/j.cnki.1009-6744.2024.01.012

• 智能交通系统与信息技术 • 上一篇    下一篇

考虑载客状态的改进孤立森林浮动车异常数据检测算法

任其亮1,徐韬*1,2,刘媛1,程龙春2   

  1. 1. 重庆交通大学,重庆 400074;2. 重庆设计集团有限公司,重庆 400050
  • 收稿日期:2023-09-12 修回日期:2023-10-11 接受日期:2023-10-13 出版日期:2024-02-25 发布日期:2024-02-12
  • 作者简介:任其亮(1978- ),男,山东莱芜人,教授,博士
  • 基金资助:
    国家社会科学基金(21BJY038)

Anomaly Data Detection Algorithm of Improvement Isolated Forest for Floating Car Data Collection Considering Passenger Carrying Status

REN Qiliang1, XU Tao*1,2, LIU Yuan1, CHENG Longchun2   

  1. 1. Chongqing Jiaotong University, Chongqing 400074, China; 2. Chongqing Design Group Limited Company, Chongqing 400050, China
  • Received:2023-09-12 Revised:2023-10-11 Accepted:2023-10-13 Online:2024-02-25 Published:2024-02-12
  • Supported by:
    National Social Science Fund of China (21BJY038)

摘要: 为提高浮动车数据中异常数据检测能力及不同载客状态下的模型检测分析能力,提出基于 S-DTA-IIForest(Summation& Difference Third Order Average & Improvement-Isolation Forest)的浮动车数据异常检测算法。构建由相邻两项求和(S)、三阶求和平均差分(DTA)的二维度空间S-DTA特征向量;提出差额累计更新和动态区分辨识的改进孤立森林IIForest算法,通过设置停止阈值参数,避免当出现新样本异常值分数大于停止阈值时,仅更新样本不更新孤立森林模型的问题,设计每个二叉树区分辨识度参数,区分辨识度位于停止区间时停止二叉树生长,提高算法收敛性能,以 ROC (Receiver Operating Characteristic)曲线下面积 AUC (Area Under ROC Cure)、 F1-score为指标对模型精度进行对比分析,并以重庆市中心城区学府大道开展实例验证。结果表明:本文S-DTA-IIForest组合算法AUC、F1-score分别为86.63%、0.89,AUC较传统孤立森林 IForest(Isolation Forest)提高32.4%,运行效率提高1.29%,具有收敛速度更快、精度更高的优势,载客条件下模型AUC、F1-score较未载客分别提高7.7%、10.8%,组合算法对载客数据有更高的检测精度,且未载客状态数据异常率较载客状态增加71.4%,未载客数据异常率更高。

关键词: 智能交通, 异常数据检测, 改进孤立森林, 浮动车数据, S-DTA算法

Abstract: To improve the detection ability of anomaly data in floating car data and analyze the model detection ability under different passenger carrying status, this paper proposes a floating car data anomaly detection algorithm based on summary and difference of the third order average and improvement isolation forest (S-DTA- IIforest). A two-dimensional degree space S-DTA feature vector is developed to include adjacent two terms summing and third-order summing mean difference. Then, the Isolation forest(IIForest) algorithm with differential cumulative update and dynamic differentiation identification is proposed, with stop threshold parameters set. When the outlier score of the new sample is greater than the stop threshold, only the sample is updated without updating the isolated forest model. At the same time, each binary tree differentiation identification parameter is designed, and the binary tree growth is stopped when the differentiation identification is in the stop interval to improve the convergence performance of the algorithm. At last, the Area Under ROC(Receiver Operating Characteristic) Cure (AUC) and the F1-score are used as the indicators to analyze the accuracy of the model, and an example verification is conducted on Xuefu Road in Chongqing city of China. The experimental results show that the AUC and F1- score of the S-DTA- IIForest combination algorithm in this paper are 86.63% and 0.89, respectively. The AUC is 32.4% higher than the traditional IForest, and the operating efficiency is 1.29% higher. It has the advantages of faster convergence speed and higher accuracy. When there are passengers in the vehicle, the AUC and F1-score of the model are respectively 7.7% and 10.8% higher than those without passengers,. The combination algorithm has higher detection accuracy for passenger data. And the anomaly rate of data in the condition without a passenger increased by 71.4% compared to the condition with a passenger, with a higher abnormal rate of data in the condition without a passenger.

Key words: intelligent transportation, anomaly data detection, improvement-isolation forest, floating car data; Summation &, Difference Third Order Average (S-DTA)

中图分类号: