交通运输系统工程与信息 ›› 2016, Vol. 16 ›› Issue (2): 98-103.

• 智能交通系统与信息技术 • 上一篇    下一篇

基于簇形均衡评估的高速公路收费数据聚类

杜瑾*1,2,郝珺3,樊海玮1   

  1. 1. 长安大学信息工程学院,西安710064;2. 陕西省道路交通智能检测与装备工程研究中心,西安710064; 3. 西安铁路局,西安710054
  • 收稿日期:2015-11-03 修回日期:2015-12-11 出版日期:2016-04-25 发布日期:2016-04-25
  • 作者简介:杜瑾(1974-),男,陕西西安人,讲师,博士.
  • 基金资助:

    国家自然科学基金/National Natural Science Foundation of China(51278058);陕西省交通厅科技项目/Shaanxi Provincial Transport Department Science and Technology Program(13-39X);中央高校基本科研业务费专项资金/The Fundamental Research Funds for the Central Universities(CHD2011JC02)

Expressway Toll Data Clustering Based on Evaluation with Balance of Clusters’Shapes

DU Jin1,2,HAO Jun3,FAN Hai-wei1   

  1. 1. School of Information Engineering, Chang’an University, Xi’an 710064, China; 2. Shaanxi Road Traffic Detection and Equipment Engineering Research Center, Xi’an 710064, China; 3. Xi’an Railway Bureau, Xi’an 710054, China
  • Received:2015-11-03 Revised:2015-12-11 Online:2016-04-25 Published:2016-04-25

摘要:

高速公路收费数据是一种高维、海量、分布特征未知的数据集,因此难以选择 何种算法和参数最适合此类数据的聚类.针对此问题,提出一种基于簇形均衡的聚类评估 指标IBCS,对各簇的形状、分布、密度和尺寸等多种形态进行均衡综合评估.该指标根据 数据集稀疏程度自适应调整邻域置信区间来度量簇结构的分散度和分离度;度量密度使 得IBCS 具有面向数据集的算法选择能力;度量簇大小避免簇划分过于悬殊的问题.UCI 数据集上多种候选算法评估比较实验验证了该指标灵活有效,能获得准确簇数并合理划 分.最后,基于IBCS 评估的西宝高速公路收费数据聚类结果表明,采用K-means 算法,簇 数为5时聚类模式最佳.

关键词: 智能交通, 数据挖掘, 聚类算法, 模式评估指标, 高速公路收费数据

Abstract:

The expressway toll data are high-dimensional and massive, of which distribution is unknown, therefore it is hard to decide which algorithm and what parameters are more suitable for clustering to this kind of data set. Aiming to this problem, IBCS, a clustering evaluation index based on balance of clusters’ shapes, is proposed, and various cluster shape features which include outline, distribution, density, and sizes, are evaluated evenly and synthetically. By means of this index, the neighborhood confidence interval is adjusted adaptively according to data set sparsity, and the degrees of scattering and separating to cluster structure are measured. With density evaluation, IBCS is provided with capability of selecting algorithm for data set. With size evaluation, the problem that clusters are too different after separated is avoided. Results of comparison experiment on UCI data set with several candidate algorithms show that, IBCS is feasible and efficient, and the correct cluster number and clustering pattern are achieved. Finally, result of Xi’an-Baoji expressway toll data clustering based on IBCS indicates that, the clustering scheme with K-means algorithm and 5 clusters is optimal.

Key words: intelligent transportation, data mining, clustering algorithm, clustering evaluation index, expressway toll data

中图分类号: