交通运输系统工程与信息 ›› 2015, Vol. 15 ›› Issue (6): 46-53.

• 智能交通系统与信息技术 • 上一篇    下一篇

基于朴素贝叶斯分类器的公交通勤人群辨识方法

孙世超,杨东援*   

  1. 同济大学道路与交通工程教育部重点实验室,上海200092
  • 收稿日期:2015-04-09 修回日期:2015-09-07 出版日期:2015-12-25 发布日期:2015-12-25
  • 作者简介:孙世超(1988-),男,辽宁大连人,博士生.
  • 基金资助:

    国家自然科学基金(51478350).

Identification of Transit Commuters Based on Naïve Bayesian Classifier

SUN Shi-chao,YANG Dong-yuan   

  1. Key Laboratory of Road and Traffic Engineering of the MOE, Tongji University, Shanghai 200092, China
  • Received:2015-04-09 Revised:2015-09-07 Online:2015-12-25 Published:2015-12-25

摘要:

公交IC 卡数据中通勤用户卡号ID 的辨识和提取是其公交出行行为特征分析 的前提.本文以厦门市公交IC 卡刷卡记录为依托,结合相关问卷调查,提出一种基于朴素 贝叶斯分类器(Naïve Bayesian Classifier,NBC)的公交通勤人群辨识方法.首先,利用两种 数据源中(问卷调查数据与IC 卡数据)同时包含的公交出行信息,例如工作日首次刷卡时 间、每周工作日刷卡天数等,建立其与调查数据中独有的类别变量(通勤人群/非通勤人 群)之间的贝叶斯概率关系,并以此构建与训练NBC模型.然后,利用未参与训练的调查 样本对标定后的模型的预测准确性进行测试,通勤人群的预测成功率达到88%.最终,利 用测试验证后的NBC模型对公交IC 卡数据中通勤人群进行识别,结果显示,厦门市公交 通勤人群的数量介于26万人到32万人之间,并给出相关指标的统计结果.

关键词: 城市交通, IC卡数据, 朴素贝叶斯分类器, 通勤人群

Abstract:

The Naïve Bayesian Classifier method is applied to identify transit commuters, based on the data of smartcard and questionnaire survey in Xiamen. Firstly, we establish the Bayesian probabilistic relations between the interviewer’s category variable (commuters/non- commuters) and the bus travel information which are contained in both questionnaire and smartcard data. Then the NBC model is established and trained based on the obtained conditional probability. By using the questionnaire sample that does not participate in the training, then the prediction accuracy of the calibrated model is tested, and the success rate of the prediction is 88% . Finally, the validated NBC model is applied to the identification of transit commuters in smartcard data, the result shows that the number of transit commuters in Xiamen may range from 260,000 to 320,000.

Key words: urban traffic, smartcard data, Naïve Bayesian Classifier, commuters

中图分类号: