交通运输系统工程与信息 ›› 2024, Vol. 24 ›› Issue (2): 13-23.DOI: 10.16097/j.cnki.1009-6744.2024.02.002

• 综合交通运输体系论坛 • 上一篇    下一篇

基于机器学习的中型城市居民出行方式选择行为研究

李文权*,邓安鑫,郑炎,殷子娟,王白凡   

  1. 东南大学,交通学院,南京211189
  • 收稿日期:2023-12-06 修回日期:2023-12-28 接受日期:2024-01-03 出版日期:2024-04-25 发布日期:2024-04-25
  • 作者简介:李文权(1964- ),男,河南宝丰人,教授,博士。
  • 基金资助:
    国家自然科学基金(52272319)。

Analysis of Residents' Travel Mode Choice in Medium-sized City Based on Machine Learning

LI Wenquan*,DENGAnxin,ZHENGYan,YIN Zijuan,WANG Baifan   

  1. School of Transportation, Southeast University, Nanjing 211189, China
  • Received:2023-12-06 Revised:2023-12-28 Accepted:2024-01-03 Online:2024-04-25 Published:2024-04-25
  • Supported by:
    NationalNaturalScienceFoundation of China (52272319)。

摘要: 为探索中型城市居民出行特征以及不同因素对出行方式选择行为的影响机制,本文以中国某中型城市居民出行数据为例,综合考虑传统离散选择模型和机器学习模型在预测精度和建模合理性上的优劣,以及机器学习模型超参数求解算法的特点和效率,引入变异程序,提出粒子群优化随机森林的中型城市居民出行方式选择预测模型,采用预测准确率、出行方式比例预测绝对误差和期望模拟误差这3项性能指标,量化对比粒子群优化随机森林模型与多种机器学习模型和多项Logit模型统计学上的预测性能差异,利用SHAP(SHapleyAdditiveexPlanation)模型深入分析个人社会经济属性、出行属性及出行方式属性等相关因素与居民出行方式选择之间的非线性关系。结果表明:粒子群优化随机森林模型整体平均预测准确率最高,为0.856,出行方式比例预测平均绝对误差和期望模拟平均误差最低,分别为0.062和0.306,模型间指标差异在统计学检验下显著;距离对不同出行方式选择的影响最显著,步行和私家车出行对距离敏感性更高,不同距离下,两者选择概率变化超过35%;30岁以下群体不同出行方式选择概率差距大于其他年龄段;性别、是否拥有私家车或公交IC卡等因素显著改变公交车和私家车的选择概率。

关键词: 城市交通, 出行方式选择, 机器学习模型, 中型城市, 粒子群优化, SHAP模型

Abstract: This paper aims to investigate the characteristics of travel behaviors and the influencing factors on travel mode choice in a medium-sized city. Utilizing travel data from a medium-sized city in China, a random forest model embedded with a particle swarm optimization algorithm adding a variation procedure (PSO-RF) was proposed for travel mode choice prediction, due to the distinctions in prediction accuracy and modeling rationality of discrete choice model and machine learning model, as well as the characteristics and efficiency of hyperparameter optimization algorithms. The predictive accuracy, predictive mode proportion's absolute deviation, and expected simulation error were used to statistically compare the predictive performance among PSO-RF, machine learning models, and the multinomial Logit model. The SHAP (SHapley additive exPlanation) model was employed to thoroughly analyze the nonlinear relationships among individual socio-economic attributes, travel attributes, mode attributes, and residents' travel mode choices. The results indicate that PSO-RF has the highest average overall prediction accuracy (0.856), and the lowest average predictive mode proportion's absolute deviation (0.062) and average expected simulation error (0.306). Statistically significant differences in models' predictions are observed. Distance has the most prominent impact on the choice of different travel modes. The modes of walking and private cars show higher sensitivity to distance, with probability changes exceeding 35% at different distances. Individuals under 30 years old exhibit greater variations in the probability of choosing different travel modes compared to other age groups. Gender, car ownership, and bus IC card ownership notably affect the probability of choosing a bus and a private car.

Key words: urban traffic, travel mode choice, machine learning, medium-sized city, particle swarm optimization, SHAPmodel

中图分类号: