交通运输系统工程与信息 ›› 2024, Vol. 24 ›› Issue (6): 159-168.DOI: 10.16097/j.cnki.1009-6744.2024.06.014

• 系统工程理论与方法 • 上一篇    下一篇

基于超参数优化集成学习的出行方式选择研究

李晓东,曹克让,匡海波*   

  1. 大连海事大学,综合交通运输协同创新中心,辽宁大连116026
  • 收稿日期:2024-07-17 修回日期:2024-10-28 接受日期:2024-10-30 出版日期:2024-12-25 发布日期:2024-12-18
  • 作者简介:李晓东(1992- ),男,河南郑州人,助理研究员,博士。
  • 基金资助:
    国家自然科学基金 (72174035);国家资助博士后研究人员计划(GZC20230343)。

Travel Mode Choice Based on Hyperparameter Optimization and Ensemble Learning

LI Xiaodong,CAO Kerang,KUANG Haibo*   

  1. Collaborative Innovation Center for Transport Studies, Dalian Maritime University, Dalian 116026, Liaoning, China
  • Received:2024-07-17 Revised:2024-10-28 Accepted:2024-10-30 Online:2024-12-25 Published:2024-12-18
  • Supported by:
    NationalNaturalScienceFoundation of China (72174035);Postdoctoral Fellowship Program of CPSF (GZC20230343)。

摘要: 为解决传统出行方式选择模型和机器学习模型存在的识别精度不高、超参数优化复杂,以及模型可解释性弱等问题,本文分别采用遗传算法和贝叶斯优化对极限梯度提升机模型进行超参数寻优,进一步融合SHAP(SHapleyAdditiveexPlanations)模型可视化出行方式属性和个体特征对选择概率的非线性关系,采用5折交叉验证的方式训练,避免过拟合。最终,结合瑞士地铁数据验证所提模型的优越性。结果表明,增强离散选择模型中效用函数的非线性表达,可以提高模型预测性能,但仍然不如机器学习模型;采用遗传算法和贝叶斯优化后的极限梯度提升机模型,在出行选择预测准确率、召回率和F1分数均高于传统的线性或非线性效用函数多项式Logit模型以及普通随机森林和极限梯度提升机;采用遗传算法优化的极限梯度提升机模型预测准确性最高,为0.781,优于基于多次网格搜索的常规模型;采用遗传算法优化超参数比多次网格搜索的方式训练时间降低了81.4%;不同出行方式的成本和时间是影响选择的重要因素,火车和汽车对于时间的敏感性更高,瑞士地铁对于成本的敏感性更高。

关键词: 城市交通, 个体出行预测, 超参数优化, 出行方式选择, 可解释机器学习

Abstract: To address the challenges of low predict accuracy, complex hyperparameter optimization, and limited model interpretability in conventional travel mode choice models and machine learning models, this paper introduces the genetic algorithm and Bayesian optimization for hyperparameter optimization of the extreme gradient boosting machine model (XGBoost). Additionally, the SHAP (SHapley Additive exPlanations) model is integrated to visualize the nonlinear relationship between travel mode attributes and individual characteristics in the choice probability. The proposed model is trained using 5-fold cross-validation to prevent overfitting and is evaluated using Swissmetro dataset to demonstrate its superiority. The results indicate that enhancing the nonlinear representation of the utility function in discrete choice models improves model prediction performance, yet falls short compared to machine learning models. The optimized XGBoost model, employing genetic algorithm and Bayesian optimization, outperforms conventional multinomial Logit models with linear or nonlinear utility functions, as well as standard random forest and non optimized XGBoost models in terms of accuracy, recall, and F1 score for travel choice predictions. The XGBoost model optimized by genetic algorithm exhibits the highest prediction accuracy of 0.781, surpassing models based on conventional multiple grid search. Moreover, hyperparameter optimization using genetic algorithm reduces training time by 81.4% compared to multiple grid search. Furthermore, the study reveals that the cost and time associated with different travel modes significantly influence the choice preferences, with trains and cars being more sensitive to time while the Swiss metro is more sensitive to cost.

Key words: urban traffic, individual travel prediction, hyperparameter optimization, travel mode choice, explainable machine learning

中图分类号: