大语言模型协同强化学习的自动驾驶决策方法

doi:10.16097/j.cnki.1009-6744.2025.04.014

交通运输系统工程与信息 ›› 2025, Vol. 25 ›› Issue (4): 137-146.DOI: 10.16097/j.cnki.1009-6744.2025.04.014

• 智能交通系统与信息技术 • 上一篇下一篇

大语言模型协同强化学习的自动驾驶决策方法

王祥^1a ，任浩^*2 ，谭国真^1a ，李健平^1a ，王珏^1b ，王妍力^1a

1. 大连理工大学，a.计算机科学与技术学院，b.控制科学与工程学院，辽宁大连116024； 2. 清华大学，精密仪器系，北京100084

收稿日期:2025-03-20 修回日期:2025-04-26 接受日期:2025-05-06 出版日期:2025-08-25 发布日期:2025-08-25
作者简介:王祥(1997— )，男，辽宁盘锦人，博士生。
基金资助:
国家自然科学基金重点项目 (U1808206)。

Autonomous Driving Decision-making Method Based on Cooperative Reinforcement Learning of Large Language Model

WANG Xiang^1a, REN Hao^*2, TAN Guozhen^1a, LI Jianping^1a, WANG Jue^1b, WANG Yanli^1a

1a. School of Computer Science and Technology, 1b. School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, Liaoning, China; 2. Department of Precision Instrument, Tsinghua University, Beijing 100084, China

Received:2025-03-20 Revised:2025-04-26 Accepted:2025-05-06 Online:2025-08-25 Published:2025-08-25
Supported by:
Key Program of the National Natural Science Foundation of China (U1808206)。

摘要/Abstract

摘要： 针对当前自动驾驶系统的高层决策缺乏具体执行细节和持续学习能力的问题，本文围绕大语言模型(Large Language Model, LLM)研究细化自动驾驶决策环节的应用。基于LLM强大的推理能力和强化学习(Reinforcement Learning, RL)的探索能力，提出一种LLM协同RL细化决策的方法。首先，基于RL输出的高级动作，利用LLM的推理能力预测自车的未来轨迹点；然后，将RL模型的输出和当前状态信息相结合，对下一个状态做出安全、无碰撞且可解释的预测；最后，将上述驾驶决策过程向量化后，存储到记忆模块作为驾驶经验，驾驶经验定期更新，实现持续学习。LLM预测的轨迹点为PID(Proportional-Integral Derivative)控制器提供详细的运动路径，为其调整车辆加速度和速度提供依据，确保车辆沿预定路径行驶。此外，轨迹预测还能评估并规避潜在碰撞风险，通过分析交通状态和历史数据规划安全路径。闭环实验结果表明：本文决策方法在各项评估指标上均优于其他模型，相对于RL、单纯基于LLM的决策方法和基于LLM跟车模型的驾驶分数分别提高了35.12，14.33和12.28，拥有记忆模块的方法比没有记忆模块的方法的驾驶分数提高了25.59。

关键词: 智能交通, 自动驾驶, 大语言模型, 强化学习, 持续学习, 轨迹预测

Abstract: Aiming at the problems that the high-level decision-making of the current autonomous driving system lacks specific execution details and continuous learning ability, this paper focuses on applying the Large Language Model (LLM) in refining the decision-making process of autonomous driving. Based on the powerful reasoning ability of the LLM and the exploration ability of Reinforcement Learning (RL), this paper proposes a method of combining the LLM and RL to refine the vehicle decision-making process. First, based on the high-level actions output of the RL, the reasoning ability of the LLM is used to predict the future trajectory points of the host vehicle. Then, the output of the RL model is combined with the current state information to make a safe, collision-free and interpretable prediction of the next state. At last, the above driving decision-making process is vectorized and stored in the memory module as driving experience, and the driving experience is updated regularly to achieve sustainable learning. The trajectory points predicted by the LLM provide a detailed motion path for the Proportional-Integral-Derivative (PID) controller, providing a basis for adjusting the vehicle's acceleration and speed to ensure that the vehicle travels along the predetermined path. In addition, the trajectory prediction can also evaluate and avoid potential collision risks, and create a safe path by analyzing the traffic state and historical data. The results of the closed-loop experiment show that the proposed decision-making method outperforms other models in all evaluation indicators. Compared to the RL, the decision-making method based solely on the LLM, and the LLM-based car-following model, the driving scores are increased by 35.12, 14.33 and 12.28 respectively. The method with the memory module increases the driving score by 25.59 compared to the method without the memory module.

Key words: intelligent traffic, autonomous driving, large language model, reinforcement learning, continual learning, trajectory prediction

中图分类号:

U495

王祥, 任浩, 谭国真, 李健平, 王珏, 王妍力. 大语言模型协同强化学习的自动驾驶决策方法[J]. 交通运输系统工程与信息, 2025, 25(4): 137-146.

WANG Xiang, REN Hao, TAN Guozhen, LI Jianping, WANG Jue, WANG Yanli. Autonomous Driving Decision-making Method Based on Cooperative Reinforcement Learning of Large Language Model[J]. Journal of Transportation Systems Engineering and Information Technology, 2025, 25(4): 137-146.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: http://www.tseit.org.cn/CN/10.16097/j.cnki.1009-6744.2025.04.014

http://www.tseit.org.cn/CN/Y2025/V25/I4/137

参考文献

[1]黄志清,曲志伟,张吉,等.基于深度强化学习的端到端无人驾驶决策[J]. 电子学报, 2020, 48(9): 1711 1719. [HUANG Z Q, QU Z W, ZHANG J, et al. End-to end autonomous driving decision based on deep reinforcement learning[J]. Acta Electronica Sinica, 2020, 48(9): 1711-1719.]

[2] DENG Y, ZHENG X, ZHANG M, et al. Scenario-based test reduction and prioritization for multi-module autonomous driving systems[C]//Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022: 82-93.

[3]宋晓琳,盛鑫,曹昊天,等.基于模仿学习和强化学习的智能车辆换道行为决策[J].汽车工程,2021,43(1): 59-67. [SONG X L, SHENG X, CAO H T, et al. Lane change behavior decision-making of intelligent vehicle based on imitation learning and reinforcement learning [J]. Automotive Engineering, 2021, 43(1): 59-67.]

[4] PENG Y, TAN G, SI H. RTA-IR: A runtime assurance framework for behavior planning based on imitation learning and responsibility-sensitive safety model[J]. Expert Systems with Applications, 2023, 232: 120824.

[5]周卫林,王玉龙,裴锋,等.基于分段学习模型的自动驾驶行为决策算法研究[J].中国公路学报,2022,35 (6): 324-338. [ZHOU W L, WANG Y L, PEI F, et al. Decision algorithm for autonomous driving behavior based on piecewise learning model[J]. China Journal of Highway and Transport, 2022, 35(6): 324-338.]

[6]李伟东,马草原,史浩,等.基于分层强化学习的自动驾驶决策控制算法[J/OL]. 吉林大学学报(工学版). (2023-12-19)[2025-07-10]. https://doi.org/10.13229/j. cnki.jdxbgxb.20230891. [LI W D, MA C Y, SHI H, et al. An automatic driving decision control algorithm based on hierarchical reinforcement learning[J/OL]. Journal of Jilin University (Engineering and Technology Edition), (2023-12-19)[2025-07-10]. https://doi.org/10.13229/j. cnki.jdxbgxb.20230891.]

[7]李传耀,张帆,王涛,等.基于深度强化学习的道路交叉口生态驾驶策略研究[J].交通运输系统工程与信息, 2024, 24(1): 81-92. [LI C Y, ZHANG F, WANG T, et al. Signalized intersection eco-driving strategy based on deep reinforcement learning[J]. Journal of Transportation Systems Engineering and Information Technology, 2024, 24(1): 81-92.]

[8] 唐斌,刘光耀,江浩斌,等.基于柔性演员-评论家算法的决策规划协同研究[J].交通运输系统工程与信息, 2024, 24(2): 105-113, 187. [TANG B, LIU G Y, JIANG HB, et al. Collaborative study of decision-making and trajectory planning for autonomous driving based on soft acto-critic algorithm[J]. Journal of Transportation Systems Engineering and Information Technology, 2024, 24(2): 105-113, 187.]

[9] 贺正冰.大语言模型在道路交通领域应用:创新与挑战[J]. 交通运输工程与信息学报,2025, 23(1): 85-92. [HE Z B. Large language models in road transportation: Innovations and challenges[J]. Journal of Transportation Engineering and Information, 2025, 23(1): 85-92.]

[10] 王祥, 谭国真,彭衍飞,等.基于语言推理与认知记忆的自动驾驶决策模型[J/OL]. 吉林大学学报(工学版). (2024-07-30)[2025-07-10]. https://doi.org/10.13229/j. cnki.jdxbgxb.20240606. [WANG X, TAN G Z, PENG Y F, et al. Autonomous driving decision-making model based on language reasoning and cognitive memory.[J/ OL]. Journal of Jilin University (Engineering and Technology Edition). (2024-07-30)[2025-07-10]. https: //doi.org/10.13229/j.cnki.jdxbgxb.20240606.]

[11] FU D, LI X, WEN L, et al. Drive like a human: Rethinking autonomous driving with large language models[C]//2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), IEEE, 2024: 910-919.

[12] CHEN X, PENG M, TIU P H, et al. GenFollower: Enhancing car-following prediction with large language models[J]. IEEE Transactions on Intelligent Vehicles, 2024: 1-11 .

[13] HUANG Y, SANSOM J, MA Z, et al. DriVLMe: Enhancing llm-based autonomous driving agents with embodied and social experiences[C]//2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2024: 3153-3160.

[14] SHAO H, WANG L, CHEN R, et al. Safety-enhanced autonomous driving using interpretable sensor fusion transformer[C]//Conference on Robot Learning. PMLR, 2023: 726-737.

[1]	姚振兴, 刘贤, 赵一飞, 王亮, 王彦琛. 手机信令不均匀定位下出行端点自适应识别方法[J]. 交通运输系统工程与信息, 2025, 25(4): 44-52.
[2]	张鹏, 李兴旺, 姬炳豪, 孙超, 李文权. 路口重复放行的公交与社会车辆协同绿波优化模型[J]. 交通运输系统工程与信息, 2025, 25(4): 53-62.
[3]	代亮, 杜鹏飞, 黄自彬, 杨朋博. 基于深度强化学习的城市交通信号分层协同控制方法[J]. 交通运输系统工程与信息, 2025, 25(4): 63-72.
[4]	王庞伟, 王思淼, 雷方舒, 徐京辉, 王子鹏, 王力. 混合动作表示强化学习下的城市交叉口智慧信控方法[J]. 交通运输系统工程与信息, 2025, 25(4): 73-83.
[5]	王连震, 沈超文, 王宇萍, 薛淑祺. 网联高速公路合流区基于间隙优化的车辆协同控制方法[J]. 交通运输系统工程与信息, 2025, 25(4): 84-95.
[6]	王维锋, 黄建鑫, 王晓全, 吴昕韩, 卞子馨. 基于无锚旋转框的航拍图像车辆全向检测方法[J]. 交通运输系统工程与信息, 2025, 25(4): 104-115.
[7]	陈峥, 张景, 陈博闻, 李春宇, 郭凤香, 魏福星. 基于异构多图时空融合的长时域车辆轨迹预测[J]. 交通运输系统工程与信息, 2025, 25(4): 126-136.
[8]	郑展骥, 冯昌奎, 赵杨洋, 凃强, 张河山, 徐进. 无人机航拍视角下密集场景非机动车小目标检测方法[J]. 交通运输系统工程与信息, 2025, 25(4): 147-161.
[9]	吴剑凡, 谢征宇, 秦勇, 王力, 王佳丽. 基于计算机视觉的地铁车站内乘客异常行为检测模型[J]. 交通运输系统工程与信息, 2025, 25(4): 162-174.
[10]	宋翠颖, 丁杰, 张春波. 模块化公交车辆调度研究综述[J]. 交通运输系统工程与信息, 2025, 25(4): 175-192.
[11]	谢秉磊, 冯健茜, 秦筱然. 多特征融合的网约车拼车起讫点需求时空预测[J]. 交通运输系统工程与信息, 2025, 25(4): 193-205.
[12]	陈喜群, 祝文琪, 吕朝锋. 融合轨迹时序与行为修正的车辆冲突风险预测[J]. 交通运输系统工程与信息, 2025, 25(4): 219-229.
[13]	朱琴跃, 李纪元, 李泓羿, 钱舒杨, 赵亚辉. 考虑负载不确定性的城轨列车目标速度曲线实时鲁棒优化[J]. 交通运输系统工程与信息, 2025, 25(4): 254-264.
[14]	覃文文, 彭栋梁, 戢晓峰, 徐迎豪, 李冰, 李武, 曾浩. 山区双车道公路借道超车轨迹预测模型[J]. 交通运输系统工程与信息, 2025, 25(3): 96-106.
[15]	高远, 付金龙, 冯文文. 考虑时空特征动态耦合的车辆轨迹预测方法[J]. 交通运输系统工程与信息, 2025, 25(3): 107-116.

大语言模型协同强化学习的自动驾驶决策方法

Autonomous Driving Decision-making Method Based on Cooperative Reinforcement Learning of Large Language Model

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics