交通运输系统工程与信息 ›› 2025, Vol. 25 ›› Issue (4): 137-146.DOI: 10.16097/j.cnki.1009-6744.2025.04.014

• 智能交通系统与信息技术 • 上一篇    下一篇

大语言模型协同强化学习的自动驾驶决策方法

王祥1a ,任浩*2 ,谭国真1a ,李健平1a ,王珏1b ,王妍力1a   

  1. 1. 大连理工大学,a.计算机科学与技术学院,b.控制科学与工程学院,辽宁大连116024; 2. 清华大学,精密仪器系,北京100084
  • 收稿日期:2025-03-20 修回日期:2025-04-26 接受日期:2025-05-06 出版日期:2025-08-25 发布日期:2025-08-25
  • 作者简介:王祥(1997— ),男,辽宁盘锦人,博士生。
  • 基金资助:
    国家自然科学基金重点项目 (U1808206)。

Autonomous Driving Decision-making Method Based on Cooperative Reinforcement Learning of Large Language Model

WANG Xiang1a, REN Hao*2, TAN Guozhen1a, LI Jianping1a, WANG Jue1b, WANG Yanli1a   

  1. 1a. School of Computer Science and Technology, 1b. School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, Liaoning, China; 2. Department of Precision Instrument, Tsinghua University, Beijing 100084, China
  • Received:2025-03-20 Revised:2025-04-26 Accepted:2025-05-06 Online:2025-08-25 Published:2025-08-25
  • Supported by:
    Key Program of the National Natural Science Foundation of China (U1808206)。

摘要: 针对当前自动驾驶系统的高层决策缺乏具体执行细节和持续学习能力的问题,本文围绕大语言模型(Large Language Model, LLM)研究细化自动驾驶决策环节的应用。基于LLM强大的推理能力和强化学习(Reinforcement Learning, RL)的探索能力,提出一种LLM协同RL细化决策的方法。首先,基于RL输出的高级动作,利用LLM的推理能力预测自车的未来轨迹点;然后,将RL模型的输出和当前状态信息相结合,对下一个状态做出安全、无碰撞且可解释的预测;最后,将上述驾驶决策过程向量化后,存储到记忆模块作为驾驶经验,驾驶经验定期更新,实现持续学习。LLM预测的轨迹点为PID(Proportional-Integral Derivative)控制器提供详细的运动路径,为其调整车辆加速度和速度提供依据,确保车辆沿预定路径行驶。此外,轨迹预测还能评估并规避潜在碰撞风险,通过分析交通状态和历史数据规划安全路径。闭环实验结果表明:本文决策方法在各项评估指标上均优于其他模型,相对于RL、单纯基于LLM的决策方法和基于LLM跟车模型的驾驶分数分别提高了35.12,14.33和12.28,拥有记忆模块的方法比没有记忆模块的方法的驾驶分数提高了25.59。

关键词: 智能交通, 自动驾驶, 大语言模型, 强化学习, 持续学习, 轨迹预测

Abstract: Aiming at the problems that the high-level decision-making of the current autonomous driving system lacks specific execution details and continuous learning ability, this paper focuses on applying the Large Language Model (LLM) in refining the decision-making process of autonomous driving. Based on the powerful reasoning ability of the LLM and the exploration ability of Reinforcement Learning (RL), this paper proposes a method of combining the LLM and RL to refine the vehicle decision-making process. First, based on the high-level actions output of the RL, the reasoning ability of the LLM is used to predict the future trajectory points of the host vehicle. Then, the output of the RL model is combined with the current state information to make a safe, collision-free and interpretable prediction of the next state. At last, the above driving decision-making process is vectorized and stored in the memory module as driving experience, and the driving experience is updated regularly to achieve sustainable learning. The trajectory points predicted by the LLM provide a detailed motion path for the Proportional-Integral-Derivative (PID) controller, providing a basis for adjusting the vehicle's acceleration and speed to ensure that the vehicle travels along the predetermined path. In addition, the trajectory prediction can also evaluate and avoid potential collision risks, and create a safe path by analyzing the traffic state and historical data. The results of the closed-loop experiment show that the proposed decision-making method outperforms other models in all evaluation indicators. Compared to the RL, the decision-making method based solely on the LLM, and the LLM-based car-following model, the driving scores are increased by 35.12, 14.33 and 12.28 respectively. The method with the memory module increases the driving score by 25.59 compared to the method without the memory module.

Key words: intelligent traffic, autonomous driving, large language model, reinforcement learning, continual learning, trajectory prediction

中图分类号: