交通运输系统工程与信息 ›› 2026, Vol. 26 ›› Issue (2): 91-101.DOI: 10.16097/j.cnki.1009-6744.2026.02.009

• 智能交通系统与信息技术 • 上一篇    下一篇

世界模型协同强化学习的感潮河段船舶自主航行决策方法

翁金贤*1 ,丁海峰1 ,刘文2 ,石坤1 ,倪宝龙3   

  1. 1. 上海海事大学,交通运输学院,上海201306;2.武汉理工大学,航运学院,武汉430063; 3. 中华人民共和国黄浦海事局,上海200086)
  • 收稿日期:2025-12-25 修回日期:2026-01-29 接受日期:2026-02-05 出版日期:2026-04-25 发布日期:2026-04-20
  • 作者简介:翁金贤(1982—),男,江西广丰人,教授。
  • 基金资助:
    上海市学术带头人(23XD1421500)。

Autonomous Vessel Navigation Decision-Making Methods in Tidal River Reaches Based on Cooperative Reinforcement Learning of World Model

WENG Jinxian*1, DING Haifeng1, LIU Wen2, SHI Kun1, NI Baolong3   

  1. 1. College of Transport and Communication, Shanghai Maritime University, Shanghai 201306, China; 2. School of Navigation, Wuhan University of Technology, Wuhan 430063, China; 3. Huangpu Maritime Safety Administration of the People's Republic of China, Shanghai 200086, China
  • Received:2025-12-25 Revised:2026-01-29 Accepted:2026-02-05 Online:2026-04-25 Published:2026-04-20
  • Supported by:
    Program of Shanghai Academic Research Leader (23XD1421500)。

摘要: 针对传统强化学习方法在船舶自主航行决策中存在环境演化建模缺失与决策短视的问题,本文提出一种世界模型(WorldModel,WM)协同强化学习(Reinforcement Learning, RL)的船舶自主航行决策方法。首先,构建RL智能体作为决策主体生成船舶候选动作。其次,基于递归状态空间模型(Recurrent State Space Model, RSSM)构建WM,利用编码器将高维观测映射至潜在空间,结合水动力流场与船舶交互环境进行时序特征提取与环境演化建模。最后,针对传统RL策略的决策短视问题,构建基于群组风险的梯度投影模块,利用WM的前视推演能力对RL候选动作进行安全性校验,通过求解约束优化问题将高风险动作映射至安全可行域,实现对潜在RL高风险决策的动态修正。以上海黄浦江典型感潮河段为例进行闭环实验,结果表明:本文决策方法在多项评估指标上均优于其他模型,相较于传统RL方法,所提方法将高风险群组出现频次由3.2次·航程-1降低至0.8次·航程-1,时间偏差率由8.15%优化至5.56%,且舵角角速度标准差由5.2(°)·s-1降至3.2 (°)·s-1。此外,消融实验表明,相比原方法,引入投影模块后高风险群组出现频次降低52.11%,时间偏差率由6.20%优化至4.32%,舵角角速度与纵向加速度标准差分别从4.21(°)·s-1和0.23 m·s-2降低至2.75 (°)·s-1和0.18 m·s-2

关键词: 水路运输, 船舶航行决策, 世界模型, 船舶, 风险评估, 航行安全

Abstract: Traditional reinforcement learning (RL) methods for autonomous vessel navigation often suffer from environmental evolution modeling deficiency and decision-making short-sightedness. To address these issues, this paper proposes an autonomous decision-making method based on World Model (WM) cooperative Reinforcement Learning (WMRL). First, an RL agent is constructed as the primary decision-maker to generate the candidate actions of vessels. Second, a WM is built based on the Recurrent State Space Model (RSSM). It utilizes an encoder to map high-dimensional observations into a latent space, which integrates the hydrodynamic flow fields and vessel interaction environments to perform temporal feature extraction and environmental evolution modeling. Finally, targeting the short-sighted decision-making of traditional strategies, a gradient projection module based on group risk is constructed. This module validates the safety of RL candidate actions by leveraging the forward-looking inference capability of the WM. By solving a constrained optimization problem, it maps high-risk actions into a safe and feasible region, and achieves the dynamic correction of potential high-risk RL decisions. Closed-loop experiments were conducted by using the typical tidal reaches of the Huangpu River in Shanghai as a case study. The results indicate that the proposed method outperforms other models across multiple evaluation metrics. Compared with the traditional RL methods, the proposed approach reduces the frequency of high-risk groups from 3.2 time·voyage-1 to 0.8 time·voyage-1, optimizes the time deviation rate from 8.15% to 5.56%, and decreases the standard deviation of rudder angle angular velocity from 5.2 (°)·s-1 to 3.2 (°)·s-1. Furthermore, ablation experiments demonstrate that, compared to the baseline, after introducing the projection module, the method reduces the frequency of high-risk groups by 52.11%, optimizes the time deviation rate from 6.20% to 4.32%, and then reduces the standard deviations of rudder angle angular velocity and longitudinal acceleration from 4.21 (°)·s-1 and 0.23 m·s-2 to 2.75 (°)·s-1 and 0.18 m·s-2, respectively.

Key words: waterway transportation, vessel navigation decision-making, world model, vessel, risk assessment, navigation safety

中图分类号: