世界模型协同强化学习的感潮河段船舶自主航行决策方法

doi:10.16097/j.cnki.1009-6744.2026.02.009

交通运输系统工程与信息 ›› 2026, Vol. 26 ›› Issue (2): 91-101.DOI: 10.16097/j.cnki.1009-6744.2026.02.009

• 智能交通系统与信息技术 • 上一篇下一篇

世界模型协同强化学习的感潮河段船舶自主航行决策方法

翁金贤^*1，丁海峰¹，刘文²，石坤¹，倪宝龙³

1. 上海海事大学，交通运输学院，上海201306；2.武汉理工大学，航运学院，武汉430063； 3. 中华人民共和国黄浦海事局，上海200086）

收稿日期:2025-12-25 修回日期:2026-01-29 接受日期:2026-02-05 出版日期:2026-04-25 发布日期:2026-04-20
作者简介:翁金贤（1982—），男，江西广丰人，教授。
基金资助:
上海市学术带头人(23XD1421500)。

Autonomous Vessel Navigation Decision-Making Methods in Tidal River Reaches Based on Cooperative Reinforcement Learning of World Model

WENG Jinxian^*1, DING Haifeng¹, LIU Wen², SHI Kun¹, NI Baolong³

1. College of Transport and Communication, Shanghai Maritime University, Shanghai 201306, China; 2. School of Navigation, Wuhan University of Technology, Wuhan 430063, China; 3. Huangpu Maritime Safety Administration of the People's Republic of China, Shanghai 200086, China

Received:2025-12-25 Revised:2026-01-29 Accepted:2026-02-05 Online:2026-04-25 Published:2026-04-20
Supported by:
Program of Shanghai Academic Research Leader (23XD1421500)。

摘要/Abstract

摘要： 针对传统强化学习方法在船舶自主航行决策中存在环境演化建模缺失与决策短视的问题，本文提出一种世界模型（WorldModel,WM）协同强化学习（Reinforcement Learning, RL）的船舶自主航行决策方法。首先，构建RL智能体作为决策主体生成船舶候选动作。其次，基于递归状态空间模型（Recurrent State Space Model, RSSM）构建WM，利用编码器将高维观测映射至潜在空间，结合水动力流场与船舶交互环境进行时序特征提取与环境演化建模。最后，针对传统RL策略的决策短视问题，构建基于群组风险的梯度投影模块，利用WM的前视推演能力对RL候选动作进行安全性校验，通过求解约束优化问题将高风险动作映射至安全可行域，实现对潜在RL高风险决策的动态修正。以上海黄浦江典型感潮河段为例进行闭环实验，结果表明：本文决策方法在多项评估指标上均优于其他模型，相较于传统RL方法，所提方法将高风险群组出现频次由3.2次·航程-1降低至0.8次·航程^-1，时间偏差率由8.15%优化至5.56%，且舵角角速度标准差由5.2(°)·s^-1降至3.2 (°)·s^-1。此外，消融实验表明，相比原方法，引入投影模块后高风险群组出现频次降低52.11%，时间偏差率由6.20%优化至4.32%，舵角角速度与纵向加速度标准差分别从4.21(°)·s^-1和0.23 m·s^-2降低至2.75 (°)·s^-1和0.18 m·s^-2。

关键词: 水路运输, 船舶航行决策, 世界模型, 船舶, 风险评估, 航行安全

Abstract: Traditional reinforcement learning (RL) methods for autonomous vessel navigation often suffer from environmental evolution modeling deficiency and decision-making short-sightedness. To address these issues, this paper proposes an autonomous decision-making method based on World Model (WM) cooperative Reinforcement Learning (WMRL). First, an RL agent is constructed as the primary decision-maker to generate the candidate actions of vessels. Second, a WM is built based on the Recurrent State Space Model (RSSM). It utilizes an encoder to map high-dimensional observations into a latent space, which integrates the hydrodynamic flow fields and vessel interaction environments to perform temporal feature extraction and environmental evolution modeling. Finally, targeting the short-sighted decision-making of traditional strategies, a gradient projection module based on group risk is constructed. This module validates the safety of RL candidate actions by leveraging the forward-looking inference capability of the WM. By solving a constrained optimization problem, it maps high-risk actions into a safe and feasible region, and achieves the dynamic correction of potential high-risk RL decisions. Closed-loop experiments were conducted by using the typical tidal reaches of the Huangpu River in Shanghai as a case study. The results indicate that the proposed method outperforms other models across multiple evaluation metrics. Compared with the traditional RL methods, the proposed approach reduces the frequency of high-risk groups from 3.2 time·voyage-1 to 0.8 time·voyage-1, optimizes the time deviation rate from 8.15% to 5.56%, and decreases the standard deviation of rudder angle angular velocity from 5.2 (°)·s^-1 to 3.2 (°)·s-¹. Furthermore, ablation experiments demonstrate that, compared to the baseline, after introducing the projection module, the method reduces the frequency of high-risk groups by 52.11%, optimizes the time deviation rate from 6.20% to 4.32%, and then reduces the standard deviations of rudder angle angular velocity and longitudinal acceleration from 4.21 (°)·s-1 and 0.23 m·s-2 to 2.75 (°)·s^-1 and 0.18 m·s^-2, respectively.

Key words: waterway transportation, vessel navigation decision-making, world model, vessel, risk assessment, navigation safety

中图分类号:

U675.9

翁金贤, 丁海峰, 刘文, 石坤, 倪宝龙. 世界模型协同强化学习的感潮河段船舶自主航行决策方法[J]. 交通运输系统工程与信息, 2026, 26(2): 91-101.

WENG Jinxian, DING Haifeng, LIU Wen, SHI Kun, NI Baolong. Autonomous Vessel Navigation Decision-Making Methods in Tidal River Reaches Based on Cooperative Reinforcement Learning of World Model[J]. Journal of Transportation Systems Engineering and Information Technology, 2026, 26(2): 91-101.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: http://www.tseit.org.cn/CN/10.16097/j.cnki.1009-6744.2026.02.009

http://www.tseit.org.cn/CN/Y2026/V26/I2/91

参考文献

[1] 陈德山,范腾泽,元海文,等.内河航运系统监管技术现状与展望[J]. 交通运输系统工程与信息,2022,22 (6): 1-14. [CHEN D S, FAN T Z, YUAN H W, et al. Status and prospects of supervision technology for inland waterway shipping system[J]. Journal of Transportation Systems Engineering and Information Technology, 2022, 22(6): 1-14.]

[2]翁金贤,廖诗管,付珊珊,等.考虑游览船运营特征的航道通过能力评价方法[J].交通运输系统工程与信息, 2020, 20(5): 199-204. [WENG J X, LIAO S G, FU S S, et al. Evaluation method of channel capacity considering operational characteristics of sightseeing ships[J]. Journal of Transportation Systems Engineering and Information Technology, 2020, 20(5): 199-204.]

[3]沈海青,郭晨,李铁山,等.考虑航行经验规则的无人船舶智能避碰导航方法[J]. 哈尔滨工程大学学报, 2018, 39(6): 998-1005. [SHEN H Q, GUO C, LI T S, et al. Intelligent collision avoidance navigation method for unmanned surface vehicles considering sailing experience rules[J]. Journal of Harbin Engineering University, 2018, 39(6): 998-1005.]

[4]陶毅涵,杜佳璐.拥挤水域中的无人船智能避碰决策与航迹跟踪控制[J]. 控制与决策, 2025, 40(1): 214 222. [TAO Y H, DU J L. Intelligent collision avoidance decision-making and trajectory tracking control for unmanned surface vehicles in congested waters[J]. Control and Decision, 2025, 40(1): 214-222.]

[5]ZHAO Y, QI X, MA Y, et al. Path following optimization for an underactuated USV using smoothly-convergent deep reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(10): 6208 6220.

[6]HUANG L, CHEN J, XU L, et al. Research on decision making methods for autonomous navigation in inland tributary waterways[J]. Applied Sciences, 2025, 15(7): 3823.

[7] XU P, QIN H, MA J, et al. Data-driven model predictive control for ships with Gaussian process[J]. Ocean Engineering, 2023, 268: 113420.

[8] 周双林,杨星,刘克中,等.规则约束下基于深度强化学习的船舶避碰方法[J].中国航海,2020, 43(3): 27 32, 46. [ZHOU S L, YANG X, LIU K Z, et al. Ship collision avoidance method based on deep reinforcement learning[J]. Navigation of China, 2020, 43(3): 27-32, 46.]

[9] 关巍,崔哲闻,罗文哲.基于改进PPO算法的船舶自主避碰决策[J]. 大连海事大学学报,2023, 49(4): 28-36. [GUAN W, CUI Z W, LUO W Z. Ship autonomous collision avoidance decision based on improved PPO algorithm[J]. Journal of Dalian Maritime University, 2023, 49(4): 28-36.]

[10] HA D, SCHMIDHUBER J. World models[J]. arXiv preprint arXiv, 2018, 2(3): 1803.10122.

[11] HAFNER D, LILLICRAP T, FISCHER I, et al. Learning latent dynamics for planning from pixels[C]// International Conference on Machine Learning, PMLR, 2019: 2555-2565.

[12] SHI K, WENG J, FAN S, et al. Exploring seafarers' emotional responses to emergencies: An empirical study using a shiphandling simulator[J]. Ocean & Coastal Management, 2023, 243: 106736.

[1]	赵媛, 石柏松, 王丹. 考虑船舶到港延误的班列时刻表优化[J]. 交通运输系统工程与信息, 2026, 26(2): 35-46.
[2]	李海江, 赵家鹏, 郭静怡, 马千里, 贾鹏. 干散货港口卸船作业资源配置与流程优化研究[J]. 交通运输系统工程与信息, 2026, 26(2): 309-317.
[3]	王杰, 王兴浩, 梁金鹏, 高俊. 考虑海峡交通风险时空差异的航运公司船队路径优化[J]. 交通运输系统工程与信息, 2026, 26(1): 11-23.
[4]	计明军, 李嘉伟, 胡寒霖, 高振迪. 自动化干散货码头资源分配与多机械协同调度优化研究[J]. 交通运输系统工程与信息, 2025, 25(6): 317-326.
[5]	郭文强, 张新宇, 杨嵩旭. 考虑燃油消耗异质性的船舶进港效率与能耗协同优化[J]. 交通运输系统工程与信息, 2025, 25(4): 297-305.
[6]	尚婷, 郭明洋, 唐伯明, 徐钰婷. 基于动态贝叶斯网络的地下道路行车风险评估[J]. 交通运输系统工程与信息, 2025, 25(3): 232-245.
[7]	胡立伟, 杨灿, 周泽禹, 潘江雄, 陈家乐, 龚麒, 马思月. 高速公路出口衔接城市路段交通风险时空动态识别研究[J]. 交通运输系统工程与信息, 2025, 25(3): 358-371.
[8]	王孝坤, 董泽锦, 王雨薇, 肖红. 分时电价下泊位-岸桥-岸电联合分配优化研究[J]. 交通运输系统工程与信息, 2025, 25(2): 314-327.
[9]	高攀, 黄柳森, 赵旭. 考虑转港调度的内河港口群多泊位联合配置策略[J]. 交通运输系统工程与信息, 2025, 25(2): 328-337.
[10]	苏婉, 吕靖, 张聆晔. 韧性视角下我国原油进口海运网络恢复策略优化研究[J]. 交通运输系统工程与信息, 2025, 25(1): 36-43.
[11]	郑建风, 秦苑, 赵志昊. 考虑需求转移与风险规避的舱位分配[J]. 交通运输系统工程与信息, 2025, 25(1): 241-249.
[12]	朱雪斌, 吕靖. 集装箱班轮运输网络中的船期恢复模型[J]. 交通运输系统工程与信息, 2024, 24(5): 205-216.
[13]	李军军, 苗泉利, 许波桅, 崔秦珂, 刘丛跃, 朱江玟. 考虑零碳航线的集装箱运输网络抗毁性研究[J]. 交通运输系统工程与信息, 2024, 24(5): 217-225.
[14]	吴迪, 韩欣丽, 石帅杰, 藉雪军, 郑建风, 刘保利. 需求不确定下边远群岛海运物流网络选址—库存—路径优化[J]. 交通运输系统工程与信息, 2024, 24(5): 268-282.
[15]	李晨, 严新平, 刘佳仑, 黄亚敏, 李诗杰. 船舶远程驾驶人机主从博弈控制方法研究[J]. 交通运输系统工程与信息, 2024, 24(3): 21-31.

世界模型协同强化学习的感潮河段船舶自主航行决策方法

Autonomous Vessel Navigation Decision-Making Methods in Tidal River Reaches Based on Cooperative Reinforcement Learning of World Model

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics