进口集装箱堆存决策的两阶段强化学习方法

doi:10.16097/j.cnki.1009-6744.2026.01.026

交通运输系统工程与信息 ›› 2026, Vol. 26 ›› Issue (1): 283-294.DOI: 10.16097/j.cnki.1009-6744.2026.01.026

进口集装箱堆存决策的两阶段强化学习方法

宋丽英^*1a,1b，邓琨琦^1a,1b，宁武²，宋海涛³，李四维^1a,1b

北京交通大学，a.交通运输学院，b.中国综合交通研究中心，北京100044；2.广西北部湾国际港务集团有限公司，南宁530022；3. 广西钦州保税港区宏港码头有限公司，广西钦州535008

收稿日期:2025-09-25 修回日期:2025-11-30 接受日期:2025-12-17 出版日期:2026-02-25 发布日期:2026-02-17
作者简介:宋丽英（1978—），女，北京人，教授，博士。
基金资助:
广西科技重大专项-江海联运重点货种运输组织方案关键技术研究(桂科AA23062021-2)。

Two-stage Reinforcement Learning Method for Stacking Decisions of Import Container

SONG Liying^*1a,1b, DENG Kunqi^1a,1b, NING Wu², SONG Haitao³, LI Siwei^1a,1b

1a. School of Traffic and Transportation, 1b. Integrated Transport Research Center of China, Beijing Jiaotong University, Beijing 100044, China; 2. Beibu Gulf Port Group, Nanning 530022, China; 3. Guangxi Qinzhou Bonded Port Honggang Wharf Co Ltd, Qinzhou 535008, Guangxi, China

Received:2025-09-25 Revised:2025-11-30 Accepted:2025-12-17 Online:2026-02-25 Published:2026-02-17
Supported by:
Guangxi Science and Technology Major Special Project-Research on Key Technologies for the Transportation Organization Scheme of Key Cargo Types in River-Sea Intermodal Transport (桂科AA23062021-2)。

摘要/Abstract

摘要： 进口集装箱堆存问题因卸船顺序与提箱顺序的矛盾以及堆场资源约束而呈现高度复杂性。针对这一挑战，本文面向自动化垂直布局堆场，提出一种基于深度强化学习的两阶段堆存决策方法。该方法将堆存过程建模为马尔可夫决策过程，在框架上引入“堆区决策-堆位决策”的分阶段结构，有效降低状态与动作空间的维度，并结合差异化奖励函数，将均衡堆区利用率、翻箱次数和提箱移动距离作为优化目标。算法设计上，第1阶段采用深度Q网络（DQN）实现堆区选择，第2阶段引入对偶深度Q网络（DuelingDQN）提升复杂状态下的堆位选择效率。实验结果表明，该方法能够在全堆场范围内形成均衡的堆存策略：在不同堆场密度和集装箱批量场景下均表现出稳定适应性，平均翻箱率控制在15%~27%，平均移动贝位数最大值为3.84贝·箱^-1，分别较实际数据降低约61.5%与38.7%。与单阶段DQN、两阶段近端策略优化（PPO）和启发式算法相比，本文方法在收敛效率、决策效果和鲁棒性方面均具有明显优势。本文不仅验证了分阶段建模与差异化奖励机制在复杂堆存问题中的有效性，还为大规模自动化堆场的调度与资源优化提供了具有推广性的解决方案。

关键词: 物流工程, 堆存决策, 强化学习, 进口集装箱, 两阶段方法

Abstract: The stacking problem of import containers is highly complex due to conflicts between unloading and retrieval sequences and yard resource constraints. This study focuses on automated vertical yards and proposes a two-stage stacking decision method based on deep reinforcement learning. The process is modeled as a Markov decision process with a phased structure of "block- selection-slot selection" to reduce the dimensionality of state and action spaces. A differentiated reward function is designed: block level decisions promote balanced yard utilization, while slot-level decisions minimize relocations and retrieval distances. In algorithm design, the Deep Q-Networks (DQN) is used for block selection and Dueling DQN for slot selection. Simulation results show that the proposed method produces balanced strategies across the yard and adapts well to different yard densities and container batch scenarios. The average relocation rate is controlled at 15%~27%, and the maximum retrieval distance is 3.84 bays per container, representing reductions of about 61.5% and 38.7% compared with historical yard data. Compared to the single- stage DQN, two-stage Proximal Policy Optimization (PPO), and heuristic optimization, the proposed method achieves faster convergence, fewer relocations, and shorter retrieval paths. These results confirm the effectiveness of phased modeling and differentiated rewards in complex stacking problems and provide a practical solution for intelligent scheduling and resource optimization in large-scale automated yards.

Key words: logistics engineering, stacking decision, reinforcement learning, import containers, two-stage method

中图分类号:

U691.3

宋丽英, 邓琨琦, 宁武, 宋海涛, 李四维. 进口集装箱堆存决策的两阶段强化学习方法[J]. 交通运输系统工程与信息, 2026, 26(1): 283-294.

SONG Liying, DENG Kunqi, NING Wu, SONG Haitao, LI Siwei. Two-stage Reinforcement Learning Method for Stacking Decisions of Import Container[J]. Journal of Transportation Systems Engineering and Information Technology, 2026, 26(1): 283-294.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: http://www.tseit.org.cn/CN/10.16097/j.cnki.1009-6744.2026.01.026

http://www.tseit.org.cn/CN/Y2026/V26/I1/283

参考文献

[1] YANG X, ZHAO N, BIAN Z, et al. An intelligent storage determining method for inbound containers in container terminals[J]. Journal of Coastal Research, 2015, 73: 197- 204.

[2]BOYSEN N, EMDE S. The parallel stack loading problem to minimize blockages[J]. European Journal of Operational Research, 2016, 249(2): 618-627.

[3]FENG Y, SONG D P, LI D. Smart stacking for import containers using customer information at automated container terminals[J]. European Journal of Operational Research, 2022, 301(2): 502-522.

[4]REKIK I, ELKOSANTINI S, CHABCHOUB H. A case based heuristic for container stacking in seaport terminals [J]. Advanced Engineering Informatics, 2018, 38: 658- 669.

[5]周鹏飞,李丕安.集装箱堆场不确定提箱次序与卸船箱位分配[J]. 哈尔滨工程大学学报,2013,34(9):1119- 1123. [ZHOU P F, LI P A. Uncertain picking-up orders and slot allocation for containers discharged from vessels in a container yard[J]. Journal of Marine Science and Application, 2013, 34(9): 1119-1123.]

[6]游鑫梦,梁承姬,张悦.进口集装箱堆存和翻箱策略两阶段规划模型[J].上海海事大学学报,2020,41(4): 1-7. [YOU X M, LIANG C J, ZHANG Y. Two-stage programming model for storage and re-handling strategies of import containers[J]. Shanghai Maritime University, 2020, 41(4): 1-7.]

[7]叶增健.基于深度强化学习的F仓码进口箱堆存管理优化研究[D]. 广州:暨南大学,2021. [YEZJ.Research on optimization of F warehouse code import container storage management based on deep reinforcement learning[D]. Guangzhou: Jinan University, Guangzhou, 2021.]

[8]于航,葛颖恩,乐美龙,等.基于二维布局分散度的两阶段箱位指派优化[J]. 交通运输系统工程与信息, 2015, 15(5): 223-230. [YU H, GE Y E, LE M L, et al. Two-stage optimization of container assignment based on distribution dispersion[J]. Journal of Transportation Systems Engineering and Information Technology, 2015, 15(5): 223-230.]

[9]郑红星,杜亮,董键.混堆模式下集装箱堆场箱位指派优化模型[J]. 交通运输系统工程与信息,2012,12(1): 153-159. [ZHENG H X, DU L, DONG J. Optimization model on container slot allocation in container yard with mixed storage mode[J]. Journal of Transportation Systems Engineering and Information Technology, 2012, 12(1): 153-159.]

[10] 武慧荣,朱晓宁.“船舶-堆场-列车”混堆堆场箱位分配模型[J]. 交通运输系统工程与信息,2015,15(4): 198- 203. [WU H R, ZHU X N. Container slot allocation model of mixed storage in "ships, yard, trains" yard[J]. Journal of Transportation Systems Engineering and Information Technology, 2015, 15(4): 198-203.]

[11] 向若愚, 杨有,陈雁翎.基于Mamba-2编码的集装箱箱位分配模型[J]. 河南科学,2025, 43(3): 321-329. [XIANG R Y, YANG Y, CHEN Y Y. Container slot assignment model based on Mamba-2 encoding[J]. Henan Science, 2025, 43(3): 321-329.]

[12] LEE W, CHO S W. Reinforcement learning approach for outbound container stacking in container terminals[J]. Computers & Industrial Engineering, 2025, 204: 111069.

[13] 周思方, 张庆年.基于提箱同步的进口箱堆存策略研究[J]. 交通运输系统工程与信息,2018, 18(5): 151- 157. [ZHOU S F, ZHANG Q N. Model of mining the synchronism of retrieval processes between customers for optimizing the import container allocation problem[J]. Journal of Transportation Systems Engineering and Information Technology, 2018, 18(5): 151-157.]

[1]	耿劭卿, 翟一冰, 曹允春. 异构无人机两级协同配送网络选址-路径联合优化[J]. 交通运输系统工程与信息, 2026, 26(1): 34-44.
[2]	王福建, 马佳豪, 李廷浩, 马东方. 混合交通环境下基于动态决策间隔的强化学习信号控制方法[J]. 交通运输系统工程与信息, 2026, 26(1): 45-54.
[3]	李琼, 林若雪, 汪勇杰, 陈艳. 面向多目标的旅游客运车辆生态驾驶策略优化[J]. 交通运输系统工程与信息, 2026, 26(1): 205-216.
[4]	许波桅, 杜尚炫, 李军军. 计及风光不确定性的集装箱港口能源-物流协同调度[J]. 交通运输系统工程与信息, 2026, 26(1): 295-304.
[5]	伍景琼, 奠然, 字太升, 李云起. 无人机配送研究：关于技术、效益及应用的系统综述[J]. 交通运输系统工程与信息, 2025, 25(6): 34-49.
[6]	陈德启, 张自设, 张文会, 闫学东, 蒋贤才. 面向高层建筑应急救援的无人机螺旋搜索轨迹控制方法[J]. 交通运输系统工程与信息, 2025, 25(6): 87-100.
[7]	张志坚, 张婷, 邸振, 郭军华. 考虑碳排放成本的轻型载货汽车-公交车协同配送问题优化[J]. 交通运输系统工程与信息, 2025, 25(6): 265-275.
[8]	程佳豪, 郝志丹, 李国旗, 刘思婧. 卡车-冲锋舟协同的山区洪灾应急物资配送多目标优化[J]. 交通运输系统工程与信息, 2025, 25(6): 294-304.
[9]	陈俊熙, 卫振林. 多移动中转站下货车-多机器人协同配送路径优化[J]. 交通运输系统工程与信息, 2025, 25(6): 305-316.
[10]	唐炜琳, 郎茂祥, 陈星瀚. 考虑灵活充电的零担快运自动化越库作业设备联合调度[J]. 交通运输系统工程与信息, 2025, 25(6): 327-340.
[11]	王江锋, 宋之凡, 李云飞, 齐崇楷, 闫学东, 李青山. 面向集中供给与稀疏需求场景的网约车调度优化算法[J]. 交通运输系统工程与信息, 2025, 25(5): 280-290.
[12]	刘长石, 万城, 王凤, 陈宝玺, 岳俊羽. 基于救援效用的洪灾物资配送与灾民转移路径规划[J]. 交通运输系统工程与信息, 2025, 25(5): 320-332.
[13]	马飞虎, 陈晓燕, 孙翠羽, 田星彤. 高速公路收费站动态车道配置的在线优化方法[J]. 交通运输系统工程与信息, 2025, 25(5): 333-342.
[14]	代亮, 杜鹏飞, 黄自彬, 杨朋博. 基于深度强化学习的城市交通信号分层协同控制方法[J]. 交通运输系统工程与信息, 2025, 25(4): 63-72.
[15]	王祥, 任浩, 谭国真, 李健平, 王珏, 王妍力. 大语言模型协同强化学习的自动驾驶决策方法[J]. 交通运输系统工程与信息, 2025, 25(4): 137-146.

进口集装箱堆存决策的两阶段强化学习方法

Two-stage Reinforcement Learning Method for Stacking Decisions of Import Container

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics