Table of Contents
Fetching ...

A Benchmark Study of Deep Reinforcement Learning Algorithms for the Container Stowage Planning Problem

Yunqi Huang, Nishith Chennakeshava, Alexis Carras, Vladislav Neverov, Wei Liu, Aske Plaat, Yingjie Fan

TL;DR

This work targets the Container Stowage Planning Problem (CSPP), an NP-hard scheduling task with significant maritime logistics impact. It introduces the Stowage Planning Gym Environment (SPGE) and two crane-scheduling extensions (SPGE-MC and SPAEC) to enable comprehensive, reproducible benchmarking of reinforcement learning methods. Five RL algorithms (DQN, QR-DQN, A2C, PPO, TRPO) are evaluated across eight escalating scenarios, highlighting that complexity amplifies performance gaps and that TRPO and PPO generally outperform others in harder settings, while A2C and value-based methods falter. The results also reveal that problem formulation (single-agent vs multi-agent) can influence outcomes, with single-agent SPGE-MC often better at reducing shifters, providing practical insights for deploying CSPP-enabled RL in real terminals. The SPGE family offers a reusable platform for future research and benchmarking in maritime logistics.

Abstract

Container stowage planning (CSPP) is a critical component of maritime transportation and terminal operations, directly affecting supply chain efficiency. Owing to its complexity, CSPP has traditionally relied on human expertise. While reinforcement learning (RL) has recently been applied to CSPP, systematic benchmark comparisons across different algorithms remain limited. To address this gap, we develop a Gym environment that captures the fundamental features of CSPP and extend it to include crane scheduling in both multi-agent and single-agent formulations. Within this framework, we evaluate five RL algorithms: DQN, QR-DQN, A2C, PPO, and TRPO under multiple scenarios of varying complexity. The results reveal distinct performance gaps with increasing complexity, underscoring the importance of algorithm choice and problem formulation for CSPP. Overall, this paper benchmarks multiple RL methods for CSPP while providing a reusable Gym environment with crane scheduling, thus offering a foundation for future research and practical deployment in maritime logistics.

A Benchmark Study of Deep Reinforcement Learning Algorithms for the Container Stowage Planning Problem

TL;DR

This work targets the Container Stowage Planning Problem (CSPP), an NP-hard scheduling task with significant maritime logistics impact. It introduces the Stowage Planning Gym Environment (SPGE) and two crane-scheduling extensions (SPGE-MC and SPAEC) to enable comprehensive, reproducible benchmarking of reinforcement learning methods. Five RL algorithms (DQN, QR-DQN, A2C, PPO, TRPO) are evaluated across eight escalating scenarios, highlighting that complexity amplifies performance gaps and that TRPO and PPO generally outperform others in harder settings, while A2C and value-based methods falter. The results also reveal that problem formulation (single-agent vs multi-agent) can influence outcomes, with single-agent SPGE-MC often better at reducing shifters, providing practical insights for deploying CSPP-enabled RL in real terminals. The SPGE family offers a reusable platform for future research and benchmarking in maritime logistics.

Abstract

Container stowage planning (CSPP) is a critical component of maritime transportation and terminal operations, directly affecting supply chain efficiency. Owing to its complexity, CSPP has traditionally relied on human expertise. While reinforcement learning (RL) has recently been applied to CSPP, systematic benchmark comparisons across different algorithms remain limited. To address this gap, we develop a Gym environment that captures the fundamental features of CSPP and extend it to include crane scheduling in both multi-agent and single-agent formulations. Within this framework, we evaluate five RL algorithms: DQN, QR-DQN, A2C, PPO, and TRPO under multiple scenarios of varying complexity. The results reveal distinct performance gaps with increasing complexity, underscoring the importance of algorithm choice and problem formulation for CSPP. Overall, this paper benchmarks multiple RL methods for CSPP while providing a reusable Gym environment with crane scheduling, thus offering a foundation for future research and practical deployment in maritime logistics.

Paper Structure

This paper contains 13 sections, 3 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Vessel structureVANTWILLER2024841.
  • Figure 2: Stowage operation processzhou2022emerging.
  • Figure 3: Visualization of SPGE and SPGE-MC. Cross-section views of vessel and yard bays. The upper and lower parts of the figure represent the current states of the vessel and yard slots, respectively. Each square represents a container slot, while different colors indicate different groups. Light-color squares in vessel requires to be filled with containers with corresponding group, while white squares in yard are empty slots. Numbers on squares are unique slot IDs, while numbers below and left indicate rows and tiers. In (a)(b), outlined squares are target vessel slots to be filled. The difference is that in (b), multiple sequencers correspond to the number of cranes, and a time mechanism is introduced, which brings crane availability into consideration. A red highlight indicates that the crane associated with the sequencer is idle and the vessel slot can be filled at this step, while gray highlight indicates the slot cannot be filled due to unavailability of the crane.
  • Figure : QR-DQN DQN A2C PPO TRPO
  • Figure : QR-DQN DQN A2C PPO TRPO