Table of Contents
Fetching ...

One4Many-StablePacker: An Efficient Deep Reinforcement Learning Framework for the 3D Bin Packing Problem

Lei Gao, Shihong Huang, Shengjie Wang, Hong Ma, Feng Zhang, Hengda Bao, Qichang Chen, Weihua Zhou

TL;DR

This work presents One4Many-StablePacker (O4M-SP), an offline 3D-BPP DRL framework that generalizes packing solutions across bins of varied dimensions while explicitly enforcing stability constraints. It formulates the problem as an MDP with a novel EMS-based state representation and uses a weighted reward combining loading rate and height difference to promote flatten packing, coupled with a stability checker. A tailored PPO-based training regime employs entropy control at critical decision nodes and a policy-drift mechanism for initial steps to maintain exploration and avoid premature convergence. Empirical results show O4M-SP outperforms multiple baselines across diverse instances and bin configurations, with strong generalization to unseen bins and effective handling of stability constraints, supporting practical deployment in logistics. The approach offers a train-once, deploy-many solution for real-world packing under stability requirements and lays groundwork for extending to irregular shapes and multi-point contact stability.

Abstract

The three-dimensional bin packing problem (3D-BPP) is widely applied in logistics and warehousing. Existing learning-based approaches often neglect practical stability-related constraints and exhibit limitations in generalizing across diverse bin dimensions. To address these limitations, we propose a novel deep reinforcement learning framework, One4Many-StablePacker (O4M-SP). The primary advantage of O4M-SP is its ability to handle various bin dimensions in a single training process while incorporating support and weight constraints common in practice. Our training method introduces two innovative mechanisms. First, it employs a weighted reward function that integrates loading rate and a new height difference metric for packing layouts, promoting improved bin utilization through flatter packing configurations. Second, it combines clipped policy gradient optimization with a tailored policy drifting method to mitigate policy entropy collapse, encouraging exploration at critical decision nodes during packing to avoid suboptimal solutions. Extensive experiments demonstrate that O4M-SP generalizes successfully across diverse bin dimensions and significantly outperforms baseline methods. Furthermore, O4M-SP exhibits strong practical applicability by effectively addressing packing scenarios with stability constraints.

One4Many-StablePacker: An Efficient Deep Reinforcement Learning Framework for the 3D Bin Packing Problem

TL;DR

This work presents One4Many-StablePacker (O4M-SP), an offline 3D-BPP DRL framework that generalizes packing solutions across bins of varied dimensions while explicitly enforcing stability constraints. It formulates the problem as an MDP with a novel EMS-based state representation and uses a weighted reward combining loading rate and height difference to promote flatten packing, coupled with a stability checker. A tailored PPO-based training regime employs entropy control at critical decision nodes and a policy-drift mechanism for initial steps to maintain exploration and avoid premature convergence. Empirical results show O4M-SP outperforms multiple baselines across diverse instances and bin configurations, with strong generalization to unseen bins and effective handling of stability constraints, supporting practical deployment in logistics. The approach offers a train-once, deploy-many solution for real-world packing under stability requirements and lays groundwork for extending to irregular shapes and multi-point contact stability.

Abstract

The three-dimensional bin packing problem (3D-BPP) is widely applied in logistics and warehousing. Existing learning-based approaches often neglect practical stability-related constraints and exhibit limitations in generalizing across diverse bin dimensions. To address these limitations, we propose a novel deep reinforcement learning framework, One4Many-StablePacker (O4M-SP). The primary advantage of O4M-SP is its ability to handle various bin dimensions in a single training process while incorporating support and weight constraints common in practice. Our training method introduces two innovative mechanisms. First, it employs a weighted reward function that integrates loading rate and a new height difference metric for packing layouts, promoting improved bin utilization through flatter packing configurations. Second, it combines clipped policy gradient optimization with a tailored policy drifting method to mitigate policy entropy collapse, encouraging exploration at critical decision nodes during packing to avoid suboptimal solutions. Extensive experiments demonstrate that O4M-SP generalizes successfully across diverse bin dimensions and significantly outperforms baseline methods. Furthermore, O4M-SP exhibits strong practical applicability by effectively addressing packing scenarios with stability constraints.

Paper Structure

This paper contains 17 sections, 7 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Space selection with different reward functions
  • Figure 2: Overview of our method. (a) Space selection: A heuristic for selecting the space for placement. (b) Stability checker: Enforces support and weight constraints. (c) O4M-SP framework: Extracts features from environmental information and uses actor and critic networks to generate action probabilities and value estimates. (d) Feature extractor: Eight identical encoder blocks, each comprising self-attention, cross-attention, feed-forward layers, residual connections, and layer normalization.
  • Figure 3: Variation of policy entropy with decision steps
  • Figure 4: Policy entropy curves
  • Figure 5: Visualizations for handling stability constraints