An Efficient Deep Reinforcement Learning Model for Online 3D Bin Packing Combining Object Rearrangement and Stable Placement

Peiwen Zhou; Ziyan Gao; Chenghao Li; Nak Young Chong

An Efficient Deep Reinforcement Learning Model for Online 3D Bin Packing Combining Object Rearrangement and Stable Placement

Peiwen Zhou, Ziyan Gao, Chenghao Li, Nak Young Chong

TL;DR

This paper tackles online 3D bin packing (3D-BPP), an NP-hard problem in logistics, by combining a two-agent deep reinforcement learning (DRL) framework with a highly reliable physics-based stability heuristic and an object rearrangement capability. It frames packing as an MDP $M=\langle S,A,P,R,\gamma\rangle$ with two policies $\pi_o$ (orientation) and $\pi_p$ (placement), operating on depth-based heightmaps to select $(o,p)$ actions. Key contributions include convexHull-based stability checks (convexHull-1 and convexHull-α), integration of a physics heuristic into PPO-based DRL, and empirical validation showing higher space utilization with fewer training epochs while ensuring placement stability. The approach offers practical impact for real-time online packing in warehouses by improving utilization and training efficiency under real-time constraints.

Abstract

This paper presents an efficient deep reinforcement learning (DRL) framework for online 3D bin packing (3D-BPP). The 3D-BPP is an NP-hard problem significant in logistics, warehousing, and transportation, involving the optimal arrangement of objects inside a bin. Traditional heuristic algorithms often fail to address dynamic and physical constraints in real-time scenarios. We introduce a novel DRL framework that integrates a reliable physics heuristic algorithm and object rearrangement and stable placement. Our experiment show that the proposed framework achieves higher space utilization rates effectively minimizing the amount of wasted space with fewer training epochs.

An Efficient Deep Reinforcement Learning Model for Online 3D Bin Packing Combining Object Rearrangement and Stable Placement

TL;DR

with two policies

(orientation) and

(placement), operating on depth-based heightmaps to select

actions. Key contributions include convexHull-based stability checks (convexHull-1 and convexHull-α), integration of a physics heuristic into PPO-based DRL, and empirical validation showing higher space utilization with fewer training epochs while ensuring placement stability. The approach offers practical impact for real-time online packing in warehouses by improving utilization and training efficiency under real-time constraints.

Abstract

Paper Structure (17 sections, 2 equations, 6 figures, 2 tables, 2 algorithms)

This paper contains 17 sections, 2 equations, 6 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Heuristics in Bin Packing Problem
DRL in 3D-BPP
Stability check in 3D-BPP
Method
Stability Checking via Physics Heuristics
DRL for Bin Packing
Problem Formulation
State Definition
Action Definition
Reward Function
Physics Heuristics DRL Framework
Experiment and Result
Physics Heuristics Validation
...and 2 more sections

Figures (6)

Figure 1: Online 3D-BPP, where the agent can only observe an upcoming object and pack it on-the-fly.
Figure 2: The main idea of convexHull-1. The left image depicts a sliding window that matches the size of the incoming object, along with portions of the scene objects contained within the sliding window. The right figure shows the zoom-in version of the content inside the sliding window. To determine the stability of the object, we calculate the largest convex hull of the highest points within the window. Next, we verify whether the center of the window lies within the convex hull. The object is deemed stable when positioned at the center of the sliding window if the convex hull includes the window's center.
Figure 3: Multi-layer packing scenarios showcasing the difference between convexHull-1 and convexHull-$\alpha$ algorithms for checking the stability of the placement. (1) Both convexHull-1 and convexHull-$\alpha$ consider the arrangement to be stable. (2) Conversely, convexHull-1 might incorrectly assess the stability if the incoming object is significantly heavier than the object in the middle layer, as detailed in (3).
Figure 4: The pipeline of the DRL framework combined with object rearrangement and physics heuristics.
Figure 5: Six possible orientations of the packing object.
...and 1 more figures

An Efficient Deep Reinforcement Learning Model for Online 3D Bin Packing Combining Object Rearrangement and Stable Placement

TL;DR

Abstract

An Efficient Deep Reinforcement Learning Model for Online 3D Bin Packing Combining Object Rearrangement and Stable Placement

Authors

TL;DR

Abstract

Table of Contents

Figures (6)