Table of Contents
Fetching ...

A Hierarchical Bin Packing Framework with Dual Manipulators via Heuristic Search and Deep Reinforcement Learning

Beomjoon Lee, Changjoo Nam

TL;DR

This work tackles the 2D bin packing problem under online and semi-online conditions with dual robotic manipulators. It introduces a hierarchical framework that couples a low-level A3C policy for precise placement with a high-level DFS-based beam-search planner that explores packing orders, orientations, and repacking, all under a padded grid bin model. The approach yields near-optimal bin utilization across diverse scenarios, with significant gains from repacking and parallel dual-arm execution, and it demonstrates feasibility in physics-based simulation and real-robot experiments. The results indicate meaningful practical impact for warehouse automation, improving throughput and reducing space waste by enabling coordinated packing and selective repacking in real-time.

Abstract

We address the bin packing problem (BPP), which aims to maximize bin utilization when packing a variety of items. The offline problem, where the complete information about the item set and their sizes is known in advance, is proven to be NP-hard. The semi-online and online variants are even more challenging, as full information about incoming items is unavailable. While existing methods have tackled both 2D and 3D BPPs, the 2D BPP remains underexplored in terms of fully maximizing utilization. We propose a hierarchical approach for solving the 2D online and semi-online BPP by combining deep reinforcement learning (RL) with heuristic search. The heuristic search selects which item to pack or unpack, determines the packing order, and chooses the orientation of each item, while the RL agent decides the precise position within the bin. Our method is capable of handling diverse scenarios, including repacking, varying levels of item information, differing numbers of accessible items, and coordination of dual manipulators. Experimental results demonstrate that our approach achieves near-optimal utilization across various practical scenarios, largely due to its repacking capability. In addition, the algorithm is evaluated in a physics-based simulation environment, where execution time is measured to assess its real-world performance.

A Hierarchical Bin Packing Framework with Dual Manipulators via Heuristic Search and Deep Reinforcement Learning

TL;DR

This work tackles the 2D bin packing problem under online and semi-online conditions with dual robotic manipulators. It introduces a hierarchical framework that couples a low-level A3C policy for precise placement with a high-level DFS-based beam-search planner that explores packing orders, orientations, and repacking, all under a padded grid bin model. The approach yields near-optimal bin utilization across diverse scenarios, with significant gains from repacking and parallel dual-arm execution, and it demonstrates feasibility in physics-based simulation and real-robot experiments. The results indicate meaningful practical impact for warehouse automation, improving throughput and reducing space waste by enabling coordinated packing and selective repacking in real-time.

Abstract

We address the bin packing problem (BPP), which aims to maximize bin utilization when packing a variety of items. The offline problem, where the complete information about the item set and their sizes is known in advance, is proven to be NP-hard. The semi-online and online variants are even more challenging, as full information about incoming items is unavailable. While existing methods have tackled both 2D and 3D BPPs, the 2D BPP remains underexplored in terms of fully maximizing utilization. We propose a hierarchical approach for solving the 2D online and semi-online BPP by combining deep reinforcement learning (RL) with heuristic search. The heuristic search selects which item to pack or unpack, determines the packing order, and chooses the orientation of each item, while the RL agent decides the precise position within the bin. Our method is capable of handling diverse scenarios, including repacking, varying levels of item information, differing numbers of accessible items, and coordination of dual manipulators. Experimental results demonstrate that our approach achieves near-optimal utilization across various practical scenarios, largely due to its repacking capability. In addition, the algorithm is evaluated in a physics-based simulation environment, where execution time is measured to assess its real-world performance.

Paper Structure

This paper contains 32 sections, 17 equations, 12 figures, 5 tables, 4 algorithms.

Figures (12)

  • Figure 1: Bin packing systems in six different scenarios. (a--c) illustrate configurations with a single manipulator, whereas (d--f) depict those involving dual manipulators. The number of known items (in green and red) varies across scenarios, and the number and placement of manipulators determine which items are accessible.
  • Figure 2: Illustration of the Bin Packing system. (a) The manipulator is denoted by $m$, $\mathcal{Z}$ represents the task space of the manipulator, and $C$ is the temporary storage area for unpacked items. (b) The bin is an open cuboid, and the next item $o$ is a solid cuboid. (c) The bin is modeled as a padded binary grid map $B$, where each cell corresponds to a low-level position action indexed by its coordinate in the image coordinate system. The values increase row-wise from left to right and top to bottom, with the origin located at the top-left corner. The item is encoded as the rotated vector $\mathbf{l}(\phi)$, and a checkerboard-patterned pixel indicates the top-left corner of the placed item.
  • Figure 3: Overall framework. Our bin packing system integrates a hierarchical algorithm with task and motion planning.
  • Figure 4: Actor-critic framework. The input consists of the current bin configuration $B$ and the size vector $\mathbf{l}(\phi)$ of the item.
  • Figure 5: Hierarchical algorithm. Each candidate sequence $\chi'$ is reordered into $\tilde{\chi}$, evaluated via forward simulation, and the best one is selected to generate the high-level action $a_{\text{high}}$.
  • ...and 7 more figures