A Hierarchical Bin Packing Framework with Dual Manipulators via Heuristic Search and Deep Reinforcement Learning
Beomjoon Lee, Changjoo Nam
TL;DR
This work tackles the 2D bin packing problem under online and semi-online conditions with dual robotic manipulators. It introduces a hierarchical framework that couples a low-level A3C policy for precise placement with a high-level DFS-based beam-search planner that explores packing orders, orientations, and repacking, all under a padded grid bin model. The approach yields near-optimal bin utilization across diverse scenarios, with significant gains from repacking and parallel dual-arm execution, and it demonstrates feasibility in physics-based simulation and real-robot experiments. The results indicate meaningful practical impact for warehouse automation, improving throughput and reducing space waste by enabling coordinated packing and selective repacking in real-time.
Abstract
We address the bin packing problem (BPP), which aims to maximize bin utilization when packing a variety of items. The offline problem, where the complete information about the item set and their sizes is known in advance, is proven to be NP-hard. The semi-online and online variants are even more challenging, as full information about incoming items is unavailable. While existing methods have tackled both 2D and 3D BPPs, the 2D BPP remains underexplored in terms of fully maximizing utilization. We propose a hierarchical approach for solving the 2D online and semi-online BPP by combining deep reinforcement learning (RL) with heuristic search. The heuristic search selects which item to pack or unpack, determines the packing order, and chooses the orientation of each item, while the RL agent decides the precise position within the bin. Our method is capable of handling diverse scenarios, including repacking, varying levels of item information, differing numbers of accessible items, and coordination of dual manipulators. Experimental results demonstrate that our approach achieves near-optimal utilization across various practical scenarios, largely due to its repacking capability. In addition, the algorithm is evaluated in a physics-based simulation environment, where execution time is measured to assess its real-world performance.
