HERB: Human-augmented Efficient Reinforcement learning for Bin-packing
Gojko Perovic, Nuno Ferreira Duarte, Atabak Dehban, Gonçalo Teixeira, Egidio Falotico, José Santos-Victor
TL;DR
The paper tackles irregular 3D object packing by introducing HERB, a human-augmented reinforcement learning framework that first leverages human-like packing sequences and then learns continuous placement with reinforcement learning. It combines a Beam-3-based sequence planner, derived from human data, with a Soft Actor-Critic placement predictor that operates on heightmap observations to output 2D pose and orientation, while inferring the vertical position. The approach is evaluated in a PyBullet-based environment built around the BoxED dataset and is validated on a Baxter robotic system, showing improvements in packing robustness, efficiency, and transferability over purely geometric or RL-based baselines. The work highlights the value of incorporating human intuition into learning-based packing and demonstrates practical feasibility for autonomous robotic packing tasks in realistic settings.
Abstract
Packing objects efficiently is a fundamental problem in logistics, warehouse automation, and robotics. While traditional packing solutions focus on geometric optimization, packing irregular, 3D objects presents significant challenges due to variations in shape and stability. Reinforcement Learning~(RL) has gained popularity in robotic packing tasks, but training purely from simulation can be inefficient and computationally expensive. In this work, we propose HERB, a human-augmented RL framework for packing irregular objects. We first leverage human demonstrations to learn the best sequence of objects to pack, incorporating latent factors such as space optimization, stability, and object relationships that are difficult to model explicitly. Next, we train a placement algorithm that uses visual information to determine the optimal object positioning inside a packing container. Our approach is validated through extensive performance evaluations, analyzing both packing efficiency and latency. Finally, we demonstrate the real-world feasibility of our method on a robotic system. Experimental results show that our method outperforms geometric and purely RL-based approaches by leveraging human intuition, improving both packing robustness and adaptability. This work highlights the potential of combining human expertise-driven RL to tackle complex real-world packing challenges in robotic systems.
