Table of Contents
Fetching ...

HERB: Human-augmented Efficient Reinforcement learning for Bin-packing

Gojko Perovic, Nuno Ferreira Duarte, Atabak Dehban, Gonçalo Teixeira, Egidio Falotico, José Santos-Victor

TL;DR

The paper tackles irregular 3D object packing by introducing HERB, a human-augmented reinforcement learning framework that first leverages human-like packing sequences and then learns continuous placement with reinforcement learning. It combines a Beam-3-based sequence planner, derived from human data, with a Soft Actor-Critic placement predictor that operates on heightmap observations to output 2D pose and orientation, while inferring the vertical position. The approach is evaluated in a PyBullet-based environment built around the BoxED dataset and is validated on a Baxter robotic system, showing improvements in packing robustness, efficiency, and transferability over purely geometric or RL-based baselines. The work highlights the value of incorporating human intuition into learning-based packing and demonstrates practical feasibility for autonomous robotic packing tasks in realistic settings.

Abstract

Packing objects efficiently is a fundamental problem in logistics, warehouse automation, and robotics. While traditional packing solutions focus on geometric optimization, packing irregular, 3D objects presents significant challenges due to variations in shape and stability. Reinforcement Learning~(RL) has gained popularity in robotic packing tasks, but training purely from simulation can be inefficient and computationally expensive. In this work, we propose HERB, a human-augmented RL framework for packing irregular objects. We first leverage human demonstrations to learn the best sequence of objects to pack, incorporating latent factors such as space optimization, stability, and object relationships that are difficult to model explicitly. Next, we train a placement algorithm that uses visual information to determine the optimal object positioning inside a packing container. Our approach is validated through extensive performance evaluations, analyzing both packing efficiency and latency. Finally, we demonstrate the real-world feasibility of our method on a robotic system. Experimental results show that our method outperforms geometric and purely RL-based approaches by leveraging human intuition, improving both packing robustness and adaptability. This work highlights the potential of combining human expertise-driven RL to tackle complex real-world packing challenges in robotic systems.

HERB: Human-augmented Efficient Reinforcement learning for Bin-packing

TL;DR

The paper tackles irregular 3D object packing by introducing HERB, a human-augmented reinforcement learning framework that first leverages human-like packing sequences and then learns continuous placement with reinforcement learning. It combines a Beam-3-based sequence planner, derived from human data, with a Soft Actor-Critic placement predictor that operates on heightmap observations to output 2D pose and orientation, while inferring the vertical position. The approach is evaluated in a PyBullet-based environment built around the BoxED dataset and is validated on a Baxter robotic system, showing improvements in packing robustness, efficiency, and transferability over purely geometric or RL-based baselines. The work highlights the value of incorporating human intuition into learning-based packing and demonstrates practical feasibility for autonomous robotic packing tasks in realistic settings.

Abstract

Packing objects efficiently is a fundamental problem in logistics, warehouse automation, and robotics. While traditional packing solutions focus on geometric optimization, packing irregular, 3D objects presents significant challenges due to variations in shape and stability. Reinforcement Learning~(RL) has gained popularity in robotic packing tasks, but training purely from simulation can be inefficient and computationally expensive. In this work, we propose HERB, a human-augmented RL framework for packing irregular objects. We first leverage human demonstrations to learn the best sequence of objects to pack, incorporating latent factors such as space optimization, stability, and object relationships that are difficult to model explicitly. Next, we train a placement algorithm that uses visual information to determine the optimal object positioning inside a packing container. Our approach is validated through extensive performance evaluations, analyzing both packing efficiency and latency. Finally, we demonstrate the real-world feasibility of our method on a robotic system. Experimental results show that our method outperforms geometric and purely RL-based approaches by leveraging human intuition, improving both packing robustness and adaptability. This work highlights the potential of combining human expertise-driven RL to tackle complex real-world packing challenges in robotic systems.

Paper Structure

This paper contains 18 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Baxter ® robot with the box and objects from the BoxED dataset. The heightmap images capturing the state of the box are taken with a depth camera (out of crop).
  • Figure 2: Complete block diagram of the proposed system in inference. Given a list of unpacked objects, the Beam-3 algorithm sorts the candidates. Then, the projection of the object is concatenated to the state of the box represented by a heightmap. Based on this, the policy predicts the $x,y,\theta$, which are then used to estimate the vertical position $z$. The episode terminates on the successful placement of all the objects, or should an object overcome the sides of the box or the vertical constraint (denoted by red lines).
  • Figure 3: Normalized Reward and Episode Length mean and standard deviation over three seeds for each setting. Higher is better. Reference in Episode Length plot denotes the mean number of objects to pack, set forth by the RL environment.
  • Figure 4: Final Compactness of the box, Compactness per step, and Stability per step for different approaches on the BoxED sequences.
  • Figure 5: Comparison of placement prediction across CS0.9 and CS0.6 parameter settings on a BoxED pack. The top two rows are the objects and the corresponding heightmap from a real sensor, the bottom is the same pack executed in simulation.