Efficient Reinforcement Learning of Task Planners for Robotic Palletization through Iterative Action Masking Learning
Zheng Wu, Yichuan Li, Wei Zhan, Changliu Liu, Yun-Hui Liu, Masayoshi Tomizuka
TL;DR
This work tackles robotic palletization as an online 3D Bin Packing Problem with a buffer, where a large combinatorial action space hinders efficient RL training. It introduces a supervised action-masking pipeline that learns to predict valid, stable placements using a semantic-segmentation framework (U‑Net) and a DAgger-like iterative refinement to align masking with the RL distribution. Empirical results show that LearnedMask markedly improves stability-aware action pruning (IoU 89.2% vs 76.6% for heuristics) and accelerates RL learning, achieving higher pallet utilization across buffer sizes and enabling real-world deployment with a Franka Panda prototype (72.0% space usage). The work highlights a practical, data-efficient path to robust RL-based task planning in complex, high-dimensional robotics, while acknowledging the need to integrate trajectory planning for fully collision-free execution.
Abstract
The development of robotic systems for palletization in logistics scenarios is of paramount importance, addressing critical efficiency and precision demands in supply chain management. This paper investigates the application of Reinforcement Learning (RL) in enhancing task planning for such robotic systems. Confronted with the substantial challenge of a vast action space, which is a significant impediment to efficiently apply out-of-the-shelf RL methods, our study introduces a novel method of utilizing supervised learning to iteratively prune and manage the action space effectively. By reducing the complexity of the action space, our approach not only accelerates the learning phase but also ensures the effectiveness and reliability of the task planning in robotic palletization. The experimental results underscore the efficacy of this method, highlighting its potential in improving the performance of RL applications in complex and high-dimensional environments like logistics palletization.
