Dynamic Non-Prehensile Object Transport via Model-Predictive Reinforcement Learning
Neel Jawale, Byron Boots, Balakumar Sundaralingam, Mohak Bhardwaj
TL;DR
This work tackles dynamic non-prehensile object transport (the robot waiter) by learning from a small set of task-space demonstrations. It introduces Conservative Value MPC (CV-MPC), which offline-trains an ensemble of end-effector value functions from demonstrations and online uses a pessimistic trajectory-return estimate within a GPU-accelerated MPC to ensure safe, robust planning despite limited data. The approach generalizes to unseen objects and can improve suboptimal demonstrations, achieving strong real-world performance with only 50–100 demonstrations. By integrating offline value learning with online MPC, CV-MPC reduces demonstrator burden and enables rapid, robust learning of dynamic manipulation tasks that rely on contact dynamics and friction constraints. The methodology complements existing MPC frameworks and opens avenues for applying offline-to-online learning to a broader class of dynamic non-prehensile actions.
Abstract
We investigate the problem of teaching a robot manipulator to perform dynamic non-prehensile object transport, also known as the `robot waiter' task, from a limited set of real-world demonstrations. We propose an approach that combines batch reinforcement learning (RL) with model-predictive control (MPC) by pretraining an ensemble of value functions from demonstration data, and utilizing them online within an uncertainty-aware MPC scheme to ensure robustness to limited data coverage. Our approach is straightforward to integrate with off-the-shelf MPC frameworks and enables learning solely from task space demonstrations with sparsely labeled transitions, while leveraging MPC to ensure smooth joint space motions and constraint satisfaction. We validate the proposed approach through extensive simulated and real-world experiments on a Franka Panda robot performing the robot waiter task and demonstrate robust deployment of value functions learned from 50-100 demonstrations. Furthermore, our approach enables generalization to novel objects not seen during training and can improve upon suboptimal demonstrations. We believe that such a framework can reduce the burden of providing extensive demonstrations and facilitate rapid training of robot manipulators to perform non-prehensile manipulation tasks. Project videos and supplementary material can be found at: https://sites.google.com/view/cvmpc.
