Data efficient Robotic Object Throwing with Model-Based Reinforcement Learning
Niccolò Turcato, Giulio Giacomuzzo, Matteo Terreran, Davide Allegro, Ruggero Carli, Alberto Dalla Libera
TL;DR
This work presents MC-PILOT, a Model-Based Reinforcement Learning framework that enables data-efficient robotic object throwing by learning a probabilistic dynamics model via Gaussian Process Regression and optimizing a release-velocity policy under release-delay uncertainties. It extends MC-PILCO by accommodating target-domain variation, gripper delays, and drag, using an augmented state and Monte Carlo policy optimization with the reparameterization trick. The method demonstrates rapid generalization to unseen targets and objects, outperforming analytical and Model-Free baselines in simulation and on a Franka Panda, and it can adapt to new task requirements with minimal additional data. By explicitly modeling delays and environmental uncertainties, MC-PILOT offers a practical route to expanding robot workspace through Pick-and-Throw with high data efficiency and robustness.
Abstract
Pick-and-place (PnP) operations, featuring object grasping and trajectory planning, are fundamental in industrial robotics applications. Despite many advancements in the field, PnP is limited by workspace constraints, reducing flexibility. Pick-and-throw (PnT) is a promising alternative where the robot throws objects to target locations, leveraging extrinsic resources like gravity to improve efficiency and expand the workspace. However, PnT execution is complex, requiring precise coordination of high-speed movements and object dynamics. Solutions to the PnT problem are categorized into analytical and learning-based approaches. Analytical methods focus on system modeling and trajectory generation but are time-consuming and offer limited generalization. Learning-based solutions, in particular Model-Free Reinforcement Learning (MFRL), offer automation and adaptability but require extensive interaction time. This paper introduces a Model-Based Reinforcement Learning (MBRL) framework, MC-PILOT, which combines data-driven modeling with policy optimization for efficient and accurate PnT tasks. MC-PILOT accounts for model uncertainties and release errors, demonstrating superior performance in simulations and real-world tests with a Franka Emika Panda manipulator. The proposed approach generalizes rapidly to new targets, offering advantages over analytical and Model-Free methods.
