Table of Contents
Fetching ...

TossingBot: Learning to Throw Arbitrary Objects with Residual Physics

Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser

TL;DR

This work introduces TossingBot, a robot system that learns to pick arbitrary objects from an unstructured bin and throw them into distant targets outside its reach. By coupling a physics-based ballistic controller with a learned residual velocity per grasp, the authors present Residual Physics, a hybrid end-to-end model that jointly optimizes grasping and throwing from visual input via self-supervised trial-and-error. The approach achieves high throughput (500+ mean picks per hour) and robust generalization to unseen objects and target locations, outperforming purely physics-based or purely data-driven baselines. Analyses reveal that the network implicitly learns meaningful object semantics and that supervising grasps with throwing success yields more stable, effective grasps for accurate throws.

Abstract

We investigate whether a robot arm can learn to pick and throw arbitrary objects into selected boxes quickly and accurately. Throwing has the potential to increase the physical reachability and picking speed of a robot arm. However, precisely throwing arbitrary objects in unstructured settings presents many challenges: from acquiring reliable pre-throw conditions (e.g. initial pose of object in manipulator) to handling varying object-centric properties (e.g. mass distribution, friction, shape) and dynamics (e.g. aerodynamics). In this work, we propose an end-to-end formulation that jointly learns to infer control parameters for grasping and throwing motion primitives from visual observations (images of arbitrary objects in a bin) through trial and error. Within this formulation, we investigate the synergies between grasping and throwing (i.e., learning grasps that enable more accurate throws) and between simulation and deep learning (i.e., using deep networks to predict residuals on top of control parameters predicted by a physics simulator). The resulting system, TossingBot, is able to grasp and throw arbitrary objects into boxes located outside its maximum reach range at 500+ mean picks per hour (600+ grasps per hour with 85% throwing accuracy); and generalizes to new objects and target locations. Videos are available at https://tossingbot.cs.princeton.edu

TossingBot: Learning to Throw Arbitrary Objects with Residual Physics

TL;DR

This work introduces TossingBot, a robot system that learns to pick arbitrary objects from an unstructured bin and throw them into distant targets outside its reach. By coupling a physics-based ballistic controller with a learned residual velocity per grasp, the authors present Residual Physics, a hybrid end-to-end model that jointly optimizes grasping and throwing from visual input via self-supervised trial-and-error. The approach achieves high throughput (500+ mean picks per hour) and robust generalization to unseen objects and target locations, outperforming purely physics-based or purely data-driven baselines. Analyses reveal that the network implicitly learns meaningful object semantics and that supervising grasps with throwing success yields more stable, effective grasps for accurate throws.

Abstract

We investigate whether a robot arm can learn to pick and throw arbitrary objects into selected boxes quickly and accurately. Throwing has the potential to increase the physical reachability and picking speed of a robot arm. However, precisely throwing arbitrary objects in unstructured settings presents many challenges: from acquiring reliable pre-throw conditions (e.g. initial pose of object in manipulator) to handling varying object-centric properties (e.g. mass distribution, friction, shape) and dynamics (e.g. aerodynamics). In this work, we propose an end-to-end formulation that jointly learns to infer control parameters for grasping and throwing motion primitives from visual observations (images of arbitrary objects in a bin) through trial and error. Within this formulation, we investigate the synergies between grasping and throwing (i.e., learning grasps that enable more accurate throws) and between simulation and deep learning (i.e., using deep networks to predict residuals on top of control parameters predicted by a physics simulator). The resulting system, TossingBot, is able to grasp and throw arbitrary objects into boxes located outside its maximum reach range at 500+ mean picks per hour (600+ grasps per hour with 85% throwing accuracy); and generalizes to new objects and target locations. Videos are available at https://tossingbot.cs.princeton.edu

Paper Structure

This paper contains 17 sections, 4 equations, 11 figures, 3 tables, 1 algorithm.

Figures (11)

  • Figure 1: TossingBot learns to grasp arbitrary objects from an unstructured bin and to throw them into target boxes located outside its maximum kinematic reach range. The aerial trajectory of different objects are controlled by jointly optimizing grasping policies and throwing release velocities.
  • Figure 2: Projectile trajectories of a thrown ping pong ball (a), screwdriver grasped and thrown by its handle (b), and the same screwdriver grasped and thrown by its shaft (c). The difference between (a) and (b) is largely due to aerodynamics, while the difference between (b) and (c) is largely due to grasping at different offsets from the object's center of mass (near the handle). Our goal is to learn joint grasping and throwing policies that can compensate for these differences to achieve accurate targeted throws.
  • Figure 3: Learning residual models and policies: (a) analytical solutions that determine action $a$ from state $s$; (b) data-driven policies that learn the direct mapping from states to actions; (c) hybrid models that combine analytical models with learning to predict future states $s_{t+1}$; (d) hybrid policies (like ours) that combine analytical solutions with learning to determine action $a$.
  • Figure 4: Overview. An RGB-D heightmap of the scene is fed into a perception module to compute spatial features $\mu$. In parallel, target location $p$ is fed into a physics-based controller to provide an initial estimate of throwing release velocity $\hat{v}$, which is concatenated with $\mu$ then fed into grasping and throwing modules. Grasping module predicts probability of grasp success for a dense pixel-wise sampling of horizontal grasps, while throwing module outputs dense prediction of residuals (per sampled grasp), which are added to $\hat{v}$ to get final predictions of throwing release velocities. We rotate input heightmaps by 16 orientations to account for 16 grasping angles. Robot executes the grasp with the highest score, followed by a throw using its corresponding predicted velocity.
  • Figure 5: Objects used in simulated (top) and real (bottom) experiments, split by seen objects (left) and unseen objects (right). The center of mass of each simulated object is indicated with a red sphere (for illustration).
  • ...and 6 more figures