Table of Contents
Fetching ...

Learning a Shape-Conditioned Agent for Purely Tactile In-Hand Manipulation of Various Objects

Johannes Pitz, Lennart Röstel, Leon Sievers, Darius Burschka, Berthold Bäuml

TL;DR

This work addresses purely tactile in-hand object reorientation across diverse shapes by learning a shape-conditioned policy coupled with a tactile state estimator. It demonstrates that Basis Point Set (BPS) shape encoding, transformed by estimated pose, provides a robust 3D representation that enables learning with tactile feedback alone, avoiding visual sensors. The authors show strong sim2real transfer and generalization to novel objects, achieving high success rates on both seen and unseen shapes, including non-convex geometries. The approach advances autonomous, vision-free manipulation with potential real-world impact in manufacturing and robotic dexterity, and identifies current limits with small-featured objects, motivating future tactile sensing improvements.}

Abstract

Reorienting diverse objects with a multi-fingered hand is a challenging task. Current methods in robotic in-hand manipulation are either object-specific or require permanent supervision of the object state from visual sensors. This is far from human capabilities and from what is needed in real-world applications. In this work, we address this gap by training shape-conditioned agents to reorient diverse objects in hand, relying purely on tactile feedback (via torque and position measurements of the fingers' joints). To achieve this, we propose a learning framework that exploits shape information in a reinforcement learning policy and a learned state estimator. We find that representing 3D shapes by vectors from a fixed set of basis points to the shape's surface, transformed by its predicted 3D pose, is especially helpful for learning dexterous in-hand manipulation. In simulation and real-world experiments, we show the reorientation of many objects with high success rates, on par with state-of-the-art results obtained with specialized single-object agents. Moreover, we show generalization to novel objects, achieving success rates of $\sim$90% even for non-convex shapes.

Learning a Shape-Conditioned Agent for Purely Tactile In-Hand Manipulation of Various Objects

TL;DR

This work addresses purely tactile in-hand object reorientation across diverse shapes by learning a shape-conditioned policy coupled with a tactile state estimator. It demonstrates that Basis Point Set (BPS) shape encoding, transformed by estimated pose, provides a robust 3D representation that enables learning with tactile feedback alone, avoiding visual sensors. The authors show strong sim2real transfer and generalization to novel objects, achieving high success rates on both seen and unseen shapes, including non-convex geometries. The approach advances autonomous, vision-free manipulation with potential real-world impact in manufacturing and robotic dexterity, and identifies current limits with small-featured objects, motivating future tactile sensing improvements.}

Abstract

Reorienting diverse objects with a multi-fingered hand is a challenging task. Current methods in robotic in-hand manipulation are either object-specific or require permanent supervision of the object state from visual sensors. This is far from human capabilities and from what is needed in real-world applications. In this work, we address this gap by training shape-conditioned agents to reorient diverse objects in hand, relying purely on tactile feedback (via torque and position measurements of the fingers' joints). To achieve this, we propose a learning framework that exploits shape information in a reinforcement learning policy and a learned state estimator. We find that representing 3D shapes by vectors from a fixed set of basis points to the shape's surface, transformed by its predicted 3D pose, is especially helpful for learning dexterous in-hand manipulation. In simulation and real-world experiments, we show the reorientation of many objects with high success rates, on par with state-of-the-art results obtained with specialized single-object agents. Moreover, we show generalization to novel objects, achieving success rates of 90% even for non-convex shapes.
Paper Structure (22 sections, 11 equations, 10 figures, 2 tables)

This paper contains 22 sections, 11 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Purely tactile, shape-conditioned in-hand reorientation with the torque-controlled DLR-Hand II Butterfass2001 and Agile Justin Bauml2014-cr (right). Conditioned on a mesh input in the initial pose and a goal orientation (left), our learned agent autonomously reorients various objects towards the target without visual information or supporting surfaces (top: training object 7, bottom: out-of-distribution object 11). Shape information is encoded as vectors to the mesh, transformed by an estimate of the current object pose, which is predicted by a learned state estimator from the history of force/position measurements.
  • Figure 2: Shape-conditioned agent control architecture. The tactile state estimator $f$ predicts the system state $s_t$ recursively (see \ref{['fig:estim_cell']}). Based on the estimated object pose $(\hat{x},\hat{R})$ and the given mesh $\mathcal{M}$, the shape respresentation $\mathcal{S}_t=\mathcal{B}(\hat{x}_t, \hat{R}_t, \mathcal{M})=V_t$ is computed in each timestep $t$. This is fed to the policy $\pi$ together with the relative rotation to the goal $R_{\Delta}$, a stack of joint measurements $z_t$, and the predicted uncertainty $\sigma$, to produce actions $q_d$. These actions, produced at a frequency of $10$Hz, are then low-pass filtered and given to an impedance controller at $1000$Hz for controlling the hand.
  • Figure 3: Estimator cell $f$. In each timestep, the BPS feature vector $V_{t-1}$ is computed from the estimated object pose $(\hat{x}_{t-1}, \hat{R}_{t-1})$ and mesh $\mathcal{M}$. The learned function $f_{\varphi}$ predicts the residual in object state $\delta_t$ from $V_{t-1}$, the previous latent state $l_{t-1}$, and the joint measurements $z_t$. Finally, state $s_t$ is obtained by (generalized) addition of $\delta_t$ to $s_{t-1}$.
  • Figure 4: Success rates $b$ during training of oracle agents with different shape representations, plotted against the total number of environmental steps. Left: Training on a single object (Cube, index 8). Middle: Training on cuboids with randomized axis-independent scaling in [4.5 cm, 9 cm]. Right: Training on the Geometric 8 object set with randomized scaling. Each line is the mean over three training runs, with shaded areas covering the min and max. We smooth the (binary) success signal for the individual runs.
  • Figure 5: Object sets used in the experiments: objects 0-7 (Geometric 8) are designed to have different geometric properties (convex, non-convex, varying number of edges, angle between adjacent surfaces, etc.). The training set consists of these eight geometric objects with randomized axis-independent scaling in [4.5 cm, 9 cm], effectively creating a much larger set of training objects. Out-of-distribution (OOD) test objects 8-12 are not present in the training set and are used for assessing generalization capabilities. Object 12 is the YCB apple YCB.
  • ...and 5 more figures