Table of Contents
Fetching ...

One-Shot Transfer of Long-Horizon Extrinsic Manipulation Through Contact Retargeting

Albert Wu, Ruocheng Wang, Sirui Chen, Clemens Eppner, C. Karen Liu

TL;DR

The paper addresses generalizing long-horizon extrinsic manipulation by retargeting contact requirements from a single demonstration. It introduces a primitive library of four short-horizon, goal-conditioned policies and an IK-based contact retargeting framework to map demonstrations to new scenes while preserving the sequence of contact configurations. Hardware experiments across four tasks, ten objects, and six environments achieve an overall success rate of $80.5\%$ (with $81.7\%$ on standard objects), demonstrating robustness to demonstration variation and environment changes. By enabling reliable chaining of primitives through contact retargeting, the approach offers a scalable path toward real-world extrinsic manipulation with limited demonstrations.

Abstract

Extrinsic manipulation, the use of environment contacts to achieve manipulation objectives, enables strategies that are otherwise impossible with a parallel jaw gripper. However, orchestrating a long-horizon sequence of contact interactions between the robot, object, and environment is notoriously challenging due to the scene diversity, large action space, and difficult contact dynamics. We observe that most extrinsic manipulation are combinations of short-horizon primitives, each of which depend strongly on initializing from a desirable contact configuration to succeed. Therefore, we propose to generalize one extrinsic manipulation trajectory to diverse objects and environments by retargeting contact requirements. We prepare a single library of robust short-horizon, goal-conditioned primitive policies, and design a framework to compose state constraints stemming from contacts specifications of each primitive. Given a test scene and a single demo prescribing the primitive sequence, our method enforces the state constraints on the test scene and find intermediate goal states using inverse kinematics. The goals are then tracked by the primitive policies. Using a 7+1 DoF robotic arm-gripper system, we achieved an overall success rate of 80.5% on hardware over 4 long-horizon extrinsic manipulation tasks, each with up to 4 primitives. Our experiments cover 10 objects and 6 environment configurations. We further show empirically that our method admits a wide range of demonstrations, and that contact retargeting is indeed the key to successfully combining primitives for long-horizon extrinsic manipulation. Code and additional details are available at stanford-tml.github.io/extrinsic-manipulation.

One-Shot Transfer of Long-Horizon Extrinsic Manipulation Through Contact Retargeting

TL;DR

The paper addresses generalizing long-horizon extrinsic manipulation by retargeting contact requirements from a single demonstration. It introduces a primitive library of four short-horizon, goal-conditioned policies and an IK-based contact retargeting framework to map demonstrations to new scenes while preserving the sequence of contact configurations. Hardware experiments across four tasks, ten objects, and six environments achieve an overall success rate of (with on standard objects), demonstrating robustness to demonstration variation and environment changes. By enabling reliable chaining of primitives through contact retargeting, the approach offers a scalable path toward real-world extrinsic manipulation with limited demonstrations.

Abstract

Extrinsic manipulation, the use of environment contacts to achieve manipulation objectives, enables strategies that are otherwise impossible with a parallel jaw gripper. However, orchestrating a long-horizon sequence of contact interactions between the robot, object, and environment is notoriously challenging due to the scene diversity, large action space, and difficult contact dynamics. We observe that most extrinsic manipulation are combinations of short-horizon primitives, each of which depend strongly on initializing from a desirable contact configuration to succeed. Therefore, we propose to generalize one extrinsic manipulation trajectory to diverse objects and environments by retargeting contact requirements. We prepare a single library of robust short-horizon, goal-conditioned primitive policies, and design a framework to compose state constraints stemming from contacts specifications of each primitive. Given a test scene and a single demo prescribing the primitive sequence, our method enforces the state constraints on the test scene and find intermediate goal states using inverse kinematics. The goals are then tracked by the primitive policies. Using a 7+1 DoF robotic arm-gripper system, we achieved an overall success rate of 80.5% on hardware over 4 long-horizon extrinsic manipulation tasks, each with up to 4 primitives. Our experiments cover 10 objects and 6 environment configurations. We further show empirically that our method admits a wide range of demonstrations, and that contact retargeting is indeed the key to successfully combining primitives for long-horizon extrinsic manipulation. Code and additional details are available at stanford-tml.github.io/extrinsic-manipulation.
Paper Structure (19 sections, 1 theorem, 8 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 1 theorem, 8 equations, 6 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

$\forall \bm{x}\in {}^{\mathcal{E}, \mathcal{O}}\mathcal{X}_{s} \cap \leftindex^{\mathcal{E},\mathcal{O}}\sigma_{i}^{\bm{x}} \cap \leftindex^{\mathcal{E},\mathcal{O}}\sigma^{\bm{x}}_{i+1}$, there exists $\bm{q}_1 \in \leftindex^{\mathcal{E},\mathcal{O}}\sigma_{i}^{\bm{q}\mid\bm{x}}$ and $\bm{q}_2 \i

Figures (6)

  • Figure 1: Retargeting the object retrieval task from a human demo (top) to oat (middle) and flapjack (bottom). This 4-primitive task involves pulling the object from between the obstacles, pushing it to the wall, pivoting against the wall to expose a graspable edge, and finally grasping the object. Each row shows a trajectory in temporal order from left to right. Please refer to our supplementary video and website for animations.
  • Figure 2: Approach overview. We prepare a primitive library and define each primitive's contact requirements online. Given a demo task trajectory and a test scene, we retarget the demo to the test scene by enforcing contact requirements. The demo's primitive sequence is then used to perform the task in the test scene.
  • Figure 3: Hardware setup. Fig. \ref{['subfig:robot']} shows the world frame, the 3 obstacles, and the wall at $(75cm,0\degree)$. Fig. \ref{['subfig:objects']} show the 13 objects, from left to right beginning with the frontmost row: camera*, onion*, meat*, salt†; cracker, cocoa, seasoning, flapjack, coffee†; oat, cereal, wafer, chocolate†. *=short objects(3). †=impossible objects(3). The rest are standard objects(7).
  • Figure 4: Extrinsic manipulation tasks. The numbers and colors denote the primitive sequence. Push: red. Pull: green. Pivot: yellow. Grasp: orange. An additional "pull" is necessary for short objects, as the a and b branches illustrate in Fig. \ref{['subfig:grasping']}.
  • Figure 5: Executing extrinsic manipulation tasks, in temporal order from left to right. From top row to bottom: avoidance on wafer, storage on cereal, grasping on cocoa and camera(short object). An extra pull (5th frame) is necessary to create clearance between the wall and gripper prior to grasping camera. Video is available in our supplementary material and on our website. Please refer to Table \ref{['tab:task_results']} for detailed task setups and Fig. \ref{['fig:highlight_retrieval']} for the retrieval task.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Definition 1: Contact configuration
  • Definition 2: Contact configuration in $\mathcal{E},\mathcal{O}$
  • Definition 3: Freestanding object states
  • Proposition 1: Arbitrary robot-object contact switch at a freestanding state