Table of Contents
Fetching ...

Tether: Autonomous Functional Play with Correspondence-Driven Trajectory Warping

William Liang, Sam Wang, Hung-Ju Wang, Osbert Bastani, Yecheng Jason Ma, Dinesh Jayaraman

TL;DR

This work designs a novel open-loop policy that warps actions from a small set of source demonstrations by anchoring them to semantic keypoint correspondences in the target scene, and deploys this policy for autonomous functional play in the real world via a continuous cycle of task selection, execution, evaluation, and improvement.

Abstract

The ability to conduct and learn from interaction and experience is a central challenge in robotics, offering a scalable alternative to labor-intensive human demonstrations. However, realizing such "play" requires (1) a policy robust to diverse, potentially out-of-distribution environment states, and (2) a procedure that continuously produces useful robot experience. To address these challenges, we introduce Tether, a method for autonomous functional play involving structured, task-directed interactions. First, we design a novel open-loop policy that warps actions from a small set of source demonstrations (<=10) by anchoring them to semantic keypoint correspondences in the target scene. We show that this design is extremely data-efficient and robust even under significant spatial and semantic variations. Second, we deploy this policy for autonomous functional play in the real world via a continuous cycle of task selection, execution, evaluation, and improvement, guided by the visual understanding capabilities of vision-language models. This procedure generates diverse, high-quality datasets with minimal human intervention. In a household-like multi-object setup, our method is the first to perform many hours of autonomous multi-task play in the real world starting from only a handful of demonstrations. This produces a stream of data that consistently improves the performance of closed-loop imitation policies over time, ultimately yielding over 1000 expert-level trajectories and training policies competitive with those learned from human-collected demonstrations.

Tether: Autonomous Functional Play with Correspondence-Driven Trajectory Warping

TL;DR

This work designs a novel open-loop policy that warps actions from a small set of source demonstrations by anchoring them to semantic keypoint correspondences in the target scene, and deploys this policy for autonomous functional play in the real world via a continuous cycle of task selection, execution, evaluation, and improvement.

Abstract

The ability to conduct and learn from interaction and experience is a central challenge in robotics, offering a scalable alternative to labor-intensive human demonstrations. However, realizing such "play" requires (1) a policy robust to diverse, potentially out-of-distribution environment states, and (2) a procedure that continuously produces useful robot experience. To address these challenges, we introduce Tether, a method for autonomous functional play involving structured, task-directed interactions. First, we design a novel open-loop policy that warps actions from a small set of source demonstrations (<=10) by anchoring them to semantic keypoint correspondences in the target scene. We show that this design is extremely data-efficient and robust even under significant spatial and semantic variations. Second, we deploy this policy for autonomous functional play in the real world via a continuous cycle of task selection, execution, evaluation, and improvement, guided by the visual understanding capabilities of vision-language models. This procedure generates diverse, high-quality datasets with minimal human intervention. In a household-like multi-object setup, our method is the first to perform many hours of autonomous multi-task play in the real world starting from only a handful of demonstrations. This produces a stream of data that consistently improves the performance of closed-loop imitation policies over time, ultimately yielding over 1000 expert-level trajectories and training policies competitive with those learned from human-collected demonstrations.
Paper Structure (16 sections, 16 figures, 3 tables, 2 algorithms)

This paper contains 16 sections, 16 figures, 3 tables, 2 algorithms.

Figures (16)

  • Figure 1: Tether performs autonomous functional play in the real-world for over 24 hours, streaming over 1000 successful trajectories for downstream policy learning.
  • Figure 2: Demonstration Summaries. Tether summarizes demonstrations into the initial frame, action sequence (red), waypoints (blue), and keypoints.
  • Figure 3: Policy Inference. During inference, Tether (left) computes correspondences (middle) and produces a warped trajectory action plan (right).
  • Figure 4: Autonomous Functional Play. Our iterative procedure runs Tether for multiple tasks and uses VLMs for plan generation and success detection.
  • Figure 5: Evaluation Tasks. Our tasks involving moving fruits and containers with in-distribution (orange) and out-of-distribution (green) objects, as well as challenging manipulation skills (purple).
  • ...and 11 more figures