Table of Contents
Fetching ...

Instant Policy: In-Context Imitation Learning via Graph Diffusion

Vitalis Vosylius, Edward Johns

TL;DR

This work tackles the challenge of rapid, data-efficient robot policy acquisition by reframing In-Context Imitation Learning as graph-diffusion on a heterogeneous graph that fuses demonstrations, observations, and future actions. Instant Policy can be trained entirely with pseudo-demonstrations generated in simulation, enabling virtually unlimited training data and scalable learning. The method decouples translation and rotation in SE(3) action representation, uses a DDIM-inspired graph denoising process, and relies on a diffusion-based graph transformer to predict actions without updating weights at test time. Empirically, it achieves higher success rates than strong baselines on RLBench tasks, demonstrates real-world viability, and exhibits cross-embodiment and language-based zero-shot transfer, highlighting its potential as a scalable foundation for robotic manipulation. The approach paves the way for instant, context-driven policy synthesis in robotics, with practical impact in rapid adaptation and broader generalization across objects and modalities.

Abstract

Following the impressive capabilities of in-context learning with large transformers, In-Context Imitation Learning (ICIL) is a promising opportunity for robotics. We introduce Instant Policy, which learns new tasks instantly (without further training) from just one or two demonstrations, achieving ICIL through two key components. First, we introduce inductive biases through a graph representation and model ICIL as a graph generation problem with a learned diffusion process, enabling structured reasoning over demonstrations, observations, and actions. Second, we show that such a model can be trained using pseudo-demonstrations - arbitrary trajectories generated in simulation - as a virtually infinite pool of training data. Simulated and real experiments show that Instant Policy enables rapid learning of various everyday robot tasks. We also show how it can serve as a foundation for cross-embodiment and zero-shot transfer to language-defined tasks. Code and videos are available at https://www.robot-learning.uk/instant-policy.

Instant Policy: In-Context Imitation Learning via Graph Diffusion

TL;DR

This work tackles the challenge of rapid, data-efficient robot policy acquisition by reframing In-Context Imitation Learning as graph-diffusion on a heterogeneous graph that fuses demonstrations, observations, and future actions. Instant Policy can be trained entirely with pseudo-demonstrations generated in simulation, enabling virtually unlimited training data and scalable learning. The method decouples translation and rotation in SE(3) action representation, uses a DDIM-inspired graph denoising process, and relies on a diffusion-based graph transformer to predict actions without updating weights at test time. Empirically, it achieves higher success rates than strong baselines on RLBench tasks, demonstrates real-world viability, and exhibits cross-embodiment and language-based zero-shot transfer, highlighting its potential as a scalable foundation for robotic manipulation. The approach paves the way for instant, context-driven policy synthesis in robotics, with practical impact in rapid adaptation and broader generalization across objects and modalities.

Abstract

Following the impressive capabilities of in-context learning with large transformers, In-Context Imitation Learning (ICIL) is a promising opportunity for robotics. We introduce Instant Policy, which learns new tasks instantly (without further training) from just one or two demonstrations, achieving ICIL through two key components. First, we introduce inductive biases through a graph representation and model ICIL as a graph generation problem with a learned diffusion process, enabling structured reasoning over demonstrations, observations, and actions. Second, we show that such a model can be trained using pseudo-demonstrations - arbitrary trajectories generated in simulation - as a virtually infinite pool of training data. Simulated and real experiments show that Instant Policy enables rapid learning of various everyday robot tasks. We also show how it can serve as a foundation for cross-embodiment and zero-shot transfer to language-defined tasks. Code and videos are available at https://www.robot-learning.uk/instant-policy.

Paper Structure

This paper contains 23 sections, 6 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Instant Policy acquires skills instantly after providing demos at test time. We model in-context imitation learning as a graph-based diffusion process, trained using pseudo-demonstrations.
  • Figure 2: (Left) A local graph, ${\mathcal{G}}_l$, representing the robot's state (blue nodes) and local geometries of the objects (green nodes). (Right) A graph representing 2 demos (3 waypoints each), the current state, and 2 future actions. Edge colours represent different edge types in a heterogeneous graph.
  • Figure 3: (Left) High-level structure of the network used to train graph-based diffusion model. (Right) Position of gripper action nodes during the denoising process for one of the predicted actions.
  • Figure 4: Examples of the simulated trajectories - 3 pseudo-demonstrations for 2 pseudo-tasks.
  • Figure 5: Attention weights visualised on sub-graph edges at two different timesteps in the phone-on-base task, showing the model's ability to track task progress and aggregate relevant information.
  • ...and 5 more figures