Instant Policy: In-Context Imitation Learning via Graph Diffusion
Vitalis Vosylius, Edward Johns
TL;DR
This work tackles the challenge of rapid, data-efficient robot policy acquisition by reframing In-Context Imitation Learning as graph-diffusion on a heterogeneous graph that fuses demonstrations, observations, and future actions. Instant Policy can be trained entirely with pseudo-demonstrations generated in simulation, enabling virtually unlimited training data and scalable learning. The method decouples translation and rotation in SE(3) action representation, uses a DDIM-inspired graph denoising process, and relies on a diffusion-based graph transformer to predict actions without updating weights at test time. Empirically, it achieves higher success rates than strong baselines on RLBench tasks, demonstrates real-world viability, and exhibits cross-embodiment and language-based zero-shot transfer, highlighting its potential as a scalable foundation for robotic manipulation. The approach paves the way for instant, context-driven policy synthesis in robotics, with practical impact in rapid adaptation and broader generalization across objects and modalities.
Abstract
Following the impressive capabilities of in-context learning with large transformers, In-Context Imitation Learning (ICIL) is a promising opportunity for robotics. We introduce Instant Policy, which learns new tasks instantly (without further training) from just one or two demonstrations, achieving ICIL through two key components. First, we introduce inductive biases through a graph representation and model ICIL as a graph generation problem with a learned diffusion process, enabling structured reasoning over demonstrations, observations, and actions. Second, we show that such a model can be trained using pseudo-demonstrations - arbitrary trajectories generated in simulation - as a virtually infinite pool of training data. Simulated and real experiments show that Instant Policy enables rapid learning of various everyday robot tasks. We also show how it can serve as a foundation for cross-embodiment and zero-shot transfer to language-defined tasks. Code and videos are available at https://www.robot-learning.uk/instant-policy.
