Table of Contents
Fetching ...

Amortizing Trajectory Diffusion with Keyed Drift Fields

Gokul Puthumanaillam, Melkior Ornik

Abstract

Diffusion-based trajectory planners can synthesize rich, multimodal action sequences for offline reinforcement learning, but their iterative denoising incurs substantial inference-time cost, making closed-loop planning slow under tight compute budgets. We study the problem of achieving diffusion-like trajectory planning behavior with one-step inference, while retaining the ability to sample diverse candidate plans and condition on the current state in a receding-horizon control loop. Our key observation is that conditional trajectory generation fails under naïve distribution-matching objectives when the similarity measure used to align generated trajectories with the dataset is dominated by unconstrained future dimensions. In practice, this causes attraction toward average trajectories, collapses action diversity, and yields near-static behavior. Our key insight is that conditional generative planning requires a conditioning-aware notion of neighborhood: trajectory updates should be computed using distances in a compact key space that reflects the condition, while still applying updates in the full trajectory space. Building on this, we introduce Keyed Drifting Policies (KDP), a one-step trajectory generator trained with a drift-field objective that attracts generated trajectories toward condition-matched dataset windows and repels them from nearby generated samples, using a stop-gradient drifted target to amortize iterative refinement into training. At inference, the resulting policy produces a full trajectory window in a single forward pass. Across standard RL benchmarks and real-time hardware deployments, KDP achieves strong performance with one-step inference and substantially lower planning latency than diffusion sampling. Project website, code and videos: https://keyed-drifting.github.io/

Amortizing Trajectory Diffusion with Keyed Drift Fields

Abstract

Diffusion-based trajectory planners can synthesize rich, multimodal action sequences for offline reinforcement learning, but their iterative denoising incurs substantial inference-time cost, making closed-loop planning slow under tight compute budgets. We study the problem of achieving diffusion-like trajectory planning behavior with one-step inference, while retaining the ability to sample diverse candidate plans and condition on the current state in a receding-horizon control loop. Our key observation is that conditional trajectory generation fails under naïve distribution-matching objectives when the similarity measure used to align generated trajectories with the dataset is dominated by unconstrained future dimensions. In practice, this causes attraction toward average trajectories, collapses action diversity, and yields near-static behavior. Our key insight is that conditional generative planning requires a conditioning-aware notion of neighborhood: trajectory updates should be computed using distances in a compact key space that reflects the condition, while still applying updates in the full trajectory space. Building on this, we introduce Keyed Drifting Policies (KDP), a one-step trajectory generator trained with a drift-field objective that attracts generated trajectories toward condition-matched dataset windows and repels them from nearby generated samples, using a stop-gradient drifted target to amortize iterative refinement into training. At inference, the resulting policy produces a full trajectory window in a single forward pass. Across standard RL benchmarks and real-time hardware deployments, KDP achieves strong performance with one-step inference and substantially lower planning latency than diffusion sampling. Project website, code and videos: https://keyed-drifting.github.io/
Paper Structure (27 sections, 22 equations, 4 figures, 7 tables, 2 algorithms)

This paper contains 27 sections, 22 equations, 4 figures, 7 tables, 2 algorithms.

Figures (4)

  • Figure 1: Given condition $c$, full-window distance selects neighbors dominated by unconstrained future dimensions (orange), encouraging mode-averaged trajectories (left bottom). Key distance selects condition-matched neighbors (magenta), preserving diverse conditional trajectories (right bottom).
  • Figure 2: Overview of KDP. A condition key retrieves condition-matched dataset windows, which train a one-step generator by attracting trajectories toward matched data and repelling them from nearby generated samples, enabling one-step closed-loop planning at inference.
  • Figure 3: D4RL simulation environments.
  • Figure 4: Qualitative real-time hardware results. Left: Crazyflie navigation trajectories for KDP (pink) and Diffuser (green). Right: SO-100 manipulation; the full behavior is best seen in the supplementary.