Table of Contents
Fetching ...

DynaGuide: Steering Diffusion Polices with Active Dynamic Guidance

Maximilian Du, Shuran Song

TL;DR

DynaGuide tackles the challenge of steering large, pretrained diffusion policies without retraining by coupling a separate latent dynamics model with the diffusion denoising process. It computes a differentiable guidance metric in the DinoV2 latent space using predicted future observations and a set of positive/negative objectives, and injects its gradient into the action denoising step via DDIM, enabling multi-objective, robust steering that can amplify underrepresented behaviors and work with off-the-shelf policies. Across CALVIN simulations and real-robot experiments, DynaGuide yields up to 70–80% steering success and outperforms goal-conditioning by up to 5.4x under low-quality guidance, while also handling multiple objectives and novel behaviors. The approach offers a plug-and-play steering paradigm with practical impact for deploying complex robotic policies in uncertain real-world settings, though it relies on observation-based guidance and invites future multimodal guidance and memory-enabled extensions.

Abstract

Deploying large, complex policies in the real world requires the ability to steer them to fit the needs of a situation. Most common steering approaches, like goal-conditioning, require training the robot policy with a distribution of test-time objectives in mind. To overcome this limitation, we present DynaGuide, a steering method for diffusion policies using guidance from an external dynamics model during the diffusion denoising process. DynaGuide separates the dynamics model from the base policy, which gives it multiple advantages, including the ability to steer towards multiple objectives, enhance underrepresented base policy behaviors, and maintain robustness on low-quality objectives. The separate guidance signal also allows DynaGuide to work with off-the-shelf pretrained diffusion policies. We demonstrate the performance and features of DynaGuide against other steering approaches in a series of simulated and real experiments, showing an average steering success of 70% on a set of articulated CALVIN tasks and outperforming goal-conditioning by 5.4x when steered with low-quality objectives. We also successfully steer an off-the-shelf real robot policy to express preference for particular objects and even create novel behavior. Videos and more can be found on the project website: https://dynaguide.github.io

DynaGuide: Steering Diffusion Polices with Active Dynamic Guidance

TL;DR

DynaGuide tackles the challenge of steering large, pretrained diffusion policies without retraining by coupling a separate latent dynamics model with the diffusion denoising process. It computes a differentiable guidance metric in the DinoV2 latent space using predicted future observations and a set of positive/negative objectives, and injects its gradient into the action denoising step via DDIM, enabling multi-objective, robust steering that can amplify underrepresented behaviors and work with off-the-shelf policies. Across CALVIN simulations and real-robot experiments, DynaGuide yields up to 70–80% steering success and outperforms goal-conditioning by up to 5.4x under low-quality guidance, while also handling multiple objectives and novel behaviors. The approach offers a plug-and-play steering paradigm with practical impact for deploying complex robotic policies in uncertain real-world settings, though it relies on observation-based guidance and invites future multimodal guidance and memory-enabled extensions.

Abstract

Deploying large, complex policies in the real world requires the ability to steer them to fit the needs of a situation. Most common steering approaches, like goal-conditioning, require training the robot policy with a distribution of test-time objectives in mind. To overcome this limitation, we present DynaGuide, a steering method for diffusion policies using guidance from an external dynamics model during the diffusion denoising process. DynaGuide separates the dynamics model from the base policy, which gives it multiple advantages, including the ability to steer towards multiple objectives, enhance underrepresented base policy behaviors, and maintain robustness on low-quality objectives. The separate guidance signal also allows DynaGuide to work with off-the-shelf pretrained diffusion policies. We demonstrate the performance and features of DynaGuide against other steering approaches in a series of simulated and real experiments, showing an average steering success of 70% on a set of articulated CALVIN tasks and outperforming goal-conditioning by 5.4x when steered with low-quality objectives. We also successfully steer an off-the-shelf real robot policy to express preference for particular objects and even create novel behavior. Videos and more can be found on the project website: https://dynaguide.github.io

Paper Structure

This paper contains 28 sections, 8 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: DynaGuide steers pretrained diffusion policies by adding guidance from a dynamics model into the action denoising process. This dynamics-based guidance can take a diverse behavior base policy and steer it towards one single behavior (left), multiple behaviors (middle), and even removing a behavior (right)--- all without fine-tuning.
  • Figure 2: Achieving Dynamics Guidance. (A): DynaGuide combines action denoising gradients $\varepsilon_p$ from the pretrained policy with a guidance gradient $\nabla_{a^k_t}\mathbf{d}$ that increases the likelihood of accomplishing a set of guidance conditions $\mathcal{G}$. (B): Inside the guidance module, a dynamics model predicts future outcomes $\hat{z}_{t+H}$ and compares them to guidance conditions $\mathcal{G}$ (desired / undesired outcomes). We use the latent distances $d$ to define a guidance metric $\mathbf{d}$ (Equation \ref{['eq:metric']}) and take the gradient to get the guidance signal $\nabla_{a^k_t}\mathbf{d}$ used by DynaGuide. (C): An example of one denoising step. The pretrained policy seeks behavior modes in the data, while the guidance gradient selects a particular mode.
  • Figure 3: Experiment Setup. In the CALVIN simulator mees_calvin_2022, we propose four experimental setups designed to showcase DynaGuide and its advantages over other steering approaches. First, we test performance with high quality outcome observations as guidance conditions (Fully-Specified Objective). Next, we reduce the guidance condition quality by randomizing robot states and other states not relevant to the target object (Underspecified Objective). Finally, we look at how we can guide the base policy in complex ways, including achieving multiple behaviors and avoiding behaviors (Multiple Objectives).
  • Figure 4: Steering Ability and Robustness in the Calvin Environment DynaGuide enhances the target behavior (horizontal axis) significantly across all experiments (Section \ref{['sec:exp12']}). The goal conditioning baseline performs very well on a clean fixed articulated setup, but it drops steeply with lower goal quality while DynaGuide remains more robust (Section \ref{['sec:exp3']}). For more precise tasks with movable cube objects, the active guidance in DynaGuide outperforms a sampling-based approach with the same dynamics model (Section \ref{['sec:exp12']})
  • Figure 5: Multiple Objectives and Underrepresented Behaviors. DynaGuide is able to steer the base policy towards multiple behaviors while minimizing other behaviors and failures (Left). DynaGuide is also able to avoid undesired behaviors by performing other behaviors successfully (Middle). On these complicated objectives and in lower data regimes (right), DynaGuide performs better than sampling approaches.
  • ...and 5 more figures