Table of Contents
Fetching ...

Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models

Sigmund H. Høeg, Yilun Du, Olav Egeland

TL;DR

Streaming Diffusion Policy tackles the slow action synthesis of diffusion-based robotic policies by generating partially denoised action trajectories with per-action noise, enabling fast, reactive control. It maintains diffusion-style conditioning and likelihood properties while introducing a persistent action buffer and chunked, increasing-noise sampling to dramatically reduce denoising steps, with speedup roughly scaling as $N/h$, where $N$ is the total diffusion steps and $h$ the number of chunks. Training employs chunk-wise noise corruption schemes to align the denoising predictor with the streaming sampling process, and practical results show SDP matching or exceeding baselines in simulated tasks and delivering superior real-world performance (e.g., Push-T). The approach offers a robust, distillation-free path to fast, closed-loop diffusion policies, with a tunable tradeoff between buffer length and reactivity that suits fast robotic manipulation tasks.

Abstract

Diffusion models have seen rapid adoption in robotic imitation learning, enabling autonomous execution of complex dexterous tasks. However, action synthesis is often slow, requiring many steps of iterative denoising, limiting the extent to which models can be used in tasks that require fast reactive policies. To sidestep this, recent works have explored how the distillation of the diffusion process can be used to accelerate policy synthesis. However, distillation is computationally expensive and can hurt both the accuracy and diversity of synthesized actions. We propose SDP (Streaming Diffusion Policy), an alternative method to accelerate policy synthesis, leveraging the insight that generating a partially denoised action trajectory is substantially faster than a full output action trajectory. At each observation, our approach outputs a partially denoised action trajectory with variable levels of noise corruption, where the immediate action to execute is noise-free, with subsequent actions having increasing levels of noise and uncertainty. The partially denoised action trajectory for a new observation can then be quickly generated by applying a few steps of denoising to the previously predicted noisy action trajectory (rolled over by one timestep). We illustrate the efficacy of this approach, dramatically speeding up policy synthesis while preserving performance across both simulated and real-world settings.

Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models

TL;DR

Streaming Diffusion Policy tackles the slow action synthesis of diffusion-based robotic policies by generating partially denoised action trajectories with per-action noise, enabling fast, reactive control. It maintains diffusion-style conditioning and likelihood properties while introducing a persistent action buffer and chunked, increasing-noise sampling to dramatically reduce denoising steps, with speedup roughly scaling as , where is the total diffusion steps and the number of chunks. Training employs chunk-wise noise corruption schemes to align the denoising predictor with the streaming sampling process, and practical results show SDP matching or exceeding baselines in simulated tasks and delivering superior real-world performance (e.g., Push-T). The approach offers a robust, distillation-free path to fast, closed-loop diffusion policies, with a tunable tradeoff between buffer length and reactivity that suits fast robotic manipulation tasks.

Abstract

Diffusion models have seen rapid adoption in robotic imitation learning, enabling autonomous execution of complex dexterous tasks. However, action synthesis is often slow, requiring many steps of iterative denoising, limiting the extent to which models can be used in tasks that require fast reactive policies. To sidestep this, recent works have explored how the distillation of the diffusion process can be used to accelerate policy synthesis. However, distillation is computationally expensive and can hurt both the accuracy and diversity of synthesized actions. We propose SDP (Streaming Diffusion Policy), an alternative method to accelerate policy synthesis, leveraging the insight that generating a partially denoised action trajectory is substantially faster than a full output action trajectory. At each observation, our approach outputs a partially denoised action trajectory with variable levels of noise corruption, where the immediate action to execute is noise-free, with subsequent actions having increasing levels of noise and uncertainty. The partially denoised action trajectory for a new observation can then be quickly generated by applying a few steps of denoising to the previously predicted noisy action trajectory (rolled over by one timestep). We illustrate the efficacy of this approach, dramatically speeding up policy synthesis while preserving performance across both simulated and real-world settings.
Paper Structure (15 sections, 8 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 8 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Predicting Partially Denoised Trajectories Previous applications of diffusion models for robotic control, like Diffusion Policy chiDiffusionPolicyVisuomotor2023b plans over a longer horizon, and executes the first few steps, discarding the rest after sampling. Streaming Diffusion Policy, however, denoise only the first actions, and keeps the future actions partly denoised for future predictions.
  • Figure 2: Action Buffer Visualization. SDP keeps a persistent action buffer, assigning actions at the beginning of the buffer to low noise levels and future actions to higher noise levels. This reduces denoising iterations for future action synthesis.
  • Figure 3: Trajectory Noise Corruptions. Different per-action noise level corruptions applied during training time to train the denoising function in SDP.
  • Figure 4: Fast Prediction Time with Longer Buffer. Streaming Diffusion Policy decreases its sampling time drastically with longer prediction horizons while not sacrificing performance. Here, the chunk length is set to $5$.
  • Figure 5: Qualitative Illustration of Simulated Experiments. SDP is able to successfully solve complex tasks from image observations, such as Push-T chiDiffusionPolicyVisuomotor2023b and Robomimic-tasks pmlr-v164-mandlekar22a.
  • ...and 2 more figures