Table of Contents
Fetching ...

Efficient Diffusion Planning with Temporal Diffusion

Jiaming Guo, Rui Zhang, Zerun Li, Yunkai Gao, Shaohui Peng, Siming Lan, Xing Hu, Zidong Du, Xishan Zhang, Ling Li

TL;DR

The paper tackles the efficiency bottleneck in diffusion-based offline RL planning by introducing Temporal Diffusion Planner (TDP), which distributes denoising steps across time to reuse and progressively refine plans. Key components include Triangular Initial Planning, Plan Refinement with a small per-step update, and an Automatic Replanning mechanism (with state-based and primarily value-based criteria) to prevent plan drift, plus a variant using an Inverse Dynamic Model (TDPInv). Empirical results on the D4RL benchmark show that TDP achieves 11–24.8x higher decision frequency than prior methods while maintaining or improving performance, with strong results on long-horizon Kitchen tasks. These findings suggest that temporal diffusion and adaptive replanning offer practical advantages for efficient, high-frequency decision making in offline RL settings, with potential extensions to multitask or hierarchical planning.

Abstract

Diffusion planning is a promising method for learning high-performance policies from offline data. To avoid the impact of discrepancies between planning and reality on performance, previous works generate new plans at each time step. However, this incurs significant computational overhead and leads to lower decision frequencies, and frequent plan switching may also affect performance. In contrast, humans might create detailed short-term plans and more general, sometimes vague, long-term plans, and adjust them over time. Inspired by this, we propose the Temporal Diffusion Planner (TDP) which improves decision efficiency by distributing the denoising steps across the time dimension. TDP begins by generating an initial plan that becomes progressively more vague over time. At each subsequent time step, rather than generating an entirely new plan, TDP updates the previous one with a small number of denoising steps. This reduces the average number of denoising steps, improving decision efficiency. Additionally, we introduce an automated replanning mechanism to prevent significant deviations between the plan and reality. Experiments on D4RL show that, compared to previous works that generate new plans every time step, TDP improves the decision-making frequency by 11-24.8 times while achieving higher or comparable performance.

Efficient Diffusion Planning with Temporal Diffusion

TL;DR

The paper tackles the efficiency bottleneck in diffusion-based offline RL planning by introducing Temporal Diffusion Planner (TDP), which distributes denoising steps across time to reuse and progressively refine plans. Key components include Triangular Initial Planning, Plan Refinement with a small per-step update, and an Automatic Replanning mechanism (with state-based and primarily value-based criteria) to prevent plan drift, plus a variant using an Inverse Dynamic Model (TDPInv). Empirical results on the D4RL benchmark show that TDP achieves 11–24.8x higher decision frequency than prior methods while maintaining or improving performance, with strong results on long-horizon Kitchen tasks. These findings suggest that temporal diffusion and adaptive replanning offer practical advantages for efficient, high-frequency decision making in offline RL settings, with potential extensions to multitask or hierarchical planning.

Abstract

Diffusion planning is a promising method for learning high-performance policies from offline data. To avoid the impact of discrepancies between planning and reality on performance, previous works generate new plans at each time step. However, this incurs significant computational overhead and leads to lower decision frequencies, and frequent plan switching may also affect performance. In contrast, humans might create detailed short-term plans and more general, sometimes vague, long-term plans, and adjust them over time. Inspired by this, we propose the Temporal Diffusion Planner (TDP) which improves decision efficiency by distributing the denoising steps across the time dimension. TDP begins by generating an initial plan that becomes progressively more vague over time. At each subsequent time step, rather than generating an entirely new plan, TDP updates the previous one with a small number of denoising steps. This reduces the average number of denoising steps, improving decision efficiency. Additionally, we introduce an automated replanning mechanism to prevent significant deviations between the plan and reality. Experiments on D4RL show that, compared to previous works that generate new plans every time step, TDP improves the decision-making frequency by 11-24.8 times while achieving higher or comparable performance.

Paper Structure

This paper contains 30 sections, 6 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Comparison of temporal diffusion planner and diffusion planning methods which replan at each time step. The visualization of points in the figure illustrates the x-y coordinates of the planned trajectories.
  • Figure 2: Overview of the Temporal Diffusion Planner. TDP starts with Triangular Initial Planning, creating detailed short-term and vague long-term plans. It then refines previous time-evolving plan at each step via temporal diffusion, applying very small number of denoising steps until automatic-replanning is triggered.
  • Figure 3: Triangular Initial Planning when the planning horizon is set as 3. $s_{t+i}^K, a_{t+i}^K$ are sampled from Gaussian noise.
  • Figure 4: The average scores of previous diffusion planning methods on Mujoco when replan at different intervals.
  • Figure 5: The normalized scores of TDP for replanning at different fixed intervals. We present the normalized scores when automatic replan with the dashed lines.
  • ...and 1 more figures