Table of Contents
Fetching ...

Tree-Guided Diffusion Planner

Hyeonseong Jeon, Cheolhong Min, Jaesik Park

TL;DR

The paper tackles the challenge of planning with pretrained diffusion models under non-convex, non-differentiable test-time objectives. It introduces Tree-guided Diffusion Planner (TDP), a zero-shot framework that builds a trajectory tree through bi-level sampling: diverse parent trajectories are generated with training-free particle guidance, and each parent is refined by fast, gradient-guided sub-trajectories conditioned on task objectives. Key contributions include a state-decomposition scheme that separates observation and control features, a bi-level sampling pipeline that unifies gradient and particle guidance, and extensive experiments on Maze2D gold-picking, KUKA robot arm manipulation, and AntMaze showing consistent advantages over prior zero-shot planners in non-convex and multi-goal settings. The approach enables flexible, task-aware planning without additional task-specific training, expanding the practical utility of diffusion-based planners for long-horizon, compositional control while acknowledging computational overhead from the broader search.

Abstract

Planning with pretrained diffusion models has emerged as a promising approach for solving test-time guided control problems. Standard gradient guidance typically performs optimally under convex, differentiable reward landscapes. However, it shows substantially reduced effectiveness in real-world scenarios with non-convex objectives, non-differentiable constraints, and multi-reward structures. Furthermore, recent supervised planning approaches require task-specific training or value estimators, which limits test-time flexibility and zero-shot generalization. We propose a Tree-guided Diffusion Planner (TDP), a zero-shot test-time planning framework that balances exploration and exploitation through structured trajectory generation. We frame test-time planning as a tree search problem using a bi-level sampling process: (1) diverse parent trajectories are produced via training-free particle guidance to encourage broad exploration, and (2) sub-trajectories are refined through fast conditional denoising guided by task objectives. TDP addresses the limitations of gradient guidance by exploring diverse trajectory regions and harnessing gradient information across this expanded solution space using only pretrained models and test-time reward signals. We evaluate TDP on three diverse tasks: maze gold-picking, robot arm block manipulation, and AntMaze multi-goal exploration. TDP consistently outperforms state-of-the-art approaches on all tasks. The project page can be found at: https://tree-diffusion-planner.github.io.

Tree-Guided Diffusion Planner

TL;DR

The paper tackles the challenge of planning with pretrained diffusion models under non-convex, non-differentiable test-time objectives. It introduces Tree-guided Diffusion Planner (TDP), a zero-shot framework that builds a trajectory tree through bi-level sampling: diverse parent trajectories are generated with training-free particle guidance, and each parent is refined by fast, gradient-guided sub-trajectories conditioned on task objectives. Key contributions include a state-decomposition scheme that separates observation and control features, a bi-level sampling pipeline that unifies gradient and particle guidance, and extensive experiments on Maze2D gold-picking, KUKA robot arm manipulation, and AntMaze showing consistent advantages over prior zero-shot planners in non-convex and multi-goal settings. The approach enables flexible, task-aware planning without additional task-specific training, expanding the practical utility of diffusion-based planners for long-horizon, compositional control while acknowledging computational overhead from the broader search.

Abstract

Planning with pretrained diffusion models has emerged as a promising approach for solving test-time guided control problems. Standard gradient guidance typically performs optimally under convex, differentiable reward landscapes. However, it shows substantially reduced effectiveness in real-world scenarios with non-convex objectives, non-differentiable constraints, and multi-reward structures. Furthermore, recent supervised planning approaches require task-specific training or value estimators, which limits test-time flexibility and zero-shot generalization. We propose a Tree-guided Diffusion Planner (TDP), a zero-shot test-time planning framework that balances exploration and exploitation through structured trajectory generation. We frame test-time planning as a tree search problem using a bi-level sampling process: (1) diverse parent trajectories are produced via training-free particle guidance to encourage broad exploration, and (2) sub-trajectories are refined through fast conditional denoising guided by task objectives. TDP addresses the limitations of gradient guidance by exploring diverse trajectory regions and harnessing gradient information across this expanded solution space using only pretrained models and test-time reward signals. We evaluate TDP on three diverse tasks: maze gold-picking, robot arm block manipulation, and AntMaze multi-goal exploration. TDP consistently outperforms state-of-the-art approaches on all tasks. The project page can be found at: https://tree-diffusion-planner.github.io.

Paper Structure

This paper contains 40 sections, 1 theorem, 7 equations, 12 figures, 14 tables, 3 algorithms.

Key Result

Proposition 1

Initialization problem in gradient guidance with diffusion planner. Assume that the trajectory data $X\in \mathbb{R}^{H\times D}$ follows the Assumption 1 in guo2024gradientguidancediffusionmodels, and given guide $\mathcal{J}(X) = \mathcal{J}_1(X) + \mathcal{J}_2(X)$ where $\mathcal{J}_1(X)=\exp\le

Figures (12)

  • Figure 1: Rollout Trajectories for the Gold-Picking in Maze2D. In the Maze2D-Large environment fu2021d4rldatasetsdeepdatadriven, the agent must collect an additional gold objective (yellow) positioned off the shortest navigation path. Trajectories are generated by Diffuser janner2022planningdiffusionflexiblebehavior (with gradient guidance), Diffuser$\gamma$feng2024resistingstochasticrisksdiffusion, and our method. See Sec. \ref{['maze2d_gold_picking']} for details.
  • Figure 2: Limitation of Gradient-based Guided Planning.a. In-distribution preference. b. Naïve gradient guidance is incompatible with non-differentiable rules.
  • Figure 3: Tree-guided Diffusion Planner (TDP). TDP constructs a trajectory tree combining diverse parent trajectories (navy) and guided sub-trajectories (red) in the 2D data space. The 3D surface represents the reward landscape, with peaks indicating high-reward regions.
  • Figure 4: Pick-and-Where-to-Place (PnWP).PnWP evaluates the agent's exploration capacity in the robot arm manipulation environment. The agent must infer suitable placement locations for each block based on the reward distribution and plan corresponding pick-and-place actions.
  • Figure 5: Diverse Trajectory Generation.a. Mean pairwise distance computed over 32 trajectories, averaged across 100 planning seeds in PnWP. Error bars indicate standard error. b. Visualization of trajectory generation process and rollout results of MCSS and TDP.
  • ...and 7 more figures

Theorems & Definitions (1)

  • Proposition 1