Table of Contents
Fetching ...

Hierarchical Diffusion Motion Planning with Task-Conditioned Uncertainty-Aware Priors

Amelie Minji Kim, Anqi Wu, Ye Zhao

TL;DR

This work addresses robust motion planning under task structure by introducing a two-level hierarchical diffusion framework that embeds task information directly into the noise model. The upper level predicts task-relevant key states and timings, while the lower level denoises trajectories under a GPMP-based prior conditioned on those keys, yielding a biased, non-isotropic diffusion process with closed-form forward posteriors. The approach concentrates probability mass near feasible, smooth, and semantically meaningful trajectories, improving success rates and dynamic feasibility on Maze2D and KUKA block stacking without relying on hard constraints or reward shaping. Experiments show that combining task-conditioned guidance with a structured prior outperforms baselines that use isotropic noise or conditioning alone, highlighting the benefit of structuring corruption alongside conditioning. The method offers a tractable, general framework for task-aware diffusion in robotics, with a project page available for further details.

Abstract

We propose a novel hierarchical diffusion planner that embeds task and motion structure directly in the noise model. Unlike standard diffusion-based planners that use zero-mean, isotropic Gaussian noise, we employ a family of task-conditioned structured Gaussians whose means and covariances are derived from Gaussian Process Motion Planning (GPMP): sparse, task-centric key states or their associated timings (or both) are treated as noisy observations to produce a prior instance. We first generalize the standard diffusion process to biased, non-isotropic corruption with closed-form forward and posterior expressions. Building on this, our hierarchy separates prior instantiation from trajectory denoising: the upper level instantiates a task-conditioned structured Gaussian (mean and covariance), and the lower level denoises the full trajectory under that fixed prior. Experiments on Maze2D goal-reaching and KUKA block stacking show improved success rates, smoother trajectories, and stronger task alignment compared to isotropic baselines. Ablation studies indicate that explicitly structuring the corruption process offers benefits beyond simply conditioning the neural network. Overall, our method concentrates probability mass of prior near feasible, smooth, and semantically meaningful trajectories while maintaining tractability. Our project page is available at https://hta-diffusion.github.io.

Hierarchical Diffusion Motion Planning with Task-Conditioned Uncertainty-Aware Priors

TL;DR

This work addresses robust motion planning under task structure by introducing a two-level hierarchical diffusion framework that embeds task information directly into the noise model. The upper level predicts task-relevant key states and timings, while the lower level denoises trajectories under a GPMP-based prior conditioned on those keys, yielding a biased, non-isotropic diffusion process with closed-form forward posteriors. The approach concentrates probability mass near feasible, smooth, and semantically meaningful trajectories, improving success rates and dynamic feasibility on Maze2D and KUKA block stacking without relying on hard constraints or reward shaping. Experiments show that combining task-conditioned guidance with a structured prior outperforms baselines that use isotropic noise or conditioning alone, highlighting the benefit of structuring corruption alongside conditioning. The method offers a tractable, general framework for task-aware diffusion in robotics, with a project page available for further details.

Abstract

We propose a novel hierarchical diffusion planner that embeds task and motion structure directly in the noise model. Unlike standard diffusion-based planners that use zero-mean, isotropic Gaussian noise, we employ a family of task-conditioned structured Gaussians whose means and covariances are derived from Gaussian Process Motion Planning (GPMP): sparse, task-centric key states or their associated timings (or both) are treated as noisy observations to produce a prior instance. We first generalize the standard diffusion process to biased, non-isotropic corruption with closed-form forward and posterior expressions. Building on this, our hierarchy separates prior instantiation from trajectory denoising: the upper level instantiates a task-conditioned structured Gaussian (mean and covariance), and the lower level denoises the full trajectory under that fixed prior. Experiments on Maze2D goal-reaching and KUKA block stacking show improved success rates, smoother trajectories, and stronger task alignment compared to isotropic baselines. Ablation studies indicate that explicitly structuring the corruption process offers benefits beyond simply conditioning the neural network. Overall, our method concentrates probability mass of prior near feasible, smooth, and semantically meaningful trajectories while maintaining tractability. Our project page is available at https://hta-diffusion.github.io.

Paper Structure

This paper contains 20 sections, 16 equations, 6 figures, 3 tables, 3 algorithms.

Figures (6)

  • Figure 1: Overview of our hierarchical diffusion framework.Training (left): From dataset trajectories we extract task-relevant key states and timings $(Y, C)$. These supervise the upper-level diffusion (Alg. \ref{['alg:train_upper']}) and also define the structured prior for the lower level, trained under this task-aware noise model (Alg. \ref{['alg:train_lower']}). Test-time planning (right): For a new task specification, the upper level provides $(Y, C)$ via rule-based heuristics or the learned diffusion. These outputs instantiate the task-specific prior, from which the lower level initializes and denoises to generate a full trajectory (Alg. \ref{['alg:test']}).
  • Figure 2: Trajectory generation pipeline. Input: task specification (e.g., Maze2D goal reaching; KUKA grasp-and-place) and initial conditions. Upper level produces key states (stack of which is denoted with $Y$) and their timings (equivalent to selection matrix $C$), either by diffusion module and/or rules. For Maze2D, they are waypoints with fixed timings. For KUKA, they are grasp/release states and contact flags, which implicitly imply timing. With $(Y,C)$, we construct a Gaussian prior by conditioning a GP over trajectories, yielding $(\mu,\mathcal{K})$ via \ref{['eqn:mu_goal_conditioned']}--\ref{['eqn:kappa_goal_conditioned']}. The shaded regions illustrate covariance structure $\mathcal{K}$ around the trajectory, showing tighter variance near key states and looser variance elsewhere. The lower level then generates the full trajectory by reverse diffusion from $\mathcal{N}(\mu,\mathcal{K})$.
  • Figure 3: Reverse diffusion in Maze2D. Columns show intermediate denoising steps (left to right). Top: isotropic Gaussian prior—samples start fully random and remain noisy until late steps. Bottom: our model—initialization is already task-aware and trajectories quickly converge to smooth, goal-reaching solutions.
  • Figure 4: Maze2D goal reaching. Top-Left: task map with start (blue) and goal (red). Top-right: baseline (isotropic Gaussian noise) samples. Bottom: our method. The small inset on the left shows the waypoints predicted by the upper level (see Fig. \ref{['fig:pipeline_overview']}). Although waypoints provide a rough guide for the global trajectory, they can be imperfect or even infeasible (with collisions), so hard-constraining the lower level using these waypoints is brittle. Thus, we treat them as noisy observations and encode their uncertainty in the GPMP prior with the observation covariance $\mathcal{K}_y$, which guides denoising to yield smooth, collision-free trajectories aligned with the task.
  • Figure 5: Comparison of generated trajectories for KUKA block stacking. Each row shows a task with a random initial arm configuration and block positions. Columns: (left) initial condition, (middle) trajectories generated with an isotropic Gaussian prior, (right) trajectories generated with our model.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2