Hierarchical Diffusion Motion Planning with Task-Conditioned Uncertainty-Aware Priors
Amelie Minji Kim, Anqi Wu, Ye Zhao
TL;DR
This work addresses robust motion planning under task structure by introducing a two-level hierarchical diffusion framework that embeds task information directly into the noise model. The upper level predicts task-relevant key states and timings, while the lower level denoises trajectories under a GPMP-based prior conditioned on those keys, yielding a biased, non-isotropic diffusion process with closed-form forward posteriors. The approach concentrates probability mass near feasible, smooth, and semantically meaningful trajectories, improving success rates and dynamic feasibility on Maze2D and KUKA block stacking without relying on hard constraints or reward shaping. Experiments show that combining task-conditioned guidance with a structured prior outperforms baselines that use isotropic noise or conditioning alone, highlighting the benefit of structuring corruption alongside conditioning. The method offers a tractable, general framework for task-aware diffusion in robotics, with a project page available for further details.
Abstract
We propose a novel hierarchical diffusion planner that embeds task and motion structure directly in the noise model. Unlike standard diffusion-based planners that use zero-mean, isotropic Gaussian noise, we employ a family of task-conditioned structured Gaussians whose means and covariances are derived from Gaussian Process Motion Planning (GPMP): sparse, task-centric key states or their associated timings (or both) are treated as noisy observations to produce a prior instance. We first generalize the standard diffusion process to biased, non-isotropic corruption with closed-form forward and posterior expressions. Building on this, our hierarchy separates prior instantiation from trajectory denoising: the upper level instantiates a task-conditioned structured Gaussian (mean and covariance), and the lower level denoises the full trajectory under that fixed prior. Experiments on Maze2D goal-reaching and KUKA block stacking show improved success rates, smoother trajectories, and stronger task alignment compared to isotropic baselines. Ablation studies indicate that explicitly structuring the corruption process offers benefits beyond simply conditioning the neural network. Overall, our method concentrates probability mass of prior near feasible, smooth, and semantically meaningful trajectories while maintaining tractability. Our project page is available at https://hta-diffusion.github.io.
