Hierarchical Reinforcement Learning with Uncertainty-Guided Diffusional Subgoals
Vivienne Huiling Wang, Tinghuai Wang, Joni Pajarinen
TL;DR
This paper addresses instability and non-stationarity in off-policy hierarchical RL by introducing HIDI, which crafts subgoals through a state-conditioned conditional diffusion model and regularizes it with a Gaussian Process prior to quantify uncertainty. A hybrid subgoal selection strategy mixes diffusion-generated subgoals with the GP predictive mean, enabling robust and diverse planning across hierarchy levels. The approach, complemented by a sparse GP for scalability, demonstrates improved sample efficiency and performance on a suite of long-horizon MuJoCo tasks, with ablations showing the value of diffusion, GP regularization, and subgoal selection. Overall, HIDI advances uncertainty-aware subgoal generation in HRL, offering stronger learning stability and practical viability for complex continuous-control problems.
Abstract
Hierarchical reinforcement learning (HRL) learns to make decisions on multiple levels of temporal abstraction. A key challenge in HRL is that the low-level policy changes over time, making it difficult for the high-level policy to generate effective subgoals. To address this issue, the high-level policy must capture a complex subgoal distribution while also accounting for uncertainty in its estimates. We propose an approach that trains a conditional diffusion model regularized by a Gaussian Process (GP) prior to generate a complex variety of subgoals while leveraging principled GP uncertainty quantification. Building on this framework, we develop a strategy that selects subgoals from both the diffusion policy and GP's predictive mean. Our approach outperforms prior HRL methods in both sample efficiency and performance on challenging continuous control benchmarks.
