PEAR: Primitive Enabled Adaptive Relabeling for Boosting Hierarchical Reinforcement Learning
Utsav Singh, Vinay P. Namboodiri
TL;DR
PEAR introduces primitive enabled adaptive relabeling to address non-stationarity in off-policy HRL by generating a curriculum of achievable subgoals from a small set of expert demonstrations. It then jointly optimizes higher-level subgoal policies and lower-level primitives using RL with imitation learning regularization on a dynamically refreshed subgoal dataset $D_g$, yielding two concrete variants: PEAR-BC and PEAR-IRL. Theoretical sub-optimality bounds show how adaptive relabeling and IL regularization tighten performance guarantees, while empirical results across six Mujoco tasks and real-world robot experiments demonstrate substantial improvements over baselines, including up to 80% success in sparse long-horizon tasks. PEAR is designed to be compatible with standard off-policy algorithms and requires only minimal task-structure assumptions, making it a practical advancement for solving long-horizon HRL challenges.
Abstract
Hierarchical reinforcement learning (HRL) has the potential to solve complex long horizon tasks using temporal abstraction and increased exploration. However, hierarchical agents are difficult to train due to inherent non-stationarity. We present primitive enabled adaptive relabeling (PEAR), a two-phase approach where we first perform adaptive relabeling on a few expert demonstrations to generate efficient subgoal supervision, and then jointly optimize HRL agents by employing reinforcement learning (RL) and imitation learning (IL). We perform theoretical analysis to bound the sub-optimality of our approach and derive a joint optimization framework using RL and IL. Since PEAR utilizes only a few expert demonstrations and considers minimal limiting assumptions on the task structure, it can be easily integrated with typical off-policy RL algorithms to produce a practical HRL approach. We perform extensive experiments on challenging environments and show that PEAR is able to outperform various hierarchical and non-hierarchical baselines and achieve upto $80\%$ success rates in complex sparse robotic control tasks where other baselines typically fail to show significant progress. We also perform ablations to thoroughly analyse the importance of our various design choices. Finally, we perform real world robotic experiments on complex tasks and demonstrate that PEAR consistently outperforms the baselines.
