On the benefits of pixel-based hierarchical policies for task generalization
Tudor Cristea-Platon, Bogdan Mazoure, Josh Susskind, Walter Talbott
TL;DR
This work investigates whether pixel-based hierarchical reinforcement learning (HRL) with task conditioning can improve generalization across tasks. Building on the Director architecture, it analyzes how a high-level manager and a low-level worker can compose reusable skills to shorten the effective horizon from $H$ to $H/k$, enable zero-shot generalization through compositionality, and accelerate fast adaptation by reusing low-level policies. The results show that HRL improves training performance on multi-task tasks, enhances reward and state-space generalization to similar tasks, and reduces the data required for solving novel tasks during fine-tuning. Overall, the findings advocate for incorporating hierarchy in RL architectures to promote generalization in vision-based robotic control scenarios.
Abstract
Reinforcement learning practitioners often avoid hierarchical policies, especially in image-based observation spaces. Typically, the single-task performance improvement over flat-policy counterparts does not justify the additional complexity associated with implementing a hierarchy. However, by introducing multiple decision-making levels, hierarchical policies can compose lower-level policies to more effectively generalize between tasks, highlighting the need for multi-task evaluations. We analyze the benefits of hierarchy through simulated multi-task robotic control experiments from pixels. Our results show that hierarchical policies trained with task conditioning can (1) increase performance on training tasks, (2) lead to improved reward and state-space generalizations in similar tasks, and (3) decrease the complexity of fine tuning required to solve novel tasks. Thus, we believe that hierarchical policies should be considered when building reinforcement learning architectures capable of generalizing between tasks.
