Boosting Hierarchical Reinforcement Learning with Meta-Learning for Complex Task Adaptation
Arash Khajooeinejad, Fatemeh Sadat Masoumi, Masoumeh Chapariniya
TL;DR
This work tackles exploration efficiency and rapid adaptation in hierarchical reinforcement learning (HRL) by integrating gradient-based meta-learning (MAML-style inner/outer updates) with intrinsic motivation and curriculum learning. The agent uses a high-level policy to select among low-level options, enabling temporal abstraction, while meta-learning optimizes both policy levels across a distribution of tasks with inner updates of $K$ steps and outer meta-parameter updates on $\mathcal{T}$. An intrinsic reward $r^{\mathrm{int}}_t = \eta / \sqrt{N(s_t) + \epsilon}$ augments extrinsic rewards, promoting novel-state exploration, and a curriculum progressively escalates task difficulty via grid size, traps, and complexity thresholds. Experimental results in grid-world environments show faster convergence, higher cumulative rewards, and increased success rates compared to standard HRL baselines, illustrating the practical value of combining meta-learning, intrinsic motivation, and curriculum learning for complex, long-horizon tasks.
Abstract
Hierarchical Reinforcement Learning (HRL) is well-suitedd for solving complex tasks by breaking them down into structured policies. However, HRL agents often struggle with efficient exploration and quick adaptation. To overcome these limitations, we propose integrating meta-learning into HRL to enable agents to learn and adapt hierarchical policies more effectively. Our method leverages meta-learning to facilitate rapid task adaptation using prior experience, while intrinsic motivation mechanisms drive efficient exploration by rewarding the discovery of novel states. Specifically, our agent employs a high-level policy to choose among multiple low-level policies within custom-designed grid environments. By incorporating gradient-based meta-learning with differentiable inner-loop updates, we optimize performance across a curriculum of progressively challenging tasks. Experimental results highlight that our metalearning-enhanced hierarchical agent significantly outperforms standard HRL approaches lacking meta-learning and intrinsic motivation. The agent demonstrates faster learning, greater cumulative rewards, and higher success rates in complex grid-based scenarios. These Findings underscore the effectiveness of combining meta-learning, curriculum learning, and intrinsic motivation to enhance the capability of HRL agents in tackling complex tasks.
