Table of Contents
Fetching ...

Boosting Hierarchical Reinforcement Learning with Meta-Learning for Complex Task Adaptation

Arash Khajooeinejad, Fatemeh Sadat Masoumi, Masoumeh Chapariniya

TL;DR

This work tackles exploration efficiency and rapid adaptation in hierarchical reinforcement learning (HRL) by integrating gradient-based meta-learning (MAML-style inner/outer updates) with intrinsic motivation and curriculum learning. The agent uses a high-level policy to select among low-level options, enabling temporal abstraction, while meta-learning optimizes both policy levels across a distribution of tasks with inner updates of $K$ steps and outer meta-parameter updates on $\mathcal{T}$. An intrinsic reward $r^{\mathrm{int}}_t = \eta / \sqrt{N(s_t) + \epsilon}$ augments extrinsic rewards, promoting novel-state exploration, and a curriculum progressively escalates task difficulty via grid size, traps, and complexity thresholds. Experimental results in grid-world environments show faster convergence, higher cumulative rewards, and increased success rates compared to standard HRL baselines, illustrating the practical value of combining meta-learning, intrinsic motivation, and curriculum learning for complex, long-horizon tasks.

Abstract

Hierarchical Reinforcement Learning (HRL) is well-suitedd for solving complex tasks by breaking them down into structured policies. However, HRL agents often struggle with efficient exploration and quick adaptation. To overcome these limitations, we propose integrating meta-learning into HRL to enable agents to learn and adapt hierarchical policies more effectively. Our method leverages meta-learning to facilitate rapid task adaptation using prior experience, while intrinsic motivation mechanisms drive efficient exploration by rewarding the discovery of novel states. Specifically, our agent employs a high-level policy to choose among multiple low-level policies within custom-designed grid environments. By incorporating gradient-based meta-learning with differentiable inner-loop updates, we optimize performance across a curriculum of progressively challenging tasks. Experimental results highlight that our metalearning-enhanced hierarchical agent significantly outperforms standard HRL approaches lacking meta-learning and intrinsic motivation. The agent demonstrates faster learning, greater cumulative rewards, and higher success rates in complex grid-based scenarios. These Findings underscore the effectiveness of combining meta-learning, curriculum learning, and intrinsic motivation to enhance the capability of HRL agents in tackling complex tasks.

Boosting Hierarchical Reinforcement Learning with Meta-Learning for Complex Task Adaptation

TL;DR

This work tackles exploration efficiency and rapid adaptation in hierarchical reinforcement learning (HRL) by integrating gradient-based meta-learning (MAML-style inner/outer updates) with intrinsic motivation and curriculum learning. The agent uses a high-level policy to select among low-level options, enabling temporal abstraction, while meta-learning optimizes both policy levels across a distribution of tasks with inner updates of steps and outer meta-parameter updates on . An intrinsic reward augments extrinsic rewards, promoting novel-state exploration, and a curriculum progressively escalates task difficulty via grid size, traps, and complexity thresholds. Experimental results in grid-world environments show faster convergence, higher cumulative rewards, and increased success rates compared to standard HRL baselines, illustrating the practical value of combining meta-learning, intrinsic motivation, and curriculum learning for complex, long-horizon tasks.

Abstract

Hierarchical Reinforcement Learning (HRL) is well-suitedd for solving complex tasks by breaking them down into structured policies. However, HRL agents often struggle with efficient exploration and quick adaptation. To overcome these limitations, we propose integrating meta-learning into HRL to enable agents to learn and adapt hierarchical policies more effectively. Our method leverages meta-learning to facilitate rapid task adaptation using prior experience, while intrinsic motivation mechanisms drive efficient exploration by rewarding the discovery of novel states. Specifically, our agent employs a high-level policy to choose among multiple low-level policies within custom-designed grid environments. By incorporating gradient-based meta-learning with differentiable inner-loop updates, we optimize performance across a curriculum of progressively challenging tasks. Experimental results highlight that our metalearning-enhanced hierarchical agent significantly outperforms standard HRL approaches lacking meta-learning and intrinsic motivation. The agent demonstrates faster learning, greater cumulative rewards, and higher success rates in complex grid-based scenarios. These Findings underscore the effectiveness of combining meta-learning, curriculum learning, and intrinsic motivation to enhance the capability of HRL agents in tackling complex tasks.

Paper Structure

This paper contains 24 sections, 5 equations, 6 figures, 1 table, 2 algorithms.

Figures (6)

  • Figure 1: Architecture for Hierarchical Reinforcement Learning (HRL). After the high-level policy makes a choice, a low-level policy is triggered to engage with the environment. Based on input from the surroundings, the agent continuously modifies its state. The hierarchical flow of feedback and decision-making between high-level and low-level policies is shown by the dashed lines.
  • Figure 2: Intrinsic Motivation and Exploration Path: This flowchart illustrates how the agent interacts with the environment, calculates intrinsic rewards based on state visitation counts, and uses a combination of intrinsic and extrinsic rewards to guide exploration. The feedback loop ensures that new states are visited and counted, promoting efficient exploration.
  • Figure 3: Meta-Learning Process Flowchart. The outer loop initializes and updates meta-parameters across tasks, while the inner loop performs task-specific adaptations using gradient descent. The meta-loss is computed based on adapted parameters to optimize the meta-parameters.
  • Figure 4: Neural network architectures for the three key components of the hierarchical reinforcement learning system: (1) High-Level Policy Network (Options), (2) Low-Level Policy Network (Actions), and (3) Termination Function Network. Each network consists of an input layer, hidden layers (64 and 32 neurons), and an output layer tailored to the respective tasks.
  • Figure 5: The graph shows the progression of Meta-Loss, Average Reward, and Success Rate over 500 meta-iterations in the fixed complexity scenario. The red line represents Meta-Loss, the blue line indicates Average Reward, and the green line shows Success Rate.
  • ...and 1 more figures