Table of Contents
Fetching ...

Hierarchical Average-Reward Linearly-solvable Markov Decision Processes

Guillermo Infante, Anders Jonsson, Vicenç Gómez

TL;DR

The paper addresses hierarchical reinforcement learning in the average-reward setting for Linearly-solvable MDPs (LMDPs), showing how to learn low-level and high-level tasks jointly without restrictive assumptions. It develops a hierarchical ALMDP framework built on state-space partitioning, subtask equivalence, and compositional value representation, enabling exact construction of the high-level value function from base subtasks. Two algorithms are proposed: a two-stage eigenvector method and an online algorithm that learns gains, subtasks, and exit values from samples; both leverage the linear structure of LMDPs and compositionality. Empirical results on N-room and Taxi domains demonstrate substantial speedups over flat ALMDP learning, validating the approach's efficiency and scalability for hierarchical average-reward tasks.

Abstract

We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable Markov Decision Processes (LMDPs) in the infinite-horizon average-reward setting. Unlike previous work, our approach allows learning low-level and high-level tasks simultaneously, without imposing limiting restrictions on the low-level tasks. Our method relies on partitions of the state space that create smaller subtasks that are easier to solve, and the equivalence between such partitions to learn more efficiently. We then exploit the compositionality of low-level tasks to exactly represent the value function of the high-level task. Experiments show that our approach can outperform flat average-reward reinforcement learning by one or several orders of magnitude.

Hierarchical Average-Reward Linearly-solvable Markov Decision Processes

TL;DR

The paper addresses hierarchical reinforcement learning in the average-reward setting for Linearly-solvable MDPs (LMDPs), showing how to learn low-level and high-level tasks jointly without restrictive assumptions. It develops a hierarchical ALMDP framework built on state-space partitioning, subtask equivalence, and compositional value representation, enabling exact construction of the high-level value function from base subtasks. Two algorithms are proposed: a two-stage eigenvector method and an online algorithm that learns gains, subtasks, and exit values from samples; both leverage the linear structure of LMDPs and compositionality. Empirical results on N-room and Taxi domains demonstrate substantial speedups over flat ALMDP learning, validating the approach's efficiency and scalability for hierarchical average-reward tasks.

Abstract

We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable Markov Decision Processes (LMDPs) in the infinite-horizon average-reward setting. Unlike previous work, our approach allows learning low-level and high-level tasks simultaneously, without imposing limiting restrictions on the low-level tasks. Our method relies on partitions of the state space that create smaller subtasks that are easier to solve, and the equivalence between such partitions to learn more efficiently. We then exploit the compositionality of low-level tasks to exactly represent the value function of the high-level task. Experiments show that our approach can outperform flat average-reward reinforcement learning by one or several orders of magnitude.
Paper Structure (17 sections, 12 theorems, 52 equations, 3 figures, 2 algorithms)

This paper contains 17 sections, 12 theorems, 52 equations, 3 figures, 2 algorithms.

Key Result

Theorem 3

Under mild assumptions, differential soft TD-learning in eq:main_v_td_update and eq:main_rho_td_update converges to the optimal values of $v$ and $\rho$ in $\mathcal{L}$.

Figures (3)

  • Figure 1: a) An example $4$-room ALMDP; b) a single subtask with 5 terminal states $G,L,R,T,B$ that is equivalent to all 4 room subtasks. Rooms are numbered 1 through 4, left-to-right, then top-to-bottom, and exit state $1^B$ refers to the exit $B$ of room $1$, etc.
  • Figure 2: Results in N-room when varying the number of rooms and the size of the rooms.
  • Figure 3: Results for $5 \times 5$ (top) and $8 \times 8$ (bottom) grids of the Taxi domain.

Theorems & Definitions (23)

  • Definition 1
  • Theorem 3
  • proof : Proof sketch
  • Theorem 4
  • Lemma 5
  • proof
  • Lemma 5
  • proof
  • Corollary 6
  • proof
  • ...and 13 more