Hierarchical Average-Reward Linearly-solvable Markov Decision Processes
Guillermo Infante, Anders Jonsson, Vicenç Gómez
TL;DR
The paper addresses hierarchical reinforcement learning in the average-reward setting for Linearly-solvable MDPs (LMDPs), showing how to learn low-level and high-level tasks jointly without restrictive assumptions. It develops a hierarchical ALMDP framework built on state-space partitioning, subtask equivalence, and compositional value representation, enabling exact construction of the high-level value function from base subtasks. Two algorithms are proposed: a two-stage eigenvector method and an online algorithm that learns gains, subtasks, and exit values from samples; both leverage the linear structure of LMDPs and compositionality. Empirical results on N-room and Taxi domains demonstrate substantial speedups over flat ALMDP learning, validating the approach's efficiency and scalability for hierarchical average-reward tasks.
Abstract
We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable Markov Decision Processes (LMDPs) in the infinite-horizon average-reward setting. Unlike previous work, our approach allows learning low-level and high-level tasks simultaneously, without imposing limiting restrictions on the low-level tasks. Our method relies on partitions of the state space that create smaller subtasks that are easier to solve, and the equivalence between such partitions to learn more efficiently. We then exploit the compositionality of low-level tasks to exactly represent the value function of the high-level task. Experiments show that our approach can outperform flat average-reward reinforcement learning by one or several orders of magnitude.
