Table of Contents
Fetching ...

Realizable Abstractions: Near-Optimal Hierarchical Reinforcement Learning

Roberto Cipollone, Luca Iocchi, Matteo Leonetti

TL;DR

This work introduces Realizable Abstractions, a formal framework that couples a finite high-level decision process (a 2-MDP) to a ground MDP through phi-relative options, ensuring that abstract transitions and rewards can be realized in the ground task. By requiring realizability and admissibility, the authors show that abstract policies translate into near-optimal ground policies via composition of options, and they provide a learning mechanism (RARL) that is PAC and sample-efficient. Realizable Abstractions also enable horizon reduction and robustness to abstraction inaccuracies, with formal connections to MDP homomorphisms and stochastic bisimulation. The proposed RARL algorithm iteratively refines the abstraction, learns realizing options through CMDPs, and updates the abstract model to maintain optimism, achieving near-optimal performance with polynomial sample complexity in structured HRL settings. Overall, the paper provides a principled, theory-backed path for scalable, compositional RL with guarantees, even when the ground MDP is large or non-finite.

Abstract

The main focus of Hierarchical Reinforcement Learning (HRL) is studying how large Markov Decision Processes (MDPs) can be more efficiently solved when addressed in a modular way, by combining partial solutions computed for smaller subtasks. Despite their very intuitive role for learning, most notions of MDP abstractions proposed in the HRL literature have limited expressive power or do not possess formal efficiency guarantees. This work addresses these fundamental issues by defining Realizable Abstractions, a new relation between generic low-level MDPs and their associated high-level decision processes. The notion we propose avoids non-Markovianity issues and has desirable near-optimality guarantees. Indeed, we show that any abstract policy for Realizable Abstractions can be translated into near-optimal policies for the low-level MDP, through a suitable composition of options. As demonstrated in the paper, these options can be expressed as solutions of specific constrained MDPs. Based on these findings, we propose RARL, a new HRL algorithm that returns compositional and near-optimal low-level policies, taking advantage of the Realizable Abstraction given in the input. We show that RARL is Probably Approximately Correct, it converges in a polynomial number of samples, and it is robust to inaccuracies in the abstraction.

Realizable Abstractions: Near-Optimal Hierarchical Reinforcement Learning

TL;DR

This work introduces Realizable Abstractions, a formal framework that couples a finite high-level decision process (a 2-MDP) to a ground MDP through phi-relative options, ensuring that abstract transitions and rewards can be realized in the ground task. By requiring realizability and admissibility, the authors show that abstract policies translate into near-optimal ground policies via composition of options, and they provide a learning mechanism (RARL) that is PAC and sample-efficient. Realizable Abstractions also enable horizon reduction and robustness to abstraction inaccuracies, with formal connections to MDP homomorphisms and stochastic bisimulation. The proposed RARL algorithm iteratively refines the abstraction, learns realizing options through CMDPs, and updates the abstract model to maintain optimism, achieving near-optimal performance with polynomial sample complexity in structured HRL settings. Overall, the paper provides a principled, theory-backed path for scalable, compositional RL with guarantees, even when the ground MDP is large or non-finite.

Abstract

The main focus of Hierarchical Reinforcement Learning (HRL) is studying how large Markov Decision Processes (MDPs) can be more efficiently solved when addressed in a modular way, by combining partial solutions computed for smaller subtasks. Despite their very intuitive role for learning, most notions of MDP abstractions proposed in the HRL literature have limited expressive power or do not possess formal efficiency guarantees. This work addresses these fundamental issues by defining Realizable Abstractions, a new relation between generic low-level MDPs and their associated high-level decision processes. The notion we propose avoids non-Markovianity issues and has desirable near-optimality guarantees. Indeed, we show that any abstract policy for Realizable Abstractions can be translated into near-optimal policies for the low-level MDP, through a suitable composition of options. As demonstrated in the paper, these options can be expressed as solutions of specific constrained MDPs. Based on these findings, we propose RARL, a new HRL algorithm that returns compositional and near-optimal low-level policies, taking advantage of the Realizable Abstraction given in the input. We show that RARL is Probably Approximately Correct, it converges in a polynomial number of samples, and it is robust to inaccuracies in the abstraction.

Paper Structure

This paper contains 21 sections, 26 theorems, 66 equations, 2 figures, 1 algorithm.

Key Result

Theorem 1

Let $\langle \bar{\mathbf{M}}, \phi \rangle$ be an $(\alpha, \beta)$-realizable abstraction of an MDP $\mathbf{M}$. Then, if $\Omega$ is the realization of some abstract policy $\bar{\pi}$, then, for any $\bar{s}_p \in \bar{\mathcal{S}}$, $s \in \mathcal{X}_{\bar{s}_p}$, $\bar{s} = \phi(s)$, Moreover, if the marginal initial distributions correspond, $\forall \bar{s}.\, \bar{\mu}(\bar{s}) = \sum_

Figures (2)

  • Figure 1: (left) The running example. The ground MDP is a grid world domain and $\bar{\mathcal{S}} = \{\bar{s}_1, \bar{s}_2, \bar{s}_3\}$. Each e is an entry in $\mathcal{E}_{\bar{s}_2 \bar{s}_1}$ and each x is an exit in $\mathcal{X}_{\bar{s}_1}$. (right) A different ground MDP.
  • Figure 2: The ground MDP (left) and the abstract MDP (right) used in the proof of \ref{['th:realizable-homomorphisms-back']}.

Theorems & Definitions (44)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Definition 3
  • Proposition 1
  • Corollary 1
  • Proposition 1
  • Proposition 1
  • Proposition 1
  • Proposition 1
  • ...and 34 more