Reinforcement Learning with Options and State Representation

Ayoub Ghriss; Masashi Sugiyama; Alessandro Lazaric

Reinforcement Learning with Options and State Representation

Ayoub Ghriss, Masashi Sugiyama, Alessandro Lazaric

TL;DR

This work addresses learning in high-dimensional environments by combining hierarchical RL with spectral methods. It introduces a hierarchical policy structure with a gating mechanism and options to enable temporal abstraction, and it investigates eigenoptions derived from graph Laplacians to capture environment geometry. A Regularized Information Maximization framework is applied to drive informative gating, and a TRPO-based approach is extended to hierarchical policies (TRHPO). Separately, a spectral framework learns eigenvectors/eigenoptions via neural networks (Spectral Net, Spectral Inference Networks) and is evaluated on synthetic grids and clustering benchmarks (MNIST, Reuters). Overall, the paper demonstrates that state representation through spectral methods can yield invariant, scalable learning primitives for HRL, though large networks and optimization challenges in eigenoption learning are noted as ongoing hurdles.

Abstract

The current thesis aims to explore the reinforcement learning field and build on existing methods to produce improved ones to tackle the problem of learning in high-dimensional and complex environments. It addresses such goals by decomposing learning tasks in a hierarchical fashion known as Hierarchical Reinforcement Learning. We start in the first chapter by getting familiar with the Markov Decision Process framework and presenting some of its recent techniques that the following chapters use. We then proceed to build our Hierarchical Policy learning as an answer to the limitations of a single primitive policy. The hierarchy is composed of a manager agent at the top and employee agents at the lower level. In the last chapter, which is the core of this thesis, we attempt to learn lower-level elements of the hierarchy independently of the manager level in what is known as the "Eigenoption". Based on the graph structure of the environment, Eigenoptions allow us to build agents that are aware of the geometric and dynamic properties of the environment. Their decision-making has a special property: it is invariant to symmetric transformations of the environment, allowing as a consequence to greatly reduce the complexity of the learning task.

Reinforcement Learning with Options and State Representation

TL;DR

Abstract

Paper Structure (47 sections, 7 theorems, 65 equations, 10 figures, 5 algorithms)

This paper contains 47 sections, 7 theorems, 65 equations, 10 figures, 5 algorithms.

Reinforcement Learning Framework
Markov Decision Process
Value Functions
MDP Properties
Dynamic Programming
Bellman Equations
Iterative Learning
Value Functions Iteration
Policy Iteration
Temporal Difference
Gradient Methods
Regularization functions
Convex optimization perspective
Trust Region Policy Optimization
recipe
...and 32 more sections

Key Result

Lemma 1.1

Let $\pi$ and $\tilde{\pi}$ be two policies over the states space $\mathcal{S}$, we have the following and using the density $\rho_{\pi}$ :

Figures (10)

Figure 1: MDP structure
Figure 2: 4-room environment for different sizes
Figure 3: TRPO in 4 Rooms
Figure 4: TRHPO vs TRPO in 4 Rooms
Figure 5: TRHPO exploited options
...and 5 more figures

Theorems & Definitions (8)

Lemma 1.1
Theorem 1.1: Contraction mapping theorem
Lemma 1.2
Lemma 2.1
Lemma 3.1
Lemma 3.2
Lemma 3
proof

Reinforcement Learning with Options and State Representation

TL;DR

Abstract

Reinforcement Learning with Options and State Representation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (8)