Reinforcement Learning with Options and State Representation
Ayoub Ghriss, Masashi Sugiyama, Alessandro Lazaric
TL;DR
This work addresses learning in high-dimensional environments by combining hierarchical RL with spectral methods. It introduces a hierarchical policy structure with a gating mechanism and options to enable temporal abstraction, and it investigates eigenoptions derived from graph Laplacians to capture environment geometry. A Regularized Information Maximization framework is applied to drive informative gating, and a TRPO-based approach is extended to hierarchical policies (TRHPO). Separately, a spectral framework learns eigenvectors/eigenoptions via neural networks (Spectral Net, Spectral Inference Networks) and is evaluated on synthetic grids and clustering benchmarks (MNIST, Reuters). Overall, the paper demonstrates that state representation through spectral methods can yield invariant, scalable learning primitives for HRL, though large networks and optimization challenges in eigenoption learning are noted as ongoing hurdles.
Abstract
The current thesis aims to explore the reinforcement learning field and build on existing methods to produce improved ones to tackle the problem of learning in high-dimensional and complex environments. It addresses such goals by decomposing learning tasks in a hierarchical fashion known as Hierarchical Reinforcement Learning. We start in the first chapter by getting familiar with the Markov Decision Process framework and presenting some of its recent techniques that the following chapters use. We then proceed to build our Hierarchical Policy learning as an answer to the limitations of a single primitive policy. The hierarchy is composed of a manager agent at the top and employee agents at the lower level. In the last chapter, which is the core of this thesis, we attempt to learn lower-level elements of the hierarchy independently of the manager level in what is known as the "Eigenoption". Based on the graph structure of the environment, Eigenoptions allow us to build agents that are aware of the geometric and dynamic properties of the environment. Their decision-making has a special property: it is invariant to symmetric transformations of the environment, allowing as a consequence to greatly reduce the complexity of the learning task.
