Table of Contents
Fetching ...

Creating Multi-Level Skill Hierarchies in Reinforcement Learning

Joshua B. Evans, Özgür Şimşek

TL;DR

This work tackles the challenge of autonomously discovering useful, multi-level action hierarchies for reinforcement learning by representing agent–environment interactions as a state-transition graph and applying modularity maximisation to reveal hierarchical structure. It introduces the Louvain skill hierarchy, which automatically generates a multi-level set of options that operate across time scales, with higher-level skills composed from lower-level ones. Empirical evaluations across six discrete domains show substantial learning improvements over baselines and demonstrate scalability to larger state spaces, along with incremental update capabilities and a continuous-domain demonstration. The approach offers a principled, scalable framework for unsupervised skill discovery with broad implications for exploration, planning, and transfer in RL.

Abstract

What is a useful skill hierarchy for an autonomous agent? We propose an answer based on a graphical representation of how the interaction between an agent and its environment may unfold. Our approach uses modularity maximisation as a central organising principle to expose the structure of the interaction graph at multiple levels of abstraction. The result is a collection of skills that operate at varying time scales, organised into a hierarchy, where skills that operate over longer time scales are composed of skills that operate over shorter time scales. The entire skill hierarchy is generated automatically, with no human intervention, including the skills themselves (their behaviour, when they can be called, and when they terminate) as well as the hierarchical dependency structure between them. In a wide range of environments, this approach generates skill hierarchies that are intuitively appealing and that considerably improve the learning performance of the agent.

Creating Multi-Level Skill Hierarchies in Reinforcement Learning

TL;DR

This work tackles the challenge of autonomously discovering useful, multi-level action hierarchies for reinforcement learning by representing agent–environment interactions as a state-transition graph and applying modularity maximisation to reveal hierarchical structure. It introduces the Louvain skill hierarchy, which automatically generates a multi-level set of options that operate across time scales, with higher-level skills composed from lower-level ones. Empirical evaluations across six discrete domains show substantial learning improvements over baselines and demonstrate scalability to larger state spaces, along with incremental update capabilities and a continuous-domain demonstration. The approach offers a principled, scalable framework for unsupervised skill discovery with broad implications for exploration, planning, and transfer in RL.

Abstract

What is a useful skill hierarchy for an autonomous agent? We propose an answer based on a graphical representation of how the interaction between an agent and its environment may unfold. Our approach uses modularity maximisation as a central organising principle to expose the structure of the interaction graph at multiple levels of abstraction. The result is a collection of skills that operate at varying time scales, organised into a hierarchy, where skills that operate over longer time scales are composed of skills that operate over shorter time scales. The entire skill hierarchy is generated automatically, with no human intervention, including the skills themselves (their behaviour, when they can be called, and when they terminate) as well as the hierarchical dependency structure between them. In a wide range of environments, this approach generates skill hierarchies that are intuitively appealing and that considerably improve the learning performance of the agent.
Paper Structure (15 sections, 1 equation, 10 figures, 3 algorithms)

This paper contains 15 sections, 1 equation, 10 figures, 3 algorithms.

Figures (10)

  • Figure 1: The environments.
  • Figure 1: Top two rows: Cluster hierarchies produced by the Louvain algorithm in Grid and Maze. Bottom row: The lowest level of the cluster hierarchy in Office.
  • Figure 2: The cluster hierarchies produced by the Louvain algorithm when applied to the state transition graphs representing Rooms, Office, Taxi, and Towers of Hanoi. For Taxi and Towers of Hanoi, the graph layout was determined by using a force-directed algorithm that models nodes as charged particles that repel each other and edges as springs that attract connected nodes.
  • Figure 2: Agent performance with Louvain skills generated using different settings of the resolution parameter $\rho$.
  • Figure 3: Learning performance. An epoch corresponds to 100 decision stages in Rooms and Towers of Hanoi, 300 in Taxi, 750 in Maze and Grid, and 1000 in Office.
  • ...and 5 more figures