Table of Contents
Fetching ...

Accelerating Task Generalisation with Multi-Level Skill Hierarchies

Thomas P Cannon, Özgür Simsek

TL;DR

This paper addresses the challenge of generalising reinforcement learning to unseen tasks by introducing Fracture Cluster Options (FraCOs), a multi-level hierarchical framework that discovers and composes reusable skills from patterns in past trajectories. Fracture patterns (fractures) are clustered to form Fracture Clusters, whose expected usefulness—based on appearance probability, relative frequency, and usage entropy—drives which clusters become FraCOs. These FraCOs are converted into options with initiation sets and termination criteria, enabling agents to perform sequences of actions or nested options across tasks. Empirical results show that FraCOs accelerate learning and improve both IID and OOD performance in tabular and deep settings, outperforming PPO, OC-PPO, and PPG on grid-world, MetaGrid, and Procgen environments, with FraCOs-SSR offering additional benefits in deep settings. The framework lays groundwork for scalable, generalisable hierarchical RL, while highlighting challenges in clustering scalability and extension to continuous action spaces.

Abstract

Creating reinforcement learning agents that generalise effectively to new tasks is a key challenge in AI research. This paper introduces Fracture Cluster Options (FraCOs), a multi-level hierarchical reinforcement learning method that achieves state-of-the-art performance on difficult generalisation tasks. FraCOs identifies patterns in agent behaviour and forms options based on the expected future usefulness of those patterns, enabling rapid adaptation to new tasks. In tabular settings, FraCOs demonstrates effective transfer and improves performance as it grows in hierarchical depth. We evaluate FraCOs against state-of-the-art deep reinforcement learning algorithms in several complex procedurally generated environments. Our results show that FraCOs achieves higher in-distribution and out-of-distribution performance than competitors.

Accelerating Task Generalisation with Multi-Level Skill Hierarchies

TL;DR

This paper addresses the challenge of generalising reinforcement learning to unseen tasks by introducing Fracture Cluster Options (FraCOs), a multi-level hierarchical framework that discovers and composes reusable skills from patterns in past trajectories. Fracture patterns (fractures) are clustered to form Fracture Clusters, whose expected usefulness—based on appearance probability, relative frequency, and usage entropy—drives which clusters become FraCOs. These FraCOs are converted into options with initiation sets and termination criteria, enabling agents to perform sequences of actions or nested options across tasks. Empirical results show that FraCOs accelerate learning and improve both IID and OOD performance in tabular and deep settings, outperforming PPO, OC-PPO, and PPG on grid-world, MetaGrid, and Procgen environments, with FraCOs-SSR offering additional benefits in deep settings. The framework lays groundwork for scalable, generalisable hierarchical RL, while highlighting challenges in clustering scalability and extension to continuous action spaces.

Abstract

Creating reinforcement learning agents that generalise effectively to new tasks is a key challenge in AI research. This paper introduces Fracture Cluster Options (FraCOs), a multi-level hierarchical reinforcement learning method that achieves state-of-the-art performance on difficult generalisation tasks. FraCOs identifies patterns in agent behaviour and forms options based on the expected future usefulness of those patterns, enabling rapid adaptation to new tasks. In tabular settings, FraCOs demonstrates effective transfer and improves performance as it grows in hierarchical depth. We evaluate FraCOs against state-of-the-art deep reinforcement learning algorithms in several complex procedurally generated environments. Our results show that FraCOs achieves higher in-distribution and out-of-distribution performance than competitors.

Paper Structure

This paper contains 41 sections, 29 equations, 21 figures, 9 tables, 1 algorithm.

Figures (21)

  • Figure 1: A two-dimensional representation of the fractures ($b=2$) derived from agents acting for 10,000 time-steps in Four Rooms.
  • Figure 2: Four examples of discovered fracture clusters ($b = 4$) from agents trained in Four Rooms. Each cluster is represented by a colour. In the four examples, the green circles represent possible starting states, blue arrows indicate actions, the width of the arrows shows the frequency of the state-action pair within the cluster, and the red circles indicate the final states of the fracture.
  • Figure 3: The eight fracture clusters with the highest expected usefulness in Nine Rooms. Expected usefulness decreases from left to right in the top row, then from left to right in the bottom row. Green points represent possible starting states, blue arrows indicate actions, the width of the arrows shows the frequency of the state-action pair within the cluster, and the red points indicate the final states of the corresponding fracture.
  • Figure 4: Episodic returns with a tabular FraCOs agent trained in the Four Rooms, Nine Rooms, and Ramesh Maze environments. Results show interquartile means of 10 independently seeded experiments, with shaded areas indicating the standard error.
  • Figure 5: Episodic returns for tabular FraCOs in unseen MetaGrid domains of varying sizes. Results are the interquartile means of 10 independently seeded experiments, with shaded areas indicating the standard error.
  • ...and 16 more figures