Accelerating Task Generalisation with Multi-Level Skill Hierarchies
Thomas P Cannon, Özgür Simsek
TL;DR
This paper addresses the challenge of generalising reinforcement learning to unseen tasks by introducing Fracture Cluster Options (FraCOs), a multi-level hierarchical framework that discovers and composes reusable skills from patterns in past trajectories. Fracture patterns (fractures) are clustered to form Fracture Clusters, whose expected usefulness—based on appearance probability, relative frequency, and usage entropy—drives which clusters become FraCOs. These FraCOs are converted into options with initiation sets and termination criteria, enabling agents to perform sequences of actions or nested options across tasks. Empirical results show that FraCOs accelerate learning and improve both IID and OOD performance in tabular and deep settings, outperforming PPO, OC-PPO, and PPG on grid-world, MetaGrid, and Procgen environments, with FraCOs-SSR offering additional benefits in deep settings. The framework lays groundwork for scalable, generalisable hierarchical RL, while highlighting challenges in clustering scalability and extension to continuous action spaces.
Abstract
Creating reinforcement learning agents that generalise effectively to new tasks is a key challenge in AI research. This paper introduces Fracture Cluster Options (FraCOs), a multi-level hierarchical reinforcement learning method that achieves state-of-the-art performance on difficult generalisation tasks. FraCOs identifies patterns in agent behaviour and forms options based on the expected future usefulness of those patterns, enabling rapid adaptation to new tasks. In tabular settings, FraCOs demonstrates effective transfer and improves performance as it grows in hierarchical depth. We evaluate FraCOs against state-of-the-art deep reinforcement learning algorithms in several complex procedurally generated environments. Our results show that FraCOs achieves higher in-distribution and out-of-distribution performance than competitors.
