Table of Contents
Fetching ...

When Do Skills Help Reinforcement Learning? A Theoretical Analysis of Temporal Abstractions

Zhening Li, Gabriel Poesia, Armando Solar-Lezama

TL;DR

This paper provides a theoretical characterization of when skills improve reinforcement learning in deterministic sparse-reward MDPs by introducing two core difficulty metrics: $J_{learn}$ for learning from experience and $J_{explore}$ for exploration. It establishes a formal connection between these difficulties and trajectory incompressibility via the $p$-incompressibility measure, showing that more expressive skills can reduce learning gains and that macro-actions can even harm learning in highly incompressible environments, while they tend to aid exploration by increasing solution density. The authors prove lower bounds linking skill expressivity, action-space size, and incompressibility to observed improvements, and they corroborate these results with experiments across four benchmark tasks and multiple RL algorithms. They also discuss practical pathways to derive skill-learning objectives from incompressibility, connecting to MDL-based approaches such as LOVE and LEMMA. Overall, the work provides a principled framework to guide automatic skill discovery and informs practitioners about when and how to deploy skills for improved sample efficiency.

Abstract

Skills are temporal abstractions that are intended to improve reinforcement learning (RL) performance through hierarchical RL. Despite our intuition about the properties of an environment that make skills useful, a precise characterization has been absent. We provide the first such characterization, focusing on the utility of deterministic skills in deterministic sparse-reward environments with finite action spaces. We show theoretically and empirically that RL performance gain from skills is worse in environments where solutions to states are less compressible. Additional theoretical results suggest that skills benefit exploration more than they benefit learning from existing experience, and that using unexpressive skills such as macroactions may worsen RL performance. We hope our findings can guide research on automatic skill discovery and help RL practitioners better decide when and how to use skills.

When Do Skills Help Reinforcement Learning? A Theoretical Analysis of Temporal Abstractions

TL;DR

This paper provides a theoretical characterization of when skills improve reinforcement learning in deterministic sparse-reward MDPs by introducing two core difficulty metrics: for learning from experience and for exploration. It establishes a formal connection between these difficulties and trajectory incompressibility via the -incompressibility measure, showing that more expressive skills can reduce learning gains and that macro-actions can even harm learning in highly incompressible environments, while they tend to aid exploration by increasing solution density. The authors prove lower bounds linking skill expressivity, action-space size, and incompressibility to observed improvements, and they corroborate these results with experiments across four benchmark tasks and multiple RL algorithms. They also discuss practical pathways to derive skill-learning objectives from incompressibility, connecting to MDL-based approaches such as LOVE and LEMMA. Overall, the work provides a principled framework to guide automatic skill discovery and informs practitioners about when and how to deploy skills for improved sample efficiency.

Abstract

Skills are temporal abstractions that are intended to improve reinforcement learning (RL) performance through hierarchical RL. Despite our intuition about the properties of an environment that make skills useful, a precise characterization has been absent. We provide the first such characterization, focusing on the utility of deterministic skills in deterministic sparse-reward environments with finite action spaces. We show theoretically and empirically that RL performance gain from skills is worse in environments where solutions to states are less compressible. Additional theoretical results suggest that skills benefit exploration more than they benefit learning from existing experience, and that using unexpressive skills such as macroactions may worsen RL performance. We hope our findings can guide research on automatic skill discovery and help RL practitioners better decide when and how to use skills.
Paper Structure (27 sections, 15 theorems, 94 equations, 1 figure, 5 tables)

This paper contains 27 sections, 15 theorems, 94 equations, 1 figure, 5 tables.

Key Result

Lemma 3.1

Suppose we apply value iteration with discount rate $\gamma=1$ and learning rate $\alpha$ to a DSMDP $\mathcal{M} = (S, A, T, g)$ with a finite action space. In particular, we initialize $V(s) \gets 0$ for $s \neq g$ and $V(g) \gets 1$, and at time $t$, we update the entire table using If $\alpha = 1$, then the number of time steps until the value of a solvable state $s$ becomes its true value (i

Figures (1)

  • Figure 1: For each of the 4 environments studied, we plot the point $(x, y)$ where $x$ is the unmerged $p$-incompressibility of the base environment and $y$ is the best complexity improvement ratio $\min C_{+} / C_{0}$ over the 31 macroaction augmentations of the base environment. Different colors represent different measures $C$ of complexity, and different panels correspond to sample complexities $N$ of different RL algorithms. The plots corresponding to $p$-learning difficulty ($J_{\text{learn}}$) and $p$-exploration difficulty ($J_{\text{explore}}$) have been repeated across panels for clearer comparison with the plots corresponding to the sample complexities ($N$) of the RL algorithms.

Theorems & Definitions (43)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5
  • Definition 2.6
  • Definition 2.7
  • Lemma 3.1
  • Definition 3.2
  • Definition 3.3
  • ...and 33 more