Table of Contents
Fetching ...

Focused Skill Discovery: Learning to Control Specific State Variables while Minimizing Side Effects

Jonathan Colaço Carr, Qinyi Sun, Cameron Allen

TL;DR

The paper tackles the limitation of unsupervised skill discovery methods that overlook control over individual state variables, leading to inefficient exploration and potential side effects when goals are underspecified. It introduces focused skill discovery, a general method that modifies existing skill rewards to target specific state variables while penalizing side effects on non-target variables, compatible with VIC, DIAYN, and LSD. Empirical results across three environments show that focused skills achieve up to about three times greater state coverage, enable faster downstream learning, and automatically avoid undesirable changes under underspecified goals, outperforming or matching recent baselines like DUSDi. This approach promises safer, more efficient pretraining of hierarchical policies and extends to a broad class of skill-discovery algorithms; future work includes scaling to continuous domains and exploring theoretical guarantees about reward hacking and focus.

Abstract

Skills are essential for unlocking higher levels of problem solving. A common approach to discovering these skills is to learn ones that reliably reach different states, thus empowering the agent to control its environment. However, existing skill discovery algorithms often overlook the natural state variables present in many reinforcement learning problems, meaning that the discovered skills lack control of specific state variables. This can significantly hamper exploration efficiency, make skills more challenging to learn with, and lead to negative side effects in downstream tasks when the goal is under-specified. We introduce a general method that enables these skill discovery algorithms to learn focused skills -- skills that target and control specific state variables. Our approach improves state space coverage by a factor of three, unlocks new learning capabilities, and automatically avoids negative side effects in downstream tasks.

Focused Skill Discovery: Learning to Control Specific State Variables while Minimizing Side Effects

TL;DR

The paper tackles the limitation of unsupervised skill discovery methods that overlook control over individual state variables, leading to inefficient exploration and potential side effects when goals are underspecified. It introduces focused skill discovery, a general method that modifies existing skill rewards to target specific state variables while penalizing side effects on non-target variables, compatible with VIC, DIAYN, and LSD. Empirical results across three environments show that focused skills achieve up to about three times greater state coverage, enable faster downstream learning, and automatically avoid undesirable changes under underspecified goals, outperforming or matching recent baselines like DUSDi. This approach promises safer, more efficient pretraining of hierarchical policies and extends to a broad class of skill-discovery algorithms; future work includes scaling to continuous domains and exploring theoretical guarantees about reward hacking and focus.

Abstract

Skills are essential for unlocking higher levels of problem solving. A common approach to discovering these skills is to learn ones that reliably reach different states, thus empowering the agent to control its environment. However, existing skill discovery algorithms often overlook the natural state variables present in many reinforcement learning problems, meaning that the discovered skills lack control of specific state variables. This can significantly hamper exploration efficiency, make skills more challenging to learn with, and lead to negative side effects in downstream tasks when the goal is under-specified. We introduce a general method that enables these skill discovery algorithms to learn focused skills -- skills that target and control specific state variables. Our approach improves state space coverage by a factor of three, unlocks new learning capabilities, and automatically avoids negative side effects in downstream tasks.

Paper Structure

This paper contains 19 sections, 9 equations, 5 figures, 3 algorithms.

Figures (5)

  • Figure 1: Skill trajectories for VIC, DIAYN and LSD algorithms. Solid lines are trajectories from focused skills, dashed lines are from DUSDi skills and grey lines are from the baseline algorithms. Skills start in the blue square and terminate at the circles. Focused skills learn to collect objects and return to the start state in order to minimize side effects.
  • Figure 2: State coverage in the FourRooms environment. Focused skills explore three times more efficiently than unfocused skills, as measured by the Area Under the Curve (AUC).
  • Figure 3: Learning performance in downstream tasks. Focused skills (solid lines) lead to faster learning and are the only ones which can accomplish the task in MudWorld.
  • Figure 4: Task performance in ForageWorld and MudWorld when agents are trained with a proxy reward instead of the true reward. Focused skills (solid lines) are the only ones which maximize the true return when only given the proxy reward, meaning that they are the only ones that can automatically avoid making unwanted changes that are not explicit in the agent's goal.
  • Figure 5: Impact of the side effect penalty strength on focused skills in the MudWorld domain. Skills are not effective when there is no side effects penalty (i.e. when $\lambda=0$).

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6