Table of Contents
Fetching ...

Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning

Jiaheng Hu, Zizhao Wang, Peter Stone, Roberto Martín-Martín

TL;DR

Disentangled Unsupervised Skill Discovery is proposed, a method for learning disentangled skills that can be efficiently reused to solve downstream tasks and significantly outperforms previous skill discovery methods when it comes to applying the learned skills to solve downstream tasks.

Abstract

A hallmark of intelligent agents is the ability to learn reusable skills purely from unsupervised interaction with the environment. However, existing unsupervised skill discovery methods often learn entangled skills where one skill variable simultaneously influences many entities in the environment, making downstream skill chaining extremely challenging. We propose Disentangled Unsupervised Skill Discovery (DUSDi), a method for learning disentangled skills that can be efficiently reused to solve downstream tasks. DUSDi decomposes skills into disentangled components, where each skill component only affects one factor of the state space. Importantly, these skill components can be concurrently composed to generate low-level actions, and efficiently chained to tackle downstream tasks through hierarchical Reinforcement Learning. DUSDi defines a novel mutual-information-based objective to enforce disentanglement between the influences of different skill components, and utilizes value factorization to optimize this objective efficiently. Evaluated in a set of challenging environments, DUSDi successfully learns disentangled skills, and significantly outperforms previous skill discovery methods when it comes to applying the learned skills to solve downstream tasks. Code and skills visualization at jiahenghu.github.io/DUSDi-site/.

Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning

TL;DR

Disentangled Unsupervised Skill Discovery is proposed, a method for learning disentangled skills that can be efficiently reused to solve downstream tasks and significantly outperforms previous skill discovery methods when it comes to applying the learned skills to solve downstream tasks.

Abstract

A hallmark of intelligent agents is the ability to learn reusable skills purely from unsupervised interaction with the environment. However, existing unsupervised skill discovery methods often learn entangled skills where one skill variable simultaneously influences many entities in the environment, making downstream skill chaining extremely challenging. We propose Disentangled Unsupervised Skill Discovery (DUSDi), a method for learning disentangled skills that can be efficiently reused to solve downstream tasks. DUSDi decomposes skills into disentangled components, where each skill component only affects one factor of the state space. Importantly, these skill components can be concurrently composed to generate low-level actions, and efficiently chained to tackle downstream tasks through hierarchical Reinforcement Learning. DUSDi defines a novel mutual-information-based objective to enforce disentanglement between the influences of different skill components, and utilizes value factorization to optimize this objective efficiently. Evaluated in a set of challenging environments, DUSDi successfully learns disentangled skills, and significantly outperforms previous skill discovery methods when it comes to applying the learned skills to solve downstream tasks. Code and skills visualization at jiahenghu.github.io/DUSDi-site/.

Paper Structure

This paper contains 28 sections, 5 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Consider an agent practicing driving skills by learning to control a car's speed (length of orange arrow), steering (curvature of orange arrow), and headlights (blue symbol), (Left) previous unsupervised skill discovery methods learn entangled skills, where a change in the skill variable can cause all three environment factors to change (Right) DUSDi learns disentangled skills with concurrent components, where each skill component only affects one factor of the state space, enabling efficient downstream task learning with hierarchical RL.
  • Figure 2: Two learning stages of DUSDi: (a) in disentangled skill learning stage, DUSDi creates a one-to-one mapping between state factors and skill components --- each disentangled skill component $z^i$ only influences state factor $s^i$. DUSDi designs a novel mutual-information-based intrinsic reward to enforce disentanglement and utilize $Q$-value decomposition to learn the skill policy $\pi_\theta$ efficiently. (b) in the task learning stage, the skill policy is used as a frozen low-level policy and a high-level policy $\pi_\text{high}$ is learned to select skill $z$ for every $L$ steps, by maximizing the task reward $r^\text{task}$.
  • Figure 3: Evaluation of the effect of Q-decomposition in skill learning. The plots depict the mean and standard deviation of accuracy ($\uparrow$) when predicting the skill component $z^i$ based on the state factor $s^i$, computed across 3 training processes. The higher prediction accuracy indicates that the policy learns to control more state factors in more distinguishable ways, leading to more efficient downstream task learning.
  • Figure 4: Training curves of DUSDi and baselines on multiple downstream tasks (reward supervised second phase). The plots depict the mean and standard deviation of the return of each method over 3 random seeds. DUSDi outperforms all baselines that learn entangled skills, converging faster and to higher returns.
  • Figure 5: Performance of DUSDi with image observations on two multi-particle downstream tasks over three random seeds. With the help of disentangled representation learning, DUSDi effectively learns skills based only on image observations and leverages the skills to solve challenging downstream tasks where baseline methods fail.
  • ...and 2 more figures