Table of Contents
Fetching ...

Periodic Skill Discovery

Jonghae Park, Daesol Cho, Jusuk Lee, Dongseok Shim, Inkyu Jang, H. Jin Kim

TL;DR

The paper tackles the lack of periodic structure in unsupervised skill discovery for RL. It introduces Periodic Skill Discovery (PSD), which maps states to a circular latent space parameterized by a period L, and trains a policy with a single-step intrinsic reward to learn multi-timescale periodic behaviors. PSD achieves diverse, temporally structured skills that transfer to downstream tasks (e.g., hurdling) and scales to pixel-based observations, with an adaptive sampling mechanism to cover feasible periods. It can also combine with METRA to enrich the behavioral repertoire, highlighting a scalable approach to uncover temporally organized behaviors without external rewards.

Abstract

Unsupervised skill discovery in reinforcement learning (RL) aims to learn diverse behaviors without relying on external rewards. However, current methods often overlook the periodic nature of learned skills, focusing instead on increasing the mutual dependence between states and skills or maximizing the distance traveled in latent space. Considering that many robotic tasks - particularly those involving locomotion - require periodic behaviors across varying timescales, the ability to discover diverse periodic skills is essential. Motivated by this, we propose Periodic Skill Discovery (PSD), a framework that discovers periodic behaviors in an unsupervised manner. The key idea of PSD is to train an encoder that maps states to a circular latent space, thereby naturally encoding periodicity in the latent representation. By capturing temporal distance, PSD can effectively learn skills with diverse periods in complex robotic tasks, even with pixel-based observations. We further show that these learned skills achieve high performance on downstream tasks such as hurdling. Moreover, integrating PSD with an existing skill discovery method offers more diverse behaviors, thus broadening the agent's repertoire. Our code and demos are available at https://jonghaepark.github.io/psd/

Periodic Skill Discovery

TL;DR

The paper tackles the lack of periodic structure in unsupervised skill discovery for RL. It introduces Periodic Skill Discovery (PSD), which maps states to a circular latent space parameterized by a period L, and trains a policy with a single-step intrinsic reward to learn multi-timescale periodic behaviors. PSD achieves diverse, temporally structured skills that transfer to downstream tasks (e.g., hurdling) and scales to pixel-based observations, with an adaptive sampling mechanism to cover feasible periods. It can also combine with METRA to enrich the behavioral repertoire, highlighting a scalable approach to uncover temporally organized behaviors without external rewards.

Abstract

Unsupervised skill discovery in reinforcement learning (RL) aims to learn diverse behaviors without relying on external rewards. However, current methods often overlook the periodic nature of learned skills, focusing instead on increasing the mutual dependence between states and skills or maximizing the distance traveled in latent space. Considering that many robotic tasks - particularly those involving locomotion - require periodic behaviors across varying timescales, the ability to discover diverse periodic skills is essential. Motivated by this, we propose Periodic Skill Discovery (PSD), a framework that discovers periodic behaviors in an unsupervised manner. The key idea of PSD is to train an encoder that maps states to a circular latent space, thereby naturally encoding periodicity in the latent representation. By capturing temporal distance, PSD can effectively learn skills with diverse periods in complex robotic tasks, even with pixel-based observations. We further show that these learned skills achieve high performance on downstream tasks such as hurdling. Moreover, integrating PSD with an existing skill discovery method offers more diverse behaviors, thus broadening the agent's repertoire. Our code and demos are available at https://jonghaepark.github.io/psd/

Paper Structure

This paper contains 44 sections, 1 theorem, 15 equations, 13 figures, 5 tables, 3 algorithms.

Key Result

Theorem 1

Given a positive integer $L$, $\phi_L$ is an optimal solution to $\mathcal{J}_{\text{PSD}}$ if and only if it forms a regular $2L$-gon of diameter $L$ centered at the origin.

Figures (13)

  • Figure 1: Visualization of the circular latent space for Walker2D and HalfCheetah. The core idea of PSD is to map the state space into a circular latent space, where temporal distance is encoded geometrically. The figure visualizes an actual policy learned by PSD, where following larger circular paths (blue$\rightarrow$magenta) corresponds to longer-period behaviors.
  • Figure 2: Latent space of PSD. Illustration of the circular structure induced by optimizing $\mathcal{J}_{\text{PSD}}$.
  • Figure 3: Comparison of skill trajectories in the frequency domain. We apply a Fourier transform to skill trajectories, where each skill is uniformly sampled from the skill prior of each method. The resulting spectrum illustrates the frequency ($x$-axis) and amplitude ($y$-axis), representing the temporal patterns of each skill. The accompanying bar chart visualizes the four most dominant frequencies—ranked by amplitude—and highlights the range of discovered periods.
  • Figure 4: MuJoCo locomotion environments.
  • Figure 5: Trajectories of the skill policy and corresponding latent representation. The figure shows the joint trajectories of Ant (top) and Walker2D (bottom) and a 2D PCA projection of their latent encodings learned by PSD. Within a single episode, we switch the period variable $L$ at fixed time intervals. The resulting behavior of the skill policy exhibits a period of $2L$ timesteps.
  • ...and 8 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof