Table of Contents
Fetching ...

Dynamic Contrastive Skill Learning with State-Transition Based Skill Clustering and Dynamic Length Adjustment

Jinwoo Choi, Seung-Woo Seo

TL;DR

Dynamic Contrastive Skill Learning (DCSL) tackles long-horizon reinforcement learning by redefining skill representation around state transitions rather than fixed action sequences. It learns a semantic skill similarity function via contrastive learning and dynamically adjusts skill lengths to match the temporal extent of behaviors, enabling robust skill extraction from noisy offline data. Empirical results across Antmaze, Kitchen, and Pick-and-Place demonstrate competitive performance and improved exploration efficiency, while ablations show the value of the similarity function and length relabeling. Overall, DCSL advances scalable, adaptable skill discovery with practical implications for offline RL and long-horizon planning.

Abstract

Reinforcement learning (RL) has made significant progress in various domains, but scaling it to long-horizon tasks with complex decision-making remains challenging. Skill learning attempts to address this by abstracting actions into higher-level behaviors. However, current approaches often fail to recognize semantically similar behaviors as the same skill and use fixed skill lengths, limiting flexibility and generalization. To address this, we propose Dynamic Contrastive Skill Learning (DCSL), a novel framework that redefines skill representation and learning. DCSL introduces three key ideas: state-transition based skill representation, skill similarity function learning, and dynamic skill length adjustment. By focusing on state transitions and leveraging contrastive learning, DCSL effectively captures the semantic context of behaviors and adapts skill lengths to match the appropriate temporal extent of behaviors. Our approach enables more flexible and adaptive skill extraction, particularly in complex or noisy datasets, and demonstrates competitive performance compared to existing methods in task completion and efficiency.

Dynamic Contrastive Skill Learning with State-Transition Based Skill Clustering and Dynamic Length Adjustment

TL;DR

Dynamic Contrastive Skill Learning (DCSL) tackles long-horizon reinforcement learning by redefining skill representation around state transitions rather than fixed action sequences. It learns a semantic skill similarity function via contrastive learning and dynamically adjusts skill lengths to match the temporal extent of behaviors, enabling robust skill extraction from noisy offline data. Empirical results across Antmaze, Kitchen, and Pick-and-Place demonstrate competitive performance and improved exploration efficiency, while ablations show the value of the similarity function and length relabeling. Overall, DCSL advances scalable, adaptable skill discovery with practical implications for offline RL and long-horizon planning.

Abstract

Reinforcement learning (RL) has made significant progress in various domains, but scaling it to long-horizon tasks with complex decision-making remains challenging. Skill learning attempts to address this by abstracting actions into higher-level behaviors. However, current approaches often fail to recognize semantically similar behaviors as the same skill and use fixed skill lengths, limiting flexibility and generalization. To address this, we propose Dynamic Contrastive Skill Learning (DCSL), a novel framework that redefines skill representation and learning. DCSL introduces three key ideas: state-transition based skill representation, skill similarity function learning, and dynamic skill length adjustment. By focusing on state transitions and leveraging contrastive learning, DCSL effectively captures the semantic context of behaviors and adapts skill lengths to match the appropriate temporal extent of behaviors. Our approach enables more flexible and adaptive skill extraction, particularly in complex or noisy datasets, and demonstrates competitive performance compared to existing methods in task completion and efficiency.

Paper Structure

This paper contains 41 sections, 2 theorems, 18 equations, 12 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

Assuming an optimal discriminator, the contrastive learning objective maximizes the mutual information between skills and state transitions.

Figures (12)

  • Figure 1: Illustration of DCSL's key ideas. Previous methods recognized 'grab object' as different skills based on object position, while DCSL clusters these into a unified skill using learned skill similarity. DCSL overcomes the limitation of fixed-length skills by dynamically adjusting skill lengths, enabling the recognition of shorter actions like 'Pick up' as independent skills.
  • Figure 2: The overall architecture of our framework, including skill extraction, skill length relabeling, and the application of skills to downstream tasks. Our method samples four states to encode skills, with red-bordered states representing negative samples. In the relabeling process, green borders indicate the skill initial state, and yellow borders denote the skill terminate state. The learned skills are then utilized in downstream RL tasks.
  • Figure 3: Performance comparison across Antmaze-Medium, Antmaze-Large, Kitchen, and Pick-and-Place environments (with 5 different random seeds). Results demonstrate the adaptability and effectiveness of DCSL in handling diverse and long-horizon tasks. Dark lines represent the average returns, and shaded areas represent standard deviations. A single training step was conducted after rolling out one episode.
  • Figure 4: Ablation study results in the Pick-and-Place environment across the ME, MR, and RP datasets, comparing the impact of the similarity function and skill length relabeling in our framework. These results highlight the significance of these techniques in maintaining performance robustness in suboptimal and noisy environments.
  • Figure 5: Comparison of execution results for 50 randomly sampled skills in the Antmaze environment. Each colored line represents the trajectory of the Ant agent during the execution of a single skill. While SPiRL and SkiMo show similar behavior patterns repeating across multiple skills, DCSL demonstrates movements in various directions and patterns. This suggests that DCSL has learned a more efficient skill representation space by effectively clustering similar behaviors.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2