Table of Contents
Fetching ...

Robust Policy Learning via Offline Skill Diffusion

Woo Kyung Kim, Minjong Yoo, Honguk Woo

TL;DR

DuSkill addresses cross-domain generalization in offline skill learning by employing offline skill diffusion with a guided diffusion-based decoder conditioned on two embedding spaces $\mathcal{Z}_{\rho}$ (domain-invariant) and $\mathcal{Z}_{\sigma}$ (domain-variant), trained with priors $p_{\rho}$ and $p_{\sigma}$. The downstream policy learns $\pi(z_{\rho},z_{\sigma}|s)$ and uses a frozen diffusion decoder to generate actions via a DDPM-based process, enabling diverse skills beyond the training datasets. Experiments on long-horizon tasks in Multi-stage Meta-World show improved few-shot imitation and online RL across domains, with ablations confirming the necessity of the hierarchical encoding and guided diffusion components. The work demonstrates the practical impact of diffusion-based skill generation for cross-domain hierarchical RL, offering a pathway to robust, data-efficient adaptation.

Abstract

Skill-based reinforcement learning (RL) approaches have shown considerable promise, especially in solving long-horizon tasks via hierarchical structures. These skills, learned task-agnostically from offline datasets, can accelerate the policy learning process for new tasks. Yet, the application of these skills in different domains remains restricted due to their inherent dependency on the datasets, which poses a challenge when attempting to learn a skill-based policy via RL for a target domain different from the datasets' domains. In this paper, we present a novel offline skill learning framework DuSkill which employs a guided Diffusion model to generate versatile skills extended from the limited skills in datasets, thereby enhancing the robustness of policy learning for tasks in different domains. Specifically, we devise a guided diffusion-based skill decoder in conjunction with the hierarchical encoding to disentangle the skill embedding space into two distinct representations, one for encapsulating domain-invariant behaviors and the other for delineating the factors that induce domain variations in the behaviors. Our DuSkill framework enhances the diversity of skills learned offline, thus enabling to accelerate the learning procedure of high-level policies for different domains. Through experiments, we show that DuSkill outperforms other skill-based imitation learning and RL algorithms for several long-horizon tasks, demonstrating its benefits in few-shot imitation and online RL.

Robust Policy Learning via Offline Skill Diffusion

TL;DR

DuSkill addresses cross-domain generalization in offline skill learning by employing offline skill diffusion with a guided diffusion-based decoder conditioned on two embedding spaces (domain-invariant) and (domain-variant), trained with priors and . The downstream policy learns and uses a frozen diffusion decoder to generate actions via a DDPM-based process, enabling diverse skills beyond the training datasets. Experiments on long-horizon tasks in Multi-stage Meta-World show improved few-shot imitation and online RL across domains, with ablations confirming the necessity of the hierarchical encoding and guided diffusion components. The work demonstrates the practical impact of diffusion-based skill generation for cross-domain hierarchical RL, offering a pathway to robust, data-efficient adaptation.

Abstract

Skill-based reinforcement learning (RL) approaches have shown considerable promise, especially in solving long-horizon tasks via hierarchical structures. These skills, learned task-agnostically from offline datasets, can accelerate the policy learning process for new tasks. Yet, the application of these skills in different domains remains restricted due to their inherent dependency on the datasets, which poses a challenge when attempting to learn a skill-based policy via RL for a target domain different from the datasets' domains. In this paper, we present a novel offline skill learning framework DuSkill which employs a guided Diffusion model to generate versatile skills extended from the limited skills in datasets, thereby enhancing the robustness of policy learning for tasks in different domains. Specifically, we devise a guided diffusion-based skill decoder in conjunction with the hierarchical encoding to disentangle the skill embedding space into two distinct representations, one for encapsulating domain-invariant behaviors and the other for delineating the factors that induce domain variations in the behaviors. Our DuSkill framework enhances the diversity of skills learned offline, thus enabling to accelerate the learning procedure of high-level policies for different domains. Through experiments, we show that DuSkill outperforms other skill-based imitation learning and RL algorithms for several long-horizon tasks, demonstrating its benefits in few-shot imitation and online RL.
Paper Structure (20 sections, 13 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 20 sections, 13 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Concept of Offline Skill Diffusion: When a downstream task belongs to the domain different from those of the training datasets, conventional skill-based learning approaches struggle in learning and choosing suitable skills. In contrast, our offline skill diffusion expands the skill diversity that goes beyond the training datasets, enabling the execution of compatible skills for the downstream task. The skills are discretely represented for visual illustration.
  • Figure 2: Offline Skill Diffusion and Downstream Policy Learning in $\text{DuSkill}$: (i) In the offline skill diffusion phase, a skill is decomposed into domain-invariant and domain-variant embeddings, and then they are combined through the guided diffusion based decoder to generate diverse skills. (ii) In the downstream policy learning phase, a high-level policy is learned for a task in different domains either through few-shot imitation or online RL.
  • Figure 3: $\text{DuSkill}$ Framework: In (i-1), the hierarchical domain encoder disentangles the domain-invariant and domain-variant embeddings. At the same time, the domain-invariant and domain-variant priors are jointly learned with these encoders. For diverse skill generation, the encoders are trained in conjunction with the guided diffusion-based decoder in (i-2). Here, the domain-invariant decoder and domain-variant decoder are responsible for reconstructing actions based on the domain-invariant and domain-variant embeddings, respectively. In (ii), the high-level policy is learned to solve the task in different domain either through few-shot imitation or online RL.
  • Figure 4: Visualization of Domain-invariant and Domain-variant embeddings
  • Figure 5: Sample Efficiency of Downstream Policy Learning
  • ...and 1 more figures