Robust Policy Learning via Offline Skill Diffusion
Woo Kyung Kim, Minjong Yoo, Honguk Woo
TL;DR
DuSkill addresses cross-domain generalization in offline skill learning by employing offline skill diffusion with a guided diffusion-based decoder conditioned on two embedding spaces $\mathcal{Z}_{\rho}$ (domain-invariant) and $\mathcal{Z}_{\sigma}$ (domain-variant), trained with priors $p_{\rho}$ and $p_{\sigma}$. The downstream policy learns $\pi(z_{\rho},z_{\sigma}|s)$ and uses a frozen diffusion decoder to generate actions via a DDPM-based process, enabling diverse skills beyond the training datasets. Experiments on long-horizon tasks in Multi-stage Meta-World show improved few-shot imitation and online RL across domains, with ablations confirming the necessity of the hierarchical encoding and guided diffusion components. The work demonstrates the practical impact of diffusion-based skill generation for cross-domain hierarchical RL, offering a pathway to robust, data-efficient adaptation.
Abstract
Skill-based reinforcement learning (RL) approaches have shown considerable promise, especially in solving long-horizon tasks via hierarchical structures. These skills, learned task-agnostically from offline datasets, can accelerate the policy learning process for new tasks. Yet, the application of these skills in different domains remains restricted due to their inherent dependency on the datasets, which poses a challenge when attempting to learn a skill-based policy via RL for a target domain different from the datasets' domains. In this paper, we present a novel offline skill learning framework DuSkill which employs a guided Diffusion model to generate versatile skills extended from the limited skills in datasets, thereby enhancing the robustness of policy learning for tasks in different domains. Specifically, we devise a guided diffusion-based skill decoder in conjunction with the hierarchical encoding to disentangle the skill embedding space into two distinct representations, one for encapsulating domain-invariant behaviors and the other for delineating the factors that induce domain variations in the behaviors. Our DuSkill framework enhances the diversity of skills learned offline, thus enabling to accelerate the learning procedure of high-level policies for different domains. Through experiments, we show that DuSkill outperforms other skill-based imitation learning and RL algorithms for several long-horizon tasks, demonstrating its benefits in few-shot imitation and online RL.
