Table of Contents
Fetching ...

DIDI: Diffusion-Guided Diversity for Offline Behavioral Generation

Jinxin Liu, Xinghong Guo, Zifeng Zhuang, Donglin Wang

TL;DR

DIDI addresses offline behavioral generation from reward-free data with multimodality. It introduces a contextual policy trained with a diffusion prior as regularization to induce diverse skills, using a three-step pseudo-labeling loop to stay within offline data. The method enables skill stitching and interpolation and supports reward-guided generation with extrinsic rewards, showing strong diversity and competitive performance across Push, Kitchen, Humanoid, and D4RL tasks. These results highlight the practical viability of diffusion-guided diversity for learning generalist skill spaces in offline settings and for downstream tasks.

Abstract

In this paper, we propose a novel approach called DIffusion-guided DIversity (DIDI) for offline behavioral generation. The goal of DIDI is to learn a diverse set of skills from a mixture of label-free offline data. We achieve this by leveraging diffusion probabilistic models as priors to guide the learning process and regularize the policy. By optimizing a joint objective that incorporates diversity and diffusion-guided regularization, we encourage the emergence of diverse behaviors while maintaining the similarity to the offline data. Experimental results in four decision-making domains (Push, Kitchen, Humanoid, and D4RL tasks) show that DIDI is effective in discovering diverse and discriminative skills. We also introduce skill stitching and skill interpolation, which highlight the generalist nature of the learned skill space. Further, by incorporating an extrinsic reward function, DIDI enables reward-guided behavior generation, facilitating the learning of diverse and optimal behaviors from sub-optimal data.

DIDI: Diffusion-Guided Diversity for Offline Behavioral Generation

TL;DR

DIDI addresses offline behavioral generation from reward-free data with multimodality. It introduces a contextual policy trained with a diffusion prior as regularization to induce diverse skills, using a three-step pseudo-labeling loop to stay within offline data. The method enables skill stitching and interpolation and supports reward-guided generation with extrinsic rewards, showing strong diversity and competitive performance across Push, Kitchen, Humanoid, and D4RL tasks. These results highlight the practical viability of diffusion-guided diversity for learning generalist skill spaces in offline settings and for downstream tasks.

Abstract

In this paper, we propose a novel approach called DIffusion-guided DIversity (DIDI) for offline behavioral generation. The goal of DIDI is to learn a diverse set of skills from a mixture of label-free offline data. We achieve this by leveraging diffusion probabilistic models as priors to guide the learning process and regularize the policy. By optimizing a joint objective that incorporates diversity and diffusion-guided regularization, we encourage the emergence of diverse behaviors while maintaining the similarity to the offline data. Experimental results in four decision-making domains (Push, Kitchen, Humanoid, and D4RL tasks) show that DIDI is effective in discovering diverse and discriminative skills. We also introduce skill stitching and skill interpolation, which highlight the generalist nature of the learned skill space. Further, by incorporating an extrinsic reward function, DIDI enables reward-guided behavior generation, facilitating the learning of diverse and optimal behaviors from sub-optimal data.
Paper Structure (22 sections, 19 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 19 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: Comparison between Diffuser janner2022planning and our DIDI (with a single skill, i.e., assuming $p(\mathbf{z}) = \delta(\mathbf{z})$).
  • Figure 2: Discovered diverse skills in three domains. We can see that in the Push domain, blocks are pushed to different positions. In the Kitchen domain, the robotic arm executes distinct actions. In the Humanoid domain, the agent exhibits different movements and navigates in different directions (the color progression from light to dark indicates the movement progress of the humanoid).
  • Figure 3: Skill stitching: (1st row) "walking" $\to$ "crouching" $\to$ "walking", (2nd row) "walking" $\to$ "turn round" $\to$ "walking", and (3rd row) "walking" $\to$ "hands up" $\to$ "walking". In the diagram, we show (left) a "walking forward" skill and (middle) a "crouching" / "turn round" / "hands up" skill, and we find that when the robot is walking forward and we suddenly switch to the "crouching" / "turn round" / "hands up" skill, the robot is able to naturally switch the behaviors (right). Then, we proceeded to "walking forward" and the robot could switch back to walking forward. The color progression from light to dark indicates the movement progress of the humanoid.
  • Figure 4: Discovered diverse and optimal skills in the Push domain (with T-shape, F-shape, and 7-shape blocks). The green block represents the starting point, the gray block represents the target, and the red curve represents the motion trajectory. We can observe that in all tasks, green blocks successfully move to the target positions and display different movement trajectories simultaneously.
  • Figure 5: Visualization of skill interpolation. We visualize the top view of skill $z_{\text{A}}$ and skill $z_{\text{B}}$ (Humanoid domain). We can see that by interpolating the skill space, we can obtain the interpolated skills that lie between the movement directions of skills $z_{\text{A}}$ and $z_{\text{B}}$.
  • ...and 4 more figures