Table of Contents
Fetching ...

DreamControl-v2: Simpler and Scalable Autonomous Humanoid Skills via Trainable Guided Diffusion Priors

Sudarshan Harithas, Sangkyung Kwak, Pushkal Katara, Srujan Deolasee, Dvij Kalaria, Srinath Sridhar, Sai Vemprala, Ashish Kapoor, Jonathan Chung-Kuan Huang

Abstract

Developing robust autonomous loco-manipulation skills for humanoids remains an open problem in robotics. While RL has been applied successfully to legged locomotion, applying it to complex, interaction-rich manipulation tasks is harder given long-horizon planning challenges for manipulation. A recent approach along these lines is DreamControl, which addresses these issues by leveraging off-the-shelf human motion diffusion models as a generative prior to guide RL policies during training. In this paper, we investigate the impact of DreamControl's motion prior and propose an improved framework that trains a guided diffusion model directly in the humanoid robot's motion space, aggregating diverse human and robot datasets into a unified embodiment space. We demonstrate that our approach captures a wider range of skills due to the larger training data mixture and establishes a more automated pipeline by removing the need for manual filtering interventions. Furthermore, we show that scaling the generation of reference trajectories is important for achieving robust downstream RL policies. We validate our approach through extensive experiments in simulation and on a real Unitree-G1.

DreamControl-v2: Simpler and Scalable Autonomous Humanoid Skills via Trainable Guided Diffusion Priors

Abstract

Developing robust autonomous loco-manipulation skills for humanoids remains an open problem in robotics. While RL has been applied successfully to legged locomotion, applying it to complex, interaction-rich manipulation tasks is harder given long-horizon planning challenges for manipulation. A recent approach along these lines is DreamControl, which addresses these issues by leveraging off-the-shelf human motion diffusion models as a generative prior to guide RL policies during training. In this paper, we investigate the impact of DreamControl's motion prior and propose an improved framework that trains a guided diffusion model directly in the humanoid robot's motion space, aggregating diverse human and robot datasets into a unified embodiment space. We demonstrate that our approach captures a wider range of skills due to the larger training data mixture and establishes a more automated pipeline by removing the need for manual filtering interventions. Furthermore, we show that scaling the generation of reference trajectories is important for achieving robust downstream RL policies. We validate our approach through extensive experiments in simulation and on a real Unitree-G1.

Paper Structure

This paper contains 39 sections, 3 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 2: DreamControl-v2 Overview. Our four-stage pipeline enables humanoid whole-body manipulation: (1) large-scale human motion datasets are retargeted into a unified robot-space; (2) a humanoid motion diffusion model is trained to generate reference trajectories conditioned on text and spatio-temporal control signal; (3) multiple reference trajectories are used to train a physics-based RL policy; (4) The learned policy is deployed in simulation and on a real robot.
  • Figure 3: Effect of Trajectory Scaling on RL Performance. More diffusion-generated trajectories consistently lead to higher policy success ratios across 3 different tasks. The figure on the right indicates the variance of test distribution on multiple tasks.
  • Figure 4: Spatial Prompting Qualitative Results. The figure depicts successful (check mark button) and rejected samples (cross mark) obtained from various spatial prompting techniques. The DreamControl-v2 model is directly prompted in robot space (blue line), trial-and-error is used for the zero-shot model (red line), and prompt calibration is indicated with green lines.
  • Figure 4: Key-point correspondences used for retargetting human data to G1. A similar correspondence mapping is used for diffusion trajectory representation Sec. \ref{['sec:traj_parameterize']}.
  • Figure 4: Trajectory Generation Qualitative Results