Table of Contents
Fetching ...

The Role of Domain Randomization in Training Diffusion Policies for Whole-Body Humanoid Control

Oleg Kaidanov, Firas Al-Hafez, Yusuf Suvari, Boris Belousov, Jan Peters

TL;DR

Investigating how dataset diversity and size affect the performance of DPs for humanoid whole-body control shows that, although DPs can achieve stable walking behavior, successful training of locomotion policies requires significantly larger and more diverse datasets compared to manipulation tasks.

Abstract

Humanoids have the potential to be the ideal embodiment in environments designed for humans. Thanks to the structural similarity to the human body, they benefit from rich sources of demonstration data, e.g., collected via teleoperation, motion capture, or even using videos of humans performing tasks. However, distilling a policy from demonstrations is still a challenging problem. While Diffusion Policies (DPs) have shown impressive results in robotic manipulation, their applicability to locomotion and humanoid control remains underexplored. In this paper, we investigate how dataset diversity and size affect the performance of DPs for humanoid whole-body control. In a simulated IsaacGym environment, we generate synthetic demonstrations by training Adversarial Motion Prior (AMP) agents under various Domain Randomization (DR) conditions, and we compare DPs fitted to datasets of different size and diversity. Our findings show that, although DPs can achieve stable walking behavior, successful training of locomotion policies requires significantly larger and more diverse datasets compared to manipulation tasks, even in simple scenarios.

The Role of Domain Randomization in Training Diffusion Policies for Whole-Body Humanoid Control

TL;DR

Investigating how dataset diversity and size affect the performance of DPs for humanoid whole-body control shows that, although DPs can achieve stable walking behavior, successful training of locomotion policies requires significantly larger and more diverse datasets compared to manipulation tasks.

Abstract

Humanoids have the potential to be the ideal embodiment in environments designed for humans. Thanks to the structural similarity to the human body, they benefit from rich sources of demonstration data, e.g., collected via teleoperation, motion capture, or even using videos of humans performing tasks. However, distilling a policy from demonstrations is still a challenging problem. While Diffusion Policies (DPs) have shown impressive results in robotic manipulation, their applicability to locomotion and humanoid control remains underexplored. In this paper, we investigate how dataset diversity and size affect the performance of DPs for humanoid whole-body control. In a simulated IsaacGym environment, we generate synthetic demonstrations by training Adversarial Motion Prior (AMP) agents under various Domain Randomization (DR) conditions, and we compare DPs fitted to datasets of different size and diversity. Our findings show that, although DPs can achieve stable walking behavior, successful training of locomotion policies requires significantly larger and more diverse datasets compared to manipulation tasks, even in simple scenarios.

Paper Structure

This paper contains 11 sections, 1 equation, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Proposed method. First, a robust and stable policy is trained using AMP under extensive Domain Randomization. This policy is then used for data collection, to subsequently train Diffusion Policies. We generate different datasets, each with different applied during data collection, and we train on each dataset separately. Finally, performance of each is evaluated on two environments: with and without .
  • Figure 2: Evaluation of Diffusion Policies in a non-randomized target environment. Top: A plot displaying the normalized performances of all configurations, with tracking performance and smoothness inverted for unified metrics (higher values indicate better performance). Bottom: A table presenting detailed results, including the success rate (higher is better), tracking performance (lower is better), and smoothness (lower is better).
  • Figure 3: Evaluation of Diffusion Policies in a randomized target environment. Evaluation of Diffusion Policies in a non-randomized target environment. Top: A plot displaying the normalized performances of all configurations, with tracking performance and smoothness inverted for unified metrics (higher values indicate better performance). Bottom: A table presenting detailed results, including the success rate (higher is better), tracking performance (lower is better), and smoothness (lower is better).