Table of Contents
Fetching ...

Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots

Pranay Dugar, Aayam Shrestha, Fangzhou Yu, Bart van Marum, Alan Fern

TL;DR

The paper tackles universal, multi-modal whole-body control for humanoids by proposing the Masked Humanoid Controller (MHC), a single learned policy capable of standing, walking, and mimicking both full and partial body motions from diverse input modalities. Trained in MuJoCo with a curriculum over masked directives, domain randomization, and a rich motion dataset, the MHC learns to generate PD setpoints that track directives while maintaining balance and robustness. Key contributions include a unified framework for multi-modal directives, a detailed data generation and architectural design, a curriculum-driven training strategy, and demonstration of sim-to-real transfer on the Digit V3 robot, along with thorough ablations and generalization analyses. The work advances toward practical, versatile humanoid control by integrating multiple input modalities and showing real-world applicability of learned whole-body control.

Abstract

The foundational capabilities of humanoid robots should include robustly standing, walking, and mimicry of whole and partial-body motions. This work introduces the Masked Humanoid Controller (MHC), which supports all of these capabilities by tracking target trajectories over selected subsets of humanoid state variables while ensuring balance and robustness against disturbances. The MHC is trained in simulation using a carefully designed curriculum that imitates partially masked motions from a library of behaviors spanning standing, walking, optimized reference trajectories, re-targeted video clips, and human motion capture data. It also allows for combining joystick-based control with partial-body motion mimicry. We showcase simulation experiments validating the MHC's ability to execute a wide variety of behaviors from partially-specified target motions. Moreover, we demonstrate sim-to-real transfer on the real-world Digit V3 humanoid robot. To our knowledge, this is the first instance of a learned controller that can realize whole-body control of a real-world humanoid for such diverse multi-modal targets.

Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots

TL;DR

The paper tackles universal, multi-modal whole-body control for humanoids by proposing the Masked Humanoid Controller (MHC), a single learned policy capable of standing, walking, and mimicking both full and partial body motions from diverse input modalities. Trained in MuJoCo with a curriculum over masked directives, domain randomization, and a rich motion dataset, the MHC learns to generate PD setpoints that track directives while maintaining balance and robustness. Key contributions include a unified framework for multi-modal directives, a detailed data generation and architectural design, a curriculum-driven training strategy, and demonstration of sim-to-real transfer on the Digit V3 robot, along with thorough ablations and generalization analyses. The work advances toward practical, versatile humanoid control by integrating multiple input modalities and showing real-world applicability of learned whole-body control.

Abstract

The foundational capabilities of humanoid robots should include robustly standing, walking, and mimicry of whole and partial-body motions. This work introduces the Masked Humanoid Controller (MHC), which supports all of these capabilities by tracking target trajectories over selected subsets of humanoid state variables while ensuring balance and robustness against disturbances. The MHC is trained in simulation using a carefully designed curriculum that imitates partially masked motions from a library of behaviors spanning standing, walking, optimized reference trajectories, re-targeted video clips, and human motion capture data. It also allows for combining joystick-based control with partial-body motion mimicry. We showcase simulation experiments validating the MHC's ability to execute a wide variety of behaviors from partially-specified target motions. Moreover, we demonstrate sim-to-real transfer on the real-world Digit V3 humanoid robot. To our knowledge, this is the first instance of a learned controller that can realize whole-body control of a real-world humanoid for such diverse multi-modal targets.
Paper Structure (18 sections, 1 equation, 2 figures, 5 tables)

This paper contains 18 sections, 1 equation, 2 figures, 5 tables.

Figures (2)

  • Figure 1: The Masked Humanoid Controller (MHC) is learned from a dataset of re-targeted human motions paired with torso locomotion commands, including standing. During training and testing, masking can be applied to target motion trajectories to yield masked motion directives that are given to the MHC. The MHC then produces PD setpoints for the whole body in order to track the current motion directive. Training includes domain randomization and force perturbations to facilitate robustness and transfer from simulation to the real robot.
  • Figure 2: Real-world demonstrations of our approach. A) Locomotion Directives using joystick commands. B) Fully-specified directive for a boxing jab; showing whole-body torso motion and foot coordination. C) Handcrafted sequence of masked directives combining upper body motion trajectory and lower body joystick commands; showing the ability to move to a location, pick up a box, move while holding it, and place it at a target location.