Table of Contents
Fetching ...

Multi-Domain Motion Embedding: Expressive Real-Time Mimicry for Legged Robots

Matthias Heyrman, Chenhao Li, Victor Klemm, Dongho Kang, Stelian Coros, Marco Hutter

TL;DR

This work tackles real-time imitation of expressive human and animal motions on legged robots by learning a dual-encoder motion representation that unifies structured (periodic) and unstructured (non-periodic) dynamics. The Multi-Domain Motion Embedding (MDME) architecture combines a variational encoder with a wavelet-based encoder and an action-space decoder to remove the need for explicit motion retargeting while generalizing to unseen morphologies. MDME demonstrates improved reconstruction accuracy and cross-domain generalization, achieving successful zero-shot deployment on humanoid and quadruped hardware and outperforming prior methods (VMP, PAE) in both simulation and real robots. The approach lays a practical foundation for scalable, real-time robot imitation across diverse styles and morphologies, while acknowledging limitations in reward tuning and in-distribution accuracy trade-offs that invite further refinement.

Abstract

Effective motion representation is crucial for enabling robots to imitate expressive behaviors in real time, yet existing motion controllers often ignore inherent patterns in motion. Previous efforts in representation learning do not attempt to jointly capture structured periodic patterns and irregular variations in human and animal movement. To address this, we present Multi-Domain Motion Embedding (MDME), a motion representation that unifies the embedding of structured and unstructured features using a wavelet-based encoder and a probabilistic embedding in parallel. This produces a rich representation of reference motions from a minimal input set, enabling improved generalization across diverse motion styles and morphologies. We evaluate MDME on retargeting-free real-time motion imitation by conditioning robot control policies on the learned embeddings, demonstrating accurate reproduction of complex trajectories on both humanoid and quadruped platforms. Our comparative studies confirm that MDME outperforms prior approaches in reconstruction fidelity and generalizability to unseen motions. Furthermore, we demonstrate that MDME can reproduce novel motion styles in real-time through zero-shot deployment, eliminating the need for task-specific tuning or online retargeting. These results position MDME as a generalizable and structure-aware foundation for scalable real-time robot imitation.

Multi-Domain Motion Embedding: Expressive Real-Time Mimicry for Legged Robots

TL;DR

This work tackles real-time imitation of expressive human and animal motions on legged robots by learning a dual-encoder motion representation that unifies structured (periodic) and unstructured (non-periodic) dynamics. The Multi-Domain Motion Embedding (MDME) architecture combines a variational encoder with a wavelet-based encoder and an action-space decoder to remove the need for explicit motion retargeting while generalizing to unseen morphologies. MDME demonstrates improved reconstruction accuracy and cross-domain generalization, achieving successful zero-shot deployment on humanoid and quadruped hardware and outperforming prior methods (VMP, PAE) in both simulation and real robots. The approach lays a practical foundation for scalable, real-time robot imitation across diverse styles and morphologies, while acknowledging limitations in reward tuning and in-distribution accuracy trade-offs that invite further refinement.

Abstract

Effective motion representation is crucial for enabling robots to imitate expressive behaviors in real time, yet existing motion controllers often ignore inherent patterns in motion. Previous efforts in representation learning do not attempt to jointly capture structured periodic patterns and irregular variations in human and animal movement. To address this, we present Multi-Domain Motion Embedding (MDME), a motion representation that unifies the embedding of structured and unstructured features using a wavelet-based encoder and a probabilistic embedding in parallel. This produces a rich representation of reference motions from a minimal input set, enabling improved generalization across diverse motion styles and morphologies. We evaluate MDME on retargeting-free real-time motion imitation by conditioning robot control policies on the learned embeddings, demonstrating accurate reproduction of complex trajectories on both humanoid and quadruped platforms. Our comparative studies confirm that MDME outperforms prior approaches in reconstruction fidelity and generalizability to unseen motions. Furthermore, we demonstrate that MDME can reproduce novel motion styles in real-time through zero-shot deployment, eliminating the need for task-specific tuning or online retargeting. These results position MDME as a generalizable and structure-aware foundation for scalable real-time robot imitation.

Paper Structure

This paper contains 26 sections, 16 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: A. Standard deployment pipeline used by previous methods using a handcrafted retargeting of reference motions to the target robot's morphology and reproducing the resulting joint angles. B. Our proposed deployment pipeline where the reference motion is embedded and an action representation of the input motion on the robot is learned.
  • Figure 2: Proposed Multi-Domain Motion Embedding training method and architecture. Produce an ideal retargeting of an input reference motion. Input the reference to the MDME architecture to teach it to reconstruct the retargeted motion on robot. Train with noisy inputs and terrain variation for additional robustness to deploy the resulting policy zero-shot on hardware.
  • Figure 3: Comparing our proposed method (MDME) with prior VMP VMP and PAE PAE methods for motion mimicry.
  • Figure 4: Ablation studies comparing results removing or changing various components of the proposed MDME architecture.
  • Figure 5: Ablation studies comparing results when training the compared architectures given retargeted joint states (Equation \ref{['eq:res:ret_input']}) as inputs.
  • ...and 4 more figures