Multi-Domain Motion Embedding: Expressive Real-Time Mimicry for Legged Robots
Matthias Heyrman, Chenhao Li, Victor Klemm, Dongho Kang, Stelian Coros, Marco Hutter
TL;DR
This work tackles real-time imitation of expressive human and animal motions on legged robots by learning a dual-encoder motion representation that unifies structured (periodic) and unstructured (non-periodic) dynamics. The Multi-Domain Motion Embedding (MDME) architecture combines a variational encoder with a wavelet-based encoder and an action-space decoder to remove the need for explicit motion retargeting while generalizing to unseen morphologies. MDME demonstrates improved reconstruction accuracy and cross-domain generalization, achieving successful zero-shot deployment on humanoid and quadruped hardware and outperforming prior methods (VMP, PAE) in both simulation and real robots. The approach lays a practical foundation for scalable, real-time robot imitation across diverse styles and morphologies, while acknowledging limitations in reward tuning and in-distribution accuracy trade-offs that invite further refinement.
Abstract
Effective motion representation is crucial for enabling robots to imitate expressive behaviors in real time, yet existing motion controllers often ignore inherent patterns in motion. Previous efforts in representation learning do not attempt to jointly capture structured periodic patterns and irregular variations in human and animal movement. To address this, we present Multi-Domain Motion Embedding (MDME), a motion representation that unifies the embedding of structured and unstructured features using a wavelet-based encoder and a probabilistic embedding in parallel. This produces a rich representation of reference motions from a minimal input set, enabling improved generalization across diverse motion styles and morphologies. We evaluate MDME on retargeting-free real-time motion imitation by conditioning robot control policies on the learned embeddings, demonstrating accurate reproduction of complex trajectories on both humanoid and quadruped platforms. Our comparative studies confirm that MDME outperforms prior approaches in reconstruction fidelity and generalizability to unseen motions. Furthermore, we demonstrate that MDME can reproduce novel motion styles in real-time through zero-shot deployment, eliminating the need for task-specific tuning or online retargeting. These results position MDME as a generalizable and structure-aware foundation for scalable real-time robot imitation.
