G-DReaM: Graph-conditioned Diffusion Retargeting across Multiple Embodiments
Zhefeng Cao, Ben Liu, Sen Li, Wei Zhang, Hua Chen
TL;DR
The paper tackles cross-embodiment motion retargeting across heterogeneous robot skeletons with limited ground-truth data. It proposes G-DReaM, a graph-conditioned diffusion model guided by an energy term for kin constraints, implemented with a transformer-based denoiser that fuses motion, graph, and correspondence information through graph-aware attention. The approach enables retargeting across non-homeomorphic skeletons, supports adaptation to new embodiments with modest fine-tuning, and demonstrates zero-shot generalization for similar motions, offering a scalable path to transferring reference motions across diverse robots. This work advances practical cross-robot motion retargeting by eliminating the need for robot-specific motion data and enabling rapid deployment of complex behaviors.
Abstract
Motion retargeting for specific robot from existing motion datasets is one critical step in transferring motion patterns from human behaviors to and across various robots. However, inconsistencies in topological structure, geometrical parameters as well as joint correspondence make it difficult to handle diverse embodiments with a unified retargeting architecture. In this work, we propose a novel unified graph-conditioned diffusion-based motion generation framework for retargeting reference motions across diverse embodiments. The intrinsic characteristics of heterogeneous embodiments are represented with graph structure that effectively captures topological and geometrical features of different robots. Such a graph-based encoding further allows for knowledge exploitation at the joint level with a customized attention mechanisms developed in this work. For lacking ground truth motions of the desired embodiment, we utilize an energy-based guidance formulated as retargeting losses to train the diffusion model. As one of the first cross-embodiment motion retargeting methods in robotics, our experiments validate that the proposed model can retarget motions across heterogeneous embodiments in a unified manner. Moreover, it demonstrates a certain degree of generalization to both diverse skeletal structures and similar motion patterns.
