Table of Contents
Fetching ...

G-DReaM: Graph-conditioned Diffusion Retargeting across Multiple Embodiments

Zhefeng Cao, Ben Liu, Sen Li, Wei Zhang, Hua Chen

TL;DR

The paper tackles cross-embodiment motion retargeting across heterogeneous robot skeletons with limited ground-truth data. It proposes G-DReaM, a graph-conditioned diffusion model guided by an energy term for kin constraints, implemented with a transformer-based denoiser that fuses motion, graph, and correspondence information through graph-aware attention. The approach enables retargeting across non-homeomorphic skeletons, supports adaptation to new embodiments with modest fine-tuning, and demonstrates zero-shot generalization for similar motions, offering a scalable path to transferring reference motions across diverse robots. This work advances practical cross-robot motion retargeting by eliminating the need for robot-specific motion data and enabling rapid deployment of complex behaviors.

Abstract

Motion retargeting for specific robot from existing motion datasets is one critical step in transferring motion patterns from human behaviors to and across various robots. However, inconsistencies in topological structure, geometrical parameters as well as joint correspondence make it difficult to handle diverse embodiments with a unified retargeting architecture. In this work, we propose a novel unified graph-conditioned diffusion-based motion generation framework for retargeting reference motions across diverse embodiments. The intrinsic characteristics of heterogeneous embodiments are represented with graph structure that effectively captures topological and geometrical features of different robots. Such a graph-based encoding further allows for knowledge exploitation at the joint level with a customized attention mechanisms developed in this work. For lacking ground truth motions of the desired embodiment, we utilize an energy-based guidance formulated as retargeting losses to train the diffusion model. As one of the first cross-embodiment motion retargeting methods in robotics, our experiments validate that the proposed model can retarget motions across heterogeneous embodiments in a unified manner. Moreover, it demonstrates a certain degree of generalization to both diverse skeletal structures and similar motion patterns.

G-DReaM: Graph-conditioned Diffusion Retargeting across Multiple Embodiments

TL;DR

The paper tackles cross-embodiment motion retargeting across heterogeneous robot skeletons with limited ground-truth data. It proposes G-DReaM, a graph-conditioned diffusion model guided by an energy term for kin constraints, implemented with a transformer-based denoiser that fuses motion, graph, and correspondence information through graph-aware attention. The approach enables retargeting across non-homeomorphic skeletons, supports adaptation to new embodiments with modest fine-tuning, and demonstrates zero-shot generalization for similar motions, offering a scalable path to transferring reference motions across diverse robots. This work advances practical cross-robot motion retargeting by eliminating the need for robot-specific motion data and enabling rapid deployment of complex behaviors.

Abstract

Motion retargeting for specific robot from existing motion datasets is one critical step in transferring motion patterns from human behaviors to and across various robots. However, inconsistencies in topological structure, geometrical parameters as well as joint correspondence make it difficult to handle diverse embodiments with a unified retargeting architecture. In this work, we propose a novel unified graph-conditioned diffusion-based motion generation framework for retargeting reference motions across diverse embodiments. The intrinsic characteristics of heterogeneous embodiments are represented with graph structure that effectively captures topological and geometrical features of different robots. Such a graph-based encoding further allows for knowledge exploitation at the joint level with a customized attention mechanisms developed in this work. For lacking ground truth motions of the desired embodiment, we utilize an energy-based guidance formulated as retargeting losses to train the diffusion model. As one of the first cross-embodiment motion retargeting methods in robotics, our experiments validate that the proposed model can retarget motions across heterogeneous embodiments in a unified manner. Moreover, it demonstrates a certain degree of generalization to both diverse skeletal structures and similar motion patterns.

Paper Structure

This paper contains 34 sections, 12 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Our G-DReaM (Graph-conditional Diffusion Retargeting across Multiple Embodiments) can be uniformly applied to heterogeneous embodiments without requiring their motion data, even when their skeletons are non-homeomorphic. All the motions here are the retargetd results from the same reference.
  • Figure 2: Overview of G-DReaM. The denoising network is based on a transformer decoder with a noisy motion input $X_t$ under the conditions of the reference motion $\mathcal{M}$ and graphs $g^{\mathcal{M},X} = \{ \phi_v^{\mathcal{M},X}, \phi_e^{\mathcal{M},X},\psi^{\mathcal{M},X}\}$. The input motion is tokenized at the joint level, where the base joint and other joints are embedded by independent encoders. Then spatial and temporal attentions extract the relationships between all joints and the chronological relationships along the time window. In addition, the spatial attention absorb the joint connectivity $\psi$ to enrich the joint relationships. For other graphic conditions, we use a multi-conditional cross attention to treat them individually. In particular, the reference motion condition $\mathcal{M}$ is encoded using a similar transformer decoder and incorporates the joint correspondence $\eta$ as an attention mask. Finally, the predicted motion $\hat{X}_0$ is output from a output decoder.
  • Figure 3: The retargeting motion across multiple robotic embodiments.
  • Figure 4: Validation of the skeleton generalization from link length aspect. The red links are the changed ones, scaled by a factor of 2.
  • Figure 5: Validation of the skeleton generalization from joint correspondence aspect. The red joints are the test ones. The first and the third subfigs require the specified red joint correspondence.
  • ...and 10 more figures