Table of Contents
Fetching ...

Skinned Motion Retargeting with Dense Geometric Interaction Perception

Zijie Ye, Jia-Wei Liu, Jia Jia, Shikun Sun, Mike Zheng Shou

TL;DR

MeshRet introduces a geometry-aware, single-pass framework for skinned motion retargeting that directly models dense geometric interactions via a Dense Mesh Interaction (DMI) field and Semantically Consistent Sensors (SCS). By preserving both contact semantics and non-contact spatial relationships, MeshRet reduces interpenetration and contact mismatches that plague prior skeleton- or geometry-only approaches. The method uses MAIT-inspired SCS for cross-topology dense correspondences and a PointNet-like DMI encoder with a transformer-based retargeting network, trained with unsupervised losses including DMI consistency and end-effector alignment. Evaluations on Mixamo and the new ScanRet dataset show state-of-the-art performance in contact preservation, reduced jitter, and natural motion transfer across diverse body shapes, with strong human preferences. The work also provides a new ScanRet dataset to benchmark geometry-aware retargeting, highlighting practical implications for animation pipelines, VR, and gaming.

Abstract

Capturing and maintaining geometric interactions among different body parts is crucial for successful motion retargeting in skinned characters. Existing approaches often overlook body geometries or add a geometry correction stage after skeletal motion retargeting. This results in conflicts between skeleton interaction and geometry correction, leading to issues such as jittery, interpenetration, and contact mismatches. To address these challenges, we introduce a new retargeting framework, MeshRet, which directly models the dense geometric interactions in motion retargeting. Initially, we establish dense mesh correspondences between characters using semantically consistent sensors (SCS), effective across diverse mesh topologies. Subsequently, we develop a novel spatio-temporal representation called the dense mesh interaction (DMI) field. This field, a collection of interacting SCS feature vectors, skillfully captures both contact and non-contact interactions between body geometries. By aligning the DMI field during retargeting, MeshRet not only preserves motion semantics but also prevents self-interpenetration and ensures contact preservation. Extensive experiments on the public Mixamo dataset and our newly-collected ScanRet dataset demonstrate that MeshRet achieves state-of-the-art performance. Code available at https://github.com/abcyzj/MeshRet.

Skinned Motion Retargeting with Dense Geometric Interaction Perception

TL;DR

MeshRet introduces a geometry-aware, single-pass framework for skinned motion retargeting that directly models dense geometric interactions via a Dense Mesh Interaction (DMI) field and Semantically Consistent Sensors (SCS). By preserving both contact semantics and non-contact spatial relationships, MeshRet reduces interpenetration and contact mismatches that plague prior skeleton- or geometry-only approaches. The method uses MAIT-inspired SCS for cross-topology dense correspondences and a PointNet-like DMI encoder with a transformer-based retargeting network, trained with unsupervised losses including DMI consistency and end-effector alignment. Evaluations on Mixamo and the new ScanRet dataset show state-of-the-art performance in contact preservation, reduced jitter, and natural motion transfer across diverse body shapes, with strong human preferences. The work also provides a new ScanRet dataset to benchmark geometry-aware retargeting, highlighting practical implications for animation pipelines, VR, and gaming.

Abstract

Capturing and maintaining geometric interactions among different body parts is crucial for successful motion retargeting in skinned characters. Existing approaches often overlook body geometries or add a geometry correction stage after skeletal motion retargeting. This results in conflicts between skeleton interaction and geometry correction, leading to issues such as jittery, interpenetration, and contact mismatches. To address these challenges, we introduce a new retargeting framework, MeshRet, which directly models the dense geometric interactions in motion retargeting. Initially, we establish dense mesh correspondences between characters using semantically consistent sensors (SCS), effective across diverse mesh topologies. Subsequently, we develop a novel spatio-temporal representation called the dense mesh interaction (DMI) field. This field, a collection of interacting SCS feature vectors, skillfully captures both contact and non-contact interactions between body geometries. By aligning the DMI field during retargeting, MeshRet not only preserves motion semantics but also prevents self-interpenetration and ensures contact preservation. Extensive experiments on the public Mixamo dataset and our newly-collected ScanRet dataset demonstrate that MeshRet achieves state-of-the-art performance. Code available at https://github.com/abcyzj/MeshRet.

Paper Structure

This paper contains 43 sections, 12 equations, 17 figures, 6 tables, 1 algorithm.

Figures (17)

  • Figure 1: Comparison with the existing method. Contrary to the earlier retargeting-correction approach zhang2023skinned, which suffer from internal contradictions leading to interpenetration, jitter, and contact mismatches, our pipeline leverages the DMI field to accurately model complex geometric interactions.
  • Figure 2: Overview of the proposed MeshRet. The pipeline begins with the extraction of the DMI field using sensor forward kinematics, denoted as $\mathcal{F}_k$, and pairwise interaction feature selection, represented by $\mathcal{F}_c$. This DMI field, in conjunction with geometric features derived from $\mathcal{F}_g$, is fed into an encoder-decoder network. The network predicts the target motion sequence, which is aligned with the target character's geometry and the original DMI field.
  • Figure 3: Left: Illustration of the method to derive a sensor feature $\mathbf{s}$ from the semantic coordinate $(b, l, \phi)$ across different characters. The red line represents the projected ray. The feature $\mathbf{s}$ encompasses the sensor's location and its tangent space matrix. Right: The DMI field effectively captures both contact and non-contact interactions. Red lines represent $\mathbf{d}^{t, i,j}$ in the DMI field. In the second example, the body sensors (yellow points) are located in the tangent plane of the hand sensors (blue points), signifying a contact interaction.
  • Figure 4: Qualitative comparison with baseline methods. Our method ensures precise contact preservation and minimal geometric interpenetration.
  • Figure 5: Qualitative comparison of ablation studies. A red circle highlights areas of interpenetration, while a red rectangle identifies errors in non-contact semantics.
  • ...and 12 more figures