Table of Contents
Fetching ...

Decoupling Contact for Fine-Grained Motion Style Transfer

Xiangjun Tang, Linjun Wu, He Wang, Yiqian Wu, Bo Hu, Songnan Li, Xu Gong, Yuchen Liao, Qilong Kou, Xiaogang Jin

TL;DR

This work tackles the challenge of decoupling contact from motion content and style in motion style transfer. It introduces a hip-velocity proxy $\mathbf{h}$ and a Transformer-based CVAE motion manifold that separately encodes style $z_s$, trajectory $z_{tj}$, and contact timing $z_{ct}$, enabling independent control over these factors. A novel loss framework and a dedicated metric, Contact Precision-Recall, quantify how well synthesized motions align hip velocity with contact changes, correlating with human perception of naturalness. Experiments show improved style expressivity and motion quality over state-of-the-art methods, and the manifold can generate motions directly or post-process existing transfers to enhance controllability and reduce artifacts like foot skating. The approach offers practical benefits for animation and gaming by giving artists precise, interpretable control over style, contact, and trajectory, while providing flexible post-processing capabilities.

Abstract

Motion style transfer changes the style of a motion while retaining its content and is useful in computer animations and games. Contact is an essential component of motion style transfer that should be controlled explicitly in order to express the style vividly while enhancing motion naturalness and quality. However, it is unknown how to decouple and control contact to achieve fine-grained control in motion style transfer. In this paper, we present a novel style transfer method for fine-grained control over contacts while achieving both motion naturalness and spatial-temporal variations of style. Based on our empirical evidence, we propose controlling contact indirectly through the hip velocity, which can be further decomposed into the trajectory and contact timing, respectively. To this end, we propose a new model that explicitly models the correlations between motions and trajectory/contact timing/style, allowing us to decouple and control each separately. Our approach is built around a motion manifold, where hip controls can be easily integrated into a Transformer-based decoder. It is versatile in that it can generate motions directly as well as be used as post-processing for existing methods to improve quality and contact controllability. In addition, we propose a new metric that measures a correlation pattern of motions based on our empirical evidence, aligning well with human perception in terms of motion naturalness. Based on extensive evaluation, our method outperforms existing methods in terms of style expressivity and motion quality.

Decoupling Contact for Fine-Grained Motion Style Transfer

TL;DR

This work tackles the challenge of decoupling contact from motion content and style in motion style transfer. It introduces a hip-velocity proxy and a Transformer-based CVAE motion manifold that separately encodes style , trajectory , and contact timing , enabling independent control over these factors. A novel loss framework and a dedicated metric, Contact Precision-Recall, quantify how well synthesized motions align hip velocity with contact changes, correlating with human perception of naturalness. Experiments show improved style expressivity and motion quality over state-of-the-art methods, and the manifold can generate motions directly or post-process existing transfers to enhance controllability and reduce artifacts like foot skating. The approach offers practical benefits for animation and gaming by giving artists precise, interpretable control over style, contact, and trajectory, while providing flexible post-processing capabilities.

Abstract

Motion style transfer changes the style of a motion while retaining its content and is useful in computer animations and games. Contact is an essential component of motion style transfer that should be controlled explicitly in order to express the style vividly while enhancing motion naturalness and quality. However, it is unknown how to decouple and control contact to achieve fine-grained control in motion style transfer. In this paper, we present a novel style transfer method for fine-grained control over contacts while achieving both motion naturalness and spatial-temporal variations of style. Based on our empirical evidence, we propose controlling contact indirectly through the hip velocity, which can be further decomposed into the trajectory and contact timing, respectively. To this end, we propose a new model that explicitly models the correlations between motions and trajectory/contact timing/style, allowing us to decouple and control each separately. Our approach is built around a motion manifold, where hip controls can be easily integrated into a Transformer-based decoder. It is versatile in that it can generate motions directly as well as be used as post-processing for existing methods to improve quality and contact controllability. In addition, we propose a new metric that measures a correlation pattern of motions based on our empirical evidence, aligning well with human perception in terms of motion naturalness. Based on extensive evaluation, our method outperforms existing methods in terms of style expressivity and motion quality.
Paper Structure (24 sections, 6 equations, 10 figures, 5 tables)

This paper contains 24 sections, 6 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: The diagram illustrates the correlation between the contact pattern and the hip speed of a walking sequence from the STYLE100 Dataset. The "z" axis of our frame points to the character's forward-facing direction and the "x" axis points to the character's left, both in the local coordinate system at the current frame. In the middle (contact timing), there are two rows of bars (light blue and blue). The top bars represent the left foot contact duration and the bottom ones represent the right foot. The first row shows the "z" component of the hip velocity, in which there is a peak value when the foot makes contact with the ground. The curve in the third row depicts the "x" component of the hip velocity, with orange and light blue rectangles representing left and right foot contact duration, respectively. The "x" component of the hip velocity decreases (increase in negative "x" axis) during the right leg contact, and increases during the left leg contact.
  • Figure 2: Overview of our pipeline. The grey blocks represent the data and others indicate the network. Trapezium shapes indicate the presence of downsampling and upsampling in the network. The snowflake symbol denotes the manifold decoder is frozen. Note that only the hip velocity of $M_h$ is used as an input to Trajectory CNNs (see Appendix for details).
  • Figure 3: Overview of our CVAE. The in a circle represents the global positional embedding.
  • Figure 4: Our method allows users to control trajectory, contact timing, and style of motions separately or jointly. To improve visibility, we intentionally increase the spatial distance between two adjacent skeletons. Blue skeletons represent the frames when the character makes contact with the ground, while orange skeletons represent the midpoint between two blue frames. Our method allows for separate interpolation of style (first row), contact timing (second row), and trajectory (third row). The latent style/contact/trajectory space interpolation parameter here varies from content sequence (0.0) to target sequence (1.0).
  • Figure 5: Our method allows changing the trajectory by scaling the magnitude of the hip velocity (Scale 0.5, Scale 1.0, and Scale 1.5), as well as changing the trajectory and contact timing at the same time by setting its hip velocity from another motion (change trajectory + contact timing). In this setting, all results are transferred to the "high knees" style.
  • ...and 5 more figures