Table of Contents
Fetching ...

Motion In-Betweening for Densely Interacting Characters

Xiaotang Zhang, Ziyi Chang, Qianhui Men, Hubert P. H. Shum

TL;DR

This work tackles the challenge of motion in-betweening for densely interacting two-character scenes, where precise spatio-temporal alignment to user-defined keyposes is essential. It introduces Cross-Space In-betweening, a two-stage framework that performs per-character in-betweening and then conditions the results on the partner via FiLM across different coordinate spaces, enabling stable, interactive transitions toward predefined keyposes. To address the constrained solution space and long-horizon degradation, it adds an adversarial interaction periodicity module based on a Periodic Autoencoder over pairwise joint distances and a Motion Refiner to correct drift in the latent space, improving long-range fidelity. Evaluations on Boxing, ReMoCap, and InterHuman demonstrate accurate, controllable, and robust long-horizon in-betweening with real-time performance and favorable user-study results.

Abstract

Motion in-betweening is the problem to synthesize movement between keyposes. Traditional research focused primarily on single characters. Extending them to densely interacting characters is highly challenging, as it demands precise spatial-temporal correspondence between the characters to maintain the interaction, while creating natural transitions towards predefined keyposes. In this research, we present a method for long-horizon interaction in-betweening that enables two characters to engage and respond to one another naturally. To effectively represent and synthesize interactions, we propose a novel solution called Cross-Space In-Betweening, which models the interactions of each character across different conditioning representation spaces. We further observe that the significantly increased constraints in interacting characters heavily limit the solution space, leading to degraded motion quality and diminished interaction over time. To enable long-horizon synthesis, we present two solutions to maintain long-term interaction and motion quality, thereby keeping synthesis in the stable region of the solution space.We first sustain interaction quality by identifying periodic interaction patterns through adversarial learning. We further maintain the motion quality by learning to refine the drifted latent space and prevent pose error accumulation. We demonstrate that our approach produces realistic, controllable, and long-horizon in-between motions of two characters with dynamic boxing and dancing actions across multiple keyposes, supported by extensive quantitative evaluations and user studies.

Motion In-Betweening for Densely Interacting Characters

TL;DR

This work tackles the challenge of motion in-betweening for densely interacting two-character scenes, where precise spatio-temporal alignment to user-defined keyposes is essential. It introduces Cross-Space In-betweening, a two-stage framework that performs per-character in-betweening and then conditions the results on the partner via FiLM across different coordinate spaces, enabling stable, interactive transitions toward predefined keyposes. To address the constrained solution space and long-horizon degradation, it adds an adversarial interaction periodicity module based on a Periodic Autoencoder over pairwise joint distances and a Motion Refiner to correct drift in the latent space, improving long-range fidelity. Evaluations on Boxing, ReMoCap, and InterHuman demonstrate accurate, controllable, and robust long-horizon in-betweening with real-time performance and favorable user-study results.

Abstract

Motion in-betweening is the problem to synthesize movement between keyposes. Traditional research focused primarily on single characters. Extending them to densely interacting characters is highly challenging, as it demands precise spatial-temporal correspondence between the characters to maintain the interaction, while creating natural transitions towards predefined keyposes. In this research, we present a method for long-horizon interaction in-betweening that enables two characters to engage and respond to one another naturally. To effectively represent and synthesize interactions, we propose a novel solution called Cross-Space In-Betweening, which models the interactions of each character across different conditioning representation spaces. We further observe that the significantly increased constraints in interacting characters heavily limit the solution space, leading to degraded motion quality and diminished interaction over time. To enable long-horizon synthesis, we present two solutions to maintain long-term interaction and motion quality, thereby keeping synthesis in the stable region of the solution space.We first sustain interaction quality by identifying periodic interaction patterns through adversarial learning. We further maintain the motion quality by learning to refine the drifted latent space and prevent pose error accumulation. We demonstrate that our approach produces realistic, controllable, and long-horizon in-between motions of two characters with dynamic boxing and dancing actions across multiple keyposes, supported by extensive quantitative evaluations and user studies.

Paper Structure

This paper contains 38 sections, 13 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: An overview of our framework. The system first generates an initial prediction for individual character which minimizes the distance to keypose. Then, it extracts relative pose representations as conditions to refine the initial prediction and generates interactive motions. Pairwise joint distances and the outcomes of main network are fed into an interaction discriminator and a motion refiner to model interaction periodicity and to reduce pose error, respectively.
  • Figure 2: Details of the interaction periodicity modeling. We first extract the pairwise joint distances $\mathcal{D}^{t}$ from generated motions (green lines between pairs of joints). We then use Periodic Autoencoder to encode the dynamics as periodic latent frequencies $h^t$, illustrated as three principal components in the middle via principal component analysis. Our discriminator then learns to identify the interactions from periodic patterns between characters.
  • Figure 3: Qualitative results on ReMoCap and InterHuman dataset. Our method produces smooth and seamless turning motions (light blue and pink) in between keyposes (blue and red).
  • Figure 4: Qualitative results compared with baseline methods. Cross-Interaction Attention exhibits severe pose error accumulation issue.
  • Figure 5: Keypose alignment performance compared with CondMDI.
  • ...and 9 more figures