Table of Contents
Fetching ...

MikuDance: Animating Character Art with Mixed Motion Dynamics

Jiaxu Zhang, Xianfang Zeng, Xin Chen, Wei Zuo, Gang Yu, Zhigang Tu

TL;DR

MikuDance consists of two key techniques: Mixed Motion Modeling and Mixed-Control Diffusion, to address the challenges of high-dynamic motion and reference-guidance misalignment in character art animation.

Abstract

We propose MikuDance, a diffusion-based pipeline incorporating mixed motion dynamics to animate stylized character art. MikuDance consists of two key techniques: Mixed Motion Modeling and Mixed-Control Diffusion, to address the challenges of high-dynamic motion and reference-guidance misalignment in character art animation. Specifically, a Scene Motion Tracking strategy is presented to explicitly model the dynamic camera in pixel-wise space, enabling unified character-scene motion modeling. Building on this, the Mixed-Control Diffusion implicitly aligns the scale and body shape of diverse characters with motion guidance, allowing flexible control of local character motion. Subsequently, a Motion-Adaptive Normalization module is incorporated to effectively inject global scene motion, paving the way for comprehensive character art animation. Through extensive experiments, we demonstrate the effectiveness and generalizability of MikuDance across various character art and motion guidance, consistently producing high-quality animations with remarkable motion dynamics.

MikuDance: Animating Character Art with Mixed Motion Dynamics

TL;DR

MikuDance consists of two key techniques: Mixed Motion Modeling and Mixed-Control Diffusion, to address the challenges of high-dynamic motion and reference-guidance misalignment in character art animation.

Abstract

We propose MikuDance, a diffusion-based pipeline incorporating mixed motion dynamics to animate stylized character art. MikuDance consists of two key techniques: Mixed Motion Modeling and Mixed-Control Diffusion, to address the challenges of high-dynamic motion and reference-guidance misalignment in character art animation. Specifically, a Scene Motion Tracking strategy is presented to explicitly model the dynamic camera in pixel-wise space, enabling unified character-scene motion modeling. Building on this, the Mixed-Control Diffusion implicitly aligns the scale and body shape of diverse characters with motion guidance, allowing flexible control of local character motion. Subsequently, a Motion-Adaptive Normalization module is incorporated to effectively inject global scene motion, paving the way for comprehensive character art animation. Through extensive experiments, we demonstrate the effectiveness and generalizability of MikuDance across various character art and motion guidance, consistently producing high-quality animations with remarkable motion dynamics.

Paper Structure

This paper contains 13 sections, 4 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: We propose MikuDance, a Diffusion-based pipeline for animating complex and stylized character art with high-dynamic motion guidance. The core insight of MikuDance lies in its Mixed Motion Modeling and Mixed-Control Diffusion capabilities.
  • Figure 2: Illustration of our MikuDance pipeline. Given a reference character art and a driving video, the pixel-wise scene motion is predicted using the Scene Motion Tracking (SMT) strategy, which is combined with the character poses to form the character-scene mixed motion guidance. The Mixed-Control Diffusion subsequently generates the animation in a latent space, guided by the character poses and the scene motion injected through the Motion-Adaptive Normalization (MAN) module.
  • Figure 3: Illustration of the Scene Motion Tracking strategy. To effectively guide global background motion, 3D camera poses extracted from the driving video are transformed into a pixel-wise 2D space through the projection of the scene's point cloud (PC).
  • Figure 4: The mixed-source training approach. We utilize synthetic stylized video frames and non-character videos in the two training stages, respectively, to enhance generalizability.
  • Figure 5: Comparison with the baselines. AniAny* is the fine-tuned version of the AniAny model, trained on our MMD video dataset.
  • ...and 9 more figures