Table of Contents
Fetching ...

PAMD: Plausibility-Aware Motion Diffusion Model for Long Dance Generation

Hongsong Wang, Yin Zhu, Qiuxia Lai, Yang Zhang, Guo-Sen Xie, Xin Geng

TL;DR

PAMD tackles long-form, music-to-dance generation with physical plausibility by embedding three physics-aware modules into a diffusion model: Plausible Motion Constraint (PMC) using Neural Distance Fields to constrain poses to a plausible manifold, Prior Motion Guidance (PMG) using a standing pose as a lightweight auxiliary condition, and Motion Refinement with Foot-Ground Contact (MRFC) to reduce foot-skating artifacts. Conditioning on music features and a fixed prior, PAMD denoises sequences from time $t=T$ to $t=0$ with losses including ${\mathcal{L}}_{\text{recon}}$, ${\mathcal{L}}_{\text{joint}}$, ${\mathcal{L}}_{\text{vel}}$, ${\mathcal{L}}_{\text{foot}}$, and ${\mathcal{L}}_{\text{PMC}}$, and leverages classifier-free guidance to amplify conditioning. The approach enables parallel long-dance generation and achieves superior Beat Alignment Score (BAS), physical realism (PFC), and geometry-based diversity (FID$_g$, Div$_g$) on the AIST++ dataset, with user studies indicating strong perceptual preference. Ablation experiments validate the complementary roles of PMC, PMG, and MRFC, showing notable improvements when all three components are combined. This work advances practical, music-driven human motion generation by enforcing explicit physical plausibility and efficient long-horizon generation, with potential applications in automatic dance creation and editing.

Abstract

Computational dance generation is crucial in many areas, such as art, human-computer interaction, virtual reality, and digital entertainment, particularly for generating coherent and expressive long dance sequences. Diffusion-based music-to-dance generation has made significant progress, yet existing methods still struggle to produce physically plausible motions. To address this, we propose Plausibility-Aware Motion Diffusion (PAMD), a framework for generating dances that are both musically aligned and physically realistic. The core of PAMD lies in the Plausible Motion Constraint (PMC), which leverages Neural Distance Fields (NDFs) to model the actual pose manifold and guide generated motions toward a physically valid pose manifold. To provide more effective guidance during generation, we incorporate Prior Motion Guidance (PMG), which uses standing poses as auxiliary conditions alongside music features. To further enhance realism for complex movements, we introduce the Motion Refinement with Foot-ground Contact (MRFC) module, which addresses foot-skating artifacts by bridging the gap between the optimization objective in linear joint position space and the data representation in nonlinear rotation space. Extensive experiments show that PAMD significantly improves musical alignment and enhances the physical plausibility of generated motions. This project page is available at: https://mucunzhuzhu.github.io/PAMD-page/.

PAMD: Plausibility-Aware Motion Diffusion Model for Long Dance Generation

TL;DR

PAMD tackles long-form, music-to-dance generation with physical plausibility by embedding three physics-aware modules into a diffusion model: Plausible Motion Constraint (PMC) using Neural Distance Fields to constrain poses to a plausible manifold, Prior Motion Guidance (PMG) using a standing pose as a lightweight auxiliary condition, and Motion Refinement with Foot-Ground Contact (MRFC) to reduce foot-skating artifacts. Conditioning on music features and a fixed prior, PAMD denoises sequences from time to with losses including , , , , and , and leverages classifier-free guidance to amplify conditioning. The approach enables parallel long-dance generation and achieves superior Beat Alignment Score (BAS), physical realism (PFC), and geometry-based diversity (FID, Div) on the AIST++ dataset, with user studies indicating strong perceptual preference. Ablation experiments validate the complementary roles of PMC, PMG, and MRFC, showing notable improvements when all three components are combined. This work advances practical, music-driven human motion generation by enforcing explicit physical plausibility and efficient long-horizon generation, with potential applications in automatic dance creation and editing.

Abstract

Computational dance generation is crucial in many areas, such as art, human-computer interaction, virtual reality, and digital entertainment, particularly for generating coherent and expressive long dance sequences. Diffusion-based music-to-dance generation has made significant progress, yet existing methods still struggle to produce physically plausible motions. To address this, we propose Plausibility-Aware Motion Diffusion (PAMD), a framework for generating dances that are both musically aligned and physically realistic. The core of PAMD lies in the Plausible Motion Constraint (PMC), which leverages Neural Distance Fields (NDFs) to model the actual pose manifold and guide generated motions toward a physically valid pose manifold. To provide more effective guidance during generation, we incorporate Prior Motion Guidance (PMG), which uses standing poses as auxiliary conditions alongside music features. To further enhance realism for complex movements, we introduce the Motion Refinement with Foot-ground Contact (MRFC) module, which addresses foot-skating artifacts by bridging the gap between the optimization objective in linear joint position space and the data representation in nonlinear rotation space. Extensive experiments show that PAMD significantly improves musical alignment and enhances the physical plausibility of generated motions. This project page is available at: https://mucunzhuzhu.github.io/PAMD-page/.

Paper Structure

This paper contains 16 sections, 10 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Motivation of our approach for diffusion-based music-to-dance generation. To generate plausible and correct motion sequences for music-to-dance, we introduce prior motion and plausible motion constraints during the training of the generative diffusion model.
  • Figure 2: Our PAMD model generates long dances that are better synchronized and visually coherent. The black dots on the pink music waveform indicate music beats, while the grey and blue motions denote dance beats generated by PAMD (ours) and EDGE, respectively. The underlined dance beats indicate close alignment, which falls within five frames of the nearest music beat. PAMD produces eight closely aligned dance beats compared to only four from EDGE. Moreover, PAMD generates more natural and fluid movements. For example, EDGE lacks hand motion in frame 233, while PAMD maintains consistent and expressive hand movements in frame 205.
  • Figure 3: PAMD Pipeline Overview: Conditioned on music and prior motion, PAMD learns to denoise dance sequences from time $t=T$ to $t=0$. Music features are extracted by Jukebox and then pass through the Transformer Music Encoder. The prior motion, timestep, and music features are concatenated and undergo cross-attention with noise. The noisy sequence $\hat{x}_t$ is processed by a transformer-based dance decoder, which generates the $raw$$dance$. Motion Refinement Module takes $raw$$dance$ as input, extracts $foot_{l}$, $foot_{s}$, $foot_{p}$ and $foot_{v}$, goes through a cross-attention, and outputs the final refined dance sequences. During the training process, the generated dance is passed through the Plausible Motion Constraint to produce an auxiliary loss.
  • Figure 4: Implausible poses of dance generation: The score indicates the implausibility of dance poses, with higher scores indicating less plausible poses. It is predicted by the trained auxiliary network in the PMC module.
  • Figure 5: Prior Motion Guidance:$x_{\text{prior}}$ is the chosen prior motion; $m$ and $t$ are music features and timestep token; $\hat{x}_t$ is the input noisy dance sequence. $\tilde{x}_0$ denotes the output raw dance.
  • ...and 2 more figures