POPDG: Popular 3D Dance Generation with PopDanceSet

Zhenye Luo; Min Ren; Xuecai Hu; Yongzhen Huang; Li Yao

POPDG: Popular 3D Dance Generation with PopDanceSet

Zhenye Luo, Min Ren, Xuecai Hu, Yongzhen Huang, Li Yao

TL;DR

This work tackles the challenge of music-driven 3D dance generation by introducing PopDanceSet, a diverse, aesthetically informed dataset, and POPDG, an iDDPM-based framework. POPDG employs Space Augmentation in DS-Attention and a lightweight Alignment Module to strengthen spatial joint connectivity and rhythmical synchronization with music, achieving state-of-the-art results on both PopDanceSet and AIST++. The authors propose extended evaluation metrics, including PBC and Beat Alignment, and validate their approach through ablations and a user study showing a strong preference for dances from PopDanceSet. The study highlights practical significance for efficient, visually appealing dance generation and provides data/code for further research, while noting training cost and the need for improved objective quality metrics.

Abstract

Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. Moreover, the proposed POPDG model within the iDDPM framework enhances dance diversity and, through the Space Augmentation Algorithm, strengthens spatial physical connections between human body joints, ensuring that increased diversity does not compromise generation quality. A streamlined Alignment Module is also designed to improve the temporal alignment between dance and music. Extensive experiments show that POPDG achieves SOTA results on two datasets. Furthermore, the paper also expands on current evaluation metrics. The dataset and code are available at https://github.com/Luke-Luo1/POPDG.

POPDG: Popular 3D Dance Generation with PopDanceSet

TL;DR

Abstract

Paper Structure (29 sections, 15 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 29 sections, 15 equations, 7 figures, 5 tables, 1 algorithm.

Introduction
Related Works
Music-Dance Dataset
Human dance generation
Diffusion Models
PopDanceSet
Popularity Function and Dataset Construction
Dataset Description
Method
Improved-DDPM and DDIM
Music and Dance Spatiotemporal block
Dance Decoder Block
Music Encoder Block
Alignment Module
Loss Function
...and 14 more sections

Figures (7)

Figure 1: POPDG, in combination with PopDanceSet, could generate a variety of aesthetically driven popular dances.
Figure 2: POPDG Pipeline Overview. POPDG, utilizing the iDDPM framework, learns to denoise dance sequences from time $t = T$ to $t = 0$. The audio feature sequence serves as the input to the Music Encoder Block, while the noisy sequence is input to the Dance Decoder Block, with the output being the generated dance sequence. And N refers to the stack number. Beginning with a noisy sequence $z_{T} \sim N(0,I)$, POPDG generates the estimated frame of the dance sequence. It then progressively noises the sequence back to $\hat{z}_{T} - 1$, repeating the process until $t = 0$.
Figure 3: Analysis of Joint Error Distribution in SMPL Human Body Model. (a) SMPL Joint Labeling: Marks human body joints from the hip (level $0$ joint) outward, color-coded by different levels. (b) Joint Error Proportions: Shows that upper body joints experience increasing error the further they are from the hip. (c) Upper Body Joint Error Levels: Displays average errors across upper body joint levels.
Figure 4: The Overview of Dance Spatial Attention. The key distinction between dance spatial attention and standard multi-head attention is the incorporation of the Space Augmentation Algorithm when calculating the Attention Map between Query and Key. This algorithm is tailored to emphasize the upper body joints in relation to the hip, enhancing their spatial inter connectivity.
Figure 5: The overview of Alignment Module: Once the music and dance features have been processed through temporal and spatial Transformers, we apply temporal feature processing to both.
...and 2 more figures

POPDG: Popular 3D Dance Generation with PopDanceSet

TL;DR

Abstract

POPDG: Popular 3D Dance Generation with PopDanceSet

Authors

TL;DR

Abstract

Table of Contents

Figures (7)