Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information

Qiaochu Huang; Xu He; Boshi Tang; Haolin Zhuang; Liyang Chen; Shuochen Gao; Zhiyong Wu; Haozhi Huang; Helen Meng

Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information

Qiaochu Huang, Xu He, Boshi Tang, Haolin Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu, Haozhi Huang, Helen Meng

TL;DR

This paper proposes ExpressiveBailando, a novel dance generation method designed to generate expressive dances, concurrently taking all three factors into account, including speed homogenization into VQ-VAE, thus improving dance dynamics.

Abstract

Dance generation, as a branch of human motion generation, has attracted increasing attention. Recently, a few works attempt to enhance dance expressiveness, which includes genre matching, beat alignment, and dance dynamics, from certain aspects. However, the enhancement is quite limited as they lack comprehensive consideration of the aforementioned three factors. In this paper, we propose ExpressiveBailando, a novel dance generation method designed to generate expressive dances, concurrently taking all three factors into account. Specifically, we mitigate the issue of speed homogenization by incorporating frequency information into VQ-VAE, thus improving dance dynamics. Additionally, we integrate music style information by extracting genre- and beat-related features with a pre-trained music model, hence achieving improvements in the other two factors. Extensive experimental results demonstrate that our proposed method can generate dances with high expressiveness and outperforms existing methods both qualitatively and quantitatively.

Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information

TL;DR

Abstract

Paper Structure (14 sections, 4 equations, 3 figures, 2 tables)

This paper contains 14 sections, 4 equations, 3 figures, 2 tables.

Introduction
Method
Frequency Complemented VQ-VAE (FreqVQ-VAE)
Music Features
EXPERIMENTS
Dataset
Experiment Setup
Results
Quantitative Evaluation
Qualitative Evaluation
User Study
Reconstruction Experiments
Ablation Study
CONCLUSION

Figures (3)

Figure 1: The overall architecture for ExpressiveBailando. MERT features are extracted, downsampled, and concatenated with handcrafted music features to form the music conditional input to cross-conditional GPT. Future dances are generated by decoding the upper and lower body pose codes predicted by the GPT with the FreqVQ-VAE decoders.
Figure 2: Architecture of the proposed FreqVQ-VAE.
Figure 3: Dances of genre Ballet Jazz generated by different methods. Green background indicates genre matching between dance and music, while red background indicates mismatching.

Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information

TL;DR

Abstract

Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information

Authors

TL;DR

Abstract

Table of Contents

Figures (3)