Bidirectional Autoregressive Diffusion Model for Dance Generation

Canyu Zhang; Youbao Tang; Ning Zhang; Ruei-Sung Lin; Mei Han; Jing Xiao; Song Wang

Bidirectional Autoregressive Diffusion Model for Dance Generation

Canyu Zhang, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Mei Han, Jing Xiao, Song Wang

TL;DR

A Bidirectional Autoregressive Diffusion Model (BADM) for music-to-dance generation, where a bidirectional encoder is built to enforce that the generated dance is harmonious in both the forward and backward directions.

Abstract

Dance serves as a powerful medium for expressing human emotions, but the lifelike generation of dance is still a considerable challenge. Recently, diffusion models have showcased remarkable generative abilities across various domains. They hold promise for human motion generation due to their adaptable many-to-many nature. Nonetheless, current diffusion-based motion generation models often create entire motion sequences directly and unidirectionally, lacking focus on the motion with local and bidirectional enhancement. When choreographing high-quality dance movements, people need to take into account not only the musical context but also the nearby music-aligned dance motions. To authentically capture human behavior, we propose a Bidirectional Autoregressive Diffusion Model (BADM) for music-to-dance generation, where a bidirectional encoder is built to enforce that the generated dance is harmonious in both the forward and backward directions. To make the generated dance motion smoother, a local information decoder is built for local motion enhancement. The proposed framework is able to generate new motions based on the input conditions and nearby motions, which foresees individual motion slices iteratively and consolidates all predictions. To further refine the synchronicity between the generated dance and the beat, the beat information is incorporated as an input to generate better music-aligned dance movements. Experimental results demonstrate that the proposed model achieves state-of-the-art performance compared to existing unidirectional approaches on the prominent benchmark for music-to-dance generation.

Bidirectional Autoregressive Diffusion Model for Dance Generation

TL;DR

Abstract

Paper Structure (37 sections, 8 equations, 4 figures, 3 tables)

This paper contains 37 sections, 8 equations, 4 figures, 3 tables.

Introduction
Related Work
Human Motion Generation
Music-to-dance Generation
Diffusion Model
Autoregressive Model
Method
Pose Representation
Diffusion Framework
Geometric Losses
Model
Sampling
Long-form Sampling
Editing
Implement Details
...and 22 more sections

Figures (4)

Figure 1: Proposed bidirectional Autoregressive diffusion model (BADM) generates harmony, physically plausible dance based on music and beat conditions.
Figure 2: Our bidirectional Autoregressive diffusion model (BADM) employs a denoising mechanism to enhance dance sequences from time $t = T$ to $t = 0$. BADM begins with a noisy sequence $z_T$ at time $T$, and proceeds to generate an estimated dance sequence $\hat{x}$. The denoising procedure is iteratively applied until $t=0$. Our autoregressive (AR) model treats the whole noise sequence as $K$ slices. On the left, we show our model processing the $k$-th slice at the diffusion timestep $t$. BADM is employed $K$ times within each BADM process.
Figure 3: BADM processes each noise slice $z_k$ in a bidirectional way. Generated dance slices are concatenated and sent to the local information decoder. We show this process at each timestep t.
Figure 4: Yellow parts represent fixed motion inputs and blue parts are the generated motion parts. For motion in-betweening (Top), the first and last frames are fixed. For specific body part editing (Bottom), the lower body joints are fixed to the input motion while the upper body is altered to fit the input.

Bidirectional Autoregressive Diffusion Model for Dance Generation

TL;DR

Abstract

Bidirectional Autoregressive Diffusion Model for Dance Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)