Table of Contents
Fetching ...

Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation

Zikai Huang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Chenxi Zheng, Jing Qin, Shengfeng He

TL;DR

Beat-It tackles beat synchronization and controllability in music-to-dance generation by disentangling beats from music and conditioning on sparse key poses. It introduces a nearest-beat distance beat representation, a hierarchical multi-condition fusion to integrate beats, key poses, and music, and a beat alignment loss to supervise timing at beat frames. On the AIST++ dataset, Beat-It outperforms state-of-the-art methods in beat alignment and motion controllability, while supporting arbitrary beat designation and flexible keyframe framing. The approach advances practical choreography by enabling beat-aware, key pose-guided dance generation with robust multi-condition guidance.

Abstract

Dance, as an art form, fundamentally hinges on the precise synchronization with musical beats. However, achieving aesthetically pleasing dance sequences from music is challenging, with existing methods often falling short in controllability and beat alignment. To address these shortcomings, this paper introduces Beat-It, a novel framework for beat-specific, key pose-guided dance generation. Unlike prior approaches, Beat-It uniquely integrates explicit beat awareness and key pose guidance, effectively resolving two main issues: the misalignment of generated dance motions with musical beats, and the inability to map key poses to specific beats, critical for practical choreography. Our approach disentangles beat conditions from music using a nearest beat distance representation and employs a hierarchical multi-condition fusion mechanism. This mechanism seamlessly integrates key poses, beats, and music features, mitigating condition conflicts and offering rich, multi-conditioned guidance for dance generation. Additionally, a specially designed beat alignment loss ensures the generated dance movements remain in sync with the designated beats. Extensive experiments confirm Beat-It's superiority over existing state-of-the-art methods in terms of beat alignment and motion controllability.

Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation

TL;DR

Beat-It tackles beat synchronization and controllability in music-to-dance generation by disentangling beats from music and conditioning on sparse key poses. It introduces a nearest-beat distance beat representation, a hierarchical multi-condition fusion to integrate beats, key poses, and music, and a beat alignment loss to supervise timing at beat frames. On the AIST++ dataset, Beat-It outperforms state-of-the-art methods in beat alignment and motion controllability, while supporting arbitrary beat designation and flexible keyframe framing. The approach advances practical choreography by enabling beat-aware, key pose-guided dance generation with robust multi-condition guidance.

Abstract

Dance, as an art form, fundamentally hinges on the precise synchronization with musical beats. However, achieving aesthetically pleasing dance sequences from music is challenging, with existing methods often falling short in controllability and beat alignment. To address these shortcomings, this paper introduces Beat-It, a novel framework for beat-specific, key pose-guided dance generation. Unlike prior approaches, Beat-It uniquely integrates explicit beat awareness and key pose guidance, effectively resolving two main issues: the misalignment of generated dance motions with musical beats, and the inability to map key poses to specific beats, critical for practical choreography. Our approach disentangles beat conditions from music using a nearest beat distance representation and employs a hierarchical multi-condition fusion mechanism. This mechanism seamlessly integrates key poses, beats, and music features, mitigating condition conflicts and offering rich, multi-conditioned guidance for dance generation. Additionally, a specially designed beat alignment loss ensures the generated dance movements remain in sync with the designated beats. Extensive experiments confirm Beat-It's superiority over existing state-of-the-art methods in terms of beat alignment and motion controllability.
Paper Structure (17 sections, 11 equations, 4 figures, 3 tables)

This paper contains 17 sections, 11 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: We introduce Beat-It, a novel method for generating 3D dance motions with beat alignment and motion controllability. Our approach explicitly injects beat awareness and seamlessly integrates multiple conditions to guide the generation process, leading to beat-synchronized, key pose-guided dance generation.
  • Figure 2: Overview of our proposed method, Beat-It. We generate a beat-synchronized dance sequence utilizing music, keyframes, and beat conditions. Conditional embeddings are derived and subsequently fused in a two-stage process: initially integrating sparse keyframe condition with other dense conditions, followed by the fusion of these dense conditions. The final fused condition is then processed by the conditional diffusion module. To ensure precise beat control, a beat alignment loss is employed to explicitly supervise the generated motions at the beat level.
  • Figure 3: Illustration of the beat-aware mask dilation scheme. The first row visualizes the keyframe mask, with deep green indicating valid control constraints and gray indicating invalid ones. The second row presents the dilation step curve with red lines marking beat frames. The third row is a heatmap of the dilation step. The fourth row shows the neighborhood range of keyframes, with light green indicating the expanded valid region from beat-aware mask dilation. The final row displays the beat-aware dilated keyframe mask.
  • Figure 4: Visualization comparison on beat alignment among different methods. The motion generated by our method shows precise beat alignment with the given beat condition, demonstrating the superiority of our method in beat control.