Table of Contents
Fetching ...

MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation

Kaixing Yang, Xulong Tang, Ziqiao Peng, Yuxuan Hu, Jun He, Hongyan Liu

TL;DR

MEGADance tackles the challenge of genre-aware music-to-dance generation by introducing a two-stage framework that separates choreographic generality from genre-specific styling. It combines Finite Scalar Quantization with kinematic-dynamic constraints to obtain high-fidelity latent dance representations and a Mixture-of-Experts with Universal and Specialized components, powered by a Mamba-Transformer backbone, to map music to the latent space. The approach achieves state-of-the-art performance on FineDance and AIST++ across objective metrics and human studies, while enabling robust genre controllability and efficient inference. This work advances practical, genre-consistent, music-driven 3D dance generation with strong potential for interactive choreography and virtual performance applications.

Abstract

Music-driven 3D dance generation has attracted increasing attention in recent years, with promising applications in choreography, virtual reality, and creative content creation. Previous research has generated promising realistic dance movement from audio signals. However, traditional methods underutilize genre conditioning, often treating it as auxiliary modifiers rather than core semantic drivers. This oversight compromises music-motion synchronization and disrupts dance genre continuity, particularly during complex rhythmic transitions, thereby leading to visually unsatisfactory effects. To address the challenge, we propose MEGADance, a novel architecture for music-driven 3D dance generation. By decoupling choreographic consistency into dance generality and genre specificity, MEGADance demonstrates significant dance quality and strong genre controllability. It consists of two stages: (1) High-Fidelity Dance Quantization Stage (HFDQ), which encodes dance motions into a latent representation by Finite Scalar Quantization (FSQ) and reconstructs them with kinematic-dynamic constraints, and (2) Genre-Aware Dance Generation Stage (GADG), which maps music into the latent representation by synergistic utilization of Mixture-of-Experts (MoE) mechanism with Mamba-Transformer hybrid backbone. Extensive experiments on the FineDance and AIST++ dataset demonstrate the state-of-the-art performance of MEGADance both qualitatively and quantitatively. Code will be released upon acceptance.

MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation

TL;DR

MEGADance tackles the challenge of genre-aware music-to-dance generation by introducing a two-stage framework that separates choreographic generality from genre-specific styling. It combines Finite Scalar Quantization with kinematic-dynamic constraints to obtain high-fidelity latent dance representations and a Mixture-of-Experts with Universal and Specialized components, powered by a Mamba-Transformer backbone, to map music to the latent space. The approach achieves state-of-the-art performance on FineDance and AIST++ across objective metrics and human studies, while enabling robust genre controllability and efficient inference. This work advances practical, genre-consistent, music-driven 3D dance generation with strong potential for interactive choreography and virtual performance applications.

Abstract

Music-driven 3D dance generation has attracted increasing attention in recent years, with promising applications in choreography, virtual reality, and creative content creation. Previous research has generated promising realistic dance movement from audio signals. However, traditional methods underutilize genre conditioning, often treating it as auxiliary modifiers rather than core semantic drivers. This oversight compromises music-motion synchronization and disrupts dance genre continuity, particularly during complex rhythmic transitions, thereby leading to visually unsatisfactory effects. To address the challenge, we propose MEGADance, a novel architecture for music-driven 3D dance generation. By decoupling choreographic consistency into dance generality and genre specificity, MEGADance demonstrates significant dance quality and strong genre controllability. It consists of two stages: (1) High-Fidelity Dance Quantization Stage (HFDQ), which encodes dance motions into a latent representation by Finite Scalar Quantization (FSQ) and reconstructs them with kinematic-dynamic constraints, and (2) Genre-Aware Dance Generation Stage (GADG), which maps music into the latent representation by synergistic utilization of Mixture-of-Experts (MoE) mechanism with Mamba-Transformer hybrid backbone. Extensive experiments on the FineDance and AIST++ dataset demonstrate the state-of-the-art performance of MEGADance both qualitatively and quantitatively. Code will be released upon acceptance.

Paper Structure

This paper contains 20 sections, 6 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: MEGADance enhances choreography consistency by decoupling it into dance generality and genre specificity via the Mixture-of-Experts design. Compared to previous methods, it produces synchronized dance with genre continuity, even under complex music conditions.
  • Figure 2: Overview of MEGADance. MEGADance employs FSQs with kinematic-dynamic constraints for body-part reconstruction in HFDQ, coupled with a MoE-based Mamba-Transformer architecture that generates music-aligned latent representations in GADG.
  • Figure 3: Qualitative Analysis on a typical Breaking Battle music clip.
  • Figure 4: Visualization of Genre Controllability on a representative Chinese music clip.
  • Figure 5: Qualitative Analysis for Ablation Study. MEGADance generates visually expressive dance motions, outperforming others in terms of stylistic consistency and movement diversity.
  • ...and 1 more figures