Table of Contents
Fetching ...

InterDance:Reactive 3D Dance Generation with Realistic Duet Interactions

Ronghui Li, Youliang Zhang, Yachao Zhang, Yuxiang Zhang, Mingyang Su, Jie Guo, Ziwei Liu, Yebin Liu, Xiu Li

TL;DR

InterDance tackles reactive duet dance generation by creating a high-fidelity duet dataset and a diffusion-based framework that explicitly models inter-dancer interaction. The dataset provides 3.93 hours of music-paired motion across 15 genres with body and finger data captured via MoCap and represented in SMPL-X format, enabling realistic contact and foot-ground interactions. A novel canonical-space motion representation with body surface vertices and contact labels, combined with an Interaction Refine Guidance and SDF-based penetration control, yields improved motion quality and interactivity over baselines. Experiments and user studies demonstrate superior motion realism, rhythmic alignment, and interaction fidelity, highlighting practical potential for animation and interactive media, while acknowledging scalability and societal considerations as future work.

Abstract

Humans perform a variety of interactive motions, among which duet dance is one of the most challenging interactions. However, in terms of human motion generative models, existing works are still unable to generate high-quality interactive motions, especially in the field of duet dance. On the one hand, it is due to the lack of large-scale high-quality datasets. On the other hand, it arises from the incomplete representation of interactive motion and the lack of fine-grained optimization of interactions. To address these challenges, we propose, InterDance, a large-scale duet dance dataset that significantly enhances motion quality, data scale, and the variety of dance genres. Built upon this dataset, we propose a new motion representation that can accurately and comprehensively describe interactive motion. We further introduce a diffusion-based framework with an interaction refinement guidance strategy to optimize the realism of interactions progressively. Extensive experiments demonstrate the effectiveness of our dataset and algorithm.

InterDance:Reactive 3D Dance Generation with Realistic Duet Interactions

TL;DR

InterDance tackles reactive duet dance generation by creating a high-fidelity duet dataset and a diffusion-based framework that explicitly models inter-dancer interaction. The dataset provides 3.93 hours of music-paired motion across 15 genres with body and finger data captured via MoCap and represented in SMPL-X format, enabling realistic contact and foot-ground interactions. A novel canonical-space motion representation with body surface vertices and contact labels, combined with an Interaction Refine Guidance and SDF-based penetration control, yields improved motion quality and interactivity over baselines. Experiments and user studies demonstrate superior motion realism, rhythmic alignment, and interaction fidelity, highlighting practical potential for animation and interactive media, while acknowledging scalability and societal considerations as future work.

Abstract

Humans perform a variety of interactive motions, among which duet dance is one of the most challenging interactions. However, in terms of human motion generative models, existing works are still unable to generate high-quality interactive motions, especially in the field of duet dance. On the one hand, it is due to the lack of large-scale high-quality datasets. On the other hand, it arises from the incomplete representation of interactive motion and the lack of fine-grained optimization of interactions. To address these challenges, we propose, InterDance, a large-scale duet dance dataset that significantly enhances motion quality, data scale, and the variety of dance genres. Built upon this dataset, we propose a new motion representation that can accurately and comprehensively describe interactive motion. We further introduce a diffusion-based framework with an interaction refinement guidance strategy to optimize the realism of interactions progressively. Extensive experiments demonstrate the effectiveness of our dataset and algorithm.

Paper Structure

This paper contains 22 sections, 10 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Example of reactive 3D dance generation. Green represents the leader and blue represents the follower (also positioned with a red marker). Given the music and leader's dance, the goal of reactive dance generation is to generate the follower's dance that coordinates with the music and leader.
  • Figure 2: Visualizations of the InterDance samples, the dataset contains high-quality duet dance with accurate body and fingers, there are a total of 15 fine-grained diverse genres.
  • Figure 3: An overview of different motion representations.
  • Figure 4: The right part shows our entire network, while the left part details the Denoise Network.
  • Figure 5: Qualitative comparisons of reactive dance generation, blue is the generated follower.
  • ...and 5 more figures