Table of Contents
Fetching ...

Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives

Ronghui Li, YuXiang Zhang, Yachao Zhang, Hongwen Zhang, Jie Guo, Yan Zhang, Yebin Liu, Xiu Li

TL;DR

Lodge, a network capable of generating extremely long dance sequences conditioned on given music, is proposed, and a Foot Refine Block is proposed to optimize the contact between the feet and the ground, enhancing the physical realism of the motion.

Abstract

We propose Lodge, a network capable of generating extremely long dance sequences conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion architecture, and propose the characteristic dance primitives that possess significant expressiveness as intermediate representations between two diffusion models. The first stage is global diffusion, which focuses on comprehending the coarse-level music-dance correlation and production characteristic dance primitives. In contrast, the second-stage is the local diffusion, which parallelly generates detailed motion sequences under the guidance of the dance primitives and choreographic rules. In addition, we propose a Foot Refine Block to optimize the contact between the feet and the ground, enhancing the physical realism of the motion. Our approach can parallelly generate dance sequences of extremely long length, striking a balance between global choreographic patterns and local motion quality and expressiveness. Extensive experiments validate the efficacy of our method.

Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives

TL;DR

Lodge, a network capable of generating extremely long dance sequences conditioned on given music, is proposed, and a Foot Refine Block is proposed to optimize the contact between the feet and the ground, enhancing the physical realism of the motion.

Abstract

We propose Lodge, a network capable of generating extremely long dance sequences conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion architecture, and propose the characteristic dance primitives that possess significant expressiveness as intermediate representations between two diffusion models. The first stage is global diffusion, which focuses on comprehending the coarse-level music-dance correlation and production characteristic dance primitives. In contrast, the second-stage is the local diffusion, which parallelly generates detailed motion sequences under the guidance of the dance primitives and choreographic rules. In addition, we propose a Foot Refine Block to optimize the contact between the feet and the ground, enhancing the physical realism of the motion. Our approach can parallelly generate dance sequences of extremely long length, striking a balance between global choreographic patterns and local motion quality and expressiveness. Extensive experiments validate the efficacy of our method.
Paper Structure (22 sections, 10 equations, 6 figures, 7 tables)

This paper contains 22 sections, 10 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Lodge can parallelly generate extremely long dance. The sections highlighted in green represent the characteristic dance primitives. These are expressive 8-frame movements that not only support parallel generation but also contains choreographic patterns. They guide the diffusion network to generate long, expressive dances in parallel while adhering to choreographic rules.
  • Figure 2: An overview of our framework. "TE" is Transformer Encoder, "G" is the genre of dance, "LD" is the Local Diffusion Model.
  • Figure 3: Training process of Local Diffusion.
  • Figure 4: The Training process of Lodge.
  • Figure 5: The inference process of Lodge.
  • ...and 1 more figures