Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

Li Siyao; Tianpei Gu; Zhitao Yang; Zhengyu Lin; Ziwei Liu; Henghui Ding; Lei Yang; Chen Change Loy

Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

Li Siyao, Tianpei Gu, Zhitao Yang, Zhengyu Lin, Ziwei Liu, Henghui Ding, Lei Yang, Chen Change Loy

TL;DR

This work proposes the new task of dance accompaniment, where a follower must react in real time to a leader's motion and accompanying music. It introduces the DD100 dataset of high-quality duet MoCap data and a GPT-based baseline, Duolando, that uses multi-part motion quantization (VQ-VAE) and an Interaction Coordinate GPT with look-ahead conditioning. To address out-of-distribution scenarios, the authors implement off-policy reinforcement learning with a Q-based objective and step-wise rewards to align lower-body velocity and reduce skating artifacts. The results show substantial gains in motion realism and leader–follower interaction over solo-dance baselines, and the approach establishes a benchmark and methodology for future multi-agent, music-conditioned motion synthesis in VR/AR contexts.

Abstract

We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, a duet dance scenario entails a heightened degree of interaction between the two participants, requiring delicate coordination in both pose and position. To support this task, we first build a large-scale and diverse duet interactive dance dataset, DD100, by recording about 117 minutes of professional dancers' performances. To address the challenges inherent in this task, we propose a GPT-based model, Duolando, which autoregressively predicts the subsequent tokenized motion conditioned on the coordinated information of the music, the leader's and the follower's movements. To further enhance the GPT's capabilities of generating stable results on unseen conditions (music and leader motions), we devise an off-policy reinforcement learning strategy that allows the model to explore viable trajectories from out-of-distribution samplings, guided by human-defined rewards. Based on the collected dataset and proposed method, we establish a benchmark with several carefully designed metrics.

Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

TL;DR

Abstract

Paper Structure (25 sections, 17 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 25 sections, 17 equations, 8 figures, 6 tables, 1 algorithm.

Introduction
DD100: A Large-scale Duet Dance Mocap Dataset
Our Approach
Quantizing Motion and Relative Translation
Interaction Coordinate GPT
Off-Policy GPT Reinforcement Learning
Experiments
Conclusion
Related Work
Music-to-Dance
Human-human Interaction
More details
Speed Statistics
Network Structures
Training Hyper-parameters
...and 10 more sections

Figures (8)

Figure 1: Example of Duolando's results. The female avatar (red arrow) is driven by the proposed method to accompany real human's (white) dancing.
Figure 2: Samples of DD100 dataset. The leader and the follower are colored in green and red, respectively. DD100 contains 10 dance genres, featuring a diverse range of poses and interactions, with intricate hand gestures.
Figure 3: (a) Structures of Motion VQ-VAEs and (b) Relative Translation VQ-VAE. The quantization is to substitute a encoded feature to the most similar one $z_k$ in the codebook $\mathcal{Z}$ such that $z_k = \arg\min_{z\in \mathcal{Z}}\|f_i-z\|.$
Figure 4: Structure of follower GPT. The GPT takes ten inputs and autoregressively predicts the subsequent tokens of follower's motion and the relative translation. Preconditions (music signals and leader's motion) are integrated with Look-Ahead Transformers (LAT).
Figure 5: GPT's supervised training (a) vs off-policy RL stage (b). For ST, supervising labels are sequences $\{\bm z\}$ quantized from the in-domain data in training set. In RL stage, network weight is optimized based on human-defined rewards (scores) on GPT's generated samples $\{\hat{\bm z}\}$ on OOD conditions.
...and 3 more figures

Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

TL;DR

Abstract

Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

Authors

TL;DR

Abstract

Table of Contents

Figures (8)