May the Dance be with You: Dance Generation Framework for Non-Humanoids

Hyemin Ahn

May the Dance be with You: Dance Generation Framework for Non-Humanoids

Hyemin Ahn

TL;DR

The paper tackles teaching non-humanoid agents to dance by leveraging human dance videos and a novel two-stage learning framework. It first trains a reward model that captures the relationship between visual rhythm, as encoded by optical flow, and music through contrastive learning on the AIST dataset, then uses reinforcement learning to optimize non-humanoid dancers (e.g., CartPole, UR5) to maximize this reward. Quantitative metrics (beat alignment, F1 scores) and a user study show the approach yields dances that humans prefer and that align with musical structure, outperforming baselines such as BPM-based control and reward-model-free variants. This work demonstrates a data-efficient path for cross-domain motion generation, enabling non-humanoid agents to dance by imitating how humans synchronize motion with music, with potential extensions into 3D visual-rhythm representations and structured choreography.

Abstract

We hypothesize dance as a motion that forms a visual rhythm from music, where the visual rhythm can be perceived from an optical flow. If an agent can recognize the relationship between visual rhythm and music, it will be able to dance by generating a motion to create a visual rhythm that matches the music. Based on this, we propose a framework for any kind of non-humanoid agents to learn how to dance from human videos. Our framework works in two processes: (1) training a reward model which perceives the relationship between optical flow (visual rhythm) and music from human dance videos, (2) training the non-humanoid dancer based on that reward model, and reinforcement learning. Our reward model consists of two feature encoders for optical flow and music. They are trained based on contrastive learning which makes the higher similarity between concurrent optical flow and music features. With this reward model, the agent learns dancing by getting a higher reward when its action creates an optical flow whose feature has a higher similarity with the given music feature. Experiment results show that generated dance motion can align with the music beat properly, and user study result indicates that our framework is more preferred by humans compared to the baselines. To the best of our knowledge, our work of non-humanoid agents which learn dance from human videos is unprecedented. An example video can be found at https://youtu.be/dOUPvo-O3QY.

May the Dance be with You: Dance Generation Framework for Non-Humanoids

TL;DR

Abstract

Paper Structure (34 sections, 1 equation, 6 figures, 2 tables)

This paper contains 34 sections, 1 equation, 6 figures, 2 tables.

Introduction
Related Work
Dance Generation for Human-like Agents
Dance Generation for Non-Humanoid Robots
Methodology
Training Reward Model
Training Non-Humanoid Dancer
Implementation Details
Optical Flow Extraction
Music Feature Extraction
Reward Model Training
Non-Humanoid Dancer Training
Experiment
Agents and Simulator
Qualitative Result
...and 19 more sections

Figures (6)

Figure 1: Illustration of how the proposed reward model consisting of two encoders and two projection heads are trained based on contrastive learning.
Figure 2: Illustration of how the proposed framework teaches the non-humanoid dancer based on reinforcement learning (RL). sim($\cdot, \cdot$) denotes the cosine similarity between two input vectors.
Figure 3: Generated dance of CartPole agent with a song "We will rock you" by Queen.
Figure 4: Example of how the kinematic beat aligns with the music beat when UR5 agent dances.
Figure 5: Illustration of how the proposed framework without the reward model can teach the non-humanoid dancer. Instead of a pre-trained reward model, it is also possible to set the reward as an L1-norm distance between optical flows from the agent and human.
...and 1 more figures

May the Dance be with You: Dance Generation Framework for Non-Humanoids

TL;DR

Abstract

May the Dance be with You: Dance Generation Framework for Non-Humanoids

Authors

TL;DR

Abstract

Table of Contents

Figures (6)