Generative human motion mimicking through feature extraction in denoising diffusion settings

Alexander Okupnik; Johannes Schneider; Kyriakos Flouris

Generative human motion mimicking through feature extraction in denoising diffusion settings

Alexander Okupnik, Johannes Schneider, Kyriakos Flouris

TL;DR

Problem: enabling embodied, interactive AI dance using single-person motion data. Approach: a diffusion-based pipeline (EDGE) augmented with motion inpainting and ILVR for style-conditioned, time-coherent imitation of a reference sequence while preserving improvisation. Contributions: a duet-free interaction framework, real-time-capable style-guided editing, and analysis of mimicry as a tunable follow strength that balances fidelity and diversity. Findings: longer ILVR refinement pulls generated motion closer to the reference and improves alignment while maintaining diversity within a practical operating range, demonstrated on the AIST++ dataset. Significance: advances embodied human–AI collaboration in dance, enabling expressive co-creation with an AI partner trained on solo motion data, with potential applications in performance and therapy.

Abstract

Recent success with large language models has sparked a new wave of verbal human-AI interaction. While such models support users in a variety of creative tasks, they lack the embodied nature of human interaction. Dance, as a primal form of human expression, is predestined to complement this experience. To explore creative human-AI interaction exemplified by dance, we build an interactive model based on motion capture (MoCap) data. It generates an artificial other by partially mimicking and also "creatively" enhancing an incoming sequence of movement data. It is the first model, which leverages single-person motion data and high level features in order to do so and, thus, it does not rely on low level human-human interaction data. It combines ideas of two diffusion models, motion inpainting, and motion style transfer to generate movement representations that are both temporally coherent and responsive to a chosen movement reference. The success of the model is demonstrated by quantitatively assessing the convergence of the feature distribution of the generated samples and the test set which serves as simulating the human performer. We show that our generations are first steps to creative dancing with AI as they are both diverse showing various deviations from the human partner while appearing realistic.

Generative human motion mimicking through feature extraction in denoising diffusion settings

TL;DR

Abstract

Generative human motion mimicking through feature extraction in denoising diffusion settings

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)