Table of Contents
Fetching ...

Human-Machine Ritual: Synergic Performance through Real-Time Motion Recognition

Zhuodi Cai, Ziyu Xu, Juan Pampin

TL;DR

This work addresses the challenge of integrating AI into dance as a respectful observer that remembers and responds to movement rather than generates new content. It presents a real-time, dancer-specific motion-recognition pipeline using wearable IMUs and MiniRocket, coupled with memory-based sound mapping to produce responsive multimedia. The approach achieves high classification accuracy (mean ~96%) with latency under 50 ms, enabling seamless live interaction and a replicable framework for dance-literate machines in performance and education. By foregrounding embodiment and somatics, the paper contributes a practical paradigm for human-machine co-performance that preserves expressive depth while leveraging efficient time-series classification for attentive observation.

Abstract

We introduce a lightweight, real-time motion recognition system that enables synergic human-machine performance through wearable IMU sensor data, MiniRocket time-series classification, and responsive multimedia control. By mapping dancer-specific movement to sound through somatic memory and association, we propose an alternative approach to human-machine collaboration, one that preserves the expressive depth of the performing body while leveraging machine learning for attentive observation and responsiveness. We demonstrate that this human-centered design reliably supports high accuracy classification (<50 ms latency), offering a replicable framework to integrate dance-literate machines into creative, educational, and live performance contexts.

Human-Machine Ritual: Synergic Performance through Real-Time Motion Recognition

TL;DR

This work addresses the challenge of integrating AI into dance as a respectful observer that remembers and responds to movement rather than generates new content. It presents a real-time, dancer-specific motion-recognition pipeline using wearable IMUs and MiniRocket, coupled with memory-based sound mapping to produce responsive multimedia. The approach achieves high classification accuracy (mean ~96%) with latency under 50 ms, enabling seamless live interaction and a replicable framework for dance-literate machines in performance and education. By foregrounding embodiment and somatics, the paper contributes a practical paradigm for human-machine co-performance that preserves expressive depth while leveraging efficient time-series classification for attentive observation.

Abstract

We introduce a lightweight, real-time motion recognition system that enables synergic human-machine performance through wearable IMU sensor data, MiniRocket time-series classification, and responsive multimedia control. By mapping dancer-specific movement to sound through somatic memory and association, we propose an alternative approach to human-machine collaboration, one that preserves the expressive depth of the performing body while leveraging machine learning for attentive observation and responsiveness. We demonstrate that this human-centered design reliably supports high accuracy classification (<50 ms latency), offering a replicable framework to integrate dance-literate machines into creative, educational, and live performance contexts.

Paper Structure

This paper contains 11 sections, 5 figures.

Figures (5)

  • Figure 1: System pipeline for training and live application. During the training stage (pre-performance), the dancer reacts to sound while wearing IMU sensors, which capture multivariate time-series motion data. This data is preprocessed and used to train a linear classifier with MiniRocket on a GPU server. During application (in performance), real-time data is streamed to the server, classified, and used to trigger corresponding multimedia elements, completing the loop from embodied memory to audiovisual output.
  • Figure 2: Sensor placement and movement sequences documentation. IMU sensors are affixed to both wrists and ankles (left panel). The right panel presents six frames from the recorded training session, each demonstrating distinct motion patterns corresponding to different sound stimuli. Movement class labels used in training are also marked in the images.
  • Figure 3: Real-time motion classification during a 10‑second mock performance. The dancer performs different movements, with the model inferring labels based on the dominant motion within each 2-second segment. During transitions, predicted probabilities tend to decrease, reflecting temporal ambiguity. Labels and probabilities are visualized alongside a time-aligned video strip and IMU signal plot.
  • Figure 4: Model performance evaluation. A confusion matrix summed over 10-fold cross-validation (left panel) shows high classification accuracy across all motion classes. Class label 0 represents the negative class, and labels 1 to 6 refer to dance movements in Figure 2. The average multiclass ROC curve (right panel) for all seven labels is calculated, with all AUC scores above 0.99, indicating strong discriminability for each class.
  • Figure 5: Real-time human-machine interaction during rehearsal. A dance artist performs in front of a projection controlled by our system described in Figure \ref{['fig:4.3.1']}. The laptop screen shows the multimedia interface, while wearable IMU sensors and a smartphone are used to stream and monitor real-time data.