ReactionMamba: Generating Short &Long Human Reaction Sequences
Hajra Anwar Beg, Baptiste Chopin, Hao Tang, Mohamed Daoudi
TL;DR
ReactionMamba introduces a VAE-based framework fused with Mamba selective state-space models to generate long-horizon, two-person reaction motions conditioned on a given actor action. The encoder maps the reactor's pose sequence to latent variables, while the decoder uses these latent codes together with the action sequence and initial pose to reconstruct coherent reaction sequences, enabling scalable, real-time generation. Across Lindy Hop, NTU120-AS, and InterX, the method delivers competitive realism and diversity with orders-of-magnitude faster inference than transformer-based baselines, and demonstrates robustness in long-horizon scenarios. Ablation studies confirm the value of direct initial-pose and action conditioning, while limitations point to adaptive conditioning and enhanced foot-ground realism as future directions.
Abstract
We present ReactionMamba, a novel framework for generating long 3D human reaction motions. Reaction-Mamba integrates a motion VAE for efficient motion encoding with Mamba-based state-space models to decode temporally consistent reactions. This design enables ReactionMamba to generate both short sequences of simple motions and long sequences of complex motions, such as dance and martial arts. We evaluate ReactionMamba on three datasets--NTU120-AS, Lindy Hop, and InterX--and demonstrate competitive performance in terms of realism, diversity, and long-sequence generation compared to previous methods, including InterFormer, ReMoS, and Ready-to-React, while achieving substantial improvements in inference speed.
