InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation
Zeyu Zhang, Akide Liu, Qi Chen, Feng Chen, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang
TL;DR
InfiniMotion presents a memory-augmented autoregressive framework for long-text-to-motion generation. The Motion Memory Transformer, enhanced by Bidirectional Mamba Memory, enables coherent generation across extremely long sequences by preserving global semantics and local transitions. Through Residual VQ-VAE, Mask Transformer, and Residual Transformer components, the method discretizes motion, aligns text with tokens, and models multi-layer representations, respectively. Evaluations on the BABEL dataset show over a 30% improvement in FID and the ability to produce motions six times longer than prior work, highlighting strong potential for film, games, and robotics applications.
Abstract
Text-to-motion generation holds potential for film, gaming, and robotics, yet current methods often prioritize short motion generation, making it challenging to produce long motion sequences effectively: (1) Current methods struggle to handle long motion sequences as a single input due to prohibitively high computational cost; (2) Breaking down the generation of long motion sequences into shorter segments can result in inconsistent transitions and requires interpolation or inpainting, which lacks entire sequence modeling. To solve these challenges, we propose InfiniMotion, a method that generates continuous motion sequences of arbitrary length within an autoregressive framework. We highlight its groundbreaking capability by generating a continuous 1-hour human motion with around 80,000 frames. Specifically, we introduce the Motion Memory Transformer with Bidirectional Mamba Memory, enhancing the transformer's memory to process long motion sequences effectively without overwhelming computational resources. Notably our method achieves over 30% improvement in FID and 6 times longer demonstration compared to previous state-of-the-art methods, showcasing significant advancements in long motion generation. See project webpage: https://steve-zeyu-zhang.github.io/InfiniMotion/
