MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion
Sai Shashank Kalakonda, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla
TL;DR
MoRAG tackles the challenge of broadening language generalization in text-to-human motion by introducing a multi-part retrieval strategy that uses LLM-generated part-specific descriptions to retrieve torso, hands, and legs motions. These part motions are spatially composed into full-body sequences and used as additional conditioning for diffusion-based motion generation via a Semantics-Modulated Transformer backbone, yielding improved semantic alignment, diversity, and zero-shot capability. The approach is instantiated as MoRAG and MoRAG-Diffuse, demonstrated on HumanML3D with GPT-3.5-turbo-instruct prompts, and shown to outperform prior text-to-motion retrieval and diffusion baselines. This work enables plug-and-play augmentation of diffusion models for more robust, varied, and unseen-text motion generation, with practical impact for realistic human motion synthesis in animation and robotics, while acknowledging dependencies on GPT prompts and dataset scale.
Abstract
We introduce MoRAG, a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation. The method enhances motion diffusion models by leveraging additional knowledge obtained through an improved motion retrieval process. By effectively prompting large language models (LLMs), we address spelling errors and rephrasing issues in motion retrieval. Our approach utilizes a multi-part retrieval strategy to improve the generalizability of motion retrieval across the language space. We create diverse samples through the spatial composition of the retrieved motions. Furthermore, by utilizing low-level, part-specific motion information, we can construct motion samples for unseen text descriptions. Our experiments demonstrate that our framework can serve as a plug-and-play module, improving the performance of motion diffusion models. Code, pretrained models and sample videos are available at: https://motion-rag.github.io/
