RMD: A Simple Baseline for More General Human Motion Generation via Training-free Retrieval-Augmented Motion Diffuse
Zhouyingcheng Liao, Mingyuan Zhang, Wenjia Wang, Lei Yang, Taku Komura
TL;DR
The paper tackles the generalization gap in text-to-human motion generation by introducing RMD, a training-free retrieval-augmented baseline that decomposes prompts with an LLM, retrieves multi-granularity motions from an external database, composes them coherently, and refines the result with a pretrained motion diffusion model. By employing a hierarchical retrieval strategy and a SDEdit-style diffusion refinement, RMD balances retrieved guidance with diffusion priors to achieve superior performance on in-domain and out-of-domain data without extra training, as demonstrated on HumanML3D and Mixamo benchmarks. Key results show improvements in R-Precision and MM Dist, alongside favorable user-study feedback for OOD prompts, validating improved semantic alignment and motion naturalness. The work highlights practical impact: leveraging external data and a strong diffusion prior at inference to achieve generalizable motion generation with minimal design complexity and training overhead, while also outlining avenues for automatic $t_0$ selection in future work.
Abstract
While motion generation has made substantial progress, its practical application remains constrained by dataset diversity and scale, limiting its ability to handle out-of-distribution scenarios. To address this, we propose a simple and effective baseline, RMD, which enhances the generalization of motion generation through retrieval-augmented techniques. Unlike previous retrieval-based methods, RMD requires no additional training and offers three key advantages: (1) the external retrieval database can be flexibly replaced; (2) body parts from the motion database can be reused, with an LLM facilitating splitting and recombination; and (3) a pre-trained motion diffusion model serves as a prior to improve the quality of motions obtained through retrieval and direct combination. Without any training, RMD achieves state-of-the-art performance, with notable advantages on out-of-distribution data.
