Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

Ke Fan; Shunlin Lu; Minyue Dai; Runyi Yu; Lixing Xiao; Zhiyang Dou; Junting Dong; Lizhuang Ma; Jingbo Wang

Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

Ke Fan, Shunlin Lu, Minyue Dai, Runyi Yu, Lixing Xiao, Zhiyang Dou, Junting Dong, Lizhuang Ma, Jingbo Wang

TL;DR

The paper tackles zero-shot motion generation by scaling both data and model capacity. It introduces MotionMillion, a 2,000+ hour, 2M-sequence motion dataset with rich text captions, and MotionMillion-Eval, a standardized zero-shot benchmark. A decoder-only transformer with FSQ-based motion tokens and wavelet preprocessing, built on LLAMA/T5-XL, demonstrates strong zero-shot generalization to out-of-domain and complex compositional motions at up to 7B parameters. The work advances data-driven pathways for zero-shot motion generation and provides a rigorous evaluation framework for future comparisons.

Abstract

Generating diverse and natural human motion sequences based on textual descriptions constitutes a fundamental and challenging research area within the domains of computer vision, graphics, and robotics. Despite significant advancements in this field, current methodologies often face challenges regarding zero-shot generalization capabilities, largely attributable to the limited size of training datasets. Moreover, the lack of a comprehensive evaluation framework impedes the advancement of this task by failing to identify directions for improvement. In this work, we aim to push text-to-motion into a new era, that is, to achieve the generalization ability of zero-shot. To this end, firstly, we develop an efficient annotation pipeline and introduce MotionMillion-the largest human motion dataset to date, featuring over 2,000 hours and 2 million high-quality motion sequences. Additionally, we propose MotionMillion-Eval, the most comprehensive benchmark for evaluating zero-shot motion generation. Leveraging a scalable architecture, we scale our model to 7B parameters and validate its performance on MotionMillion-Eval. Our results demonstrate strong generalization to out-of-domain and complex compositional motions, marking a significant step toward zero-shot human motion generation. The code is available at https://github.com/VankouF/MotionMillion-Codes.

Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

TL;DR

Abstract

Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)