OpenT2M: No-frill Motion Generation with Open-source,Large-scale, High-quality Data

Bin Cao; Sipeng Zheng; Hao Luo; Boyuan Li; Jing Liu; Zongqing Lu

OpenT2M: No-frill Motion Generation with Open-source,Large-scale, High-quality Data

Bin Cao, Sipeng Zheng, Hao Luo, Boyuan Li, Jing Liu, Zongqing Lu

Abstract

Text-to-motion (T2M) generation aims to create realistic human movements from text descriptions, with promising applications in animation and robotics. Despite recent progress, current T2M models perform poorly on unseen text descriptions due to the small scale and limited diversity of existing motion datasets. To address this problem, we introduce OpenT2M, a million-level, high-quality, and open-source motion dataset containing over 2800 hours of human motion. Each sequence undergoes rigorous quality control through physical feasibility validation and multi-granularity filtering, with detailed second-wise text annotations. We also develop an automated pipeline for creating long-horizon sequences, enabling complex motion generation. Building upon OpenT2M, we introduce MonoFrill, a pretrained motion model that achieves compelling T2M results without complicated designs or technique tricks as "frills". Its core component is 2D-PRQ, a novel motion tokenizer that captures spatiotemporal dependencies by dividing the human body into biology parts. Experiments show that OpenT2M significantly improves generalization of existing T2M models, while 2D-PRQ achieves superior reconstruction and strong zero-shot performance. We expect OpenT2M and MonoFrill will advance the T2M field by addressing longstanding data quality and benchmarking challenges.

OpenT2M: No-frill Motion Generation with Open-source,Large-scale, High-quality Data

Abstract

Paper Structure (16 sections, 2 equations, 8 figures, 8 tables)

This paper contains 16 sections, 2 equations, 8 figures, 8 tables.

Introduction
Related Work
The OpenT2M Dataset
The MonoFrill Model
Experiments
Experimental Setup
Effectiveness of OpenT2M Dataset
Effectiveness of 2D-PRQ
Conclusion
Additional Analysis of OpenT2M
Data Distribution
Comparison of Long-horizon Datasets
Second-wise Text Annotation
Additional Details of 2D-PRQ
Evaluation Metrics
...and 1 more sections

Figures (8)

Figure 1: (Left) Visualization of text embeddings for the training and validation sets of HumanML3D and Motion-X. A substantial overlap between the splits indicates data leakage. To avoid this risk, we remove the overlap via data repartition (version denoted as $*$). (Right) However, we observe a drastic performance drop when experimenting on this repartitioned benchmark, which reveals the limited generalization capability of current methods when faced with out-of-domain data.
Figure 2: Data Curation pipeline.(a) We adopt a two-stage pipeline, including physically feasible validation and multi-granularity filter. (b) We adapt the interpolation-based method for motion curation and introduce an RL-policy for refinement. (c) For text annotation, we generate temporally aligned labels for each second of video, using them to synthesize a precise, semantic-rich description.
Figure 3: Model Overview. We propose an extendable, autoregressive (AR) and discrete T2M model with no frills. (left) Our core design 2D-PRQ divides the entire body into five parts, encoding and quantizing motion into a sequence of discrete part-level tokens. (right) The AR model takes text as input and predicts part-level motion tokens. We call this model "MonoFrill" to show its simplicity.
Figure 4: Visualization of generated long-horizon motions. Visualization results demonstrate the ability to generate long-horizon motion sequences that accurately align with complex texts.
Figure 5: Statistics of the OpenT2M dataset. (a) Motion sequence distribution (log scale). (b) Average motion length distribution.
...and 3 more figures

OpenT2M: No-frill Motion Generation with Open-source,Large-scale, High-quality Data

Abstract

OpenT2M: No-frill Motion Generation with Open-source,Large-scale, High-quality Data

Authors

Abstract

Table of Contents

Figures (8)