From Diffusion To Flow: Efficient Motion Generation In MotionGPT3

Jaymin Ban; JiHong Jeon; SangYeop Jeong

From Diffusion To Flow: Efficient Motion Generation In MotionGPT3

Jaymin Ban, JiHong Jeon, SangYeop Jeong

Abstract

Recent text-driven motion generation methods span both discrete token-based approaches and continuous-latent formulations. MotionGPT3 exemplifies the latter paradigm, combining a learned continuous motion latent space with a diffusion-based prior for text-conditioned synthesis. While rectified flow objectives have recently demonstrated favorable convergence and inference-time properties relative to diffusion in image and audio generation, it remains unclear whether these advantages transfer cleanly to the motion generation setting. In this work, we conduct a controlled empirical study comparing diffusion and rectified flow objectives within the MotionGPT3 framework. By holding the model architecture, training protocol, and evaluation setup fixed, we isolate the effect of the generative objective on training dynamics, final performance, and inference efficiency. Experiments on the HumanML3D dataset show that rectified flow converges in fewer training epochs, reaches strong test performance earlier, and matches or exceeds diffusion-based motion quality under identical conditions. Moreover, flow-based priors exhibit stable behavior across a wide range of inference step counts and achieve competitive quality with fewer sampling steps, yielding improved efficiency--quality trade-offs. Overall, our results suggest that several known benefits of rectified flow objectives do extend to continuous-latent text-to-motion generation, highlighting the importance of the training objective choice in motion priors.

From Diffusion To Flow: Efficient Motion Generation In MotionGPT3

Abstract

From Diffusion To Flow: Efficient Motion Generation In MotionGPT3

Abstract

Paper Structure

Table of Contents

Figures (4)