Table of Contents
Fetching ...

PMG: Progressive Motion Generation via Sparse Anchor Postures Curriculum Learning

Yingjie Xi, Jian Jun Zhang, Xiaosong Yang

TL;DR

The paper tackles the challenge of generating controllable, high-fidelity human motion that adheres to both global trajectories and fine-grained postures. It introduces ProMoGen, a diffusion-based framework that decouples trajectory guidance from local anchor poses via a Trajectory Encoder and an Anchor Motion Encoder, coupled through an Initial Motion Generator and a Refinement Module. To stabilize learning with sparse anchor guidance, it proposes SAP-CL, a curriculum that progressively reduces anchor density across stages and employs a Filtering Module to sample anchors. Experiments on HumanML3D and CombatMotion demonstrate state-of-the-art performance across metrics such as MPJPE and FID, confirming improved controllability, fidelity, and efficiency over baseline methods.

Abstract

In computer animation, game design, and human-computer interaction, synthesizing human motion that aligns with user intent remains a significant challenge. Existing methods have notable limitations: textual approaches offer high-level semantic guidance but struggle to describe complex actions accurately; trajectory-based techniques provide intuitive global motion direction yet often fall short in generating precise or customized character movements; and anchor poses-guided methods are typically confined to synthesize only simple motion patterns. To generate more controllable and precise human motions, we propose \textbf{ProMoGen (Progressive Motion Generation)}, a novel framework that integrates trajectory guidance with sparse anchor motion control. Global trajectories ensure consistency in spatial direction and displacement, while sparse anchor motions only deliver precise action guidance without displacement. This decoupling enables independent refinement of both aspects, resulting in a more controllable, high-fidelity, and sophisticated motion synthesis. ProMoGen supports both dual and single control paradigms within a unified training process. Moreover, we recognize that direct learning from sparse motions is inherently unstable, we introduce \textbf{SAP-CL (Sparse Anchor Posture Curriculum Learning)}, a curriculum learning strategy that progressively adjusts the number of anchors used for guidance, thereby enabling more precise and stable convergence. Extensive experiments demonstrate that ProMoGen excels in synthesizing vivid and diverse motions guided by predefined trajectory and arbitrary anchor frames. Our approach seamlessly integrates personalized motion with structured guidance, significantly outperforming state-of-the-art methods across multiple control scenarios.

PMG: Progressive Motion Generation via Sparse Anchor Postures Curriculum Learning

TL;DR

The paper tackles the challenge of generating controllable, high-fidelity human motion that adheres to both global trajectories and fine-grained postures. It introduces ProMoGen, a diffusion-based framework that decouples trajectory guidance from local anchor poses via a Trajectory Encoder and an Anchor Motion Encoder, coupled through an Initial Motion Generator and a Refinement Module. To stabilize learning with sparse anchor guidance, it proposes SAP-CL, a curriculum that progressively reduces anchor density across stages and employs a Filtering Module to sample anchors. Experiments on HumanML3D and CombatMotion demonstrate state-of-the-art performance across metrics such as MPJPE and FID, confirming improved controllability, fidelity, and efficiency over baseline methods.

Abstract

In computer animation, game design, and human-computer interaction, synthesizing human motion that aligns with user intent remains a significant challenge. Existing methods have notable limitations: textual approaches offer high-level semantic guidance but struggle to describe complex actions accurately; trajectory-based techniques provide intuitive global motion direction yet often fall short in generating precise or customized character movements; and anchor poses-guided methods are typically confined to synthesize only simple motion patterns. To generate more controllable and precise human motions, we propose \textbf{ProMoGen (Progressive Motion Generation)}, a novel framework that integrates trajectory guidance with sparse anchor motion control. Global trajectories ensure consistency in spatial direction and displacement, while sparse anchor motions only deliver precise action guidance without displacement. This decoupling enables independent refinement of both aspects, resulting in a more controllable, high-fidelity, and sophisticated motion synthesis. ProMoGen supports both dual and single control paradigms within a unified training process. Moreover, we recognize that direct learning from sparse motions is inherently unstable, we introduce \textbf{SAP-CL (Sparse Anchor Posture Curriculum Learning)}, a curriculum learning strategy that progressively adjusts the number of anchors used for guidance, thereby enabling more precise and stable convergence. Extensive experiments demonstrate that ProMoGen excels in synthesizing vivid and diverse motions guided by predefined trajectory and arbitrary anchor frames. Our approach seamlessly integrates personalized motion with structured guidance, significantly outperforming state-of-the-art methods across multiple control scenarios.

Paper Structure

This paper contains 16 sections, 13 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: The SAP-CL training strategy(left panel) and the structure of ProMoGen method(right panel).
  • Figure 2: In this figure, the horizontal axis represents FID, while the vertical axis corresponds to MPJPE(To ensure comparability, the values of all models have been transformed to a uniform scale). Notably, data points that approach the upper right region along the diagonal indicate superior model performance. The v1 model achieves the most favorable results, thereby substantiating the robustness of our whole structure design. Additionally, the placement of v2 in the upper right corner further demonstrates the exceptional performance of our complete model with fine-designed components.
  • Figure 3: This figure illustrates the trajectory-based, sparse pose-guided motion generation process of our ProMoGen. In the top row, the reconstructed anchor motion (displayed as a mesh on the right) is compared with the control pose (depicted as a skeleton on the left), demonstrating a high degree of correspondence. In each of the subsequent sub-figures, the primary visualization represents the reconstructed motion, while the lines on the left indicate the trajectory. Color coding is applied such that both the character and the trajectory transition from purple (starting point) to yellow(ending point). The upper left corner of each sub-figure provides the guidance of sparse poses. All visualizations in this figure are derived from inferences made under the condition that the number of sparse anchor poses$f_n$ is fixed at five.
  • Figure 4: The figure illustrates the comparative generation effects of various modules. Under identical trajectory and anchor postures, the motions synthesized by our model are notably smoother and exhibit richer dynamic variations. Furthermore, the generated motions of ProMoGen most accurately adhere to the sparse poses guidance, highlighting the superior performance of our approach.
  • Figure 5: The two figures above display the changing of FID and MPJPE metrics, comparing regular training with curriculum learning. The curriculum learning yields significant improvements in model performance across varying numbers of anchor poses. The subfigures below further demonstrate that as the curriculum learning stages progress, model performance progressively enhances.
  • ...and 1 more figures