Table of Contents
Fetching ...

Leader and Follower: Interactive Motion Generation under Trajectory Constraints

Runqi Wang, Caoyuan Ma, Jian Zhao, Hanrui Xu, Dongfang Sun, Haoyang Chen, Lin Xiong, Zheng Wang, Xuelong Li

TL;DR

The paper tackles interactive multi-person motion generation conditioned on both text and explicit trajectory constraints, a problem where prior methods struggle with trajectory precision and inter-person dynamics. It introduces a training-free Leader-Follow paradigm that decouples two-agent interactions and leverages a diffusion-based framework, a unidirectional Pace Controller to enforce leader trajectory fidelity, and a Kinematic Synchronization Adapter to maintain follower plausibility via SMPL-based collision handling. The Motion Range Refinement Process identifies the mid-stage of diffusion as critical for trajectory adjustment, guiding the leader and then aligning the follower with the leader while preserving physical plausibility. Extensive experiments on the InterHuman dataset show that the proposed method yields higher realism and trajectory adherence than state-of-the-art approaches, with only modest increases in inference time, making it practical for game and film production pipelines.

Abstract

With the rapid advancement of game and film production, generating interactive motion from texts has garnered significant attention due to its potential to revolutionize content creation processes. In many practical applications, there is a need to impose strict constraints on the motion range or trajectory of virtual characters. However, existing methods that rely solely on textual input face substantial challenges in accurately capturing the user's intent, particularly in specifying the desired trajectory. As a result, the generated motions often lack plausibility and accuracy. Moreover, existing trajectory - based methods for customized motion generation rely on retraining for single - actor scenarios, which limits flexibility and adaptability to different datasets, as well as interactivity in two-actor motions. To generate interactive motion following specified trajectories, this paper decouples complex motion into a Leader - Follower dynamic, inspired by role allocation in partner dancing. Based on this framework, this paper explores the motion range refinement process in interactive motion generation and proposes a training-free approach, integrating a Pace Controller and a Kinematic Synchronization Adapter. The framework enhances the ability of existing models to generate motion that adheres to trajectory by controlling the leader's movement and correcting the follower's motion to align with the leader. Experimental results show that the proposed approach, by better leveraging trajectory information, outperforms existing methods in both realism and accuracy.

Leader and Follower: Interactive Motion Generation under Trajectory Constraints

TL;DR

The paper tackles interactive multi-person motion generation conditioned on both text and explicit trajectory constraints, a problem where prior methods struggle with trajectory precision and inter-person dynamics. It introduces a training-free Leader-Follow paradigm that decouples two-agent interactions and leverages a diffusion-based framework, a unidirectional Pace Controller to enforce leader trajectory fidelity, and a Kinematic Synchronization Adapter to maintain follower plausibility via SMPL-based collision handling. The Motion Range Refinement Process identifies the mid-stage of diffusion as critical for trajectory adjustment, guiding the leader and then aligning the follower with the leader while preserving physical plausibility. Extensive experiments on the InterHuman dataset show that the proposed method yields higher realism and trajectory adherence than state-of-the-art approaches, with only modest increases in inference time, making it practical for game and film production pipelines.

Abstract

With the rapid advancement of game and film production, generating interactive motion from texts has garnered significant attention due to its potential to revolutionize content creation processes. In many practical applications, there is a need to impose strict constraints on the motion range or trajectory of virtual characters. However, existing methods that rely solely on textual input face substantial challenges in accurately capturing the user's intent, particularly in specifying the desired trajectory. As a result, the generated motions often lack plausibility and accuracy. Moreover, existing trajectory - based methods for customized motion generation rely on retraining for single - actor scenarios, which limits flexibility and adaptability to different datasets, as well as interactivity in two-actor motions. To generate interactive motion following specified trajectories, this paper decouples complex motion into a Leader - Follower dynamic, inspired by role allocation in partner dancing. Based on this framework, this paper explores the motion range refinement process in interactive motion generation and proposes a training-free approach, integrating a Pace Controller and a Kinematic Synchronization Adapter. The framework enhances the ability of existing models to generate motion that adheres to trajectory by controlling the leader's movement and correcting the follower's motion to align with the leader. Experimental results show that the proposed approach, by better leveraging trajectory information, outperforms existing methods in both realism and accuracy.

Paper Structure

This paper contains 14 sections, 7 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: Comparison of Our Task with Previous Works. (a) Interaction methods based on textual input to describe trajectory result in trajectory deviations and interaction errors (as indicated by the red circle); (b) Some methods for single-actor motion generation use 3D trajectories but require retraining and fail to account for inter-person interactions; (c) Our approach leverages precise 3D trajectory and textual input to guide interactive motion generation, achieving consistent trajectory generation without additional retraining.
  • Figure 2: Motion Range Refinement Process. We visualize the trajectories and human poses at different time steps during the denoising process. The process of interactive motion generation is divided into three stages: early diffusion characterized by high noise and overlapping trajectories, mid-stage stabilization of movement direction and range, and final-stage refinement of motion details with stable motion range.
  • Figure 3: Motion Generation Pipeline with Text and Trajectory input. Inspired by partner dance leadership, we first use a Controller to define the leader's trajectory, and then employ an Adapter to guide the follower's motion to align with the leader.
  • Figure 4: Visual Comparison with Other Methods. In complex scenarios requiring close contact and interaction, baseline models often produce unnatural interpenetration (as indicated by the red circles). Our approach controls the leader's trajectory and guides the follower's actions to align with the leader, thereby effectively addressing these issues.
  • Figure 5: Demonstration of Trajectory Guidance Effect. For the same input text, "Two people are dancing together," we provide different trajectory conditions. All generated sequences align with both the trajectory and textual features, resulting in realistic and natural motions.
  • ...and 3 more figures