FineXtrol: Controllable Motion Generation via Fine-Grained Text
Keming Shen, Bizhu Wu, Junliang Chen, Xiaoqin Wang, Linlin Shen
TL;DR
FineXtrol addresses the challenge of controllable text-driven motion generation by introducing fine-grained, temporally explicit textual controls for body-part movements. It employs a dual-branch diffusion framework with residual guidance from fine-grained text and a hierarchical contrastive learning module to produce discriminative embeddings for these signals. Empirical results on HumanML3D show strong controllability across multiple body parts and temporal intervals, with improved efficiency and reduced parameter count compared to coordinate-based methods. The approach yields higher realism and precision in motion generation and offers a user-friendly alternative to spatial control signals, enabling scalable, fine-grained manipulation of human motions in practical applications.
Abstract
Recent works have sought to enhance the controllability and precision of text-driven motion generation. Some approaches leverage large language models (LLMs) to produce more detailed texts, while others incorporate global 3D coordinate sequences as additional control signals. However, the former often introduces misaligned details and lacks explicit temporal cues, and the latter incurs significant computational cost when converting coordinates to standard motion representations. To address these issues, we propose FineXtrol, a novel control framework for efficient motion generation guided by temporally-aware, precise, user-friendly, and fine-grained textual control signals that describe specific body part movements over time. In support of this framework, we design a hierarchical contrastive learning module that encourages the text encoder to produce more discriminative embeddings for our novel control signals, thereby improving motion controllability. Quantitative results show that FineXtrol achieves strong performance in controllable motion generation, while qualitative analysis demonstrates its flexibility in directing specific body part movements.
