MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation
Haopeng Fang, Di Qiu, Binjie Mao, He Tang
TL;DR
MotionCharacter tackles identity preservation and fine-grained motion control in text-to-video generation. It introduces an ID-Preserving Adapter and a Motion Control Module, augmented by Region-Aware and ID-Consistency losses, and leverages the Human-Motion dataset with optical-flow-derived motion intensity to guide training. The approach enables identity-consistent videos that accurately follow nuanced actions and allows intuitive motion scaling without per-identity retraining. Experimental results and user studies show improved identity fidelity, motion adherence, and visual quality over baseline methods.
Abstract
Recent advancements in personalized Text-to-Video (T2V) generation highlight the importance of integrating character-specific identities and actions. However, previous T2V models struggle with identity consistency and controllable motion dynamics, mainly due to limited fine-grained facial and action-based textual prompts, and datasets that overlook key human attributes and actions. To address these challenges, we propose MotionCharacter, an efficient and high-fidelity human video generation framework designed for identity preservation and fine-grained motion control. We introduce an ID-preserving module to maintain identity fidelity while allowing flexible attribute modifications, and further integrate ID-consistency and region-aware loss mechanisms, significantly enhancing identity consistency and detail fidelity. Additionally, our approach incorporates a motion control module that prioritizes action-related text while maintaining subject consistency, along with a dataset, Human-Motion, which utilizes large language models to generate detailed motion descriptions. For simplify user control during inference, we parameterize motion intensity through a single coefficient, allowing for easy adjustments. Extensive experiments highlight the effectiveness of MotionCharacter, demonstrating significant improvements in ID-preserving, high-quality video generation.
