Zero-shot High-fidelity and Pose-controllable Character Animation
Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Guo-Jun Qi, Yu-Gang Jiang
TL;DR
This work tackles zero-shot image-to-video character animation from a single image, addressing the dual challenges of high visual fidelity and strict pose control without model training. It introduces PoseAnimate, a reconstruction-based framework equipped with four key innovations: PACM for pose-aware embeddings, DCAM for maintaining identity and temporal coherence, MGDM for decoupled character-background attention, and PATA for smooth pose transitions. Across extensive experiments, PoseAnimate surpasses state-of-the-art training-based methods in character consistency and detail fidelity while maintaining temporal coherence, validating its effectiveness and efficiency. By leveraging existing diffusion models with targeted modules, the approach enables high-quality, pose-controllable animations without requiring additional training data.
Abstract
Image-to-video (I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity. However, existing approaches suffer from inconsistency of character appearances and poor preservation of fine details. Moreover, they require a large amount of video data for training, which can be computationally demanding. To address these limitations, we propose PoseAnimate, a novel zero-shot I2V framework for character animation. PoseAnimate contains three key components: 1) a Pose-Aware Control Module (PACM) that incorporates diverse pose signals into text embeddings, to preserve character-independent content and maintain precise alignment of actions. 2) a Dual Consistency Attention Module (DCAM) that enhances temporal consistency and retains character identity and intricate background details. 3) a Mask-Guided Decoupling Module (MGDM) that refines distinct feature perception abilities, improving animation fidelity by decoupling the character and background. We also propose a Pose Alignment Transition Algorithm (PATA) to ensure smooth action transition. Extensive experiment results demonstrate that our approach outperforms the state-of-the-art training-based methods in terms of character consistency and detail fidelity. Moreover, it maintains a high level of temporal coherence throughout the generated animations.
