Identity-Preserving Pose-Guided Character Animation via Facial Landmarks Transformation
Lianrui Mu, Xingze Zhou, Wenjie Zheng, Jiangnan Ye, Haoji Hu
TL;DR
The paper tackles identity preservation in pose-guided character animation when driving landmarks misalign with reference facial geometry. It introduces Facial Landmarks Transformation (FLT), a training-free, plug-and-play pipeline based on a 3D Morphable Model that converts 2D landmarks to a 3D face, enforces the reference identity by combining reference shape with driving expressions, and re-renders to produce transformed landmarks for generation. Key contributions include the FLT framework, its applicability as a drop-in tool for existing generation models, and open-source release, validated on two models (AnimateAnyone and ControlNeXt) and two datasets (TikTok and UBC Fashion) showing improved identity preservation and temporal coherence. The approach enables more faithful and consistent pose-guided animations in challenging scenarios, with potential impact on virtual character production and personalized video synthesis, while noting limitations in landmark detection under rapid motion and occlusion and prospects for end-to-end and full-body extensions.
Abstract
Creating realistic pose-guided image-to-video character animations while preserving facial identity remains challenging, especially in complex and dynamic scenarios such as dancing, where precise identity consistency is crucial. Existing methods frequently encounter difficulties maintaining facial coherence due to misalignments between facial landmarks extracted from driving videos that provide head pose and expression cues and the facial geometry of the reference images. To address this limitation, we introduce the Facial Landmarks Transformation (FLT) method, which leverages a 3D Morphable Model to address this limitation. FLT converts 2D landmarks into a 3D face model, adjusts the 3D face model to align with the reference identity, and then transforms them back into 2D landmarks to guide the image-to-video generation process. This approach ensures accurate alignment with the reference facial geometry, enhancing the consistency between generated videos and reference images. Experimental results demonstrate that FLT effectively preserves facial identity, significantly improving pose-guided character animation models.
