FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image
Fei Yin, Mallikarjun B R, Chun-Han Yao, Rafał Mantiuk, Varun Jampani
TL;DR
FaceCraft4D tackles the challenge of animatable 4D avatar generation from a single image by integrating three priors—shape from 3D-GAN inversion, image priors with depth-guided cross-view warping and diffusion-based texture refinement, and a video prior for synchronized multi-view expressions. It introduces COIN training to robustly learn a consistent base representation (GaussianAvatar) while capturing view-specific details through a lightweight MLP that handles inconsistencies. The two-stage pipeline first synthesizes personalized multiview data and then optimizes a 4D representation that can be animated via FLAME parameters, achieving superior shape fidelity, texture quality, and cross-view identity preservation. The method enables high-quality, 360-degree avatar rendering from a single image with practical training and real-time rendering capabilities, broadening accessibility for applications in gaming, education, and film.
Abstract
We present a novel framework for generating high-quality, animatable 4D avatar from a single image. While recent advances have shown promising results in 4D avatar creation, existing methods either require extensive multiview data or struggle with shape accuracy and identity consistency. To address these limitations, we propose a comprehensive system that leverages shape, image, and video priors to create full-view, animatable avatars. Our approach first obtains initial coarse shape through 3D-GAN inversion. Then, it enhances multiview textures using depth-guided warping signals for cross-view consistency with the help of the image diffusion model. To handle expression animation, we incorporate a video prior with synchronized driving signals across viewpoints. We further introduce a Consistent-Inconsistent training to effectively handle data inconsistencies during 4D reconstruction. Experimental results demonstrate that our method achieves superior quality compared to the prior art, while maintaining consistency across different viewpoints and expressions.
