FastAvatar: Instant 3D Gaussian Splatting for Faces from Single Unconstrained Poses
Hao Liang, Zhixuan Ge, Soumendu Majee, Ashish Tiwari, G. M. Dilshan Godaliyadda, Ashok Veeraraghavan, Guha Balakrishnan
TL;DR
FastAvatar tackles single-image 3D face reconstruction under unconstrained poses by introducing a template-based 3D Gaussian Splatting representation and a two-stage pipeline: a pose-invariant encoder–decoder predicts Gaussian residuals to deform a FLAME-aligned template, followed by a lightweight appearance refinement. This approach combines the stability of a strong geometric prior with targeted optimization to achieve high fidelity, reporting $PSNR$ $24.01$ dB and $SSIM$ $0.91$ in roughly $3$ seconds on an NVIDIA A100. The method also enables photorealistic novel-view synthesis and FLAME-guided expression animation, with demonstrated generalization to unseen identities and out-of-distribution subjects. Overall, FastAvatar delivers a practical, real-time solution that bridges fast feed-forward prediction and per-subject optimization, expanding the applicability of 3DGS-based facial avatars for interactive applications.
Abstract
We present FastAvatar, a fast and robust algorithm for single-image 3D face reconstruction using 3D Gaussian Splatting (3DGS). Given a single input image from an arbitrary pose, FastAvatar recovers a high-quality, full-head 3DGS avatar in approximately 3 seconds on a single NVIDIA A100 GPU. We use a two-stage design: a feed-forward encoder-decoder predicts coarse face geometry by regressing Gaussian structure from a pose-invariant identity embedding, and a lightweight test-time refinement stage then optimizes the appearance parameters for photorealistic rendering. This hybrid strategy combines the speed and stability of direct prediction with the accuracy of optimization, enabling strong identity preservation even under extreme input poses. FastAvatar achieves state-of-the-art reconstruction quality (24.01 dB PSNR, 0.91 SSIM) while running over 600x faster than existing per-subject optimization methods (e.g., FlashAvatar, GaussianAvatars, GASP). Once reconstructed, our avatars support photorealistic novel-view synthesis and FLAME-guided expression animation, enabling controllable reenactment from a single image. By jointly offering high fidelity, robustness to pose, and rapid reconstruction, FastAvatar significantly broadens the applicability of 3DGS-based facial avatars.
