GaussianStyle: Gaussian Head Avatar via StyleGAN
Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, Chenliang Xu
TL;DR
GaussianStyle addresses the challenge of producing high-fidelity, editable head avatars from monocular video by overcoming fixed-canonical-coordinate limitations that cause over-smoothing in dynamic head modeling. It fuses 3D Gaussian Splatting with StyleGAN through a temporal-aware Triplane-Gaussian representation, attention-based deformation, and a multi-stage training pipeline that maps volumetric features into StyleGAN latent space. The approach yields state-of-the-art results in portrait reenactment, novel-view synthesis, and 3D editing, while maintaining inference speeds above 30 FPS. This combination enables high-detail rendering with controllable expressions and poses directly from monocular input, making it practical for real-time avatar applications.
Abstract
Existing methods like Neural Radiation Fields (NeRF) and 3D Gaussian Splatting (3DGS) have made significant strides in facial attribute control such as facial animation and components editing, yet they struggle with fine-grained representation and scalability in dynamic head modeling. To address these limitations, we propose GaussianStyle, a novel framework that integrates the volumetric strengths of 3DGS with the powerful implicit representation of StyleGAN. The GaussianStyle preserves structural information, such as expressions and poses, using Gaussian points, while projecting the implicit volumetric representation into StyleGAN to capture high-frequency details and mitigate the over-smoothing commonly observed in neural texture rendering. Experimental outcomes indicate that our method achieves state-of-the-art performance in reenactment, novel view synthesis, and animation.
