GaussianStyle: Gaussian Head Avatar via StyleGAN

Pinxin Liu; Luchuan Song; Daoan Zhang; Hang Hua; Yunlong Tang; Huaijin Tu; Jiebo Luo; Chenliang Xu

GaussianStyle: Gaussian Head Avatar via StyleGAN

Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, Chenliang Xu

TL;DR

GaussianStyle addresses the challenge of producing high-fidelity, editable head avatars from monocular video by overcoming fixed-canonical-coordinate limitations that cause over-smoothing in dynamic head modeling. It fuses 3D Gaussian Splatting with StyleGAN through a temporal-aware Triplane-Gaussian representation, attention-based deformation, and a multi-stage training pipeline that maps volumetric features into StyleGAN latent space. The approach yields state-of-the-art results in portrait reenactment, novel-view synthesis, and 3D editing, while maintaining inference speeds above 30 FPS. This combination enables high-detail rendering with controllable expressions and poses directly from monocular input, making it practical for real-time avatar applications.

Abstract

Existing methods like Neural Radiation Fields (NeRF) and 3D Gaussian Splatting (3DGS) have made significant strides in facial attribute control such as facial animation and components editing, yet they struggle with fine-grained representation and scalability in dynamic head modeling. To address these limitations, we propose GaussianStyle, a novel framework that integrates the volumetric strengths of 3DGS with the powerful implicit representation of StyleGAN. The GaussianStyle preserves structural information, such as expressions and poses, using Gaussian points, while projecting the implicit volumetric representation into StyleGAN to capture high-frequency details and mitigate the over-smoothing commonly observed in neural texture rendering. Experimental outcomes indicate that our method achieves state-of-the-art performance in reenactment, novel view synthesis, and animation.

GaussianStyle: Gaussian Head Avatar via StyleGAN

TL;DR

Abstract

Paper Structure (16 sections, 8 equations, 7 figures, 2 tables)

This paper contains 16 sections, 8 equations, 7 figures, 2 tables.

Introduction
Related Work
Method
Deformable Triplane-Gaussian
Extended StyleGAN based on Gaussian
Training strategy of Volumetric Rendering
Experiment
Implementation Details
Dataset
Baseline Methods
Quantitative Evaluation
Evaluation Metrics
Qualitative Results
Ablation Studies
Applications: 3D Editing and Novel View
...and 1 more sections

Figures (7)

Figure 1: We present GaussianStyle, a novel method designed for high-fidelity volumetric avatar reconstruction from a short monocular video. Our pipeline can be utilized for portrait reenactment, high-fidelity editing, and novel view synthesis.
Figure 2: The proposed Tri-Stage training strategy includes StyleGAN-based Volumetric Rendering. In Stage 1, we construct static coarse canonical Gaussians. In Stage 2, Gaussians are queried from a temporal-aware triplane for attention-based deformation. In Stage 3, we initialize the StyleGAN through multi-view PTI initialization and project dynamic Gaussian prior into StyleGAN for volumetric rendering.
Figure 3: Left: Four regions within a single StyleGAN Block for features manipulation. Mid: Integration to R3 performs the best. R3+R4 does not bring improvement. Right: Blocks 1 to 5 are effective for volumetric projection. The upper refers to the block pruning results.
Figure 4: Our model outperforms other monocular avatar rendering methods in detail such as eyes and teeth.
Figure 5: Other methods are not robust to novel views, expressions, or head poses and thus exhibit noisy point clouds and blurred results.
...and 2 more figures

GaussianStyle: Gaussian Head Avatar via StyleGAN

TL;DR

Abstract

GaussianStyle: Gaussian Head Avatar via StyleGAN

Authors

TL;DR

Abstract

Table of Contents

Figures (7)