FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding

Jun Xiang; Xuan Gao; Yudong Guo; Juyong Zhang

FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding

Jun Xiang, Xuan Gao, Yudong Guo, Juyong Zhang

TL;DR

A uniform 3D Gaussian field embedded in the surface of a parametric face model is maintained and extra spatial offset is learned to model non-surface regions and subtle facial details to enable super-fast rendering speed.

Abstract

We propose FlashAvatar, a novel and lightweight 3D animatable avatar representation that could reconstruct a digital avatar from a short monocular video sequence in minutes and render high-fidelity photo-realistic images at 300FPS on a consumer-grade GPU. To achieve this, we maintain a uniform 3D Gaussian field embedded in the surface of a parametric face model and learn extra spatial offset to model non-surface regions and subtle facial details. While full use of geometric priors can capture high-frequency facial details and preserve exaggerated expressions, proper initialization can help reduce the number of Gaussians, thus enabling super-fast rendering speed. Extensive experimental results demonstrate that FlashAvatar outperforms existing works regarding visual quality and personalized details and is almost an order of magnitude faster in rendering speed. Project page: https://ustc3dv.github.io/FlashAvatar/

FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding

TL;DR

Abstract

Paper Structure (25 sections, 9 equations, 16 figures, 2 tables)

This paper contains 25 sections, 9 equations, 16 figures, 2 tables.

Introduction
Related Work
Digital Head Model
Scene representations with 3D-GS
Radiance field acceleration
Background
Methods
Surface-embedded Gaussian Initialization
Gaussian Offset
Training Scheme
Implementation Details
Experiments
Dataset
Comparison with Representative Methods
Comparison with C + D strategy
...and 10 more sections

Figures (16)

Figure 1: Given a monocular video sequence, our proposed FlashAvatar can reconstruct a high-fidelity digital avatar in minutes which can be animated and rendered over 300FPS at the resolution of $512\times 512$ with an Nvidia RTX 3090.
Figure 2: Initialization in UV space corresponds to a more uniform Gaussian position distribution, which could model full head details better. We only sample points in the head region, including neck, so the number of sample vertices is smaller than FLAME vertice number 5023.
Figure 3: Overview. We initially maintain the 3D Gaussian field in 2D UV space and embed them into dynamic FLAME mesh surfaces through mesh rasterization. For every surface-embedded 3D Gaussian, the offset network takes tracked expression code and the corresponding position of the Gaussian center on canonical mesh as input, outputs the spatial offset, including position, rotation, and scaling deformation. The deformed Gaussians are then splatted to render the image with a given pose.
Figure 4: To well model interior mouth, we close the mouth cavity of FLAME mesh with additional faces and broaden up corresponding area on UV map.
Figure 5: Qualitative comparisons with state-of-the-art head avatar reconstruction methods. Our model well reconstructs facial details, thin structures, and subtle expressions while achieving a remarkable rendering speed over 300FPS.
...and 11 more figures

FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding

TL;DR

Abstract

FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding

Authors

TL;DR

Abstract

Table of Contents

Figures (16)