Table of Contents
Fetching ...

FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video

Jiawei Zhang, Zijian Wu, Zhiyang Liang, Yicheng Gong, Dongfang Hu, Yao Yao, Xun Cao, Hao Zhu

TL;DR

FATE enables animatable, 360° full-head reconstruction from monocular video by integrating sampling-based densification of Gaussian splats in UV space, neural baking to convert discrete Gaussian attributes into continuous texture maps, and a universal completion framework that leverages SphereHead priors for rear-side appearance. The approach achieves state-of-the-art qualitative and quantitative performance while enabling intuitive texture editing and efficient rendering. Key contributions include (i) a sampling-based densification strategy that provides balanced Gaussian distributions, (ii) BakeNet-based neural baking for texture-level editing, and (iii) a universal completion pipeline that yields complete rear and side views from frontal monocular input. This work advances practical monocular head avatars with editable textures and robust 360° renderability, enabling broader applications in AR/VR, film, and interactive media, albeit with limitations under inconsistent lighting and potential identity shifts in extreme cases.

Abstract

Reconstructing high-fidelity, animatable 3D head avatars from effortlessly captured monocular videos is a pivotal yet formidable challenge. Although significant progress has been made in rendering performance and manipulation capabilities, notable challenges remain, including incomplete reconstruction and inefficient Gaussian representation. To address these challenges, we introduce FATE, a novel method for reconstructing an editable full-head avatar from a single monocular video. FATE integrates a sampling-based densification strategy to ensure optimal positional distribution of points, improving rendering efficiency. A neural baking technique is introduced to convert discrete Gaussian representations into continuous attribute maps, facilitating intuitive appearance editing. Furthermore, we propose a universal completion framework to recover non-frontal appearance, culminating in a 360$^\circ$-renderable 3D head avatar. FATE outperforms previous approaches in both qualitative and quantitative evaluations, achieving state-of-the-art performance. To the best of our knowledge, FATE is the first animatable and 360$^\circ$ full-head monocular reconstruction method for a 3D head avatar.

FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video

TL;DR

FATE enables animatable, 360° full-head reconstruction from monocular video by integrating sampling-based densification of Gaussian splats in UV space, neural baking to convert discrete Gaussian attributes into continuous texture maps, and a universal completion framework that leverages SphereHead priors for rear-side appearance. The approach achieves state-of-the-art qualitative and quantitative performance while enabling intuitive texture editing and efficient rendering. Key contributions include (i) a sampling-based densification strategy that provides balanced Gaussian distributions, (ii) BakeNet-based neural baking for texture-level editing, and (iii) a universal completion pipeline that yields complete rear and side views from frontal monocular input. This work advances practical monocular head avatars with editable textures and robust 360° renderability, enabling broader applications in AR/VR, film, and interactive media, albeit with limitations under inconsistent lighting and potential identity shifts in extreme cases.

Abstract

Reconstructing high-fidelity, animatable 3D head avatars from effortlessly captured monocular videos is a pivotal yet formidable challenge. Although significant progress has been made in rendering performance and manipulation capabilities, notable challenges remain, including incomplete reconstruction and inefficient Gaussian representation. To address these challenges, we introduce FATE, a novel method for reconstructing an editable full-head avatar from a single monocular video. FATE integrates a sampling-based densification strategy to ensure optimal positional distribution of points, improving rendering efficiency. A neural baking technique is introduced to convert discrete Gaussian representations into continuous attribute maps, facilitating intuitive appearance editing. Furthermore, we propose a universal completion framework to recover non-frontal appearance, culminating in a 360-renderable 3D head avatar. FATE outperforms previous approaches in both qualitative and quantitative evaluations, achieving state-of-the-art performance. To the best of our knowledge, FATE is the first animatable and 360 full-head monocular reconstruction method for a 3D head avatar.

Paper Structure

This paper contains 36 sections, 22 equations, 19 figures, 9 tables.

Figures (19)

  • Figure 1: Pipeline. In Stage I, we perform sampling-based densification in Sec. \ref{['sec: densify']} in the UV space and train a Gaussian head avatar using the preprocessed monocular video dataset. The obtained head avatar can optionally use full-head completion in Sec \ref{['sec: aug']} to recover non-frontal regions. In Stage II, given the learned head avatar, we construct a continuous function $f(\mathbf{p})$ in the UV space using U-Net $\mathcal{H}$ and bilinear kernel $\mathcal{B}$, baking the Gaussian attributes into several maps as described in Sec \ref{['sec: baking']}.
  • Figure 2: 3DGS in Monocular Video. (a) In monocular reconstruction, since the sides of the head avatar are rarely supervised, Gaussians tend to grow towards the direction of the rendering camera. (b) This potentially results in position gradient visualizations during training, showing that most of the facial region displays distributions exceeding the threshold $\tau_{\mathrm{pos}}$.
  • Figure 3: Texture Map Visualization. (a) Directly optimizing texture maps often results in significantly low quality, with visible holes and artifacts. (b) In contrast, our neural baking method produces a much smoother and more plausible texture map.
  • Figure 4: Baked Results Visualization. We visualize the color texture map produced by neural baking on different subjects.
  • Figure 5: Completion Framework. A universal framework is proposed to complete the side and rear appearance under monocular settings.
  • ...and 14 more figures