Table of Contents
Fetching ...

Generalizable and Animatable Gaussian Head Avatar

Xuangeng Chu, Tatsuya Harada

TL;DR

The key innovation of this work is the proposed dual-lifting method, which produces high-fidelity 3D Gaussians that capture identity and facial details that can reconstruct unseen identities without specific optimizations and perform reenactment rendering at real-time speeds.

Abstract

In this paper, we propose Generalizable and Animatable Gaussian head Avatar (GAGAvatar) for one-shot animatable head avatar reconstruction. Existing methods rely on neural radiance fields, leading to heavy rendering consumption and low reenactment speeds. To address these limitations, we generate the parameters of 3D Gaussians from a single image in a single forward pass. The key innovation of our work is the proposed dual-lifting method, which produces high-fidelity 3D Gaussians that capture identity and facial details. Additionally, we leverage global image features and the 3D morphable model to construct 3D Gaussians for controlling expressions. After training, our model can reconstruct unseen identities without specific optimizations and perform reenactment rendering at real-time speeds. Experiments show that our method exhibits superior performance compared to previous methods in terms of reconstruction quality and expression accuracy. We believe our method can establish new benchmarks for future research and advance applications of digital avatars. Code and demos are available https://github.com/xg-chu/GAGAvatar.

Generalizable and Animatable Gaussian Head Avatar

TL;DR

The key innovation of this work is the proposed dual-lifting method, which produces high-fidelity 3D Gaussians that capture identity and facial details that can reconstruct unseen identities without specific optimizations and perform reenactment rendering at real-time speeds.

Abstract

In this paper, we propose Generalizable and Animatable Gaussian head Avatar (GAGAvatar) for one-shot animatable head avatar reconstruction. Existing methods rely on neural radiance fields, leading to heavy rendering consumption and low reenactment speeds. To address these limitations, we generate the parameters of 3D Gaussians from a single image in a single forward pass. The key innovation of our work is the proposed dual-lifting method, which produces high-fidelity 3D Gaussians that capture identity and facial details. Additionally, we leverage global image features and the 3D morphable model to construct 3D Gaussians for controlling expressions. After training, our model can reconstruct unseen identities without specific optimizations and perform reenactment rendering at real-time speeds. Experiments show that our method exhibits superior performance compared to previous methods in terms of reconstruction quality and expression accuracy. We believe our method can establish new benchmarks for future research and advance applications of digital avatars. Code and demos are available https://github.com/xg-chu/GAGAvatar.

Paper Structure

This paper contains 23 sections, 5 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: Our method can reconstruct animatable avatars from a single image, offering strong generalization and controllability with real-time reenactment speeds.
  • Figure 2: Our method consists of two branches: a reconstruction branch (Sec. \ref{['sec:31']}) and an expression branch (Sec. \ref{['sec:32']}). We render dual-lifting and expressed Gaussians to get coarse results, and then use a neural renderer to get fine results. Only a small driving part needs to be run repeatedly to drive the expression, while the rest is executed only once.
  • Figure 3: Cross-identity qualitative results on the VFHQ vfhq2022 dataset. Compared with baseline methods, our method has accurate expressions and rich details.
  • Figure 4: Ablation results on VFHQ vfhq2022 datasets. We can see that our full method performs best, especially on facial edges such as glasses in large view angles.
  • Figure 5: Lifting results of an in-the-wild image, include the front view and the top view. Points are filtered by Gaussian opacity. We color two parts of the dual-lifting separately, and the black points are the image plane. It can be seen that the lifted 3D structure is relatively flat without $\mathcal{L}_{lifting}$.
  • ...and 12 more figures