Table of Contents
Fetching ...

HeadGAP: Few-Shot 3D Head Avatar via Generalizable Gaussian Priors

Xiaozheng Zheng, Chao Wen, Zhaohu Li, Weiyi Zhang, Zhuo Su, Xu Chang, Yang Zhao, Zheng Lv, Xiaoyuan Zhang, Yongjie Zhang, Guidong Wang, Lan Xu

TL;DR

HeadGAP presents a two-phase framework for few-shot 3D head avatar creation by learning generalizable 3D Gaussian priors from large-scale multi-view data and applying them through a Gaussian Prior Network (GAPNet) with part-based dynamic modeling. Personalization from limited inputs is achieved via inversion and targeted fine-tuning on a 3D Gaussian Splatting representation, followed by CNN-based refinement to deliver photo-realistic rendering and stable animations. The approach demonstrates strong performance on NeRSemble and in-the-wild data, outperforming prior methods in both fidelity and view-consistency, and enables intuitive head editing and editing operations. This work advances practical 3D avatar generation by reducing data requirements while maintaining high realism and robust animation, with implications for AR/VR, content creation, and telepresence.

Abstract

In this paper, we present a novel 3D head avatar creation approach capable of generalizing from few-shot in-the-wild data with high-fidelity and animatable robustness. Given the underconstrained nature of this problem, incorporating prior knowledge is essential. Therefore, we propose a framework comprising prior learning and avatar creation phases. The prior learning phase leverages 3D head priors derived from a large-scale multi-view dynamic dataset, and the avatar creation phase applies these priors for few-shot personalization. Our approach effectively captures these priors by utilizing a Gaussian Splatting-based auto-decoder network with part-based dynamic modeling. Our method employs identity-shared encoding with personalized latent codes for individual identities to learn the attributes of Gaussian primitives. During the avatar creation phase, we achieve fast head avatar personalization by leveraging inversion and fine-tuning strategies. Extensive experiments demonstrate that our model effectively exploits head priors and successfully generalizes them to few-shot personalization, achieving photo-realistic rendering quality, multi-view consistency, and stable animation.

HeadGAP: Few-Shot 3D Head Avatar via Generalizable Gaussian Priors

TL;DR

HeadGAP presents a two-phase framework for few-shot 3D head avatar creation by learning generalizable 3D Gaussian priors from large-scale multi-view data and applying them through a Gaussian Prior Network (GAPNet) with part-based dynamic modeling. Personalization from limited inputs is achieved via inversion and targeted fine-tuning on a 3D Gaussian Splatting representation, followed by CNN-based refinement to deliver photo-realistic rendering and stable animations. The approach demonstrates strong performance on NeRSemble and in-the-wild data, outperforming prior methods in both fidelity and view-consistency, and enables intuitive head editing and editing operations. This work advances practical 3D avatar generation by reducing data requirements while maintaining high realism and robust animation, with implications for AR/VR, content creation, and telepresence.

Abstract

In this paper, we present a novel 3D head avatar creation approach capable of generalizing from few-shot in-the-wild data with high-fidelity and animatable robustness. Given the underconstrained nature of this problem, incorporating prior knowledge is essential. Therefore, we propose a framework comprising prior learning and avatar creation phases. The prior learning phase leverages 3D head priors derived from a large-scale multi-view dynamic dataset, and the avatar creation phase applies these priors for few-shot personalization. Our approach effectively captures these priors by utilizing a Gaussian Splatting-based auto-decoder network with part-based dynamic modeling. Our method employs identity-shared encoding with personalized latent codes for individual identities to learn the attributes of Gaussian primitives. During the avatar creation phase, we achieve fast head avatar personalization by leveraging inversion and fine-tuning strategies. Extensive experiments demonstrate that our model effectively exploits head priors and successfully generalizes them to few-shot personalization, achieving photo-realistic rendering quality, multi-view consistency, and stable animation.
Paper Structure (26 sections, 10 equations, 21 figures, 3 tables)

This paper contains 26 sections, 10 equations, 21 figures, 3 tables.

Figures (21)

  • Figure 1: We present HeadGAP to create photo-realistic animatable 3D head avatars from only a few or even one image of the target person. Firstly, we utilize large-scale 3D data to learn 3D head prior with our designed 3D Gaussian head prior model. Secondly, we can use few-shot data to create 3D animatable avatars. Finally, we can animate the few-shot avatars with novel expressions.
  • Figure 2: HeadGAP framework. The prior learning phase uses different IDs' data to embed head priors into the GAPNet. The personalization phase firstly optimizes identity codes to obtain the inverted avatar, then updates the GAPNet to get the fine-tuned avatar.
  • Figure 3: Illustration of the GAPNet. Given the tracked meshes of the input images, GAPNet binds part-based Gaussian primitives with initialized features to the mesh. Then, it employs part-specific modules to predict the local attributes of each primitive. The local attributes are transformed into global ones for 3DGS rendering. Finally, the renderings are fed into the CNN to obtain the final rendered images.
  • Figure 4: Qualitative comparisons of our approach against state-of-the-art methods using a single image as input.
  • Figure 5: Qualitative comparisons of our approach against state-of-the-art methods using few-shot input.
  • ...and 16 more figures