Table of Contents
Fetching ...

Gaussian Deja-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities

Peizhi Yan, Rabab Ward, Qiang Tang, Shan Du

TL;DR

The “Gaussian Déjà-vu” framework is introduced, which first obtains a generalized model of the head avatar and then personalizes the result, which outperforms state-of-the-art 3D Gaussian head avatars in terms of photorealistic quality and reduces training time consumption to at least a quarter of the existing methods.

Abstract

Recent advancements in 3D Gaussian Splatting (3DGS) have unlocked significant potential for modeling 3D head avatars, providing greater flexibility than mesh-based methods and more efficient rendering compared to NeRF-based approaches. Despite these advancements, the creation of controllable 3DGS-based head avatars remains time-intensive, often requiring tens of minutes to hours. To expedite this process, we here introduce the "Gaussian Deja-vu" framework, which first obtains a generalized model of the head avatar and then personalizes the result. The generalized model is trained on large 2D (synthetic and real) image datasets. This model provides a well-initialized 3D Gaussian head that is further refined using a monocular video to achieve the personalized head avatar. For personalizing, we propose learnable expression-aware rectification blendmaps to correct the initial 3D Gaussians, ensuring rapid convergence without the reliance on neural networks. Experiments demonstrate that the proposed method meets its objectives. It outperforms state-of-the-art 3D Gaussian head avatars in terms of photorealistic quality as well as reduces training time consumption to at least a quarter of the existing methods, producing the avatar in minutes.

Gaussian Deja-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities

TL;DR

The “Gaussian Déjà-vu” framework is introduced, which first obtains a generalized model of the head avatar and then personalizes the result, which outperforms state-of-the-art 3D Gaussian head avatars in terms of photorealistic quality and reduces training time consumption to at least a quarter of the existing methods.

Abstract

Recent advancements in 3D Gaussian Splatting (3DGS) have unlocked significant potential for modeling 3D head avatars, providing greater flexibility than mesh-based methods and more efficient rendering compared to NeRF-based approaches. Despite these advancements, the creation of controllable 3DGS-based head avatars remains time-intensive, often requiring tens of minutes to hours. To expedite this process, we here introduce the "Gaussian Deja-vu" framework, which first obtains a generalized model of the head avatar and then personalizes the result. The generalized model is trained on large 2D (synthetic and real) image datasets. This model provides a well-initialized 3D Gaussian head that is further refined using a monocular video to achieve the personalized head avatar. For personalizing, we propose learnable expression-aware rectification blendmaps to correct the initial 3D Gaussians, ensuring rapid convergence without the reliance on neural networks. Experiments demonstrate that the proposed method meets its objectives. It outperforms state-of-the-art 3D Gaussian head avatars in terms of photorealistic quality as well as reduces training time consumption to at least a quarter of the existing methods, producing the avatar in minutes.
Paper Structure (17 sections, 11 equations, 9 figures, 4 tables)

This paper contains 17 sections, 11 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Gaussian Déjà-vu first trains a reconstruction model on large face image datasets and serves as a generalized base. This model initializes the 3D Gaussian head, which is then optimized to personalize the avatar to match the person in the video.
  • Figure 2: Detailed flowcharts for our single-image-based reconstruction (a) and monocular-video-based further optimization (b) processes.
  • Figure 3: Qualitative comparison results on single-image-based reconstruction.
  • Figure 4: Qualitative comparison with HeadNeRF across varying viewing angles. Our method works even at extreme angles.
  • Figure 5: Expression reconstruction results.
  • ...and 4 more figures