Table of Contents
Fetching ...

GPHM: Gaussian Parametric Head Model for Monocular Head Avatar Reconstruction

Yuelang Xu, Zhaoqi Su, Qingyao Wu, Yebin Liu

TL;DR

The 3D Gaussian Parametric Head Model is introduced, which employs 3D Gaussians to accurately represent the complexities of the human head, allowing precise control over both identity and expression, and achieves high-quality, photo-realistic rendering with real-time efficiency.

Abstract

Creating high-fidelity 3D human head avatars is crucial for applications in VR/AR, digital human, and film production. Recent advances have leveraged morphable face models to generate animated head avatars from easily accessible data, representing varying identities and expressions within a low-dimensional parametric space. However, existing methods often struggle with modeling complex appearance details, e.g., hairstyles, and suffer from low rendering quality and efficiency. In this paper we introduce a novel approach, 3D Gaussian Parametric Head Model, which employs 3D Gaussians to accurately represent the complexities of the human head, allowing precise control over both identity and expression. The Gaussian model can handle intricate details, enabling realistic representations of varying appearances and complex expressions. Furthermore, we presents a well-designed training framework to ensure smooth convergence, providing a robust guarantee for learning the rich content. Our method achieves high-quality, photo-realistic rendering with real-time efficiency, making it a valuable contribution to the field of parametric head models. Finally, we apply the 3D Gaussian Parametric Head Model to monocular video or few-shot head avatar reconstruction tasks, which enables instant reconstruction of high-quality 3D head avatars even when input data is extremely limited, surpassing previous methods in terms of reconstruction quality and training speed.

GPHM: Gaussian Parametric Head Model for Monocular Head Avatar Reconstruction

TL;DR

The 3D Gaussian Parametric Head Model is introduced, which employs 3D Gaussians to accurately represent the complexities of the human head, allowing precise control over both identity and expression, and achieves high-quality, photo-realistic rendering with real-time efficiency.

Abstract

Creating high-fidelity 3D human head avatars is crucial for applications in VR/AR, digital human, and film production. Recent advances have leveraged morphable face models to generate animated head avatars from easily accessible data, representing varying identities and expressions within a low-dimensional parametric space. However, existing methods often struggle with modeling complex appearance details, e.g., hairstyles, and suffer from low rendering quality and efficiency. In this paper we introduce a novel approach, 3D Gaussian Parametric Head Model, which employs 3D Gaussians to accurately represent the complexities of the human head, allowing precise control over both identity and expression. The Gaussian model can handle intricate details, enabling realistic representations of varying appearances and complex expressions. Furthermore, we presents a well-designed training framework to ensure smooth convergence, providing a robust guarantee for learning the rich content. Our method achieves high-quality, photo-realistic rendering with real-time efficiency, making it a valuable contribution to the field of parametric head models. Finally, we apply the 3D Gaussian Parametric Head Model to monocular video or few-shot head avatar reconstruction tasks, which enables instant reconstruction of high-quality 3D head avatars even when input data is extremely limited, surpassing previous methods in terms of reconstruction quality and training speed.
Paper Structure (14 sections, 12 equations, 14 figures, 3 tables)

This paper contains 14 sections, 12 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: We utilize hybrid datasets comprising captured multi-view video data and rendered image data from 3D scans for training our model. The trained model can be manipulated using decoupled identity and expression codes to produce a diverse array of high-fidelity head models. When presented with an image, our model can be adjusted to reconstruct the portrait in the image and edit the expression according to any other desired expressions.
  • Figure 2: The overview of our GPHM model. Our training strategy can be divided into a Guiding Geometry Model for initialization, and a final 3D Gaussian Parametric Head Model. Deformations of each model are further decoupled into identity-related, expression-related and non-face deformations. For the expression condition images, we input crop groundtruth face image or synthesized images via LivePortrait guo2024liveportrait. For the non-face motion condition, we input groundtruth images with the face area masked. The renderer involves a convolutional refine network $\boldsymbol{\Psi}$, which finally transfers the feature maps from mesh/Gaussian renderer to fine portrait images. During inference, our output exclusively comes from the Gaussian model.
  • Figure 3: We generate additional expression condition images via LivePortrait guo2024liveportrait for training the appearance decoupled expression encoder.
  • Figure 4: The pipeline of head avatar reconstruction from monocular videos. First, we optimize the identity code $\boldsymbol{z}_{id}$ to coarsely fit the GPHM model to the input video. Then we directly finetune the 3D Gaussian attributes and the motion-related networks in the GPHM for a fine-grained head avatar. The flame chart in the figure marks the parameters that need to be optimized.
  • Figure 5: We generate the head models with randomly sampled identity codes and expression codes as conditions. Each row corresponds to the same identity code, and each column corresponds to the same expression code.
  • ...and 9 more figures