Table of Contents
Fetching ...

PSAvatar: A Point-based Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting

Zhongyuan Zhao, Zhenyu Bao, Qing Li, Guoping Qiu, Kanglin Liu

TL;DR

PSAvatar is introduced, a novel framework for animatable head avatar creation that utilizes discrete geometric primitive to create a parametric morphable shape model and employs 3D Gaussian for fine detail representation and high fidelity rendering.

Abstract

Despite much progress, achieving real-time high-fidelity head avatar animation is still difficult and existing methods have to trade-off between speed and quality. 3DMM based methods often fail to model non-facial structures such as eyeglasses and hairstyles, while neural implicit models suffer from deformation inflexibility and rendering inefficiency. Although 3D Gaussian has been demonstrated to possess promising capability for geometry representation and radiance field reconstruction, applying 3D Gaussian in head avatar creation remains a major challenge since it is difficult for 3D Gaussian to model the head shape variations caused by changing poses and expressions. In this paper, we introduce PSAvatar, a novel framework for animatable head avatar creation that utilizes discrete geometric primitive to create a parametric morphable shape model and employs 3D Gaussian for fine detail representation and high fidelity rendering. The parametric morphable shape model is a Point-based Morphable Shape Model (PMSM) which uses points instead of meshes for 3D representation to achieve enhanced representation flexibility. The PMSM first converts the FLAME mesh to points by sampling on the surfaces as well as off the meshes to enable the reconstruction of not only surface-like structures but also complex geometries such as eyeglasses and hairstyles. By aligning these points with the head shape in an analysis-by-synthesis manner, the PMSM makes it possible to utilize 3D Gaussian for fine detail representation and appearance modeling, thus enabling the creation of high-fidelity avatars. We show that PSAvatar can reconstruct high-fidelity head avatars of a variety of subjects and the avatars can be animated in real-time ($\ge$ 25 fps at a resolution of 512 $\times$ 512 ).

PSAvatar: A Point-based Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting

TL;DR

PSAvatar is introduced, a novel framework for animatable head avatar creation that utilizes discrete geometric primitive to create a parametric morphable shape model and employs 3D Gaussian for fine detail representation and high fidelity rendering.

Abstract

Despite much progress, achieving real-time high-fidelity head avatar animation is still difficult and existing methods have to trade-off between speed and quality. 3DMM based methods often fail to model non-facial structures such as eyeglasses and hairstyles, while neural implicit models suffer from deformation inflexibility and rendering inefficiency. Although 3D Gaussian has been demonstrated to possess promising capability for geometry representation and radiance field reconstruction, applying 3D Gaussian in head avatar creation remains a major challenge since it is difficult for 3D Gaussian to model the head shape variations caused by changing poses and expressions. In this paper, we introduce PSAvatar, a novel framework for animatable head avatar creation that utilizes discrete geometric primitive to create a parametric morphable shape model and employs 3D Gaussian for fine detail representation and high fidelity rendering. The parametric morphable shape model is a Point-based Morphable Shape Model (PMSM) which uses points instead of meshes for 3D representation to achieve enhanced representation flexibility. The PMSM first converts the FLAME mesh to points by sampling on the surfaces as well as off the meshes to enable the reconstruction of not only surface-like structures but also complex geometries such as eyeglasses and hairstyles. By aligning these points with the head shape in an analysis-by-synthesis manner, the PMSM makes it possible to utilize 3D Gaussian for fine detail representation and appearance modeling, thus enabling the creation of high-fidelity avatars. We show that PSAvatar can reconstruct high-fidelity head avatars of a variety of subjects and the avatars can be animated in real-time ( 25 fps at a resolution of 512 512 ).
Paper Structure (13 sections, 15 equations, 9 figures, 2 tables)

This paper contains 13 sections, 15 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: PSAvatar learns the shape with pose and expression variations based on a point-based shape model, and employs 3D Gaussian for fine detail representation and efficient rendering. Given monocular portrait videos, PSAvatar can create head avatars that enable real-time ($\ge$ 25 fps at 512 $\times$ 512 resolution) and high-fidelity rendering.
  • Figure 2: Overview. Given a monocular portrait video, we conduct FLAME tracking to obtain the parameters. The point-based shape model (PSM) first converts the FLAME mesh to points. It performs sampling on the surfaces (blue points) and additionally generates samples off the meshes by offsetting the samples on the meshes along their normal directions (black points). These points are then aligned with the head shape in an analysis-by-synthesis manner. The inclusion of points on meshes and off meshes enables the PSM to reconstruct not only surface-like structures but also complex geometries that are beyond the capability of 3DMMs. Combining the PSM with 3D Gaussian allows the reconstruction of the radiance field for efficient rendering.
  • Figure 3: Shape variations for given poses and expressions. The reference images (taken from subject 2) on the left provide the pose and expression parameters, and the Point-based Shape Model (PSM) can warp the points in a way that is consistent with the reference, i.e. the reference person turns his head around, the points follow the movements. Blue and black represent the points on and off the mesh respectively. To visualize the shape variation in a better way, points sampled based on the eye, nose and mouth regions are colored with pink, red and green, respectively.
  • Figure 4: Visualization of each component in PSAvatar. (a) shows the learned point-based shape model. (b) visualizes the 3D Gaussian, which shows improved representation flexibility over PSM. (c) and (d) are the rendered and ground truth image, respectively.
  • Figure 5: Qualitative comparison on subject 1-6 (from top to bottom). PSAvatar shows improved performances over strong baselines in capturing fine details such as hair strands, teeth, glasses, etc.
  • ...and 4 more figures