Table of Contents
Fetching ...

HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting

Helisa Dhamo, Yinyu Nie, Arthur Moreau, Jifei Song, Richard Shaw, Yiren Zhou, Eduardo Pérez-Pellitero

TL;DR

HeadGaS addresses the challenge of real-time, photorealistic 3D head animation from monocular video. It extends 3D Gaussian Splats with a per-Gaussian latent feature basis that blends with expression weights to produce frame-specific color and opacity, enabling motion without deforming geometry directly. Across INSTA, NeRFBlendShape, and PointAvatar datasets, HeadGaS achieves up to ~2 dB PSNR improvement and over ×10 rendering speedups, while supporting same-subject, cross-subject expression transfer, and novel view synthesis. Ablation studies validate the necessity of latent-feature blending and the color/opacity modulation approach, though limitations include head-tracker dependence and memory costs for the feature bases, pointing to future improvements in efficiency and robustness.

Abstract

3D head animation has seen major quality and runtime improvements over the last few years, particularly empowered by the advances in differentiable rendering and neural radiance fields. Real-time rendering is a highly desirable goal for real-world applications. We propose HeadGaS, a model that uses 3D Gaussian Splats (3DGS) for 3D head reconstruction and animation. In this paper we introduce a hybrid model that extends the explicit 3DGS representation with a base of learnable latent features, which can be linearly blended with low-dimensional parameters from parametric head models to obtain expression-dependent color and opacity values. We demonstrate that HeadGaS delivers state-of-the-art results in real-time inference frame rates, surpassing baselines by up to 2dB, while accelerating rendering speed by over x10.

HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting

TL;DR

HeadGaS addresses the challenge of real-time, photorealistic 3D head animation from monocular video. It extends 3D Gaussian Splats with a per-Gaussian latent feature basis that blends with expression weights to produce frame-specific color and opacity, enabling motion without deforming geometry directly. Across INSTA, NeRFBlendShape, and PointAvatar datasets, HeadGaS achieves up to ~2 dB PSNR improvement and over ×10 rendering speedups, while supporting same-subject, cross-subject expression transfer, and novel view synthesis. Ablation studies validate the necessity of latent-feature blending and the color/opacity modulation approach, though limitations include head-tracker dependence and memory costs for the feature bases, pointing to future improvements in efficiency and robustness.

Abstract

3D head animation has seen major quality and runtime improvements over the last few years, particularly empowered by the advances in differentiable rendering and neural radiance fields. Real-time rendering is a highly desirable goal for real-world applications. We propose HeadGaS, a model that uses 3D Gaussian Splats (3DGS) for 3D head reconstruction and animation. In this paper we introduce a hybrid model that extends the explicit 3DGS representation with a base of learnable latent features, which can be linearly blended with low-dimensional parameters from parametric head models to obtain expression-dependent color and opacity values. We demonstrate that HeadGaS delivers state-of-the-art results in real-time inference frame rates, surpassing baselines by up to 2dB, while accelerating rendering speed by over x10.
Paper Structure (32 sections, 11 equations, 14 figures, 3 tables)

This paper contains 32 sections, 11 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Overview of HeadGaS. We reconstruct a 3D head based on an expression-aware 3D Gaussian cloud representation, which results in real-time rendering and high image quality. Left: The model is trained with a monocular video of a moving head. At inference, we query the model with a novel sequence of poses and expression parameters to render a real-time video. Right: Rendering speed (fps in logarithmic scale) vs PSNR plot comparing different methods. The circle radius indicates training time.
  • Figure 2: Motion modelling via opacity change.Left: Two example frames $i$ and $j$ rendered by HeadGaS. Right: Rendering of opacity difference $\alpha_j - \alpha_i$ (blue: Gaussians with an opacity decrease; red: Gaussians with an opacity increase; colors close to white: minor change, static regions). We observe a strong opacity increase in dynamic areas, e.g. lower chin Gaussians turn opaque as the jaw fully opens.
  • Figure 3: HeadGaS pipeline. We represent 3D space as a set of feature-enhanced 3D Gaussians. Every Gaussian contains a feature basis $\bm{F}$ that can be blended via the expression vector to obtain a frame specific feature $\bm{f}_i$. The frame specific feature is fed to an MLP $\phi(\cdot)$ alongside position $\bm{\mu}$ to obtain expression-dependent color $\bm{c}_i$ and opacity $\bm{\alpha}_i$. Finally, $\bm{c}_i$ and $\bm{\alpha}_i$ are fed to the rasterizer alongside other Gaussian parameters like rotation $R$, scale $S$ and position $\bm{\mu}$ to render the image.
  • Figure 4: Qualitative evaluation comparing the proposed model against INSTA INSTA:CVPR2023, PointAvatar Zheng2023pointavatar and NeRFBlendShape Gao2022nerfblendshape baselines, namely on the a) INSTA data, b) NBS data and c) PointAvatar data. The close-ups on the right of each example highlight our method's ability to capture details like teeth, wrinkles and reflections.
  • Figure 5: Qualitative ablation on the INSTA dataset
  • ...and 9 more figures