Table of Contents
Fetching ...

HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars

Gent Serifi, Marcel C. Buehler

Abstract

We introduce HyperGaussians, a novel extension of 3D Gaussian Splatting for high-quality animatable face avatars. Creating such detailed face avatars from videos is a challenging problem and has numerous applications in augmented and virtual reality. While tremendous successes have been achieved for static faces, animatable avatars from monocular videos still fall in the uncanny valley. The de facto standard, 3D Gaussian Splatting (3DGS), represents a face through a collection of 3D Gaussian primitives. 3DGS excels at rendering static faces, but the state-of-the-art still struggles with nonlinear deformations, complex lighting effects, and fine details. While most related works focus on predicting better Gaussian parameters from expression codes, we rethink the 3D Gaussian representation itself and how to make it more expressive. Our insights lead to a novel extension of 3D Gaussians to high-dimensional multivariate Gaussians, dubbed 'HyperGaussians'. The higher dimensionality increases expressivity through conditioning on a learnable local embedding. However, splatting HyperGaussians is computationally expensive because it requires inverting a high-dimensional covariance matrix. We solve this by reparameterizing the covariance matrix, dubbed the 'inverse covariance trick'. This trick boosts the efficiency so that HyperGaussians can be seamlessly integrated into existing models. To demonstrate this, we plug in HyperGaussians into the state-of-the-art in fast monocular face avatars: FlashAvatar. Our evaluation on 19 subjects from 4 face datasets shows that HyperGaussians outperform 3DGS numerically and visually, particularly for high-frequency details like eyeglass frames, teeth, complex facial movements, and specular reflections.

HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars

Abstract

We introduce HyperGaussians, a novel extension of 3D Gaussian Splatting for high-quality animatable face avatars. Creating such detailed face avatars from videos is a challenging problem and has numerous applications in augmented and virtual reality. While tremendous successes have been achieved for static faces, animatable avatars from monocular videos still fall in the uncanny valley. The de facto standard, 3D Gaussian Splatting (3DGS), represents a face through a collection of 3D Gaussian primitives. 3DGS excels at rendering static faces, but the state-of-the-art still struggles with nonlinear deformations, complex lighting effects, and fine details. While most related works focus on predicting better Gaussian parameters from expression codes, we rethink the 3D Gaussian representation itself and how to make it more expressive. Our insights lead to a novel extension of 3D Gaussians to high-dimensional multivariate Gaussians, dubbed 'HyperGaussians'. The higher dimensionality increases expressivity through conditioning on a learnable local embedding. However, splatting HyperGaussians is computationally expensive because it requires inverting a high-dimensional covariance matrix. We solve this by reparameterizing the covariance matrix, dubbed the 'inverse covariance trick'. This trick boosts the efficiency so that HyperGaussians can be seamlessly integrated into existing models. To demonstrate this, we plug in HyperGaussians into the state-of-the-art in fast monocular face avatars: FlashAvatar. Our evaluation on 19 subjects from 4 face datasets shows that HyperGaussians outperform 3DGS numerically and visually, particularly for high-frequency details like eyeglass frames, teeth, complex facial movements, and specular reflections.

Paper Structure

This paper contains 40 sections, 15 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: We propose an expressive extension to 3D Gaussians, dubbed HyperGaussians, and plug them into existing methods for face avatars, FlashAvatar xiang2024flashavatar and GaussianHeadAvatar xu2024gaussianheadavatar. FlashAvatar modulates 3D Gaussian primitives with expression-dependent offsets $\Delta$. We make a single modification to the pipeline: plugging HyperGaussians (\ref{['ssec:multivariate_gaussians']}) in between the MLP output and the rasterization, which modifies the offsets $\Delta$ in higher dimensions. Instead of directly predicting offsets $\Delta$, we predict a latent $z_\psi$ that conditions HyperGaussians. Without any other modifications or hyperparameter tuning, this simple change leads to a performance boost in rendering high-frequency details in the final avatar (\ref{['tab:quantitative_comparison']} and \ref{['fig:qualitative_comparison']}). This figure has been adapted from FlashAvatar xiang2024flashavatar.
  • Figure 2: Benchmark Results on conditioning for ${\sim}15$k HyperGaussians with attribute dimension $m = 3$ (e.g., position) and varying latent dimension $n$. We average the measurements across 1000 runs on an NVIDIA GeForce RTX 2080 Ti after an initial warm-up. The benchmark code performs one forward and one backward pass.
  • Figure 3: Qualitative Comparison with FlashAvatar xiang2024flashavatar, MonoGaussianAvatar chen2024monogaussianavatar, and SplattingAvatar shao2024splattingavatar. Ours achieves high-quality details for thin structures (glass frames and teeth in the top row), specular reflections (eyes in the third row), and gracefully handles complex deformations (mouth in the second and fourth row).
  • Figure 4: Cross-reenactment Comparison with FlashAvatar xiang2024flashavatar, MonoGaussianAvatar chen2024monogaussianavatar, and SplattingAvatar shao2024splattingavatar. Ours preserves fine details in the teeth and the overall shape of the subject. Please see the supplementary HTML page for more cross-reenactment results.
  • Figure 5: Enhancing GaussianHeadAvatars xu2024gaussianheadavatar with HyperGaussians boosts high-frequency details like wrinkles and reflections in the eyes, glasses, and teeth. See \ref{['tab:quantitative_comparison']} for metrics.
  • ...and 6 more figures