Table of Contents
Fetching ...

Gaussian Eigen Models for Human Heads

Wojciech Zielonka, Timo Bolkart, Thabo Beeler, Justus Thies

TL;DR

The paper addresses the trade-off between rendering realism and computational efficiency in personalized head avatars. It introduces Gaussian Eigen Models (GEM), a mesh-free, linear appearance model that distills high-quality CNN-based Gaussian avatars into subject-specific ensembles of PCA bases for Gaussian primitives, enabling compact coefficients to drive appearance. An image-based regressor maps a single RGB image to GEM coefficients, and rendering is performed via 3D Gaussian Splatting, allowing real-time self- and cross-person reenactment with improved visual fidelity. Extensive experiments on NeRSemble data show GEM surpasses baselines in novel-view and novel-expression quality while maintaining low memory and fast rendering, with ablations illustrating favorable quality/size tradeoffs. The work offers practical implications for efficient, high-quality digital humans on commodity devices and enables future extensions to localized control and cross-subject modeling.

Abstract

Current personalized neural head avatars face a trade-off: lightweight models lack detail and realism, while high-quality, animatable avatars require significant computational resources, making them unsuitable for commodity devices. To address this gap, we introduce Gaussian Eigen Models (GEM), which provide high-quality, lightweight, and easily controllable head avatars. GEM utilizes 3D Gaussian primitives for representing the appearance combined with Gaussian splatting for rendering. Building on the success of mesh-based 3D morphable face models (3DMM), we define GEM as an ensemble of linear eigenbases for representing the head appearance of a specific subject. In particular, we construct linear bases to represent the position, scale, rotation, and opacity of the 3D Gaussians. This allows us to efficiently generate Gaussian primitives of a specific head shape by a linear combination of the basis vectors, only requiring a low-dimensional parameter vector that contains the respective coefficients. We propose to construct these linear bases (GEM) by distilling high-quality compute-intense CNN-based Gaussian avatar models that can generate expression-dependent appearance changes like wrinkles. These high-quality models are trained on multi-view videos of a subject and are distilled using a series of principal component analyses. Once we have obtained the bases that represent the animatable appearance space of a specific human, we learn a regressor that takes a single RGB image as input and predicts the low-dimensional parameter vector that corresponds to the shown facial expression. In a series of experiments, we compare GEM's self-reenactment and cross-person reenactment results to state-of-the-art 3D avatar methods, demonstrating GEM's higher visual quality and better generalization to new expressions.

Gaussian Eigen Models for Human Heads

TL;DR

The paper addresses the trade-off between rendering realism and computational efficiency in personalized head avatars. It introduces Gaussian Eigen Models (GEM), a mesh-free, linear appearance model that distills high-quality CNN-based Gaussian avatars into subject-specific ensembles of PCA bases for Gaussian primitives, enabling compact coefficients to drive appearance. An image-based regressor maps a single RGB image to GEM coefficients, and rendering is performed via 3D Gaussian Splatting, allowing real-time self- and cross-person reenactment with improved visual fidelity. Extensive experiments on NeRSemble data show GEM surpasses baselines in novel-view and novel-expression quality while maintaining low memory and fast rendering, with ablations illustrating favorable quality/size tradeoffs. The work offers practical implications for efficient, high-quality digital humans on commodity devices and enables future extensions to localized control and cross-subject modeling.

Abstract

Current personalized neural head avatars face a trade-off: lightweight models lack detail and realism, while high-quality, animatable avatars require significant computational resources, making them unsuitable for commodity devices. To address this gap, we introduce Gaussian Eigen Models (GEM), which provide high-quality, lightweight, and easily controllable head avatars. GEM utilizes 3D Gaussian primitives for representing the appearance combined with Gaussian splatting for rendering. Building on the success of mesh-based 3D morphable face models (3DMM), we define GEM as an ensemble of linear eigenbases for representing the head appearance of a specific subject. In particular, we construct linear bases to represent the position, scale, rotation, and opacity of the 3D Gaussians. This allows us to efficiently generate Gaussian primitives of a specific head shape by a linear combination of the basis vectors, only requiring a low-dimensional parameter vector that contains the respective coefficients. We propose to construct these linear bases (GEM) by distilling high-quality compute-intense CNN-based Gaussian avatar models that can generate expression-dependent appearance changes like wrinkles. These high-quality models are trained on multi-view videos of a subject and are distilled using a series of principal component analyses. Once we have obtained the bases that represent the animatable appearance space of a specific human, we learn a regressor that takes a single RGB image as input and predicts the low-dimensional parameter vector that corresponds to the shown facial expression. In a series of experiments, we compare GEM's self-reenactment and cross-person reenactment results to state-of-the-art 3D avatar methods, demonstrating GEM's higher visual quality and better generalization to new expressions.
Paper Structure (27 sections, 8 equations, 23 figures, 7 tables)

This paper contains 27 sections, 8 equations, 23 figures, 7 tables.

Figures (23)

  • Figure 1: Given a multi-view video of a subject and mesh tracking, we create a dataset of 3D Gaussian point clouds for each frame in the sequence. Using this data, we distill a high-quality Gaussian Eigen Model (GEM). GEM is an ensemble of linear bases for each Gaussian primitive modality: position, opacity, scale, and rotation. Based on these bases, facial appearances are generated by a linear combination.
  • Figure 2: Samples of a GEM. We display samples for the first three components of the position $\mathbf{k}_\phi$ eigenbasis of a GEM, showing diverse expressions. Note that GEM requires no parametric 3D face model like FLAMEFLAME:SiggraphAsia2017.
  • Figure 3: Image-based animation. One of the applications of our GEM is real-time (cross)-reenactment. For that, we utilize generalized features from EMOCA Danek2022EMOCAED and build a pipeline to regress the PCA coefficients of our model from an input image/video.
  • Figure 4: Novel view synthesis. Both, our CNN and GEM show better performance on novel views, especially, in the region of the mouth interior and wrinkles. In this experiment, we are following the evaluation of Gaussian Avatars Qian2024gaussianavatars and demonstrate novel viewpoint generation. GEM is obtained throughout analysis-by-synthesis fitting Blanz1999AMMThies2016Face2FaceRF. Note that the expressions are seen during training.
  • Figure 5: Novel view and expression synthesis. Our Gaussian Eigen Models for Human Heads shows better results in regions like teeth, wrinkles, and self-shadows compared to other methods that struggle with artifacts.
  • ...and 18 more figures