Gaussian Eigen Models for Human Heads
Wojciech Zielonka, Timo Bolkart, Thabo Beeler, Justus Thies
TL;DR
The paper addresses the trade-off between rendering realism and computational efficiency in personalized head avatars. It introduces Gaussian Eigen Models (GEM), a mesh-free, linear appearance model that distills high-quality CNN-based Gaussian avatars into subject-specific ensembles of PCA bases for Gaussian primitives, enabling compact coefficients to drive appearance. An image-based regressor maps a single RGB image to GEM coefficients, and rendering is performed via 3D Gaussian Splatting, allowing real-time self- and cross-person reenactment with improved visual fidelity. Extensive experiments on NeRSemble data show GEM surpasses baselines in novel-view and novel-expression quality while maintaining low memory and fast rendering, with ablations illustrating favorable quality/size tradeoffs. The work offers practical implications for efficient, high-quality digital humans on commodity devices and enables future extensions to localized control and cross-subject modeling.
Abstract
Current personalized neural head avatars face a trade-off: lightweight models lack detail and realism, while high-quality, animatable avatars require significant computational resources, making them unsuitable for commodity devices. To address this gap, we introduce Gaussian Eigen Models (GEM), which provide high-quality, lightweight, and easily controllable head avatars. GEM utilizes 3D Gaussian primitives for representing the appearance combined with Gaussian splatting for rendering. Building on the success of mesh-based 3D morphable face models (3DMM), we define GEM as an ensemble of linear eigenbases for representing the head appearance of a specific subject. In particular, we construct linear bases to represent the position, scale, rotation, and opacity of the 3D Gaussians. This allows us to efficiently generate Gaussian primitives of a specific head shape by a linear combination of the basis vectors, only requiring a low-dimensional parameter vector that contains the respective coefficients. We propose to construct these linear bases (GEM) by distilling high-quality compute-intense CNN-based Gaussian avatar models that can generate expression-dependent appearance changes like wrinkles. These high-quality models are trained on multi-view videos of a subject and are distilled using a series of principal component analyses. Once we have obtained the bases that represent the animatable appearance space of a specific human, we learn a regressor that takes a single RGB image as input and predicts the low-dimensional parameter vector that corresponds to the shown facial expression. In a series of experiments, we compare GEM's self-reenactment and cross-person reenactment results to state-of-the-art 3D avatar methods, demonstrating GEM's higher visual quality and better generalization to new expressions.
