Table of Contents
Fetching ...

3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations

Yating Wang, Xuan Wang, Ran Yi, Yanbo Fan, Jichen Hu, Jingcheng Zhu, Lizhuang Ma

TL;DR

This work tackles the challenge of producing photorealistic, animatable 3D head avatars that are both texture-detailed and computationally efficient. It introduces a compact tensorial representation: static appearance is stored in a tri-plane in canonical space, while expression-driven textures are captured with lightweight 1D feature lines that decode to opacity offsets, enabling dynamic detail with minimal storage. Two training strategies—adaptive truncated opacity penalties and class-balanced sampling—enhance generalization to unseen expressions. On Nersemble data, the approach achieves high-fidelity rendering at around $300\,\mathrm{FPS}$ with about $10\,\mathrm{MB}$ per subject, outperforming state-of-the-art baselines in both novel view synthesis and self-reenactment. This method offers practical, real-time head-avatar rendering with broad applicability in streaming, AR/VR, and mobile video conferencing.

Abstract

Recent studies have combined 3D Gaussian and 3D Morphable Models (3DMM) to construct high-quality 3D head avatars. In this line of research, existing methods either fail to capture the dynamic textures or incur significant overhead in terms of runtime speed or storage space. To this end, we propose a novel method that addresses all the aforementioned demands. In specific, we introduce an expressive and compact representation that encodes texture-related attributes of the 3D Gaussians in the tensorial format. We store appearance of neutral expression in static tri-planes, and represents dynamic texture details for different expressions using lightweight 1D feature lines, which are then decoded into opacity offset relative to the neutral face. We further propose adaptive truncated opacity penalty and class-balanced sampling to improve generalization across different expressions. Experiments show this design enables accurate face dynamic details capturing while maintains real-time rendering and significantly reduces storage costs, thus broadening the applicability to more scenarios.

3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations

TL;DR

This work tackles the challenge of producing photorealistic, animatable 3D head avatars that are both texture-detailed and computationally efficient. It introduces a compact tensorial representation: static appearance is stored in a tri-plane in canonical space, while expression-driven textures are captured with lightweight 1D feature lines that decode to opacity offsets, enabling dynamic detail with minimal storage. Two training strategies—adaptive truncated opacity penalties and class-balanced sampling—enhance generalization to unseen expressions. On Nersemble data, the approach achieves high-fidelity rendering at around with about per subject, outperforming state-of-the-art baselines in both novel view synthesis and self-reenactment. This method offers practical, real-time head-avatar rendering with broad applicability in streaming, AR/VR, and mobile video conferencing.

Abstract

Recent studies have combined 3D Gaussian and 3D Morphable Models (3DMM) to construct high-quality 3D head avatars. In this line of research, existing methods either fail to capture the dynamic textures or incur significant overhead in terms of runtime speed or storage space. To this end, we propose a novel method that addresses all the aforementioned demands. In specific, we introduce an expressive and compact representation that encodes texture-related attributes of the 3D Gaussians in the tensorial format. We store appearance of neutral expression in static tri-planes, and represents dynamic texture details for different expressions using lightweight 1D feature lines, which are then decoded into opacity offset relative to the neutral face. We further propose adaptive truncated opacity penalty and class-balanced sampling to improve generalization across different expressions. Experiments show this design enables accurate face dynamic details capturing while maintains real-time rendering and significantly reduces storage costs, thus broadening the applicability to more scenarios.

Paper Structure

This paper contains 21 sections, 11 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Our method improves rendering quality while ensuring real-time performance and minimum storage. The points radius in the figure is proportional to the square root of the storage.
  • Figure 2: Our goal is to reconstruct 3DGS head avatar with dynamic details, ensuring real-time rendering and minimized storage. We use aparametric face mesh to describe large-scale geometry motions, moving the bound Gaussian splats accordingly. A triplane stores view-dependent appearance in canonical space, while 1D feature lines are used for dynamic details per blendshape, allowing interpolation with expression coefficients. Finally, the geometry attributes of the splats, along with the canonical appearance and dynamic details, are combined to render the face image.
  • Figure 3: Qualitative comparison with baseline methods on novel view synthesis task.
  • Figure 4: Qualitative comparison with baseline methods on self-reenactment task.
  • Figure 5: Cross-identity reenactment of head avatars. We use the expression and pose of the source subject on the far right to drive the character on the left.
  • ...and 4 more figures