3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations
Yating Wang, Xuan Wang, Ran Yi, Yanbo Fan, Jichen Hu, Jingcheng Zhu, Lizhuang Ma
TL;DR
This work tackles the challenge of producing photorealistic, animatable 3D head avatars that are both texture-detailed and computationally efficient. It introduces a compact tensorial representation: static appearance is stored in a tri-plane in canonical space, while expression-driven textures are captured with lightweight 1D feature lines that decode to opacity offsets, enabling dynamic detail with minimal storage. Two training strategies—adaptive truncated opacity penalties and class-balanced sampling—enhance generalization to unseen expressions. On Nersemble data, the approach achieves high-fidelity rendering at around $300\,\mathrm{FPS}$ with about $10\,\mathrm{MB}$ per subject, outperforming state-of-the-art baselines in both novel view synthesis and self-reenactment. This method offers practical, real-time head-avatar rendering with broad applicability in streaming, AR/VR, and mobile video conferencing.
Abstract
Recent studies have combined 3D Gaussian and 3D Morphable Models (3DMM) to construct high-quality 3D head avatars. In this line of research, existing methods either fail to capture the dynamic textures or incur significant overhead in terms of runtime speed or storage space. To this end, we propose a novel method that addresses all the aforementioned demands. In specific, we introduce an expressive and compact representation that encodes texture-related attributes of the 3D Gaussians in the tensorial format. We store appearance of neutral expression in static tri-planes, and represents dynamic texture details for different expressions using lightweight 1D feature lines, which are then decoded into opacity offset relative to the neutral face. We further propose adaptive truncated opacity penalty and class-balanced sampling to improve generalization across different expressions. Experiments show this design enables accurate face dynamic details capturing while maintains real-time rendering and significantly reduces storage costs, thus broadening the applicability to more scenarios.
