Table of Contents
Fetching ...

GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation

Jie Wang, Jiu-Cheng Xie, Xianyan Li, Feng Xu, Chi-Man Pun, Hao Gao

TL;DR

GaussianHead tackles monocular, dynamic head avatar reconstruction by representing head geometry with anisotropic 3D Gaussians and storing appearance in a compact single-resolution tri-plane. A motion deformation field aligns Gaussians to expression-driven poses, while a novel learnable Gaussian derivation generates multiple doppelgangers per core Gaussian, mitigating feature dilution and enabling high-fidelity texture capture. Hierarchical radiance decoding and inherited derivation initialization provide accurate rendering with efficient training, yielding superior reconstruction, cross-identity reenactment, and novel-view synthesis at a notably smaller model size. The approach offers practical potential for real-time or resource-constrained applications and sets a new direction for compact, expressive head avatars.

Abstract

Constructing vivid 3D head avatars for given subjects and realizing a series of animations on them is valuable yet challenging. This paper presents GaussianHead, which models the actional human head with anisotropic 3D Gaussians. In our framework, a motion deformation field and multi-resolution tri-plane are constructed respectively to deal with the head's dynamic geometry and complex texture. Notably, we impose an exclusive derivation scheme on each Gaussian, which generates its multiple doppelgangers through a set of learnable parameters for position transformation. With this design, we can compactly and accurately encode the appearance information of Gaussians, even those fitting the head's particular components with sophisticated structures. In addition, an inherited derivation strategy for newly added Gaussians is adopted to facilitate training acceleration. Extensive experiments show that our method can produce high-fidelity renderings, outperforming state-of-the-art approaches in reconstruction, cross-identity reenactment, and novel view synthesis tasks. Our code is available at: https://github.com/chiehwangs/gaussian-head.

GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation

TL;DR

GaussianHead tackles monocular, dynamic head avatar reconstruction by representing head geometry with anisotropic 3D Gaussians and storing appearance in a compact single-resolution tri-plane. A motion deformation field aligns Gaussians to expression-driven poses, while a novel learnable Gaussian derivation generates multiple doppelgangers per core Gaussian, mitigating feature dilution and enabling high-fidelity texture capture. Hierarchical radiance decoding and inherited derivation initialization provide accurate rendering with efficient training, yielding superior reconstruction, cross-identity reenactment, and novel-view synthesis at a notably smaller model size. The approach offers practical potential for real-time or resource-constrained applications and sets a new direction for compact, expressive head avatars.

Abstract

Constructing vivid 3D head avatars for given subjects and realizing a series of animations on them is valuable yet challenging. This paper presents GaussianHead, which models the actional human head with anisotropic 3D Gaussians. In our framework, a motion deformation field and multi-resolution tri-plane are constructed respectively to deal with the head's dynamic geometry and complex texture. Notably, we impose an exclusive derivation scheme on each Gaussian, which generates its multiple doppelgangers through a set of learnable parameters for position transformation. With this design, we can compactly and accurately encode the appearance information of Gaussians, even those fitting the head's particular components with sophisticated structures. In addition, an inherited derivation strategy for newly added Gaussians is adopted to facilitate training acceleration. Extensive experiments show that our method can produce high-fidelity renderings, outperforming state-of-the-art approaches in reconstruction, cross-identity reenactment, and novel view synthesis tasks. Our code is available at: https://github.com/chiehwangs/gaussian-head.
Paper Structure (23 sections, 12 equations, 13 figures, 4 tables)

This paper contains 23 sections, 12 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: The adoption of axis-aligned mapping in the tri-plane-formed feature representation is accompanied by a severe problem of "feature dilution", a visual illustration of that is given in (a). We refer the readers to the third paragraph of the Introduction part for detailed explanations. In contrast, (b) shows our derivation strategy for addressing this problem. Taking the blue Gaussian primitive for example, we first obtain its multiple derivatives through a specific set of learnable transformations and then do feature projection and aggregation. Other 3D Gaussians also undergo the same process.
  • Figure 2: Method overview. GaussianHead uses a set of 3D Gaussians with learnable attributes controlling their shape and appearance to model the subject's head. A motion deformation field is first set up to represent the dynamic head geometry, which converts structureless Gaussians $G_R$ to structured core ones $G_P$ in a posed space via conditioning on pre-acquired expression parameters $\bm{e}$. Next, a single-resolution tri-plane structure of the feature container is leveraged to store appearance-related attributes. Notably, derivation mechanisms through learnable rotations are applied to each core Gaussian, yielding several doppelgangers of it. The integration of sub-features obtained through projection onto the planes from those doppelgangers is taken as the final feature $\bm{f}$ of the core Gaussian. Two separate tiny MLPs are then employed to decode opacity $\alpha$ and spherical harmonic coefficients (SHs), based on which we generate the final rendering via differential rasterization.
  • Figure 3: Qualitative comparisons of the reconstruction task. All competitors are run under the configurations specified by their respective works. Our method achieves superior visual results, particularly in aspects such as wrinkles, teeth, eyebrows, and even reflections on glasses.
  • Figure 4: Error maps for the reconstruction task. We compare with (a) NeRFBlendShapenerfblendshape, (b) GaussianBlendShape gaussianblendshape, (c) SplattingAvatar SplattingAvatar, (d) INSTA INSTA, (e) PointAvatar pointavatar and (f) MonoGaussianAvatar monogaussianavatar. Note that methods (a-d) only model the head, and (e-f), as well as ours, further include the torso. In each map, brighter areas indicate larger errors.
  • Figure 5: Qualitative comparisons of the cross-identity reenactment task. Comparison methods include (a) MonoGaussianAvatarmonogaussianavatar, (b) PointAvatarpointavatar, (c) INSTAINSTA, (d) SplattingAvatar SplattingAvatar, (e) GaussianBlendShape gaussianblendshape and (f) NeRFBlendShape nerfblendshape. Our GaussianHead achieves the best reenactment results, even in conveying extreme expressions. For more intuitive comparisons of motion sequences reenacted by these methods, please refer to the supplementary video.
  • ...and 8 more figures