Table of Contents
Fetching ...

HRAvatar: High-Quality and Relightable Gaussian Head Avatar

Dongbin Zhang, Yunfei Liu, Lijian Lin, Ye Zhu, Kangjie Chen, Minghan Qin, Yu Li, Haoqian Wang

TL;DR

HRAvatar introduces a monocular-head-avatar reconstruction method that combines 3D Gaussian Splatting with learnable per-point deformation and an end-to-end expression encoder to achieve high-fidelity geometry and expressive motion. The appearance model decomposes into intrinsic properties (albedo, roughness, Fresnel) and employs physically-based shading with environment maps, enabling realistic relighting under unknown lighting. Extensive experiments across INSTA, HDTF, and self-captured data show state-of-the-art quality and real-time relighting (~155 FPS), with ablations confirming the importance of the encoder, deformation strategy, and shading terms. While offering significant gains, the approach remains bounded by FLAME priors and intrinsic-albedo disentanglement challenges, pointing to future work in semantic material guidance and GPU-accelerated inference.

Abstract

Reconstructing animatable and high-quality 3D head avatars from monocular videos, especially with realistic relighting, is a valuable task. However, the limited information from single-view input, combined with the complex head poses and facial movements, makes this challenging. Previous methods achieve real-time performance by combining 3D Gaussian Splatting with a parametric head model, but the resulting head quality suffers from inaccurate face tracking and limited expressiveness of the deformation model. These methods also fail to produce realistic effects under novel lighting conditions. To address these issues, we propose HRAvatar, a 3DGS-based method that reconstructs high-fidelity, relightable 3D head avatars. HRAvatar reduces tracking errors through end-to-end optimization and better captures individual facial deformations using learnable blendshapes and learnable linear blend skinning. Additionally, it decomposes head appearance into several physical properties and incorporates physically-based shading to account for environmental lighting. Extensive experiments demonstrate that HRAvatar not only reconstructs superior-quality heads but also achieves realistic visual effects under varying lighting conditions.

HRAvatar: High-Quality and Relightable Gaussian Head Avatar

TL;DR

HRAvatar introduces a monocular-head-avatar reconstruction method that combines 3D Gaussian Splatting with learnable per-point deformation and an end-to-end expression encoder to achieve high-fidelity geometry and expressive motion. The appearance model decomposes into intrinsic properties (albedo, roughness, Fresnel) and employs physically-based shading with environment maps, enabling realistic relighting under unknown lighting. Extensive experiments across INSTA, HDTF, and self-captured data show state-of-the-art quality and real-time relighting (~155 FPS), with ablations confirming the importance of the encoder, deformation strategy, and shading terms. While offering significant gains, the approach remains bounded by FLAME priors and intrinsic-albedo disentanglement challenges, pointing to future work in semantic material guidance and GPU-accelerated inference.

Abstract

Reconstructing animatable and high-quality 3D head avatars from monocular videos, especially with realistic relighting, is a valuable task. However, the limited information from single-view input, combined with the complex head poses and facial movements, makes this challenging. Previous methods achieve real-time performance by combining 3D Gaussian Splatting with a parametric head model, but the resulting head quality suffers from inaccurate face tracking and limited expressiveness of the deformation model. These methods also fail to produce realistic effects under novel lighting conditions. To address these issues, we propose HRAvatar, a 3DGS-based method that reconstructs high-fidelity, relightable 3D head avatars. HRAvatar reduces tracking errors through end-to-end optimization and better captures individual facial deformations using learnable blendshapes and learnable linear blend skinning. Additionally, it decomposes head appearance into several physical properties and incorporates physically-based shading to account for environmental lighting. Extensive experiments demonstrate that HRAvatar not only reconstructs superior-quality heads but also achieves realistic visual effects under varying lighting conditions.

Paper Structure

This paper contains 34 sections, 17 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: With monocular video input, HRAvatar reconstructs a high-quality, animatable 3D head avatar that enables realistic relighting effects and simple material editing.
  • Figure 2: Given a monocular video with unknown lighting and $M$ frames, we first track fixed shape parameter $\beta$ and pose parameters $\{\theta_j\}^M$ through iterative optimization before training. Expression parameters $\{\psi_j\}^M$ and jaw poses $\theta^{jaw}$ are estimated via an expression encoder, which is optimized during training. With these parameters, we transform the Gaussian points into pose space using learnable linear blendshapes $\mathcal{BS}$ and linear blend skinning $\mathcal{LBS}$. We then render the Gaussian points to obtain albedo, roughness, reflectance, and normal maps. Finally, we compute pixel colors using physically-based shading with optimizable environment maps.
  • Figure 3: Qualitative comparison results on self-reenactment. Compared to others, ours captures finer texture details and renders high-fidelity images. Ours also achieves more accurate expression deformations and reconstructs better geometric details.
  • Figure 4: Visual comparison with FLARE on relighting. "Spec. int." denotes the specular intensity coefficient. FLARE exhibits some artifacts due to partially corrupted normals, while our method learns smoother normals, enabling more reasonable and consistent relighting. Notably, due to differences in pre-filtering environment maps, our method and FLARE exhibit variations in lighting brightness.
  • Figure 5: Visual comparison on cross-reenactment. HRAvatar accurately simulates actors' poses and expressions, preserving textures and geometric details, while others exhibit artifacts.
  • ...and 7 more figures