Table of Contents
Fetching ...

Relightable Gaussian Codec Avatars

Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, Giljoo Nam

TL;DR

This work tackles real-time relighting of photorealistic head avatars by marrying a geometry representation based on 3D Gaussian Splatting with a learnable radiance transfer appearance model that supports all-frequency reflections. The appearance model combines diffuse spherical harmonics and specular spherical Gaussians, enabling global illumination handling and high-frequency highlights, while an explicit eye model provides accurate corneal reflections and gaze control. A CVAE-based latent space encodes expressions, and an explicit UV-aligned Gaussian decomposition enables drivable, animatable avatars captured from multi-view data. The approach demonstrates superior fidelity and real-time performance, including VR headset demonstrations, and introduces principled ablations validating the key design choices for geometry, appearance, and eye modeling.

Abstract

The fidelity of relighting is bounded by both geometry and appearance representations. For geometry, both mesh and volumetric approaches have difficulty modeling intricate structures like 3D hair geometry. For appearance, existing relighting models are limited in fidelity and often too slow to render in real-time with high-resolution continuous environments. In this work, we present Relightable Gaussian Codec Avatars, a method to build high-fidelity relightable head avatars that can be animated to generate novel expressions. Our geometry model based on 3D Gaussians can capture 3D-consistent sub-millimeter details such as hair strands and pores on dynamic face sequences. To support diverse materials of human heads such as the eyes, skin, and hair in a unified manner, we present a novel relightable appearance model based on learnable radiance transfer. Together with global illumination-aware spherical harmonics for the diffuse components, we achieve real-time relighting with all-frequency reflections using spherical Gaussians. This appearance model can be efficiently relit under both point light and continuous illumination. We further improve the fidelity of eye reflections and enable explicit gaze control by introducing relightable explicit eye models. Our method outperforms existing approaches without compromising real-time performance. We also demonstrate real-time relighting of avatars on a tethered consumer VR headset, showcasing the efficiency and fidelity of our avatars.

Relightable Gaussian Codec Avatars

TL;DR

This work tackles real-time relighting of photorealistic head avatars by marrying a geometry representation based on 3D Gaussian Splatting with a learnable radiance transfer appearance model that supports all-frequency reflections. The appearance model combines diffuse spherical harmonics and specular spherical Gaussians, enabling global illumination handling and high-frequency highlights, while an explicit eye model provides accurate corneal reflections and gaze control. A CVAE-based latent space encodes expressions, and an explicit UV-aligned Gaussian decomposition enables drivable, animatable avatars captured from multi-view data. The approach demonstrates superior fidelity and real-time performance, including VR headset demonstrations, and introduces principled ablations validating the key design choices for geometry, appearance, and eye modeling.

Abstract

The fidelity of relighting is bounded by both geometry and appearance representations. For geometry, both mesh and volumetric approaches have difficulty modeling intricate structures like 3D hair geometry. For appearance, existing relighting models are limited in fidelity and often too slow to render in real-time with high-resolution continuous environments. In this work, we present Relightable Gaussian Codec Avatars, a method to build high-fidelity relightable head avatars that can be animated to generate novel expressions. Our geometry model based on 3D Gaussians can capture 3D-consistent sub-millimeter details such as hair strands and pores on dynamic face sequences. To support diverse materials of human heads such as the eyes, skin, and hair in a unified manner, we present a novel relightable appearance model based on learnable radiance transfer. Together with global illumination-aware spherical harmonics for the diffuse components, we achieve real-time relighting with all-frequency reflections using spherical Gaussians. This appearance model can be efficiently relit under both point light and continuous illumination. We further improve the fidelity of eye reflections and enable explicit gaze control by introducing relightable explicit eye models. Our method outperforms existing approaches without compromising real-time performance. We also demonstrate real-time relighting of avatars on a tethered consumer VR headset, showcasing the efficiency and fidelity of our avatars.
Paper Structure (18 sections, 21 equations, 9 figures, 4 tables)

This paper contains 18 sections, 21 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Overview. Given an expression latent code $\mathbf{z}$, gaze $\mathbf{e}_{\{l,r\}}$, and view direction $\boldsymbol{\omega}_o$, our model decodes the parameters of 3D Gaussians (rotation $\mathbf{R}_k$, translation $\mathbf{t}_k$, scale $\mathbf{s}_k$, and opacity $o_k$) and learnable radiance transfer functions (colored and monochrome diffuse SH coefficients $\mathbf{d}^c_k$, $\mathbf{d}^m_k$, roughness $\sigma_k$, normal $\mathbf{n}_k$, and visibility $v_k$). We integrate the radiance transfer functions with the input light to compute the final color $\mathbf{c}_k$, which we then render via splatting and supervise in image space. The coarse vertex decoder $\mathcal{D}_{v}$ and geometry decoder $\mathcal{D}_{g}$ are described in Sec. \ref{['sec:geom']}, the appearance decoders $\mathcal{D}_{\{ci,cv\}}$ in Sec. \ref{['sec:color']}, and eyeball decoders $\mathcal{D}_{\{ei,ev\}}$ in Sec. \ref{['sec:eye']}.
  • Figure 2: Intrinsics decomposition. The full render (a) is produced by addition of a diffuse (b) and a specular component (c) (intensity multiplied by 2 for clarity). The diffuse component is obtained by multiplying a learned albedo (d) with shading computed by SH-based radiance transfer (e). The specular lobes direction is computed using a per-Gaussian normal (f).
  • Figure 3: Geometric representation comparison. Compared to a held out frame, (a), our Gaussian splatting decoded geometry (b,c) shows improved resolution over MVP lombardi2021mixture (d), especially in fine details like eyelashes and pores. The explicit eyeball model (b) additionally improves realism in eye glints. All methods use the appearance model described in Sec. \ref{['sec:color']}.
  • Figure 4: Appearance representation comparison. Compared to a held out frame (a), our appearance model (Sec. \ref{['sec:color']}) shows sharper pore-level specularities than methods using only a linear neural network yang2023towards or the spherical harmonics-only method "Eyenerf" li2022eyenerf. All methods use the geometric representation described in Sec. \ref{['sec:geom']} (without explicit eyeballs.)
  • Figure 5: Ablation Study: Monochrome SH. Compared to a held out frame (a), using higher-order monochrome SH coefficients (b) improves the sharpness of shadows compared to a model without them (c).
  • ...and 4 more figures