Table of Contents
Fetching ...

Hybrid Explicit Representation for Ultra-Realistic Head Avatars

Hongrui Cai, Yuting Xiao, Xuan Wang, Jiafei Li, Yudong Guo, Yanbo Fan, Shenghua Gao, Juyong Zhang

TL;DR

HERA presents a hybrid explicit head-avatar representation that unites UV-mapped meshes for high-fidelity textures with 3D Gaussian splats for intricate geometry, rendered in real time via a differentiable hybrid pipeline. A stable depth-sorting strategy ensures splats do not artifactually intersect mesh facets, enabling clean novel-view synthesis and expressive animation. The method optimizes texture, opacity, and a suite of Gaussian splats anchored to mesh facets, outperforming neural implicit and single-primitive baselines across view synthesis and expression tasks while using substantially fewer splats than prior 3DGS approaches. The approach offers practical benefits for high-quality avatar creation and UV-space editing, with future work enabling relighting and broader material manipulation.

Abstract

We introduce a novel approach to creating ultra-realistic head avatars and rendering them in real-time (>30fps at $2048 \times 1334$ resolution). First, we propose a hybrid explicit representation that combines the advantages of two primitive-based efficient rendering techniques. UV-mapped 3D mesh is utilized to capture sharp and rich textures on smooth surfaces, while 3D Gaussian Splatting is employed to represent complex geometric structures. In the pipeline of modeling an avatar, after tracking parametric models based on captured multi-view RGB videos, our goal is to simultaneously optimize the texture and opacity map of mesh, as well as a set of 3D Gaussian splats localized and rigged onto the mesh facets. Specifically, we perform $α$-blending on the color and opacity values based on the merged and re-ordered z-buffer from the rasterization results of mesh and 3DGS. This process involves the mesh and 3DGS adaptively fitting the captured visual information to outline a high-fidelity digital avatar. To avoid artifacts caused by Gaussian splats crossing the mesh facets, we design a stable hybrid depth sorting strategy. Experiments illustrate that our modeled results exceed those of state-of-the-art approaches.

Hybrid Explicit Representation for Ultra-Realistic Head Avatars

TL;DR

HERA presents a hybrid explicit head-avatar representation that unites UV-mapped meshes for high-fidelity textures with 3D Gaussian splats for intricate geometry, rendered in real time via a differentiable hybrid pipeline. A stable depth-sorting strategy ensures splats do not artifactually intersect mesh facets, enabling clean novel-view synthesis and expressive animation. The method optimizes texture, opacity, and a suite of Gaussian splats anchored to mesh facets, outperforming neural implicit and single-primitive baselines across view synthesis and expression tasks while using substantially fewer splats than prior 3DGS approaches. The approach offers practical benefits for high-quality avatar creation and UV-space editing, with future work enabling relighting and broader material manipulation.

Abstract

We introduce a novel approach to creating ultra-realistic head avatars and rendering them in real-time (>30fps at resolution). First, we propose a hybrid explicit representation that combines the advantages of two primitive-based efficient rendering techniques. UV-mapped 3D mesh is utilized to capture sharp and rich textures on smooth surfaces, while 3D Gaussian Splatting is employed to represent complex geometric structures. In the pipeline of modeling an avatar, after tracking parametric models based on captured multi-view RGB videos, our goal is to simultaneously optimize the texture and opacity map of mesh, as well as a set of 3D Gaussian splats localized and rigged onto the mesh facets. Specifically, we perform -blending on the color and opacity values based on the merged and re-ordered z-buffer from the rasterization results of mesh and 3DGS. This process involves the mesh and 3DGS adaptively fitting the captured visual information to outline a high-fidelity digital avatar. To avoid artifacts caused by Gaussian splats crossing the mesh facets, we design a stable hybrid depth sorting strategy. Experiments illustrate that our modeled results exceed those of state-of-the-art approaches.
Paper Structure (30 sections, 9 equations, 9 figures, 4 tables)

This paper contains 30 sections, 9 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: The overall pipeline of proposed HERA. In the canonical space, there is a mesh with a texture UV map $\mathbf{T}$ (visualized in RGB format) and an opacity UV map $\mathbf{A}$, along with several Gaussian splats defined in the local coordinate system of the mesh facets. During animation, the positions of the mesh vertices change, causing the rigged splats to move accordingly. Under the camera view, both the mesh and Gaussian splats are rasterized using the proposed hybrid approach, and the image is rendered through $\alpha$-blending. The entire pipeline is fully differentiable. Guided by the captured image, the texture map $\mathbf{T}$ and the opacity map $\mathbf{A}$ are optimized while the rigged Gaussian splats are updated and densified simultaneously.
  • Figure 2: An example illustrating the stable depth sorting strategy. 3DGS records the depth of a 3D splat (denoted by its mean $\bm{\mu}$) by calculating the distance between its projection onto the $\textbf{z}$-axis and the camera center $O$, i.e., $l_{OA}$, which is not a per-pixel value. During the rasterization of the mesh, individual rays $\textbf{r}_1$ and $\textbf{r}_2$ from $O$ intersect the facet $\textbf{f}$ at two points, and the depths of these points are recorded as $l_{OB}$ and $l_{OC}$, respectively. (a) If we directly compare the per-pixel mesh depths $l_{OB}$ and $l_{OC}$ with the splat depth $l_{OA}$, we will place $\bm{\mu}$ in front of $\textbf{f}$ when rendering the pixel on $\textbf{r}_1$, but this order is reversed when rendering the pixel on $\textbf{r}_2$. This inconsistency can make the splat appear as if it is crossing the mesh facet, leading to artifacts when rendering from novel viewpoints. (b) Our proposed strategy. Whether sorting along rays $\textbf{r}_1$ and $\textbf{r}_2$, we project the splat onto the mesh facet and compare the depth of the projected point, i.e., $l_{OD}$, with $l_{OA}$.
  • Figure 3: Ablation study on depth sorting strategy. It demonstrates that our proposed stable sorting strategy effectively eliminates artifacts in novel view synthesis caused by Gaussian splats crossing the mesh. Zoom in for better views.
  • Figure 4: Novel view synthesis on Multiface dataset wuu2022multiface. From left to right, we display the results of NeRFBlendShape gao2022reconstructing, PointAvatar zheng2023pointavatar, GaussianBlendshapes ma20243d, GaussianAvatars qian2024gaussianavatars, ours and ground truth images, respectively. Zoom in for better views.
  • Figure 5: Novel expression animation on Multiface dataset wuu2022multiface. From left to right, we display the results of NeRFBlendShape gao2022reconstructing, PointAvatar zheng2023pointavatar, GaussianBlendshapes ma20243d, GaussianAvatars qian2024gaussianavatars, ours and ground truth images, respectively. Zoom in for better views.
  • ...and 4 more figures