Table of Contents
Fetching ...

MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing

Cong Wang, Di Kang, He-Yi Sun, Shen-Han Qian, Zi-Xuan Wang, Linchao Bao, Song-Hai Zhang

TL;DR

MeGA addresses the challenge of high-fidelity head avatars by decoupling head components into suitable representations: an enhanced FLAME-based facial mesh augmented with a UV displacement map and three-part neural textures for facial rendering via deferred neural rendering, and a static 3D Gaussian Splatting hair model with rigid and MLP-driven non-rigid deformation for dynamic hair. An occlusion-aware blending module ensures seamless face-hair fusion. The system enables downstream editing such as hairstyle alterations and texture edits, and experiments on the NeRSemble dataset show state-of-the-art performance in novel-view and novel-expression synthesis. Overall, MeGA demonstrates that a targeted hybrid representation can achieve superior photorealism and practical editability for full-head avatars in AR/VR contexts.

Abstract

Creating high-fidelity head avatars from multi-view videos is a core issue for many AR/VR applications. However, existing methods usually struggle to obtain high-quality renderings for all different head components simultaneously since they use one single representation to model components with drastically different characteristics (e.g., skin vs. hair). In this paper, we propose a Hybrid Mesh-Gaussian Head Avatar (MeGA) that models different head components with more suitable representations. Specifically, we select an enhanced FLAME mesh as our facial representation and predict a UV displacement map to provide per-vertex offsets for improved personalized geometric details. To achieve photorealistic renderings, we obtain facial colors using deferred neural rendering and disentangle neural textures into three meaningful parts. For hair modeling, we first build a static canonical hair using 3D Gaussian Splatting. A rigid transformation and an MLP-based deformation field are further applied to handle complex dynamic expressions. Combined with our occlusion-aware blending, MeGA generates higher-fidelity renderings for the whole head and naturally supports more downstream tasks. Experiments on the NeRSemble dataset demonstrate the effectiveness of our designs, outperforming previous state-of-the-art methods and supporting various editing functionalities, including hairstyle alteration and texture editing.

MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing

TL;DR

MeGA addresses the challenge of high-fidelity head avatars by decoupling head components into suitable representations: an enhanced FLAME-based facial mesh augmented with a UV displacement map and three-part neural textures for facial rendering via deferred neural rendering, and a static 3D Gaussian Splatting hair model with rigid and MLP-driven non-rigid deformation for dynamic hair. An occlusion-aware blending module ensures seamless face-hair fusion. The system enables downstream editing such as hairstyle alterations and texture edits, and experiments on the NeRSemble dataset show state-of-the-art performance in novel-view and novel-expression synthesis. Overall, MeGA demonstrates that a targeted hybrid representation can achieve superior photorealism and practical editability for full-head avatars in AR/VR contexts.

Abstract

Creating high-fidelity head avatars from multi-view videos is a core issue for many AR/VR applications. However, existing methods usually struggle to obtain high-quality renderings for all different head components simultaneously since they use one single representation to model components with drastically different characteristics (e.g., skin vs. hair). In this paper, we propose a Hybrid Mesh-Gaussian Head Avatar (MeGA) that models different head components with more suitable representations. Specifically, we select an enhanced FLAME mesh as our facial representation and predict a UV displacement map to provide per-vertex offsets for improved personalized geometric details. To achieve photorealistic renderings, we obtain facial colors using deferred neural rendering and disentangle neural textures into three meaningful parts. For hair modeling, we first build a static canonical hair using 3D Gaussian Splatting. A rigid transformation and an MLP-based deformation field are further applied to handle complex dynamic expressions. Combined with our occlusion-aware blending, MeGA generates higher-fidelity renderings for the whole head and naturally supports more downstream tasks. Experiments on the NeRSemble dataset demonstrate the effectiveness of our designs, outperforming previous state-of-the-art methods and supporting various editing functionalities, including hairstyle alteration and texture editing.
Paper Structure (15 sections, 15 equations, 5 figures, 2 tables)

This paper contains 15 sections, 15 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of our Hybrid Mesh-Gaussian Head Avatar. MeGA models different head components with more suitable representations. For facial modeling, we propose a neural mesh-based representation, including a UV displacement map $\hat{\bm{G}}_d$ for geometric details, a disentangled neural texture map composed by $\hat{\bm{T}}_{di}$, $\hat{\bm{T}}_{dy}$, and $\hat{\bm{T}}_v$ to learn the diffuse colors, dynamic textures, and view-dependent colors, respectively. For hair modeling, a canonical 3D Gaussian Splatting is reconstructed and then animated using a global rigid transformation and an MLP-based non-rigid deformation field. A mesh occlusion-aware blending is proposed to properly blend the face and hair images. MeGA naturally supports hair alteration and texture editing due to the disentangled representations. Learnable parameters are highlighted using green boxes.
  • Figure 2: Mesh Occlusion-Aware Blending. By comparing the "near-z" depth map $\bm{D}_{nz}$ of hair and the depth map of the head, we find pixels that should use hair renderings (white regions in $\bm{M}_o$). Further combining with mesh occlusion-aware hair opacity map $\bm{A}_g$ which only accumulates opacities of visible Gaussians (i.e., in front of the mesh), we obtain the blending mask for final renderings.
  • Figure 3: Hairstyle Alteration and Texture Editing. MeGA naturally supports hairstyle alteration and texture editing. The edited head avatar can be rendered in different views and expressions.
  • Figure 4: Qualitative Comparisons with State-of-the-Art Methods. MeGA generates more realistic facial renditions compared to previous state-of-the-art methods, especially in terms of expression matching and detailed skin textures (e.g., wrinkles).
  • Figure 5: Effects of disentangled texture maps and the UV displacement map. Disabling any of the texture maps results in a worse appearance in the facial region (i.e., MeGA-noview and MeGA-nodyn). No UV displacement map leads to a lack of geometric details and produces unrealistic renderings.