Table of Contents
Fetching ...

GGAvatar: Geometric Adjustment of Gaussian Head Avatar

Xinyang Li, Jiaxin Wang, Yixin Xuan, Gongxin Yao, Yu Pan

TL;DR

GGAvatar tackles robust, high-fidelity 3D head avatar reconstruction from monocular video by combining neutral Gaussian initialization with a Geometry Morph Adjuster. The method binds Gaussian primitives to a FLAME mesh for coarse geometry and uses a multi-resolution tri-plane with an MLP to learn per-Gaussian deformation bases, addressing limitations of linear blend skinning. It achieves state-of-the-art visual quality and quantitative metrics on public datasets, excelling in novel-view synthesis and cross-identity reenactment. The approach enables expressive, detailed head avatars suitable for immersive telepresence and metaverse applications.

Abstract

We propose GGAvatar, a novel 3D avatar representation designed to robustly model dynamic head avatars with complex identities and deformations. GGAvatar employs a coarse-to-fine structure, featuring two core modules: Neutral Gaussian Initialization Module and Geometry Morph Adjuster. Neutral Gaussian Initialization Module pairs Gaussian primitives with deformable triangular meshes, employing an adaptive density control strategy to model the geometric structure of the target subject with neutral expressions. Geometry Morph Adjuster introduces deformation bases for each Gaussian in global space, creating fine-grained low-dimensional representations of deformation behaviors to address the Linear Blend Skinning formula's limitations effectively. Extensive experiments show that GGAvatar can produce high-fidelity renderings, outperforming state-of-the-art methods in visual quality and quantitative metrics.

GGAvatar: Geometric Adjustment of Gaussian Head Avatar

TL;DR

GGAvatar tackles robust, high-fidelity 3D head avatar reconstruction from monocular video by combining neutral Gaussian initialization with a Geometry Morph Adjuster. The method binds Gaussian primitives to a FLAME mesh for coarse geometry and uses a multi-resolution tri-plane with an MLP to learn per-Gaussian deformation bases, addressing limitations of linear blend skinning. It achieves state-of-the-art visual quality and quantitative metrics on public datasets, excelling in novel-view synthesis and cross-identity reenactment. The approach enables expressive, detailed head avatars suitable for immersive telepresence and metaverse applications.

Abstract

We propose GGAvatar, a novel 3D avatar representation designed to robustly model dynamic head avatars with complex identities and deformations. GGAvatar employs a coarse-to-fine structure, featuring two core modules: Neutral Gaussian Initialization Module and Geometry Morph Adjuster. Neutral Gaussian Initialization Module pairs Gaussian primitives with deformable triangular meshes, employing an adaptive density control strategy to model the geometric structure of the target subject with neutral expressions. Geometry Morph Adjuster introduces deformation bases for each Gaussian in global space, creating fine-grained low-dimensional representations of deformation behaviors to address the Linear Blend Skinning formula's limitations effectively. Extensive experiments show that GGAvatar can produce high-fidelity renderings, outperforming state-of-the-art methods in visual quality and quantitative metrics.
Paper Structure (18 sections, 12 equations, 5 figures, 2 tables)

This paper contains 18 sections, 12 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The initialization process for a neutral expression (e.g., ID1) uses a densification strategy to add Gaussian primitives to non-head regions, accelerating training convergence. This method shows that even without corresponding neutral expression images, we can reconstruct the neutral Gaussian geometry using the binding Gaussian strategy.
  • Figure 2: Overview of GGAvatar. A mesh-embedded Gaussian initialization strategy is proposed to model the geometry of neutral Gaussians. The neutral Gaussians are then coarsely deformed with FLAME mesh. To capture high-frequency dynamic details, we introduce the Geometry Morph Adjuster. To further enhance the representation capability of the deformation bases, we generate a latent vector from expression and pose parameters using an MLP. The deformed Gaussians are then splatted to render the image with a given pose.
  • Figure 3: Qualitative Comparisons with State-of-the-Art Methods. From top to bottom is ID1, ID2, ID3, ID4. GGAvatar generates more realistic face reconstructions, especially in capturing high-frequency dynamic details and reconstructing extreme expressions.
  • Figure 4: Novel view synthesis results of GGAvatar. We demonstrate multi-view geometric consistency across both x- and y-axis rotations.
  • Figure 5: Cross-Identity Reenactment results comparison.