Table of Contents
Fetching ...

GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians

Xiaobao Wei, Peng Chen, Ming Lu, Hui Chen, Feng Tian

TL;DR

This paper introduces a method called GraphAvatar that utilizes Graph Neural Networks (GNN) to generate 3D Gaussians for the head avatar, surpassing existing methods in visual fidelity and storage consumption and sheds light on the trade-offs between rendering quality and model size.

Abstract

Rendering photorealistic head avatars from arbitrary viewpoints is crucial for various applications like virtual reality. Although previous methods based on Neural Radiance Fields (NeRF) can achieve impressive results, they lack fidelity and efficiency. Recent methods using 3D Gaussian Splatting (3DGS) have improved rendering quality and real-time performance but still require significant storage overhead. In this paper, we introduce a method called GraphAvatar that utilizes Graph Neural Networks (GNN) to generate 3D Gaussians for the head avatar. Specifically, GraphAvatar trains a geometric GNN and an appearance GNN to generate the attributes of the 3D Gaussians from the tracked mesh. Therefore, our method can store the GNN models instead of the 3D Gaussians, significantly reducing the storage overhead to just 10MB. To reduce the impact of face-tracking errors, we also present a novel graph-guided optimization module to refine face-tracking parameters during training. Finally, we introduce a 3D-aware enhancer for post-processing to enhance the rendering quality. We conduct comprehensive experiments to demonstrate the advantages of GraphAvatar, surpassing existing methods in visual fidelity and storage consumption. The ablation study sheds light on the trade-offs between rendering quality and model size. The code will be released at: https://github.com/ucwxb/GraphAvatar

GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians

TL;DR

This paper introduces a method called GraphAvatar that utilizes Graph Neural Networks (GNN) to generate 3D Gaussians for the head avatar, surpassing existing methods in visual fidelity and storage consumption and sheds light on the trade-offs between rendering quality and model size.

Abstract

Rendering photorealistic head avatars from arbitrary viewpoints is crucial for various applications like virtual reality. Although previous methods based on Neural Radiance Fields (NeRF) can achieve impressive results, they lack fidelity and efficiency. Recent methods using 3D Gaussian Splatting (3DGS) have improved rendering quality and real-time performance but still require significant storage overhead. In this paper, we introduce a method called GraphAvatar that utilizes Graph Neural Networks (GNN) to generate 3D Gaussians for the head avatar. Specifically, GraphAvatar trains a geometric GNN and an appearance GNN to generate the attributes of the 3D Gaussians from the tracked mesh. Therefore, our method can store the GNN models instead of the 3D Gaussians, significantly reducing the storage overhead to just 10MB. To reduce the impact of face-tracking errors, we also present a novel graph-guided optimization module to refine face-tracking parameters during training. Finally, we introduce a 3D-aware enhancer for post-processing to enhance the rendering quality. We conduct comprehensive experiments to demonstrate the advantages of GraphAvatar, surpassing existing methods in visual fidelity and storage consumption. The ablation study sheds light on the trade-offs between rendering quality and model size. The code will be released at: https://github.com/ucwxb/GraphAvatar

Paper Structure

This paper contains 19 sections, 11 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: GraphAvatar leverages graph neural networks to generate 3D Gaussians, which are then rasterized into high-fidelity images based on tracked meshes. Compared to contemporary approaches, GraphAvatar not only delivers superior rendering performance but also features the most compact model size, substantially minimizing storage requirements.
  • Figure 2: Pipeline of GraphAvatar. Our method takes the tracked meshes from source videos as input and first utilizes a geometric Graph Unet and an appearance Graph Unet to generate corresponding 3D Gaussian attributes. These Gaussians are then established as anchors to predict view-dependent attributes as neural Gaussians. To minimize errors from the tracked mesh, we introduce a graph-guided optimization module that utilizes time series and bottleneck features from Graph Unet to refine the tracked camera pose and expression coefficients. All Gaussians are combined and splatted into 2D images and depths using a differentiable rasterizer. Conditioned on the predicted depth map, a 3D-aware enhancer post-processes the rendered images to produce the final high-quality images.
  • Figure 3: Qualitative comparison on INSTA dataset.
  • Figure 4: Qualitative comparison on NBS dataset.
  • Figure 5: Qualitative ablation study on the INSTA dataset.