Table of Contents
Fetching ...

PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image

Peng Li, Yisheng He, Yingdong Hu, Yuan Dong, Weihao Yuan, Yuan Liu, Siyu Zhu, Gang Cheng, Zilong Dong, Yike Guo

TL;DR

PanoLAM tackles the problem of producing high-fidelity Gaussian full-head avatars from a single unposed image without expensive optimization. It introduces a feed-forward, dual-branch architecture that combines a coarse-to-fine point-based head generation with a spherical-triplane prior distillation pathway, leveraging a large-scale synthetic dataset generated from pretrained 3D GANs. The approach demonstrates superior reconstruction quality and significant speedups for inference and rendering, validated through extensive ablations and comparisons against prior art. This work enables practical, real-time 3D avatar creation from a single photo and highlights the value of synthetic 3D priors and topology-aware densification in 3D head reconstruction.

Abstract

We present a feed-forward framework for Gaussian full-head synthesis from a single unposed image. Unlike previous work that relies on time-consuming GAN inversion and test-time optimization, our framework can reconstruct the Gaussian full-head model given a single unposed image in a single forward pass. This enables fast reconstruction and rendering during inference. To mitigate the lack of large-scale 3D head assets, we propose a large-scale synthetic dataset from trained 3D GANs and train our framework using only synthetic data. For efficient high-fidelity generation, we introduce a coarse-to-fine Gaussian head generation pipeline, where sparse points from the FLAME model interact with the image features by transformer blocks for feature extraction and coarse shape reconstruction, which are then densified for high-fidelity reconstruction. To fully leverage the prior knowledge residing in pretrained 3D GANs for effective reconstruction, we propose a dual-branch framework that effectively aggregates the structured spherical triplane feature and unstructured point-based features for more effective Gaussian head reconstruction. Experimental results show the effectiveness of our framework towards existing work. Project page at: https://panolam.github.io/.

PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image

TL;DR

PanoLAM tackles the problem of producing high-fidelity Gaussian full-head avatars from a single unposed image without expensive optimization. It introduces a feed-forward, dual-branch architecture that combines a coarse-to-fine point-based head generation with a spherical-triplane prior distillation pathway, leveraging a large-scale synthetic dataset generated from pretrained 3D GANs. The approach demonstrates superior reconstruction quality and significant speedups for inference and rendering, validated through extensive ablations and comparisons against prior art. This work enables practical, real-time 3D avatar creation from a single photo and highlights the value of synthetic 3D priors and topology-aware densification in 3D head reconstruction.

Abstract

We present a feed-forward framework for Gaussian full-head synthesis from a single unposed image. Unlike previous work that relies on time-consuming GAN inversion and test-time optimization, our framework can reconstruct the Gaussian full-head model given a single unposed image in a single forward pass. This enables fast reconstruction and rendering during inference. To mitigate the lack of large-scale 3D head assets, we propose a large-scale synthetic dataset from trained 3D GANs and train our framework using only synthetic data. For efficient high-fidelity generation, we introduce a coarse-to-fine Gaussian head generation pipeline, where sparse points from the FLAME model interact with the image features by transformer blocks for feature extraction and coarse shape reconstruction, which are then densified for high-fidelity reconstruction. To fully leverage the prior knowledge residing in pretrained 3D GANs for effective reconstruction, we propose a dual-branch framework that effectively aggregates the structured spherical triplane feature and unstructured point-based features for more effective Gaussian head reconstruction. Experimental results show the effectiveness of our framework towards existing work. Project page at: https://panolam.github.io/.

Paper Structure

This paper contains 43 sections, 8 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: PanoLAM creates high-fidelity Gaussian full-heads with one-shot unposed images in seconds.
  • Figure 2: Overall Framework. Given an unposed head image as input, PanoLAM involves two branches to achieve single-pass 3D Gaussian head reconstruction: a point-based transformer for coarse-to-fine point shape reconstruction and point features extraction, and a spherical triplane transformer to distill prior knowledge from 3D GAN. Features from the two branches are concatenated for high-fidelity Gaussian head regression.
  • Figure 3: The proposed feature aggregation mechanism samples and aggregates multi-layer features from the spherical triplane for each point.
  • Figure 4: Visualization of reconstruction and novel view synthesis of different methods.
  • Figure 5: Analysis of ray marching aggregation weights in the original Spherical Triplane in SphereHead.
  • ...and 9 more figures