Table of Contents
Fetching ...

OMEGA-Avatar: One-shot Modeling of 360° Gaussian Avatars

Zehao Xia, Yiqun Wang, Zhengda Lu, Kai Liu, Jun Xiao, Peter Wonka

TL;DR

OMEGA-Avatar tackles the challenge of producing animatable, full-head 3D avatars from a single image by combining diffusion-guided semantic-aware FLAME mesh deformation with a dual Gaussian head decoded from a canonical UV representation. It introduces semantic-aware topology-preserving deformation to capture hair and unseen regions, and a multi-view feature splatting module that fuses features from generated multi-view views into a shared UV map for stable, view-consistent Gaussian decoding. The method achieves state-of-the-art performance in 360° full-head completeness and identity preservation on NeRSemble and Avatar-256, while remaining fully feed-forward without per-subject optimization. This approach enables efficient, one-shot creation of high-fidelity, animatable avatars suitable for real-time rendering and wide deployment across applications requiring robust 3D head modeling.

Abstract

Creating high-fidelity, animatable 3D avatars from a single image remains a formidable challenge. We identified three desirable attributes of avatar generation: 1) the method should be feed-forward, 2) model a 360° full-head, and 3) should be animation-ready. However, current work addresses only two of the three points simultaneously. To address these limitations, we propose OMEGA-Avatar, the first feed-forward framework that simultaneously generates a generalizable, 360°-complete, and animatable 3D Gaussian head from a single image. Starting from a feed-forward and animatable framework, we address the 360° full-head avatar generation problem with two novel components. First, to overcome poor hair modeling in full-head avatar generation, we introduce a semantic-aware mesh deformation module that integrates multi-view normals to optimize a FLAME head with hair while preserving its topology structure. Second, to enable effective feed-forward decoding of full-head features, we propose a multi-view feature splatting module that constructs a shared canonical UV representation from features across multiple views through differentiable bilinear splatting, hierarchical UV mapping, and visibility-aware fusion. This approach preserves both global structural coherence and local high-frequency details across all viewpoints, ensuring 360° consistency without per-instance optimization. Extensive experiments demonstrate that OMEGA-Avatar achieves state-of-the-art performance, significantly outperforming existing baselines in 360° full-head completeness while robustly preserving identity across different viewpoints.

OMEGA-Avatar: One-shot Modeling of 360° Gaussian Avatars

TL;DR

OMEGA-Avatar tackles the challenge of producing animatable, full-head 3D avatars from a single image by combining diffusion-guided semantic-aware FLAME mesh deformation with a dual Gaussian head decoded from a canonical UV representation. It introduces semantic-aware topology-preserving deformation to capture hair and unseen regions, and a multi-view feature splatting module that fuses features from generated multi-view views into a shared UV map for stable, view-consistent Gaussian decoding. The method achieves state-of-the-art performance in 360° full-head completeness and identity preservation on NeRSemble and Avatar-256, while remaining fully feed-forward without per-subject optimization. This approach enables efficient, one-shot creation of high-fidelity, animatable avatars suitable for real-time rendering and wide deployment across applications requiring robust 3D head modeling.

Abstract

Creating high-fidelity, animatable 3D avatars from a single image remains a formidable challenge. We identified three desirable attributes of avatar generation: 1) the method should be feed-forward, 2) model a 360° full-head, and 3) should be animation-ready. However, current work addresses only two of the three points simultaneously. To address these limitations, we propose OMEGA-Avatar, the first feed-forward framework that simultaneously generates a generalizable, 360°-complete, and animatable 3D Gaussian head from a single image. Starting from a feed-forward and animatable framework, we address the 360° full-head avatar generation problem with two novel components. First, to overcome poor hair modeling in full-head avatar generation, we introduce a semantic-aware mesh deformation module that integrates multi-view normals to optimize a FLAME head with hair while preserving its topology structure. Second, to enable effective feed-forward decoding of full-head features, we propose a multi-view feature splatting module that constructs a shared canonical UV representation from features across multiple views through differentiable bilinear splatting, hierarchical UV mapping, and visibility-aware fusion. This approach preserves both global structural coherence and local high-frequency details across all viewpoints, ensuring 360° consistency without per-instance optimization. Extensive experiments demonstrate that OMEGA-Avatar achieves state-of-the-art performance, significantly outperforming existing baselines in 360° full-head completeness while robustly preserving identity across different viewpoints.
Paper Structure (17 sections, 9 equations, 9 figures, 4 tables)

This paper contains 17 sections, 9 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Pipeline Overview. Given the source and target images, we leverage diffusion models to synthesize multi-view RGB images and corresponding normal maps. These normal maps are used to semantic-aware mesh deformation, while pixel-wise features are extracted from multi-view RGB images. Multi-view features are subsequently aggregated into a canonical UV feature map through the multi-view feature splatting module. The UV features and vertex features extracted from the deformed mesh are decoded and anchored to the mesh via UV mapping. For animation, the expression and pose derived from the target image are injected into the deformed mesh. Finally, the rendered output is enhanced by a neural refiner to generate the final full-head avatar.
  • Figure 2: Semantic-aware Mesh Deformation. Direct optimization with normal guidance disrupts the parametric structure of FLAME, causing severe surface irregularities and topological artifacts (left). Note the holes in the cranial region and the degeneration of facial features such as the eyes and ears (highlighted in red boxes). Our approach (right) mitigates these issues by incorporating semantic-aware topology preservation and a semantic-aware Laplacian. We preserve the clean topology of FLAME and ensure $360^\circ$ geometric consistency, enabling the generation of fine details without compromising facial structural integrity.
  • Figure 3: Multi-view Feature Splatting. Taking a frontal view as an example, we first obtain per-pixel UV coordinates and normals via rasterization, and map features to UV space using differentiable bilinear splatting. We employ hierarchical UV mapping, which builds a multi-resolution pyramid to fill missing regions in a coarse-to-fine manner. Simultaneously, we calculate a fusion weight map by combining view-dependent confidence and UV sampling density. Finally, the visibility-aware fusion module aggregates the weighted features from all views to generate the final UV feature map.
  • Figure 4: Novel view synthesis from single image on the Ava-256 dataset. Compared to state-of-the-art methods, our approach better preserves identity consistency and high-quality rendering results, even under unseen and extreme side-view facial angles. Note that PanoHead and SphereHead require inputs aligned to the FFHQ canonical space, which leads to differences in apparent scale. We use the red boxes to highlight the visual artifacts.
  • Figure 5: Qualitative Ablation on the NeRSemble dataset. We compare results for models trained: (1) without multi-view diffusion, (2) without using semantic-aware Laplacian term, (3) without differentiable bilinear splatting, (4) without hierarchical UV mapping, (5) without visibility-aware fusion.
  • ...and 4 more figures