Table of Contents
Fetching ...

GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions

Salvatore Esposito, Qingshan Xu, Kacper Kania, Charlie Hewitt, Octave Mariotti, Lohit Petikam, Julien Valentin, Arno Onken, Oisin Mac Aodha

TL;DR

This paper addresses the challenge of generating high-quality 3D geometry from 2D images without extensive multi-view supervision by introducing GeoGen, an SDF-based generative model that replaces volume-density with a learned Signed Distance Function. GeoGen augments a triplane-based GAN architecture (inspired by EG3D) with an SDF network and a depth-consistency constraint, enforcing alignment between rendered depth and the SDF surface via a Laplace-based density transform. The approach yields more accurate surfaces and detailed meshes, validated against baselines on synthetic head datasets and ShapeNet Cars, and supported by quantitative 3D metrics and qualitative inversions. A synthetic 360-degree head dataset is presented to enable robust 3D evaluation, underscoring GeoGen's potential for realistic 3D content creation in animation, gaming, and VR.

Abstract

We introduce a new generative approach for synthesizing 3D geometry and images from single-view collections. Most existing approaches predict volumetric density to render multi-view consistent images. By employing volumetric rendering using neural radiance fields, they inherit a key limitation: the generated geometry is noisy and unconstrained, limiting the quality and utility of the output meshes. To address this issue, we propose GeoGen, a new SDF-based 3D generative model trained in an end-to-end manner. Initially, we reinterpret the volumetric density as a Signed Distance Function (SDF). This allows us to introduce useful priors to generate valid meshes. However, those priors prevent the generative model from learning details, limiting the applicability of the method to real-world scenarios. To alleviate that problem, we make the transformation learnable and constrain the rendered depth map to be consistent with the zero-level set of the SDF. Through the lens of adversarial training, we encourage the network to produce higher fidelity details on the output meshes. For evaluation, we introduce a synthetic dataset of human avatars captured from 360-degree camera angles, to overcome the challenges presented by real-world datasets, which often lack 3D consistency and do not cover all camera angles. Our experiments on multiple datasets show that GeoGen produces visually and quantitatively better geometry than the previous generative models based on neural radiance fields.

GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions

TL;DR

This paper addresses the challenge of generating high-quality 3D geometry from 2D images without extensive multi-view supervision by introducing GeoGen, an SDF-based generative model that replaces volume-density with a learned Signed Distance Function. GeoGen augments a triplane-based GAN architecture (inspired by EG3D) with an SDF network and a depth-consistency constraint, enforcing alignment between rendered depth and the SDF surface via a Laplace-based density transform. The approach yields more accurate surfaces and detailed meshes, validated against baselines on synthetic head datasets and ShapeNet Cars, and supported by quantitative 3D metrics and qualitative inversions. A synthetic 360-degree head dataset is presented to enable robust 3D evaluation, underscoring GeoGen's potential for realistic 3D content creation in animation, gaming, and VR.

Abstract

We introduce a new generative approach for synthesizing 3D geometry and images from single-view collections. Most existing approaches predict volumetric density to render multi-view consistent images. By employing volumetric rendering using neural radiance fields, they inherit a key limitation: the generated geometry is noisy and unconstrained, limiting the quality and utility of the output meshes. To address this issue, we propose GeoGen, a new SDF-based 3D generative model trained in an end-to-end manner. Initially, we reinterpret the volumetric density as a Signed Distance Function (SDF). This allows us to introduce useful priors to generate valid meshes. However, those priors prevent the generative model from learning details, limiting the applicability of the method to real-world scenarios. To alleviate that problem, we make the transformation learnable and constrain the rendered depth map to be consistent with the zero-level set of the SDF. Through the lens of adversarial training, we encourage the network to produce higher fidelity details on the output meshes. For evaluation, we introduce a synthetic dataset of human avatars captured from 360-degree camera angles, to overcome the challenges presented by real-world datasets, which often lack 3D consistency and do not cover all camera angles. Our experiments on multiple datasets show that GeoGen produces visually and quantitatively better geometry than the previous generative models based on neural radiance fields.
Paper Structure (27 sections, 7 equations, 12 figures, 2 tables)

This paper contains 27 sections, 7 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: GeoGen, our 3D-aware generator, is trained solely from 2D images. Noise sampling is followed by a StyleGan2 generator that produces triplane features similar to EG3D chan2022efficient. However, we enhance them with positional info and an SDF network for refined geometry. GeoGen is end-to-end trained with a GAN objective along with our SDF depth consistency loss.
  • Figure 2: Examples from our synthetic human dataset. We display rendered images on top and pseudo 3D ground-truth below.
  • Figure 3: Sampled images and meshes from EG3D, Style SDF, and our GeoGen approach on FFHQ. GeoGen meshes display smoothness, anatomical accuracy, and detailed facial features. In contrast to EG3D and Style SDF, GeoGen synthesizes finer geometric detail.
  • Figure 4: Sampled images and meshes from EG3D, StyleSDF, and our GeoGen approach trained on our synthetic human head dataset. GeoGen results in fewer overt visual artefacts and more faithfully captures the backs of objects (e.g. see second last column). While the 2D images from the competing methods look plausible, the underlying 3D mesh is not always consistent.
  • Figure 5: Inversion Results for EG3D and GeoGen Models: The figure presents a comparison at 0$^{\circ}$, 90$^{\circ}$, and 270$^{\circ}$ angles to highlight variations in the reconstruction of facial features by the two models.
  • ...and 7 more figures