GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions
Salvatore Esposito, Qingshan Xu, Kacper Kania, Charlie Hewitt, Octave Mariotti, Lohit Petikam, Julien Valentin, Arno Onken, Oisin Mac Aodha
TL;DR
This paper addresses the challenge of generating high-quality 3D geometry from 2D images without extensive multi-view supervision by introducing GeoGen, an SDF-based generative model that replaces volume-density with a learned Signed Distance Function. GeoGen augments a triplane-based GAN architecture (inspired by EG3D) with an SDF network and a depth-consistency constraint, enforcing alignment between rendered depth and the SDF surface via a Laplace-based density transform. The approach yields more accurate surfaces and detailed meshes, validated against baselines on synthetic head datasets and ShapeNet Cars, and supported by quantitative 3D metrics and qualitative inversions. A synthetic 360-degree head dataset is presented to enable robust 3D evaluation, underscoring GeoGen's potential for realistic 3D content creation in animation, gaming, and VR.
Abstract
We introduce a new generative approach for synthesizing 3D geometry and images from single-view collections. Most existing approaches predict volumetric density to render multi-view consistent images. By employing volumetric rendering using neural radiance fields, they inherit a key limitation: the generated geometry is noisy and unconstrained, limiting the quality and utility of the output meshes. To address this issue, we propose GeoGen, a new SDF-based 3D generative model trained in an end-to-end manner. Initially, we reinterpret the volumetric density as a Signed Distance Function (SDF). This allows us to introduce useful priors to generate valid meshes. However, those priors prevent the generative model from learning details, limiting the applicability of the method to real-world scenarios. To alleviate that problem, we make the transformation learnable and constrain the rendered depth map to be consistent with the zero-level set of the SDF. Through the lens of adversarial training, we encourage the network to produce higher fidelity details on the output meshes. For evaluation, we introduce a synthetic dataset of human avatars captured from 360-degree camera angles, to overcome the challenges presented by real-world datasets, which often lack 3D consistency and do not cover all camera angles. Our experiments on multiple datasets show that GeoGen produces visually and quantitatively better geometry than the previous generative models based on neural radiance fields.
