Table of Contents
Fetching ...

Inverse Graphics GAN: Learning to Generate 3D Shapes from Unstructured 2D Data

Sebastian Lunz, Yingzhen Li, Andrew Fitzgibbon, Nate Kushman

TL;DR

The paper tackles learning 3D shape distributions from unstructured 2D images by introducing IG-GAN, which uses an off-the-shelf non-differentiable renderer coupled with a neural proxy renderer and a discriminator-output matching loss to enable gradient-based training on voxel representations. By jointly training a 3D voxel generator, a neural renderer, and a discriminator, IG-GAN can exploit realistic shading, lighting, and textures from industrial renderers while circumventing non-differentiability. The approach yields superior 2D image realism (FID) across ShapeNet categories and natural images, with ablations showing the importance of DOM and thoughtful pretraining. This method demonstrates a scalable path to high-quality 3D generation from abundant 2D data and points toward incorporating richer rendering effects in the future.

Abstract

Recent work has shown the ability to learn generative models for 3D shapes from only unstructured 2D images. However, training such models requires differentiating through the rasterization step of the rendering process, therefore past work has focused on developing bespoke rendering models which smooth over this non-differentiable process in various ways. Such models are thus unable to take advantage of the photo-realistic, fully featured, industrial renderers built by the gaming and graphics industry. In this paper we introduce the first scalable training technique for 3D generative models from 2D data which utilizes an off-the-shelf non-differentiable renderer. To account for the non-differentiability, we introduce a proxy neural renderer to match the output of the non-differentiable renderer. We further propose discriminator output matching to ensure that the neural renderer learns to smooth over the rasterization appropriately. We evaluate our model on images rendered from our generated 3D shapes, and show that our model can consistently learn to generate better shapes than existing models when trained with exclusively unstructured 2D images.

Inverse Graphics GAN: Learning to Generate 3D Shapes from Unstructured 2D Data

TL;DR

The paper tackles learning 3D shape distributions from unstructured 2D images by introducing IG-GAN, which uses an off-the-shelf non-differentiable renderer coupled with a neural proxy renderer and a discriminator-output matching loss to enable gradient-based training on voxel representations. By jointly training a 3D voxel generator, a neural renderer, and a discriminator, IG-GAN can exploit realistic shading, lighting, and textures from industrial renderers while circumventing non-differentiability. The approach yields superior 2D image realism (FID) across ShapeNet categories and natural images, with ablations showing the importance of DOM and thoughtful pretraining. This method demonstrates a scalable path to high-quality 3D generation from abundant 2D data and points toward incorporating richer rendering effects in the future.

Abstract

Recent work has shown the ability to learn generative models for 3D shapes from only unstructured 2D images. However, training such models requires differentiating through the rasterization step of the rendering process, therefore past work has focused on developing bespoke rendering models which smooth over this non-differentiable process in various ways. Such models are thus unable to take advantage of the photo-realistic, fully featured, industrial renderers built by the gaming and graphics industry. In this paper we introduce the first scalable training technique for 3D generative models from 2D data which utilizes an off-the-shelf non-differentiable renderer. To account for the non-differentiability, we introduce a proxy neural renderer to match the output of the non-differentiable renderer. We further propose discriminator output matching to ensure that the neural renderer learns to smooth over the rasterization appropriately. We evaluate our model on images rendered from our generated 3D shapes, and show that our model can consistently learn to generate better shapes than existing models when trained with exclusively unstructured 2D images.

Paper Structure

This paper contains 18 sections, 9 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: 3D shapes generated by training IG-GAN on unstructured 2D images rendered from three ShapeNet classes.
  • Figure 2: The architecture and training setup for IG-GAN.
  • Figure 2: Ablation results without discriminator output matching (DOM) when training on chairs/couches "one per model" datasets. We either fix the pre-trained neural renderer ("Fixed"), or continuing to train it during GAN training ("Retrained"). The generator samples fed to the discriminator are rendered using either OpenGL or the neural renderer. For reference, our model is equivalent to the Retrained OpenGL setup with the addition of the DOM loss and achieves FID scores 20.7/35.8.
  • Figure 3: Normal Maps of objects generated by IG-GAN on the 'Unlimited' datasets. The left panel shows a single sample rendered in different view points, and the right panel shows multiple samples rendered from a canonical viewpoint.
  • Figure 3: Comparisons of neural renderer pre-trainings on different 3D shapes. FIDs are reported for the 'One per model' chairs.
  • ...and 4 more figures