Inverse Graphics GAN: Learning to Generate 3D Shapes from Unstructured 2D Data
Sebastian Lunz, Yingzhen Li, Andrew Fitzgibbon, Nate Kushman
TL;DR
The paper tackles learning 3D shape distributions from unstructured 2D images by introducing IG-GAN, which uses an off-the-shelf non-differentiable renderer coupled with a neural proxy renderer and a discriminator-output matching loss to enable gradient-based training on voxel representations. By jointly training a 3D voxel generator, a neural renderer, and a discriminator, IG-GAN can exploit realistic shading, lighting, and textures from industrial renderers while circumventing non-differentiability. The approach yields superior 2D image realism (FID) across ShapeNet categories and natural images, with ablations showing the importance of DOM and thoughtful pretraining. This method demonstrates a scalable path to high-quality 3D generation from abundant 2D data and points toward incorporating richer rendering effects in the future.
Abstract
Recent work has shown the ability to learn generative models for 3D shapes from only unstructured 2D images. However, training such models requires differentiating through the rasterization step of the rendering process, therefore past work has focused on developing bespoke rendering models which smooth over this non-differentiable process in various ways. Such models are thus unable to take advantage of the photo-realistic, fully featured, industrial renderers built by the gaming and graphics industry. In this paper we introduce the first scalable training technique for 3D generative models from 2D data which utilizes an off-the-shelf non-differentiable renderer. To account for the non-differentiability, we introduce a proxy neural renderer to match the output of the non-differentiable renderer. We further propose discriminator output matching to ensure that the neural renderer learns to smooth over the rasterization appropriately. We evaluate our model on images rendered from our generated 3D shapes, and show that our model can consistently learn to generate better shapes than existing models when trained with exclusively unstructured 2D images.
