Table of Contents
Fetching ...

Deep Generative Models on 3D Representations: A Survey

Zifan Shi, Sida Peng, Yinghao Xu, Andreas Geiger, Yiyi Liao, Yujun Shen

TL;DR

This survey comprehensively maps 3D generative modeling by organizing work around 3D representations (voxels, point clouds, meshes, neural fields, depth maps) and supervision signals (2D vs 3D). It contrasts major generative-model families (GANs, VAEs, normalizing flows, diffusion models) and reviews how each representation pairs with these models. The paper details learning-from-3D-data approaches, learning-from-2D-data methods, and a spectrum of applications from shape editing to 3D reconstruction and representation learning. It also discusses persistent challenges—universality, controllability, efficiency, and stability—and outlines future directions to accelerate progress in 3D generation and rendering. Overall, the work serves as a foundational reference for researchers seeking to understand and advance 3D generative modeling across representations and supervision regimes.

Abstract

Generative models aim to learn the distribution of observed data by generating new instances. With the advent of neural networks, deep generative models, including variational autoencoders (VAEs), generative adversarial networks (GANs), and diffusion models (DMs), have progressed remarkably in synthesizing 2D images. Recently, researchers started to shift focus from 2D to 3D space, considering that 3D data is more closely aligned with our physical world and holds immense practical potential. However, unlike 2D images, which possess an inherent and efficient representation (\textit{i.e.}, a pixel grid), representing 3D data poses significantly greater challenges. Ideally, a robust 3D representation should be capable of accurately modeling complex shapes and appearances while being highly efficient in handling high-resolution data with high processing speeds and low memory requirements. Regrettably, existing 3D representations, such as point clouds, meshes, and neural fields, often fail to satisfy all of these requirements simultaneously. In this survey, we thoroughly review the ongoing developments of 3D generative models, including methods that employ 2D and 3D supervision. Our analysis centers on generative models, with a particular focus on the representations utilized in this context. We believe our survey will help the community to track the field's evolution and to spark innovative ideas to propel progress towards solving this challenging task.

Deep Generative Models on 3D Representations: A Survey

TL;DR

This survey comprehensively maps 3D generative modeling by organizing work around 3D representations (voxels, point clouds, meshes, neural fields, depth maps) and supervision signals (2D vs 3D). It contrasts major generative-model families (GANs, VAEs, normalizing flows, diffusion models) and reviews how each representation pairs with these models. The paper details learning-from-3D-data approaches, learning-from-2D-data methods, and a spectrum of applications from shape editing to 3D reconstruction and representation learning. It also discusses persistent challenges—universality, controllability, efficiency, and stability—and outlines future directions to accelerate progress in 3D generation and rendering. Overall, the work serves as a foundational reference for researchers seeking to understand and advance 3D generative modeling across representations and supervision regimes.

Abstract

Generative models aim to learn the distribution of observed data by generating new instances. With the advent of neural networks, deep generative models, including variational autoencoders (VAEs), generative adversarial networks (GANs), and diffusion models (DMs), have progressed remarkably in synthesizing 2D images. Recently, researchers started to shift focus from 2D to 3D space, considering that 3D data is more closely aligned with our physical world and holds immense practical potential. However, unlike 2D images, which possess an inherent and efficient representation (\textit{i.e.}, a pixel grid), representing 3D data poses significantly greater challenges. Ideally, a robust 3D representation should be capable of accurately modeling complex shapes and appearances while being highly efficient in handling high-resolution data with high processing speeds and low memory requirements. Regrettably, existing 3D representations, such as point clouds, meshes, and neural fields, often fail to satisfy all of these requirements simultaneously. In this survey, we thoroughly review the ongoing developments of 3D generative models, including methods that employ 2D and 3D supervision. Our analysis centers on generative models, with a particular focus on the representations utilized in this context. We believe our survey will help the community to track the field's evolution and to spark innovative ideas to propel progress towards solving this challenging task.
Paper Structure (22 sections, 10 equations, 6 figures, 3 tables)

This paper contains 22 sections, 10 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: 3D generative model pipeline. To synthesize 3D data from the random noise or conditioning signal, previous methods propose various types of generative models, such as GAN, normalizing flow, VAE, diffusion model, and energy-based model. Popular representations of 3D data include point clouds, voxels, meshes, depth, neural fields, and hybrid representations. The generative models are optimized under either the 2D supervision through differentiable rendering or the 3D supervision.
  • Figure 2: 3D generative model timeline. We show representative methods trained with 3D supervision (top) and 2D supervision (bottom), respectively. Each method is illustrated with its 3D representation and the generative model.
  • Figure 3: Representative 3D generative models. We present some classical pipelines for generating (a) point clouds, (b) voxel grids, (c) neural fields, (d) meshes, and (e) hybrid representations. Some figures are taken from li2021spyang2019pointflowluo2021diffusionwu2016learningben2018multiHanocka2020p2mpeng2021neuralGao2022NeurIPS. We only present some of representative methods. Please refer to Sec. \ref{['sec:shape_generation']} for more variants.
  • Figure 4: The general pipeline of 3D-aware GAN. The 3D-aware GAN framework generates 3D representations including Tri-plane eg3depigraf, Voxel xu2021volumeganschwarz2022voxgraf, and Mesh Gao2022NeurIPS. These representations are then utilized to predict the color and density for volume rendering. The discriminator is omitted since it follows a similar approach as conventional 2D GANs.
  • Figure 5: FID v.s. Resolution of representative 3D synthesis methods trained on FFHQ stylegan. We annotate each method by their FID score and GigaFLOPS.
  • ...and 1 more figures