Table of Contents
Fetching ...

Recent Advances in 3D Object and Scene Generation: A Survey

Xiang Tang, Ruotong Li, Xiaopeng Fan

TL;DR

The paper surveys static 3D content generation by dissecting 3D representations (explicit, implicit, hybrid) and mapping them to four generative-model families (VAEs, GANs, autoregressive, diffusion). It then analyzes scene generation through three paradigms: layout-guided, 2D-prior lifting, and rule-driven modeling, highlighting advances in diffusion-based geometry and texture synthesis, along with neural rendering strategies. Key contributions include a taxonomy linking representations to generation methods, a synthesis of object- and scene-level approaches, and a discussion of datasets, evaluation metrics, and future directions for scalable, controllable, and physically plausible 3D content. The survey underscores the shift from pure geometry to joint geometry-appearance modeling, the rise of diffusion priors and LLM/VLM-assisted scene understanding, and the need for standardized benchmarks and efficient, scalable pipelines for real-world deployment.

Abstract

In recent years, the demand for 3D content has grown exponentially with the intelligent upgrade of interactive media, extended reality (XR), and Metaverse industries. In order to overcome the limitations of traditional manual modeling approaches, such as labor-intensive workflows and prolonged production cycles, revolutionary advances have been achieved through the convergence of novel 3D representation paradigms and artificial intelligence generative technologies. In this survey, we conduct a systematic review of the cutting-edge achievements in static 3D object and scene generation, as well as establish a comprehensive technical framework through systematic categorization. We start our analysis with mainstream 3D object representations. Subsequently, we delve into the technical pathways of 3D object generation based on four mainstream deep generative models: Variational Autoencoders, Generative Adversarial Networks, Autoregressive Models, and Diffusion Models. Regarding scene generation, we focus on three dominant paradigms: layout-guided generation, lifting based on 2D priors, and rule-driven modeling. Finally, we critically examine persistent challenges in 3D generation and propose potential research directions for future investigation. This survey aims to provide readers with a structured understanding of state-of-the-art 3D generation technologies while inspiring researchers to undertake more exploration in this domain.

Recent Advances in 3D Object and Scene Generation: A Survey

TL;DR

The paper surveys static 3D content generation by dissecting 3D representations (explicit, implicit, hybrid) and mapping them to four generative-model families (VAEs, GANs, autoregressive, diffusion). It then analyzes scene generation through three paradigms: layout-guided, 2D-prior lifting, and rule-driven modeling, highlighting advances in diffusion-based geometry and texture synthesis, along with neural rendering strategies. Key contributions include a taxonomy linking representations to generation methods, a synthesis of object- and scene-level approaches, and a discussion of datasets, evaluation metrics, and future directions for scalable, controllable, and physically plausible 3D content. The survey underscores the shift from pure geometry to joint geometry-appearance modeling, the rise of diffusion priors and LLM/VLM-assisted scene understanding, and the need for standardized benchmarks and efficient, scalable pipelines for real-world deployment.

Abstract

In recent years, the demand for 3D content has grown exponentially with the intelligent upgrade of interactive media, extended reality (XR), and Metaverse industries. In order to overcome the limitations of traditional manual modeling approaches, such as labor-intensive workflows and prolonged production cycles, revolutionary advances have been achieved through the convergence of novel 3D representation paradigms and artificial intelligence generative technologies. In this survey, we conduct a systematic review of the cutting-edge achievements in static 3D object and scene generation, as well as establish a comprehensive technical framework through systematic categorization. We start our analysis with mainstream 3D object representations. Subsequently, we delve into the technical pathways of 3D object generation based on four mainstream deep generative models: Variational Autoencoders, Generative Adversarial Networks, Autoregressive Models, and Diffusion Models. Regarding scene generation, we focus on three dominant paradigms: layout-guided generation, lifting based on 2D priors, and rule-driven modeling. Finally, we critically examine persistent challenges in 3D generation and propose potential research directions for future investigation. This survey aims to provide readers with a structured understanding of state-of-the-art 3D generation technologies while inspiring researchers to undertake more exploration in this domain.

Paper Structure

This paper contains 33 sections, 8 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Structure of this survey.
  • Figure 2: Qualitative comparison of deep generative models across different dimensions. The more symbols ✔ there are, the better the performance.
  • Figure 3: Overview of 3D object generation methods. The upper panel presents the general workflow and key application categories of data-driven generative models. The lower panel enumerates four mainstream generative model frameworks.
  • Figure 4: Qualitative comparison of generation methods for object appearance and material.
  • Figure 5: 3D scene generation methods. Layout-guided: Holodeck yang2024holodeck, LayoutVLM sun2025layoutvlm; 2D prior-based: SceneDreamer360 li2024scenedreamer360, SAM 3D chen2025sam; Rule-driven: Feng et al. feng2025text, Infinigen raistrick2023infinite. Moreover, WorldGrow li2025worldgrow can generate infinite scenes.
  • ...and 2 more figures