Recent Advances in 3D Object and Scene Generation: A Survey
Xiang Tang, Ruotong Li, Xiaopeng Fan
TL;DR
The paper surveys static 3D content generation by dissecting 3D representations (explicit, implicit, hybrid) and mapping them to four generative-model families (VAEs, GANs, autoregressive, diffusion). It then analyzes scene generation through three paradigms: layout-guided, 2D-prior lifting, and rule-driven modeling, highlighting advances in diffusion-based geometry and texture synthesis, along with neural rendering strategies. Key contributions include a taxonomy linking representations to generation methods, a synthesis of object- and scene-level approaches, and a discussion of datasets, evaluation metrics, and future directions for scalable, controllable, and physically plausible 3D content. The survey underscores the shift from pure geometry to joint geometry-appearance modeling, the rise of diffusion priors and LLM/VLM-assisted scene understanding, and the need for standardized benchmarks and efficient, scalable pipelines for real-world deployment.
Abstract
In recent years, the demand for 3D content has grown exponentially with the intelligent upgrade of interactive media, extended reality (XR), and Metaverse industries. In order to overcome the limitations of traditional manual modeling approaches, such as labor-intensive workflows and prolonged production cycles, revolutionary advances have been achieved through the convergence of novel 3D representation paradigms and artificial intelligence generative technologies. In this survey, we conduct a systematic review of the cutting-edge achievements in static 3D object and scene generation, as well as establish a comprehensive technical framework through systematic categorization. We start our analysis with mainstream 3D object representations. Subsequently, we delve into the technical pathways of 3D object generation based on four mainstream deep generative models: Variational Autoencoders, Generative Adversarial Networks, Autoregressive Models, and Diffusion Models. Regarding scene generation, we focus on three dominant paradigms: layout-guided generation, lifting based on 2D priors, and rule-driven modeling. Finally, we critically examine persistent challenges in 3D generation and propose potential research directions for future investigation. This survey aims to provide readers with a structured understanding of state-of-the-art 3D generation technologies while inspiring researchers to undertake more exploration in this domain.
