Table of Contents
Fetching ...

Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human

Song Bai, Jie Li

TL;DR

This paper surveys recent progress in AI-generated 3D content, organized around single 3D objects, 3D human models, 3D scenes, and human motion synthesis, with emphasis on 2023 papers. It argues that diffusion-based 2D-to-3D pipelines (via NeRF, 3DGS, and related representations) combined with control mechanisms (ControlNet, DreamBooth, LoRA) and anthropometric priors like SMPL(-X) are driving rapid gains in fidelity and consistency. It highlights high-profile methods (e.g., One-2-3-45++, Direct2.5, RichDreamer, SceneDreamer, Story2Motion) and datasets (ObjaverseXL) that enable high-quality, view-consistent outputs, at times with substantial compute. It also discusses persistent challenges in scene fidelity, background handling, evaluation metrics, and the need for scalable 3D scene datasets, while underscoring broad applicability to gaming, education, advertising, and AR/VR.

Abstract

While AI-generated text and 2D images continue to expand its territory, 3D generation has gradually emerged as a trend that cannot be ignored. Since the year 2023 an abundant amount of research papers has emerged in the domain of 3D generation. This growth encompasses not just the creation of 3D objects, but also the rapid development of 3D character and motion generation. Several key factors contribute to this progress. The enhanced fidelity in stable diffusion, coupled with control methods that ensure multi-view consistency, and realistic human models like SMPL-X, contribute synergistically to the production of 3D models with remarkable consistency and near-realistic appearances. The advancements in neural network-based 3D storing and rendering models, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have accelerated the efficiency and realism of neural rendered models. Furthermore, the multimodality capabilities of large language models have enabled language inputs to transcend into human motion outputs. This paper aims to provide a comprehensive overview and summary of the relevant papers published mostly during the latter half year of 2023. It will begin by discussing the AI generated object models in 3D, followed by the generated 3D human models, and finally, the generated 3D human motions, culminating in a conclusive summary and a vision for the future.

Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human

TL;DR

This paper surveys recent progress in AI-generated 3D content, organized around single 3D objects, 3D human models, 3D scenes, and human motion synthesis, with emphasis on 2023 papers. It argues that diffusion-based 2D-to-3D pipelines (via NeRF, 3DGS, and related representations) combined with control mechanisms (ControlNet, DreamBooth, LoRA) and anthropometric priors like SMPL(-X) are driving rapid gains in fidelity and consistency. It highlights high-profile methods (e.g., One-2-3-45++, Direct2.5, RichDreamer, SceneDreamer, Story2Motion) and datasets (ObjaverseXL) that enable high-quality, view-consistent outputs, at times with substantial compute. It also discusses persistent challenges in scene fidelity, background handling, evaluation metrics, and the need for scalable 3D scene datasets, while underscoring broad applicability to gaming, education, advertising, and AR/VR.

Abstract

While AI-generated text and 2D images continue to expand its territory, 3D generation has gradually emerged as a trend that cannot be ignored. Since the year 2023 an abundant amount of research papers has emerged in the domain of 3D generation. This growth encompasses not just the creation of 3D objects, but also the rapid development of 3D character and motion generation. Several key factors contribute to this progress. The enhanced fidelity in stable diffusion, coupled with control methods that ensure multi-view consistency, and realistic human models like SMPL-X, contribute synergistically to the production of 3D models with remarkable consistency and near-realistic appearances. The advancements in neural network-based 3D storing and rendering models, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have accelerated the efficiency and realism of neural rendered models. Furthermore, the multimodality capabilities of large language models have enabled language inputs to transcend into human motion outputs. This paper aims to provide a comprehensive overview and summary of the relevant papers published mostly during the latter half year of 2023. It will begin by discussing the AI generated object models in 3D, followed by the generated 3D human models, and finally, the generated 3D human motions, culminating in a conclusive summary and a vision for the future.
Paper Structure (10 sections, 14 figures)

This paper contains 10 sections, 14 figures.

Figures (14)

  • Figure 1: Structure of NeRF, picture obtained from mildenhall2020nerf
  • Figure 2: Structure of 3D Gaussian Splatting, picture obtained from kerbl20233d
  • Figure 3: Adaptive Gaussian densification scheme of the 3DGS, picture obtained from kerbl20233d
  • Figure 4: The examples of the 3D generated models with high quality and 4K details from RichDreamer qiu2023richdreamer
  • Figure 5: The structure graph of Direct2.5, especially specifies each step's time requirement, adding up to a total time of 10 seconds. Picture obtained from lu2023direct25
  • ...and 9 more figures