Table of Contents
Fetching ...

A Survey on 3D Human Avatar Modeling -- From Reconstruction to Generation

Ruihe Wang, Yukang Cao, Kai Han, Kwan-Yee K. Wong

TL;DR

This survey comprehensively catalogs the landscape of 3D human avatar modeling, spanning NeRF-based reconstruction and animation, 3D Gaussian Splatting, GAN-based generation, and diffusion/LLM-driven approaches. It contrasts implicit (NeRF, PIFu-like) and explicit (mesh/point-based) representations, details how SMPL/Pose priors are incorporated, and highlights how diffusion and CLIP enable text- and image-guided 3D avatar creation and editing. The work also discusses practical trade-offs, including data requirements, training efficiency, topology fidelity, and the role of 3D priors in improving realism and controllability. It concludes with a forward-looking reflection on optimization-based versus feed-forward pipelines and identifies open challenges in pose control, clothing topology, real-time generation, and robust editing. Overall, the survey delineates a rich ecosystem of techniques that push toward flexible, high-fidelity, and controllable 3D human avatars for entertainment, AR/VR, and interactive applications.

Abstract

3D modeling has long been an important area in computer vision and computer graphics. Recently, thanks to the breakthroughs in neural representations and generative models, we witnessed a rapid development of 3D modeling. 3D human modeling, lying at the core of many real-world applications, such as gaming and animation, has attracted significant attention. Over the past few years, a large body of work on creating 3D human avatars has been introduced, forming a new and abundant knowledge base for 3D human modeling. The scale of the literature makes it difficult for individuals to keep track of all the works. This survey aims to provide a comprehensive overview of these emerging techniques for 3D human avatar modeling, from both reconstruction and generation perspectives. Firstly, we review representative methods for 3D human reconstruction, including methods based on pixel-aligned implicit function, neural radiance field, and 3D Gaussian Splatting, etc. We then summarize representative methods for 3D human generation, especially those using large language models like CLIP, diffusion models, and various 3D representations, which demonstrate state-of-the-art performance. Finally, we discuss our reflection on existing methods and open challenges for 3D human avatar modeling, shedding light on future research.

A Survey on 3D Human Avatar Modeling -- From Reconstruction to Generation

TL;DR

This survey comprehensively catalogs the landscape of 3D human avatar modeling, spanning NeRF-based reconstruction and animation, 3D Gaussian Splatting, GAN-based generation, and diffusion/LLM-driven approaches. It contrasts implicit (NeRF, PIFu-like) and explicit (mesh/point-based) representations, details how SMPL/Pose priors are incorporated, and highlights how diffusion and CLIP enable text- and image-guided 3D avatar creation and editing. The work also discusses practical trade-offs, including data requirements, training efficiency, topology fidelity, and the role of 3D priors in improving realism and controllability. It concludes with a forward-looking reflection on optimization-based versus feed-forward pipelines and identifies open challenges in pose control, clothing topology, real-time generation, and robust editing. Overall, the survey delineates a rich ecosystem of techniques that push toward flexible, high-fidelity, and controllable 3D human avatars for entertainment, AR/VR, and interactive applications.

Abstract

3D modeling has long been an important area in computer vision and computer graphics. Recently, thanks to the breakthroughs in neural representations and generative models, we witnessed a rapid development of 3D modeling. 3D human modeling, lying at the core of many real-world applications, such as gaming and animation, has attracted significant attention. Over the past few years, a large body of work on creating 3D human avatars has been introduced, forming a new and abundant knowledge base for 3D human modeling. The scale of the literature makes it difficult for individuals to keep track of all the works. This survey aims to provide a comprehensive overview of these emerging techniques for 3D human avatar modeling, from both reconstruction and generation perspectives. Firstly, we review representative methods for 3D human reconstruction, including methods based on pixel-aligned implicit function, neural radiance field, and 3D Gaussian Splatting, etc. We then summarize representative methods for 3D human generation, especially those using large language models like CLIP, diffusion models, and various 3D representations, which demonstrate state-of-the-art performance. Finally, we discuss our reflection on existing methods and open challenges for 3D human avatar modeling, shedding light on future research.
Paper Structure (33 sections, 50 equations, 15 figures, 1 table)

This paper contains 33 sections, 50 equations, 15 figures, 1 table.

Figures (15)

  • Figure 7: NeRF overview. Figure obtained from mildenhall2021nerf.
  • Figure 8: Input and output of NeRF-based reconstruction methods from static cameras. Figure obtained from weng2022humannerf.
  • Figure 9: Input and output of HOSNeRF. Figure obtained from liu2023hosnerf.
  • Figure 10: Input and output of NeRF-based animation methods. Figure obtained from peng2021animatable.
  • Figure 11: Input and output of PersonNeRF. Figure obtained from weng2023personnerf.
  • ...and 10 more figures