Survey on Fundamental Deep Learning 3D Reconstruction Techniques
Yonge Bai, LikHang Wong, TszYin Twan
TL;DR
The survey analyzes three fundamental DL-driven 3D reconstruction paradigms—NeRFs, latent-diffusion-models-based approaches, and 3D Gaussian Splatting—detailing their scene representations, rendering pipelines, and optimization strategies. It highlights efficiency advances such as Instant-NGP's hash encoding, and zero-shot, single-image view synthesis via Zero-1-to-3, while also outlining limitations in data requirements, generalizability, and editing. The work discusses practical tradeoffs between implicit vs explicit representations and underscores future directions in semantic guidance, dynamic scenes, and single-view reconstruction. Together, these insights provide a cohesive roadmap for researchers and practitioners pursuing photo-realistic and efficient 3D reconstruction with DL methods.
Abstract
This survey aims to investigate fundamental deep learning (DL) based 3D reconstruction techniques that produce photo-realistic 3D models and scenes, highlighting Neural Radiance Fields (NeRFs), Latent Diffusion Models (LDM), and 3D Gaussian Splatting. We dissect the underlying algorithms, evaluate their strengths and tradeoffs, and project future research trajectories in this rapidly evolving field. We provide a comprehensive overview of the fundamental in DL-driven 3D scene reconstruction, offering insights into their potential applications and limitations.
