Table of Contents
Fetching ...

Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review

Thang-Anh-Quan Nguyen, Amine Bourki, Mátyás Macudzinski, Anthony Brunel, Mohammed Bennamoun

TL;DR

This survey presents the first comprehensive, taxonomy-driven review of semantically-aware Neural Radiance Fields (SRRFs), synthesizing insights from over 250 papers to show how semantic information enhances 3D reconstruction, segmentation, editing, and language-guided interactions. It details core NeRF fundamentals (radiance fields, volumetric rendering, depth, and positional encoding) and then maps a diverse set of SRRF approaches into six categories: 3D geometry enhancement, segmentation, editable NeRFs, object detection/pose estimation, holistic decomposition, and language-enabled SRRFs. The authors discuss datasets, evaluation metrics, and practical challenges such as generalization, data efficiency, and real-time performance, and they propose directions for future work including cross-dataset generalization, multi-modal integration, and collaborative tooling. By clarifying how semantic cues can be incorporated into neural radiance fields, the paper highlights SRRFs as a promising path toward robust, interactive, and semantically grounded 3D scene understanding with broad impact on AR/VR, robotics, and beyond.

Abstract

This review thoroughly examines the role of semantically-aware Neural Radiance Fields (NeRFs) in visual scene understanding, covering an analysis of over 250 scholarly papers. It explores how NeRFs adeptly infer 3D representations for both stationary and dynamic objects in a scene. This capability is pivotal for generating high-quality new viewpoints, completing missing scene details (inpainting), conducting comprehensive scene segmentation (panoptic segmentation), predicting 3D bounding boxes, editing 3D scenes, and extracting object-centric 3D models. A significant aspect of this study is the application of semantic labels as viewpoint-invariant functions, which effectively map spatial coordinates to a spectrum of semantic labels, thus facilitating the recognition of distinct objects within the scene. Overall, this survey highlights the progression and diverse applications of semantically-aware neural radiance fields in the context of visual scene interpretation.

Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review

TL;DR

This survey presents the first comprehensive, taxonomy-driven review of semantically-aware Neural Radiance Fields (SRRFs), synthesizing insights from over 250 papers to show how semantic information enhances 3D reconstruction, segmentation, editing, and language-guided interactions. It details core NeRF fundamentals (radiance fields, volumetric rendering, depth, and positional encoding) and then maps a diverse set of SRRF approaches into six categories: 3D geometry enhancement, segmentation, editable NeRFs, object detection/pose estimation, holistic decomposition, and language-enabled SRRFs. The authors discuss datasets, evaluation metrics, and practical challenges such as generalization, data efficiency, and real-time performance, and they propose directions for future work including cross-dataset generalization, multi-modal integration, and collaborative tooling. By clarifying how semantic cues can be incorporated into neural radiance fields, the paper highlights SRRFs as a promising path toward robust, interactive, and semantically grounded 3D scene understanding with broad impact on AR/VR, robotics, and beyond.

Abstract

This review thoroughly examines the role of semantically-aware Neural Radiance Fields (NeRFs) in visual scene understanding, covering an analysis of over 250 scholarly papers. It explores how NeRFs adeptly infer 3D representations for both stationary and dynamic objects in a scene. This capability is pivotal for generating high-quality new viewpoints, completing missing scene details (inpainting), conducting comprehensive scene segmentation (panoptic segmentation), predicting 3D bounding boxes, editing 3D scenes, and extracting object-centric 3D models. A significant aspect of this study is the application of semantic labels as viewpoint-invariant functions, which effectively map spatial coordinates to a spectrum of semantic labels, thus facilitating the recognition of distinct objects within the scene. Overall, this survey highlights the progression and diverse applications of semantically-aware neural radiance fields in the context of visual scene interpretation.
Paper Structure (46 sections, 13 equations, 14 figures, 2 tables)

This paper contains 46 sections, 13 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Taxonomy of our study on Semantically-aware Neural Radiance Fields (SRFs).
  • Figure 2: Overview of NeRF mildenhall2020nerf scene representation and differentiable rendering. (a) Images are synthesized by sampling 5D coordinates (location and viewing direction) along camera rays, (b) an MLP produces a color and volume density from those sampled points, and (c) volume rendering allow to reconstruct the final image using those values, all of which is end-to-end differentiable (d).
  • Figure 3: Semantic NeRFs zhi2021place. 3D positions $(x, y, z)$ and viewing directions $(\theta, \phi)$ are fed into the network after positional encoding (PE). Volume densities $\sigma$ and semantic logits $\mathbf{s}$ are functions of $(x, y, z)$ while $\mathbf{c}$ additionally depend on $(\theta, \phi)$.
  • Figure 4: Different approaches for conditional 3D representation, which can be effectively used for 3D-aware object manipulation: (a) conditional surface or volumetric representation kanazawa2018learningbhattad2021view, (b) image-conditional NeRFs yu2021pixelnerfchibane2021stereotrevithick2021grfsharma2022neural that train the feature encoder and NeRF as decoder (c) generative NeRFs schwarz2020grafchan2021pixue2022giraffe that render images from randomly sampled disentangled 3D attributes, and (d) auto-encoding NeRFs jang2021codenerfliu2021editingkim2022ae that extract the disentangled 3D latent codes from input and renders images from these attributes.
  • Figure 5: Chronological overview of the most relevant semantically-aware NeRFs spanning all 6 categories covered by our study: 3D geometry enhancement, segmentation, editable NeRFs, object detection and 6D pose, holistic decomposition, NeRFs and language.
  • ...and 9 more figures