Table of Contents
Fetching ...

BeyondPixels: A Comprehensive Review of the Evolution of Neural Radiance Fields

AKM Shahariar Azad Rabby, Chengcui Zhang

TL;DR

This survey reviews recent advances in NeRF and categorizes them according to their architectural designs, especially in the field of novel view synthesis.

Abstract

Neural rendering combines ideas from classical computer graphics and machine learning to synthesize images from real-world observations. NeRF, short for Neural Radiance Fields, is a recent innovation that uses AI algorithms to create 3D objects from 2D images. By leveraging an interpolation approach, NeRF can produce new 3D reconstructed views of complicated scenes. Rather than directly restoring the whole 3D scene geometry, NeRF generates a volumetric representation called a ``radiance field,'' which is capable of creating color and density for every point within the relevant 3D space. The broad appeal and notoriety of NeRF make it imperative to examine the existing research on the topic comprehensively. While previous surveys on 3D rendering have primarily focused on traditional computer vision-based or deep learning-based approaches, only a handful of them discuss the potential of NeRF. However, such surveys have predominantly focused on NeRF's early contributions and have not explored its full potential. NeRF is a relatively new technique continuously being investigated for its capabilities and limitations. This survey reviews recent advances in NeRF and categorizes them according to their architectural designs, especially in the field of novel view synthesis.

BeyondPixels: A Comprehensive Review of the Evolution of Neural Radiance Fields

TL;DR

This survey reviews recent advances in NeRF and categorizes them according to their architectural designs, especially in the field of novel view synthesis.

Abstract

Neural rendering combines ideas from classical computer graphics and machine learning to synthesize images from real-world observations. NeRF, short for Neural Radiance Fields, is a recent innovation that uses AI algorithms to create 3D objects from 2D images. By leveraging an interpolation approach, NeRF can produce new 3D reconstructed views of complicated scenes. Rather than directly restoring the whole 3D scene geometry, NeRF generates a volumetric representation called a ``radiance field,'' which is capable of creating color and density for every point within the relevant 3D space. The broad appeal and notoriety of NeRF make it imperative to examine the existing research on the topic comprehensively. While previous surveys on 3D rendering have primarily focused on traditional computer vision-based or deep learning-based approaches, only a handful of them discuss the potential of NeRF. However, such surveys have predominantly focused on NeRF's early contributions and have not explored its full potential. NeRF is a relatively new technique continuously being investigated for its capabilities and limitations. This survey reviews recent advances in NeRF and categorizes them according to their architectural designs, especially in the field of novel view synthesis.
Paper Structure (21 sections, 8 equations, 7 figures, 5 tables)

This paper contains 21 sections, 8 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Example of aliasing artifacts
  • Figure 2: An overview of the process of rendering and training NeRF. The first step (a) involves selecting sampling points for each pixel in an image that needs to be synthesized. The next step (b) involves using NeRF's MLP(s) to generate densities and colors at the selected sampling points, followed by volume rendering techniques to composite these values into an image (c). Since this rendering function is differentiable, scene representation is optimized by minimizing the difference between the synthesized and observed ground truth images (d).
  • Figure 3: Comparing NeRF and Mip-NeRF neural rendering methods. (a) NeRF samples point along rays traced from the camera center through each pixel. These points are encoded with a positional encoding to produce features. (b) Mip-NeRF instead reasons about the 3D conical frustum defined by a camera pixel. These frustums are featurized using an integrated positional encoding, which approximates the frustum as a multivariate Gaussian and computes the integral of the positional encodings within the Gaussian. This figure is adapted from Mip-NeRF paper mipNerf.
  • Figure 4: The pipeline consists of five key stages. First, a multi-camera rig is used to capture images of a scene from varied viewpoints. The camera pose is estimated for each image view. These images and poses are then utilized to train a FastNeRF model that represents the scene as an implicit neural representation. Next, the positions and directions are densely sampled from the trained model, and the radiance field outputs are cached in a sparse 3D grid structure. At render time, rays are traced from the camera through pixels, and the radiance field is queried from the cache at intersection points. Finally, optimizations such as parallel GPU execution and accelerated data structures enhance performance.
  • Figure 5: LOLNeRF simultaneously learns a table of latent codes for each image and foreground and background neural radiance fields (NeRFs). The volumetric rendering output is evaluated against each training pixel using a per-ray RGB loss and against an image segmenter using an alpha value loss. Camera alignments are determined by fitting the 2D landmark outputs to class-specific canonical 3D keypoints using a least-squares approach.
  • ...and 2 more figures