Table of Contents
Fetching ...

Sampling for View Synthesis: From Local Light Field Fusion to Neural Radiance Fields and Beyond

Ravi Ramamoorthi

TL;DR

The paper addresses the problem of how to sample views for reliable, high-quality view synthesis. It champions a principled, frequency-domain approach (plenoptic sampling) extended to occluded scenes, deriving explicit bounds on view sampling density and showing that local light field fusion using MPIs can achieve Nyquist-like perceptual quality with up to ~4000× fewer views. It provides concrete prescriptive guidelines, including the relationship between depth layers, disparity, and camera sampling intervals, and demonstrates practical viability with a smartphone app. The discussion contextualizes these results within the rise of neural radiance fields (NeRFs) and related representations, highlighting both the progress toward extremely sparse or single-image view synthesis and the need for formal sampling guarantees in modern methods, as a direction for future work.

Abstract

Capturing and rendering novel views of complex real-world scenes is a long-standing problem in computer graphics and vision, with applications in augmented and virtual reality, immersive experiences and 3D photography. The advent of deep learning has enabled revolutionary advances in this area, classically known as image-based rendering. However, previous approaches require intractably dense view sampling or provide little or no guidance for how users should sample views of a scene to reliably render high-quality novel views. Local light field fusion proposes an algorithm for practical view synthesis from an irregular grid of sampled views that first expands each sampled view into a local light field via a multiplane image scene representation, then renders novel views by blending adjacent local light fields. Crucially, we extend traditional plenoptic sampling theory to derive a bound that specifies precisely how densely users should sample views of a given scene when using our algorithm. We achieve the perceptual quality of Nyquist rate view sampling while using up to 4000x fewer views. Subsequent developments have led to new scene representations for deep learning with view synthesis, notably neural radiance fields, but the problem of sparse view synthesis from a small number of images has only grown in importance. We reprise some of the recent results on sparse and even single image view synthesis, while posing the question of whether prescriptive sampling guidelines are feasible for the new generation of image-based rendering algorithms.

Sampling for View Synthesis: From Local Light Field Fusion to Neural Radiance Fields and Beyond

TL;DR

The paper addresses the problem of how to sample views for reliable, high-quality view synthesis. It champions a principled, frequency-domain approach (plenoptic sampling) extended to occluded scenes, deriving explicit bounds on view sampling density and showing that local light field fusion using MPIs can achieve Nyquist-like perceptual quality with up to ~4000× fewer views. It provides concrete prescriptive guidelines, including the relationship between depth layers, disparity, and camera sampling intervals, and demonstrates practical viability with a smartphone app. The discussion contextualizes these results within the rise of neural radiance fields (NeRFs) and related representations, highlighting both the progress toward extremely sparse or single-image view synthesis and the need for formal sampling guarantees in modern methods, as a direction for future work.

Abstract

Capturing and rendering novel views of complex real-world scenes is a long-standing problem in computer graphics and vision, with applications in augmented and virtual reality, immersive experiences and 3D photography. The advent of deep learning has enabled revolutionary advances in this area, classically known as image-based rendering. However, previous approaches require intractably dense view sampling or provide little or no guidance for how users should sample views of a scene to reliably render high-quality novel views. Local light field fusion proposes an algorithm for practical view synthesis from an irregular grid of sampled views that first expands each sampled view into a local light field via a multiplane image scene representation, then renders novel views by blending adjacent local light fields. Crucially, we extend traditional plenoptic sampling theory to derive a bound that specifies precisely how densely users should sample views of a given scene when using our algorithm. We achieve the perceptual quality of Nyquist rate view sampling while using up to 4000x fewer views. Subsequent developments have led to new scene representations for deep learning with view synthesis, notably neural radiance fields, but the problem of sparse view synthesis from a small number of images has only grown in importance. We reprise some of the recent results on sparse and even single image view synthesis, while posing the question of whether prescriptive sampling guidelines are feasible for the new generation of image-based rendering algorithms.
Paper Structure (4 sections, 7 equations, 5 figures)

This paper contains 4 sections, 7 equations, 5 figures.

Figures (5)

  • Figure 1: Some basic results from the plenoptic sampling paper plenoptic. On the left is the basic form of the double wedge spectrum. In the middle we show packing of replicas with just sparse enough sampling so the central double wedge can be isolated with a parallelogram reconstruction filter. On the right is the geometry-image sampling curve showing how fewer images are needed with more depth layers. Figures taken from Chai et al. Local light field fusion extends these results to account for occlusions, and shows how to apply the method for prescriptive view sampling guidelines with rigorous bounds in the context of modern deep learning multiplane image prediction methods.
  • Figure 2: Local light field fusion extends traditional plenoptic sampling to consider occlusions when reconstructing a continuous light field from MPIs. (a) Considering occlusions expands the Fourier support to a parallelogram (the Fourier support without occlusions is shown in blue and occlusions expand the Fourier support to additionally include the purple region) and doubles the Nyquist view sampling rate. (b) As in the no-occlusions case, separately reconstructing the light field for D layers decreases the Nyquist rate by a factor of D. (c) With occlusions, the full light field spectrum cannot be reconstructed by summing the individual layer spectra because the union of their supports is smaller than the support of the full light field spectrum (a). Instead, we compute the full light field by alpha compositing the individual light field layers from back to front in the primal domain.
  • Figure 3: Left: The basic idea of lifting an input sampled view to a multiplane image with RGB color and opacity. Right: Validation of the method and theory, showing that with a $D$ layer (or planes) MPI, we can reconstruct scenes up to a disparity of $D$ pixels, at least until $D=64$, with the same perceptual quality as light field interpolation with Nyquist rate sampling (black dotted line). Note that sampling is in two dimensions, so we achieve the same results as Nyquist rate view sampling with $64^2=4096\times$ fewer views. For higher numbers of planes, the overlap between adjacent views decreases and errors increase. The colored dots indicate the point on each line where the number of planes equals the maximum scene disparity, while the shaded region indicates 1 standard deviation over all 8 test scenes.
  • Figure 4: Results on two scenes (from Fig. 9 of the original paper llff). These datasets were captured by a standard cellphone. We render a sequence of new views and show both a crop from a single rendered output and an epipolar slice of the sequence. We show 2D projections of the input camera poses (blue dots) and new view path (red line) along the z and y axes of the new view camera in the lower left of each row. Comparison is made to prior methods, showcasing the quality of results from local light field fusion.
  • Figure 5: Sheared and Multiple Axis-Aligned Filtering for sampling and reconstruction in Monte Carlo Rendering and Denoising. On the left, we show the sheared Fourier filter, and the corresponding parallelogram filter in the primal domain (from fast 4D sheared filtering yan2015fast). On the right, we show approximation with multiple axis-aligned filters (from MAAF maaf), and comparison to axis-aligned and sheared filtering.