Table of Contents
Fetching ...

Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines

Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, Abhishek Kar

TL;DR

This work tackles the challenge of synthesizing novel views from irregular, sparsely sampled real-world scenes by promoting each input view to a local light field via a multiplane image (MPI) representation and blending neighboring MPIs to render continuous viewpoints. It extends plenoptic sampling theory to derive a prescriptive sampling bound, showing that Nyquist-quality results can be achieved with up to 4000× fewer input views by using D depth planes per MPI, resulting in a 2D reduction factor of D^2 for light fields. The authors train a 3D CNN to predict MPIs from plane-sweep volumes, and render new views by alpha-blending MPIs with occlusion-aware weighting, outperforming state-of-the-art baselines (LFI, ULR, Soft3D, BW Deep) on both synthetic and real datasets. They also demonstrate practical deployment through an AR smartphone app that guides data capture and real-time desktop/mobile viewers, highlighting the approach’s potential for accessible, high-fidelity virtual exploration of real-world scenes.

Abstract

We present a practical and robust deep learning solution for capturing and rendering novel views of complex real world scenes for virtual exploration. Previous approaches either require intractably dense view sampling or provide little to no guidance for how users should sample views of a scene to reliably render high-quality novel views. Instead, we propose an algorithm for view synthesis from an irregular grid of sampled views that first expands each sampled view into a local light field via a multiplane image (MPI) scene representation, then renders novel views by blending adjacent local light fields. We extend traditional plenoptic sampling theory to derive a bound that specifies precisely how densely users should sample views of a given scene when using our algorithm. In practice, we apply this bound to capture and render views of real world scenes that achieve the perceptual quality of Nyquist rate view sampling while using up to 4000x fewer views. We demonstrate our approach's practicality with an augmented reality smartphone app that guides users to capture input images of a scene and viewers that enable realtime virtual exploration on desktop and mobile platforms.

Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines

TL;DR

This work tackles the challenge of synthesizing novel views from irregular, sparsely sampled real-world scenes by promoting each input view to a local light field via a multiplane image (MPI) representation and blending neighboring MPIs to render continuous viewpoints. It extends plenoptic sampling theory to derive a prescriptive sampling bound, showing that Nyquist-quality results can be achieved with up to 4000× fewer input views by using D depth planes per MPI, resulting in a 2D reduction factor of D^2 for light fields. The authors train a 3D CNN to predict MPIs from plane-sweep volumes, and render new views by alpha-blending MPIs with occlusion-aware weighting, outperforming state-of-the-art baselines (LFI, ULR, Soft3D, BW Deep) on both synthetic and real datasets. They also demonstrate practical deployment through an AR smartphone app that guides data capture and real-time desktop/mobile viewers, highlighting the approach’s potential for accessible, high-fidelity virtual exploration of real-world scenes.

Abstract

We present a practical and robust deep learning solution for capturing and rendering novel views of complex real world scenes for virtual exploration. Previous approaches either require intractably dense view sampling or provide little to no guidance for how users should sample views of a scene to reliably render high-quality novel views. Instead, we propose an algorithm for view synthesis from an irregular grid of sampled views that first expands each sampled view into a local light field via a multiplane image (MPI) scene representation, then renders novel views by blending adjacent local light fields. We extend traditional plenoptic sampling theory to derive a bound that specifies precisely how densely users should sample views of a given scene when using our algorithm. In practice, we apply this bound to capture and render views of real world scenes that achieve the perceptual quality of Nyquist rate view sampling while using up to 4000x fewer views. We demonstrate our approach's practicality with an augmented reality smartphone app that guides users to capture input images of a scene and viewers that enable realtime virtual exploration on desktop and mobile platforms.

Paper Structure

This paper contains 40 sections, 13 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Traditional plenoptic sampling without occlusions, as derived in chai00. (a) The Fourier support of a light field without occlusions lies within a double-wedge, shown in blue. Nyquist rate view sampling is set by the double-wedge width, which is determined by the minimum and maximum scene depths $[z_{\min},z_{\max}]$ and the maximum spatial frequency $K_x$. The ideal reconstruction filter is shown in orange. (b) Splitting the light field into $D$ non-overlapping layers with equal disparity width decreases the Nyquist rate by a factor of $D$. (c) Without occlusions, the full light field spectrum is the sum of the spectra from each layer.
  • Figure 2: We extend traditional plenoptic sampling to consider occlusions when reconstructing a continuous light field from MPIs. (a) Considering occlusions expands the Fourier support to a parallelogram (the Fourier support without occlusions is shown in blue and occlusions expand the Fourier support to additionally include the purple region) and doubles the Nyquist view sampling rate. (b) As in the no-occlusions case, separately reconstructing the light field for $D$ layers decreases the Nyquist rate by a factor of $D$. (c) With occlusions, the full light field spectrum cannot be reconstructed by summing the individual layer spectra because the union of their supports is smaller than the support of the full light field spectrum (a). Instead, we compute the full light field by alpha compositing the individual light field layers from back to front in the primal domain.
  • Figure 3: We promote each input view sample to an MPI scene representation zhou18, consisting of $D$ RGB$\alpha$ planes at regularly sampled disparities within the input view's camera frustum. Each MPI can render continuously-valued novel views within a local neighborhood by alpha compositing color along rays into the novel view's camera.
  • Figure 4: We render novel views as a weighted combination of renderings from neighboring MPIs, modulated by the corresponding accumulated alphas.
  • Figure 5: An example illustrating the benefits of using accumulated alpha to blend MPI renderings. We render two MPIs at the same new camera pose. In the top row, we display the RGB outputs $C_{t,i}$ from each MPI as well as the accumulated alphas $\alpha_{t,i}$, normalized so that they sum to one at each pixel. In the bottom row, we see that a simple average of the RGB images $C_{t,i}$ retains the stretching artifacts from both MPI renderings, whereas the alpha weighted blending combines only the non-occluded pixels from each input to produce a clean output $C_t$.
  • ...and 5 more figures