Table of Contents
Fetching ...

NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis

Zinuo You, Andreas Geiger, Anpei Chen

TL;DR

NeLF-Pro tackles scalable, high-fidelity novel view synthesis across scenes of varying scale by representing a 3D scene as local light field probes and applying a Vector-Matrix-Matrix (VMM) factorization that shares a core representation while keeping probe-specific bases. The method queries a small set of camera-adjacent probes via soft blending and mipmap-like hierarchical sampling to render density and radiance with occlusion awareness, using a differentiable decoder to map fused features to $\\sigma$ and $\\mathcal{L}(\\mathbf{x},\\mathbf{d})$. It combines volumetric light field rendering with continuous factorization, local coordinate transforms, and permutation-invariant blending to achieve fast optimization and high fidelity across small to large-scale datasets, outperforming many grid-based baselines while maintaining compact models. The approach demonstrates strong performance on mip-NeRF360, Free, KITTI-360, and large-scale Google Earth–level scenes, with notably faster training times for large reconstructions and robust handling of multi-scale geometry.

Abstract

We present NeLF-Pro, a novel representation to model and reconstruct light fields in diverse natural scenes that vary in extent and spatial granularity. In contrast to previous fast reconstruction methods that represent the 3D scene globally, we model the light field of a scene as a set of local light field feature probes, parameterized with position and multi-channel 2D feature maps. Our central idea is to bake the scene's light field into spatially varying learnable representations and to query point features by weighted blending of probes close to the camera - allowing for mipmap representation and rendering. We introduce a novel vector-matrix-matrix (VMM) factorization technique that effectively represents the light field feature probes as products of core factors (i.e., VM) shared among local feature probes, and a basis factor (i.e., M) - efficiently encoding internal relationships and patterns within the scene. Experimentally, we demonstrate that NeLF-Pro significantly boosts the performance of feature grid-based representations, and achieves fast reconstruction with better rendering quality while maintaining compact modeling. Project webpage https://sinoyou.github.io/nelf-pro/.

NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis

TL;DR

NeLF-Pro tackles scalable, high-fidelity novel view synthesis across scenes of varying scale by representing a 3D scene as local light field probes and applying a Vector-Matrix-Matrix (VMM) factorization that shares a core representation while keeping probe-specific bases. The method queries a small set of camera-adjacent probes via soft blending and mipmap-like hierarchical sampling to render density and radiance with occlusion awareness, using a differentiable decoder to map fused features to and . It combines volumetric light field rendering with continuous factorization, local coordinate transforms, and permutation-invariant blending to achieve fast optimization and high fidelity across small to large-scale datasets, outperforming many grid-based baselines while maintaining compact models. The approach demonstrates strong performance on mip-NeRF360, Free, KITTI-360, and large-scale Google Earth–level scenes, with notably faster training times for large reconstructions and robust handling of multi-scale geometry.

Abstract

We present NeLF-Pro, a novel representation to model and reconstruct light fields in diverse natural scenes that vary in extent and spatial granularity. In contrast to previous fast reconstruction methods that represent the 3D scene globally, we model the light field of a scene as a set of local light field feature probes, parameterized with position and multi-channel 2D feature maps. Our central idea is to bake the scene's light field into spatially varying learnable representations and to query point features by weighted blending of probes close to the camera - allowing for mipmap representation and rendering. We introduce a novel vector-matrix-matrix (VMM) factorization technique that effectively represents the light field feature probes as products of core factors (i.e., VM) shared among local feature probes, and a basis factor (i.e., M) - efficiently encoding internal relationships and patterns within the scene. Experimentally, we demonstrate that NeLF-Pro significantly boosts the performance of feature grid-based representations, and achieves fast reconstruction with better rendering quality while maintaining compact modeling. Project webpage https://sinoyou.github.io/nelf-pro/.
Paper Structure (17 sections, 10 equations, 11 figures, 6 tables)

This paper contains 17 sections, 10 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: NeLF-Pro represent a Scene as spatially distributed Light Field Probes for faithful novel view synthesis in diverse and spatially inhomogeneous natural scenes.
  • Figure 2: NeLF-Pro pipeline. Our approach models a scene using light field probes represented by a set of core vectors $\bar{\mathbf{V}}$, matrices $\bar{\mathbf{M}}$ as well as a set of basis matrices $\mathbf{M}$. We project the sample points onto local probes to obtain local spherical coordinates and then query factors from probes near the camera. These factors are aggregated using a self-attention mechanism and combined using Hadamard products. The resulting factor $\mathcal{\tau}_l(\mathbf{x})$ and the viewing direction $\mathbf{d}$ are used to calculate the density $\sigma$ and the light field radiance $\mathcal{L}(\mathbf{x},\mathbf{d})$.
  • Figure 3: Datasets and trajectories used for our evaluation.
  • Figure 4: Qualitative results on mip-NeRF360 Barron2022CVPR and Free dataset Wang2023CVPR. Our approach is able to better reconstruct thin structure and appearance details than the baseline methods (Instant-NGP Mueller2022SIGGRAPH, F$^2$-NeRF Wang2023CVPR).
  • Figure 5: Results on KITTI-360 NVS Task. Our approach provides more sharp renderings than the baselines.
  • ...and 6 more figures