Table of Contents
Fetching ...

Geometry-Aware Scene Configurations for Novel View Synthesis

Minkwan Kim, Changwoon Choi, Young Min Kim

TL;DR

The paper tackles novel-view synthesis for large indoor environments with incomplete observations by introducing geometry-aware scene configurations. It builds a geometric scaffold from implicit surface reconstructions, computes coverage weights $w_i$, and optimizes basis centers $\mathbf{p}_j$ to minimize $\mathcal{L}_{\text{cov}}$, enabling adaptive placement of NeRF blocks or NeLF probes. A scene-adaptive geometric regularization pipeline is then applied, using robust depth losses $\mathcal{L}_{\text{robust}}$ and virtual-view depth supervision to stabilize training and improve extrapolation performance; the total objective combines $\mathcal{L}_{\text{rgb}}$, $\mathcal{L}_{\text{robust}}$, and $\mathcal{L}_{\text{reg}}$. Experiments on ScanNet++ and Zip-NeRF demonstrate consistent gains in PSNR/SSIM (and competitive LPIPS) with equal memory budgets, validating the effectiveness of geometry-guided basis distribution and regularization for challenging indoor scenes.

Abstract

We propose scene-adaptive strategies to efficiently allocate representation capacity for generating immersive experiences of indoor environments from incomplete observations. Indoor scenes with multiple rooms often exhibit irregular layouts with varying complexity, containing clutter, occlusion, and flat walls. We maximize the utilization of limited resources with guidance from geometric priors, which are often readily available after pre-processing stages. We record observation statistics on the estimated geometric scaffold and guide the optimal placement of bases, which greatly improves upon the uniform basis arrangements adopted by previous scalable Neural Radiance Field (NeRF) representations. We also suggest scene-adaptive virtual viewpoints to compensate for geometric deficiencies inherent in view configurations in the input trajectory and impose the necessary regularization. We present a comprehensive analysis and discussion regarding rendering quality and memory requirements in several large-scale indoor scenes, demonstrating significant enhancements compared to baselines that employ regular placements.

Geometry-Aware Scene Configurations for Novel View Synthesis

TL;DR

The paper tackles novel-view synthesis for large indoor environments with incomplete observations by introducing geometry-aware scene configurations. It builds a geometric scaffold from implicit surface reconstructions, computes coverage weights , and optimizes basis centers to minimize , enabling adaptive placement of NeRF blocks or NeLF probes. A scene-adaptive geometric regularization pipeline is then applied, using robust depth losses and virtual-view depth supervision to stabilize training and improve extrapolation performance; the total objective combines , , and . Experiments on ScanNet++ and Zip-NeRF demonstrate consistent gains in PSNR/SSIM (and competitive LPIPS) with equal memory budgets, validating the effectiveness of geometry-guided basis distribution and regularization for challenging indoor scenes.

Abstract

We propose scene-adaptive strategies to efficiently allocate representation capacity for generating immersive experiences of indoor environments from incomplete observations. Indoor scenes with multiple rooms often exhibit irregular layouts with varying complexity, containing clutter, occlusion, and flat walls. We maximize the utilization of limited resources with guidance from geometric priors, which are often readily available after pre-processing stages. We record observation statistics on the estimated geometric scaffold and guide the optimal placement of bases, which greatly improves upon the uniform basis arrangements adopted by previous scalable Neural Radiance Field (NeRF) representations. We also suggest scene-adaptive virtual viewpoints to compensate for geometric deficiencies inherent in view configurations in the input trajectory and impose the necessary regularization. We present a comprehensive analysis and discussion regarding rendering quality and memory requirements in several large-scale indoor scenes, demonstrating significant enhancements compared to baselines that employ regular placements.

Paper Structure

This paper contains 46 sections, 11 equations, 22 figures, 9 tables, 3 algorithms.

Figures (22)

  • Figure 1: 2D toy examples of basis placement. We evaluate basis placement using $\mathcal{L}_{\text{cov}}$ (a) with and (b) without the denominator. Initial basis (circle outlines), optimized basis (solid green circles), and observation views (blue frustums) are visualized.
  • Figure 2: Scene-splitting strategies. (a) The original NeRF represents the entire scene with a single block. Scalable scene representations set multiple bases (b) evenly along the camera trajectory or (c) uniformly dividing the scene's spatial extent. (d) We propose an adaptive approach based on scene configurations. Red curves denote camera trajectories and green dots mark the bases' centers.
  • Figure 2: Coverage weights and optimized basis positions. (a) Given training (blue) and test (red) cameras, initial bases (yellow spheres) are obtained using FPS on cameras. (b) Optimized bases are distributed at balanced positions regarding the coverage weight. Initial bases locations are indicated with sphere outlines, while the optimized bases are solid spheres in a bright green color.
  • Figure 3: Method overview. (a) We first extract the geometric scaffold from multi-view image observation. (b) Then we define coverage weights $w_i$ for surface points $\mathbf{x}_i$ on the obtained mesh surface which consider both scene geometry and observation statistics. Starting from bases sampled along the camera trajectory using the FPS algorithm, we optimize their positions by minimizing energy function $\mathcal{L}_{\text{cov}}$. (c) Finally, we optimize radiance fields with RGB supervision, guided by geometric regularization on training and virtual viewpoints.
  • Figure 3: Shifting optimized basis positions. (a) PSNR comparisons with different levels of perturbation noise scales. (b) Enlargements of rendered images for visual comparison of sampled noise levels.
  • ...and 17 more figures