Geometry-Aware Scene Configurations for Novel View Synthesis
Minkwan Kim, Changwoon Choi, Young Min Kim
TL;DR
The paper tackles novel-view synthesis for large indoor environments with incomplete observations by introducing geometry-aware scene configurations. It builds a geometric scaffold from implicit surface reconstructions, computes coverage weights $w_i$, and optimizes basis centers $\mathbf{p}_j$ to minimize $\mathcal{L}_{\text{cov}}$, enabling adaptive placement of NeRF blocks or NeLF probes. A scene-adaptive geometric regularization pipeline is then applied, using robust depth losses $\mathcal{L}_{\text{robust}}$ and virtual-view depth supervision to stabilize training and improve extrapolation performance; the total objective combines $\mathcal{L}_{\text{rgb}}$, $\mathcal{L}_{\text{robust}}$, and $\mathcal{L}_{\text{reg}}$. Experiments on ScanNet++ and Zip-NeRF demonstrate consistent gains in PSNR/SSIM (and competitive LPIPS) with equal memory budgets, validating the effectiveness of geometry-guided basis distribution and regularization for challenging indoor scenes.
Abstract
We propose scene-adaptive strategies to efficiently allocate representation capacity for generating immersive experiences of indoor environments from incomplete observations. Indoor scenes with multiple rooms often exhibit irregular layouts with varying complexity, containing clutter, occlusion, and flat walls. We maximize the utilization of limited resources with guidance from geometric priors, which are often readily available after pre-processing stages. We record observation statistics on the estimated geometric scaffold and guide the optimal placement of bases, which greatly improves upon the uniform basis arrangements adopted by previous scalable Neural Radiance Field (NeRF) representations. We also suggest scene-adaptive virtual viewpoints to compensate for geometric deficiencies inherent in view configurations in the input trajectory and impose the necessary regularization. We present a comprehensive analysis and discussion regarding rendering quality and memory requirements in several large-scale indoor scenes, demonstrating significant enhancements compared to baselines that employ regular placements.
