ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models
Meng-Li Shih, Wei-Chiu Ma, Lorenzo Boyice, Aleksander Holynski, Forrester Cole, Brian L. Curless, Janne Kontkanen
TL;DR
ExtraNeRF addresses the challenge of extrapolating neural radiance fields from a small set of views by integrating a base NeRF with diffusion priors guided by a visibility map. The method trains a BaseNeRF on observed views, then iteratively inpaints unseen regions with a scene-tuned diffusion model and enhances details with a second diffusion model, all while supervising with virtual views and depth information. Per-scene diffusion fine-tuning and a dedicated visibility/depth completion pipeline yield sharp, coherent disoccluded content and achieve state-of-the-art results on LLFF and Tanks & Temples benchmarks with few input views. The approach offers a practical pathway to extend NeRFs beyond observed data, enabling richer, more flexible view exploration in real-world capture scenarios.
Abstract
We propose ExtraNeRF, a novel method for extrapolating the range of views handled by a Neural Radiance Field (NeRF). Our main idea is to leverage NeRFs to model scene-specific, fine-grained details, while capitalizing on diffusion models to extrapolate beyond our observed data. A key ingredient is to track visibility to determine what portions of the scene have not been observed, and focus on reconstructing those regions consistently with diffusion models. Our primary contributions include a visibility-aware diffusion-based inpainting module that is fine-tuned on the input imagery, yielding an initial NeRF with moderate quality (often blurry) inpainted regions, followed by a second diffusion model trained on the input imagery to consistently enhance, notably sharpen, the inpainted imagery from the first pass. We demonstrate high-quality results, extrapolating beyond a small number of (typically six or fewer) input views, effectively outpainting the NeRF as well as inpainting newly disoccluded regions inside the original viewing volume. We compare with related work both quantitatively and qualitatively and show significant gains over prior art.
