Table of Contents
Fetching ...

Segmentation-Guided Neural Radiance Fields for Novel Street View Synthesis

Yizhou Li, Yusuke Monno, Masatoshi Okutomi, Yuuichi Tanaka, Seiichi Kataoka, Teruaki Kosiba

TL;DR

The paper tackles outdoor street-view novel view synthesis with NeRF, addressing transient objects, sparse textures, and lighting variability. It extends ZipNeRF with segmentation-guided masks from Grounded SAM, separate region modeling for transient, sky, and ground, plus appearance embeddings for lighting consistency, and a sky decay loss and ground-plane regularization. A total loss is defined as $L_{\text{total}} = L_{\text{rgb}} + \lambda_{\text{sky}} L_{\text{sky}} + \lambda_{\text{ground}} L_{\text{ground}}$, guiding training. Experiments on Kobe street data show improved novel-view quality with fewer artifacts compared to ZipNeRF, and diffusion-based post-processing can further enhance distant views.

Abstract

Recent advances in Neural Radiance Fields (NeRF) have shown great potential in 3D reconstruction and novel view synthesis, particularly for indoor and small-scale scenes. However, extending NeRF to large-scale outdoor environments presents challenges such as transient objects, sparse cameras and textures, and varying lighting conditions. In this paper, we propose a segmentation-guided enhancement to NeRF for outdoor street scenes, focusing on complex urban environments. Our approach extends ZipNeRF and utilizes Grounded SAM for segmentation mask generation, enabling effective handling of transient objects, modeling of the sky, and regularization of the ground. We also introduce appearance embeddings to adapt to inconsistent lighting across view sequences. Experimental results demonstrate that our method outperforms the baseline ZipNeRF, improving novel view synthesis quality with fewer artifacts and sharper details.

Segmentation-Guided Neural Radiance Fields for Novel Street View Synthesis

TL;DR

The paper tackles outdoor street-view novel view synthesis with NeRF, addressing transient objects, sparse textures, and lighting variability. It extends ZipNeRF with segmentation-guided masks from Grounded SAM, separate region modeling for transient, sky, and ground, plus appearance embeddings for lighting consistency, and a sky decay loss and ground-plane regularization. A total loss is defined as , guiding training. Experiments on Kobe street data show improved novel-view quality with fewer artifacts compared to ZipNeRF, and diffusion-based post-processing can further enhance distant views.

Abstract

Recent advances in Neural Radiance Fields (NeRF) have shown great potential in 3D reconstruction and novel view synthesis, particularly for indoor and small-scale scenes. However, extending NeRF to large-scale outdoor environments presents challenges such as transient objects, sparse cameras and textures, and varying lighting conditions. In this paper, we propose a segmentation-guided enhancement to NeRF for outdoor street scenes, focusing on complex urban environments. Our approach extends ZipNeRF and utilizes Grounded SAM for segmentation mask generation, enabling effective handling of transient objects, modeling of the sky, and regularization of the ground. We also introduce appearance embeddings to adapt to inconsistent lighting across view sequences. Experimental results demonstrate that our method outperforms the baseline ZipNeRF, improving novel view synthesis quality with fewer artifacts and sharper details.

Paper Structure

This paper contains 15 sections, 14 equations, 6 figures.

Figures (6)

  • Figure 1: The overview of our approaches for different segmentation regions.
  • Figure 2: The overview of our network architecture.
  • Figure 3: (Left) Sample images of our dataset for an intersection. Each image is from each of 12 video clips. (Right) Estimated camera poses by using COLMAP.
  • Figure 4: Comparison of ZipNeRF and the proposed method for novel view synthesis.
  • Figure 5: Comparison of the cases without and with ground plane regularization.
  • ...and 1 more figures