Table of Contents
Fetching ...

Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes

Takashi Otonari, Satoshi Ikehata, Kiyoharu Aizawa

TL;DR

This research introduces an innovative method, termed here as Entity-NeRF, which combines the strengths of knowledge-based and statistical strategies, leveraging entity-wise statistics, leveraging entity segmentation and stationary entity classification through thing/stuff segmentation.

Abstract

Recent advancements in the study of Neural Radiance Fields (NeRF) for dynamic scenes often involve explicit modeling of scene dynamics. However, this approach faces challenges in modeling scene dynamics in urban environments, where moving objects of various categories and scales are present. In such settings, it becomes crucial to effectively eliminate moving objects to accurately reconstruct static backgrounds. Our research introduces an innovative method, termed here as Entity-NeRF, which combines the strengths of knowledge-based and statistical strategies. This approach utilizes entity-wise statistics, leveraging entity segmentation and stationary entity classification through thing/stuff segmentation. To assess our methodology, we created an urban scene dataset masked with moving objects. Our comprehensive experiments demonstrate that Entity-NeRF notably outperforms existing techniques in removing moving objects and reconstructing static urban backgrounds, both quantitatively and qualitatively.

Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes

TL;DR

This research introduces an innovative method, termed here as Entity-NeRF, which combines the strengths of knowledge-based and statistical strategies, leveraging entity-wise statistics, leveraging entity segmentation and stationary entity classification through thing/stuff segmentation.

Abstract

Recent advancements in the study of Neural Radiance Fields (NeRF) for dynamic scenes often involve explicit modeling of scene dynamics. However, this approach faces challenges in modeling scene dynamics in urban environments, where moving objects of various categories and scales are present. In such settings, it becomes crucial to effectively eliminate moving objects to accurately reconstruct static backgrounds. Our research introduces an innovative method, termed here as Entity-NeRF, which combines the strengths of knowledge-based and statistical strategies. This approach utilizes entity-wise statistics, leveraging entity segmentation and stationary entity classification through thing/stuff segmentation. To assess our methodology, we created an urban scene dataset masked with moving objects. Our comprehensive experiments demonstrate that Entity-NeRF notably outperforms existing techniques in removing moving objects and reconstructing static urban backgrounds, both quantitatively and qualitatively.
Paper Structure (26 sections, 3 equations, 14 figures, 3 tables)

This paper contains 26 sections, 3 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: In urban scenes, statistical approach robustnerf mistakes complex backgrounds for moving objects (top) and fails to remove small moving objects (bottom). On the other hand, Entity-NeRF can reconstruct complex backgrounds and remove small moving objects.
  • Figure 2: Overview of our Entity-NeRF pipeline.$D(\mathbf{r}) = 0$ if Entity-wise Average of Residual Ranks (\ref{['subsec:entity-wise-loss']}) of the entities labeled 'thing' in the stationary entity classification (\ref{['subsec:neural-weight-function']}) is greater than a threshold value $\mathcal{T}$. The 'thing' label for the stationary entity classification is given as $s(\mathbf{r})=0$ and the 'stuff' label as $s(\mathbf{r})=1$.
  • Figure 3: $\bm{D(}\mathbf{r}\bm{)}$ of RobustNeRF robustnerf and our Entity-wise Average of Residual Ranks (EARR) at the end of training. Our EARR can more efficiently incorporate the background into learning.
  • Figure 4: MovieMap Dataset. Only moving objects in the video are masked. Therefore, parked cars and stationary people are not masked.
  • Figure 5: Qualitative comparison with dynamic NeRF methods ($\mathbf{D^2}$NeRF d2nerf and RoDynRF liu2023robust) on MovieMap Dataset.
  • ...and 9 more figures