Table of Contents
Fetching ...

DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields

Nicolas Schischka, Hannah Schieber, Mert Asim Karaoglu, Melih Görgülü, Florian Grötzner, Alexander Ladikos, Daniel Roth, Nassir Navab, Benjamin Busam

TL;DR

DynaMoN tackles robust camera localization and dynamic novel-view synthesis in dynamic scenes by integrating semantic segmentation with generic motion masks to isolate static content during pose estimation, while representing motion with a HexPlane-based 4D NeRF. It introduces an iterative training scheme that alternates between refining camera poses and optimizing the NeRF, using static-focused ray sampling to improve both trajectory accuracy and rendering quality. Evaluations on the TUM RGB-D Dynamic and BONN RGB-D Dynamic datasets show state-of-the-art performance in translation accuracy and NVS PSNR/SSIM, with improved robustness under sparse trajectories. The approach offers faster training relative to comparable dynamic-camera methods and advances the practical deployment of 4D scene representations in dynamic environments.

Abstract

The accurate reconstruction of dynamic scenes with neural radiance fields is significantly dependent on the estimation of camera poses. Widely used structure-from-motion pipelines encounter difficulties in accurately tracking the camera trajectory when faced with separate dynamics of the scene content and the camera movement. To address this challenge, we propose Dynamic Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields (DynaMoN). DynaMoN utilizes semantic segmentation and generic motion masks to handle dynamic content for initial camera pose estimation and statics-focused ray sampling for fast and accurate novel-view synthesis. Our novel iterative learning scheme switches between training the NeRF and updating the pose parameters for an improved reconstruction and trajectory estimation quality. The proposed pipeline shows significant acceleration of the training process. We extensively evaluate our approach on two real-world dynamic datasets, the TUM RGB-D dataset and the BONN RGB-D Dynamic dataset. DynaMoN improves over the state-of-the-art both in terms of reconstruction quality and trajectory accuracy. We plan to make our code public to enhance research in this area.

DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields

TL;DR

DynaMoN tackles robust camera localization and dynamic novel-view synthesis in dynamic scenes by integrating semantic segmentation with generic motion masks to isolate static content during pose estimation, while representing motion with a HexPlane-based 4D NeRF. It introduces an iterative training scheme that alternates between refining camera poses and optimizing the NeRF, using static-focused ray sampling to improve both trajectory accuracy and rendering quality. Evaluations on the TUM RGB-D Dynamic and BONN RGB-D Dynamic datasets show state-of-the-art performance in translation accuracy and NVS PSNR/SSIM, with improved robustness under sparse trajectories. The approach offers faster training relative to comparable dynamic-camera methods and advances the practical deployment of 4D scene representations in dynamic environments.

Abstract

The accurate reconstruction of dynamic scenes with neural radiance fields is significantly dependent on the estimation of camera poses. Widely used structure-from-motion pipelines encounter difficulties in accurately tracking the camera trajectory when faced with separate dynamics of the scene content and the camera movement. To address this challenge, we propose Dynamic Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields (DynaMoN). DynaMoN utilizes semantic segmentation and generic motion masks to handle dynamic content for initial camera pose estimation and statics-focused ray sampling for fast and accurate novel-view synthesis. Our novel iterative learning scheme switches between training the NeRF and updating the pose parameters for an improved reconstruction and trajectory estimation quality. The proposed pipeline shows significant acceleration of the training process. We extensively evaluate our approach on two real-world dynamic datasets, the TUM RGB-D dataset and the BONN RGB-D Dynamic dataset. DynaMoN improves over the state-of-the-art both in terms of reconstruction quality and trajectory accuracy. We plan to make our code public to enhance research in this area.
Paper Structure (21 sections, 2 equations, 3 figures, 7 tables)

This paper contains 21 sections, 2 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: NeRF combined with SLAM usually relies on static scenes (left). Approaches such as InstantNGP muller2022instant can be used to mask out dynamic content rosinol2022nerfchung2023orbeez (left) resulting still in reduced quality. Fully dynamic NeRF provides an implicit 4D (3D+time) representation; however, it relies on offline SfM, which can suffer in the presence of considerable motion (center). DynaMoN considers dynamics at all stages using motion segmentation (MS) and semantic segmentation (SS) (right). This enables a more robust camera tracking and NVS with a higher quality (ours, right).
  • Figure 2: DynaMoN retrieves consecutive RGB frames and applies semantic segmentation ($M_{ss} \rightarrow S_{S_i}$) and motion masks ($M_{MS} \rightarrow S_{M_i}$) on these input frames ($I$). Combining these masks (OR), we enable a motion-aware, fast and robust camera pose estimation $G$. Based on $G$, we produce a time-dependent output ($\Theta$) using dynamic NeRF. During NeRF training, we utilize the previously computed masks to refine the camera poses (dashed lines). Dashed lines represent iterative updates on NeRF and pose parameters. We highlight our approach in green.
  • Figure 3: Qualitative Results for NVS on TUM RGB-D (top) and BONN RGB-D Dynamic (bottom). Ground Truth (left), novel views views from HexPlane + COLMAP (2nd column), improved camera poses from our backbone (3rd column), and our full model (right).