VDNeRF: Vision-only Dynamic Neural Radiance Field for Urban Scenes

Zhengyu Zou; Jingfeng Li; Hao Li; Xiaolei Hou; Jinwen Hu; Jingkun Chen; Lechao Cheng; Dingwen Zhang

VDNeRF: Vision-only Dynamic Neural Radiance Field for Urban Scenes

Zhengyu Zou, Jingfeng Li, Hao Li, Xiaolei Hou, Jinwen Hu, Jingkun Chen, Lechao Cheng, Dingwen Zhang

TL;DR

VDNeRF addresses the challenge of reconstructing dynamic urban scenes without camera poses by jointly learning camera trajectories and a spatiotemporal scene representation through two NeRFs for static and dynamic content. It introduces a flow-informed dynamic NeRF and a shadow-based fusion mechanism, all within a progressive sub-scene training framework that enables self-supervised static-dynamic decomposition. The approach achieves state-of-the-art results on NOTR and Pandaset for both novel view synthesis and pose estimation, demonstrating robust perception in pose-free urban scenarios. This work advances practical vision-based perception for autonomous driving and robotics by removing reliance on external pose data while delivering high-quality dynamic reconstructions.

Abstract

Neural Radiance Fields (NeRFs) implicitly model continuous three-dimensional scenes using a set of images with known camera poses, enabling the rendering of photorealistic novel views. However, existing NeRF-based methods encounter challenges in applications such as autonomous driving and robotic perception, primarily due to the difficulty of capturing accurate camera poses and limitations in handling large-scale dynamic environments. To address these issues, we propose Vision-only Dynamic NeRF (VDNeRF), a method that accurately recovers camera trajectories and learns spatiotemporal representations for dynamic urban scenes without requiring additional camera pose information or expensive sensor data. VDNeRF employs two separate NeRF models to jointly reconstruct the scene. The static NeRF model optimizes camera poses and static background, while the dynamic NeRF model incorporates the 3D scene flow to ensure accurate and consistent reconstruction of dynamic objects. To address the ambiguity between camera motion and independent object motion, we design an effective and powerful training framework to achieve robust camera pose estimation and self-supervised decomposition of static and dynamic elements in a scene. Extensive evaluations on mainstream urban driving datasets demonstrate that VDNeRF surpasses state-of-the-art NeRF-based pose-free methods in both camera pose estimation and dynamic novel view synthesis.

VDNeRF: Vision-only Dynamic Neural Radiance Field for Urban Scenes

TL;DR

Abstract

VDNeRF: Vision-only Dynamic Neural Radiance Field for Urban Scenes

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)