RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes
Thang-Anh-Quan Nguyen, Luis Roldão, Nathan Piasco, Moussab Bennehar, Dzmitry Tsishkou
TL;DR
RoDUS addresses the challenge of disentangling static and dynamic elements in large-scale urban scenes for NeRF-based rendering without extensive motion cues. It introduces a two-pathway NeRF with separate static and dynamic radiance fields, a 4D hash grid, and a semantic radiance field guided by a foreground-only mask to promote accurate decomposition. The training leverages a robust IRLS-based loss, sky and road regularization, and a bootstrapping strategy to stabilize optimization, achieving superior static background reconstruction and dynamic-object segmentation on KITTI-360 and Pandaset. The combination of robust initialization, semantic guidance, and targeted regularization yields improved decomposition quality and multi-view consistency, with implications for autonomous driving and urban scene understanding.
Abstract
The task of separating dynamic objects from static environments using NeRFs has been widely studied in recent years. However, capturing large-scale scenes still poses a challenge due to their complex geometric structures and unconstrained dynamics. Without the help of 3D motion cues, previous methods often require simplified setups with slow camera motion and only a few/single dynamic actors, leading to suboptimal solutions in most urban setups. To overcome such limitations, we present RoDUS, a pipeline for decomposing static and dynamic elements in urban scenes, with thoughtfully separated NeRF models for moving and non-moving components. Our approach utilizes a robust kernel-based initialization coupled with 4D semantic information to selectively guide the learning process. This strategy enables accurate capturing of the dynamics in the scene, resulting in reduced floating artifacts in the reconstructed background, all by using self-supervision. Notably, experimental evaluations on KITTI-360 and Pandaset datasets demonstrate the effectiveness of our method in decomposing challenging urban scenes into precise static and dynamic components.
