DIAL-GS: Dynamic Instance Aware Reconstruction for Label-free Street Scenes with 4D Gaussian Splatting
Chenpeng Su, Wenhua Wu, Chensheng Peng, Tianchen Deng, Zhe Liu, Hesheng Wang
TL;DR
DIAL-GS tackles label-free urban scene reconstruction in dynamic driving environments by coupling a two-stage framework: Stage 1 uses appearance-position inconsistency to identify dynamic instances via per-frame scores $S_{i,t}$ and a cubic threshold $S_i^3>δ$, forming a dynamic set $\mathcal{D}$. Stage 2 employs instance-aware 4D Gaussian Splatting with ID embeddings and dynamic attributes, optimized through a suite of losses including $L_{id}$, $L_{\bar v}$, $L_{\beta}$, $L_{3d}$, and $L_{consist}$, with reciprocal updates between identity and dynamics to improve both segmentation and motion coherence. The method enables per-instance editing by manipulating Gaussians tied to specific IDs, and experiments show improvements over prior self-supervised baselines in image reconstruction and novel-view synthesis, while maintaining efficient rendering. Overall, DIAL-GS provides a scalable, dynamic, and editable 3D representation for urban scenes that supports fine-grained editing without manual annotations, with practical implications for data synthesis and testing in autonomous driving.
Abstract
Urban scene reconstruction is critical for autonomous driving, enabling structured 3D representations for data synthesis and closed-loop testing. Supervised approaches rely on costly human annotations and lack scalability, while current self-supervised methods often confuse static and dynamic elements and fail to distinguish individual dynamic objects, limiting fine-grained editing. We propose DIAL-GS, a novel dynamic instance-aware reconstruction method for label-free street scenes with 4D Gaussian Splatting. We first accurately identify dynamic instances by exploiting appearance-position inconsistency between warped rendering and actual observation. Guided by instance-level dynamic perception, we employ instance-aware 4D Gaussians as the unified volumetric representation, realizing dynamic-adaptive and instance-aware reconstruction. Furthermore, we introduce a reciprocal mechanism through which identity and dynamics reinforce each other, enhancing both integrity and consistency. Experiments on urban driving scenarios show that DIAL-GS surpasses existing self-supervised baselines in reconstruction quality and instance-level editing, offering a concise yet powerful solution for urban scene modeling.
