Table of Contents
Fetching ...

DIAL-GS: Dynamic Instance Aware Reconstruction for Label-free Street Scenes with 4D Gaussian Splatting

Chenpeng Su, Wenhua Wu, Chensheng Peng, Tianchen Deng, Zhe Liu, Hesheng Wang

TL;DR

DIAL-GS tackles label-free urban scene reconstruction in dynamic driving environments by coupling a two-stage framework: Stage 1 uses appearance-position inconsistency to identify dynamic instances via per-frame scores $S_{i,t}$ and a cubic threshold $S_i^3>δ$, forming a dynamic set $\mathcal{D}$. Stage 2 employs instance-aware 4D Gaussian Splatting with ID embeddings and dynamic attributes, optimized through a suite of losses including $L_{id}$, $L_{\bar v}$, $L_{\beta}$, $L_{3d}$, and $L_{consist}$, with reciprocal updates between identity and dynamics to improve both segmentation and motion coherence. The method enables per-instance editing by manipulating Gaussians tied to specific IDs, and experiments show improvements over prior self-supervised baselines in image reconstruction and novel-view synthesis, while maintaining efficient rendering. Overall, DIAL-GS provides a scalable, dynamic, and editable 3D representation for urban scenes that supports fine-grained editing without manual annotations, with practical implications for data synthesis and testing in autonomous driving.

Abstract

Urban scene reconstruction is critical for autonomous driving, enabling structured 3D representations for data synthesis and closed-loop testing. Supervised approaches rely on costly human annotations and lack scalability, while current self-supervised methods often confuse static and dynamic elements and fail to distinguish individual dynamic objects, limiting fine-grained editing. We propose DIAL-GS, a novel dynamic instance-aware reconstruction method for label-free street scenes with 4D Gaussian Splatting. We first accurately identify dynamic instances by exploiting appearance-position inconsistency between warped rendering and actual observation. Guided by instance-level dynamic perception, we employ instance-aware 4D Gaussians as the unified volumetric representation, realizing dynamic-adaptive and instance-aware reconstruction. Furthermore, we introduce a reciprocal mechanism through which identity and dynamics reinforce each other, enhancing both integrity and consistency. Experiments on urban driving scenarios show that DIAL-GS surpasses existing self-supervised baselines in reconstruction quality and instance-level editing, offering a concise yet powerful solution for urban scene modeling.

DIAL-GS: Dynamic Instance Aware Reconstruction for Label-free Street Scenes with 4D Gaussian Splatting

TL;DR

DIAL-GS tackles label-free urban scene reconstruction in dynamic driving environments by coupling a two-stage framework: Stage 1 uses appearance-position inconsistency to identify dynamic instances via per-frame scores and a cubic threshold , forming a dynamic set . Stage 2 employs instance-aware 4D Gaussian Splatting with ID embeddings and dynamic attributes, optimized through a suite of losses including , , , , and , with reciprocal updates between identity and dynamics to improve both segmentation and motion coherence. The method enables per-instance editing by manipulating Gaussians tied to specific IDs, and experiments show improvements over prior self-supervised baselines in image reconstruction and novel-view synthesis, while maintaining efficient rendering. Overall, DIAL-GS provides a scalable, dynamic, and editable 3D representation for urban scenes that supports fine-grained editing without manual annotations, with practical implications for data synthesis and testing in autonomous driving.

Abstract

Urban scene reconstruction is critical for autonomous driving, enabling structured 3D representations for data synthesis and closed-loop testing. Supervised approaches rely on costly human annotations and lack scalability, while current self-supervised methods often confuse static and dynamic elements and fail to distinguish individual dynamic objects, limiting fine-grained editing. We propose DIAL-GS, a novel dynamic instance-aware reconstruction method for label-free street scenes with 4D Gaussian Splatting. We first accurately identify dynamic instances by exploiting appearance-position inconsistency between warped rendering and actual observation. Guided by instance-level dynamic perception, we employ instance-aware 4D Gaussians as the unified volumetric representation, realizing dynamic-adaptive and instance-aware reconstruction. Furthermore, we introduce a reciprocal mechanism through which identity and dynamics reinforce each other, enhancing both integrity and consistency. Experiments on urban driving scenarios show that DIAL-GS surpasses existing self-supervised baselines in reconstruction quality and instance-level editing, offering a concise yet powerful solution for urban scene modeling.

Paper Structure

This paper contains 17 sections, 19 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Motivation. DIAL-GS overcomes the limits of supervised and self-supervised methods with label-free, dynamic-adaptive and instance-aware reconstruction.
  • Figure 2: Method overview. (i) Stage 1 conducts instance-level dynamic perception with static GS by exploiting inconsistency between warped renderings and ground-truth frames. Accumulated dynamic scores quantify inconsistency are used to obtain a dynamic ID list, according to which ID labels and dynamic masks are derived. (ii) Stage 2 reconstructs the scene with instance-aware 4DGS as the unified representation. Guided by the ID labels and dynamic masks, it achieves instance awareness and refines dynamic attributes. Then it performs reciprocal training to enhance both instance awareness integrity and dynamics consistency. (iii) With instance awareness, DIAL-GS further enables instance-level editing, a capability not supported by previous self-supervised approaches.
  • Figure 3: Dynamic Mask Comparison. DeSiRe-GS misclassifies static region and incompletely capture dynamic parts. DIAL-GS obtains accurate and sharp dynamic masks instead.
  • Figure 4: Qualitative results. Decomposition with DeSiRe-GS peng2024desiregs4dstreetgaussians suffers from severe misclassification, whereas DIAL-GS achieves accurate decomposition and clear instance separation.
  • Figure 5: Instance Edition. By realizing instance awareness, DIAL-GS supports instance edition within the self-supervised regime.
  • ...and 5 more figures