Table of Contents
Fetching ...

CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving

Rui Song, Chenwei Liang, Yan Xia, Walter Zimmer, Hu Cao, Holger Caesar, Andreas Festag, Alois Knoll

TL;DR

CoDa-4DGS addresses the challenge of rendering highly dynamic driving scenes by introducing context awareness and temporal deformation awareness to 4D Gaussian splatting. It distills semantic features from a 2D foundation model to guide Gaussian embeddings and uses a HexPlane-based deformation cue along with a sinusoidal time encoding, feeding a Deformation Compensation Network that refines Gaussian deformations for accurate 4D reconstruction and novel view synthesis. The method achieves state-of-the-art results on Waymo and KITTI, enables semantic-aware 4D tasks, and offers a plug-and-play capability to enhance existing Gaussian-based dynamic scene models. This approach advances photorealistic closed-loop simulation for autonomous driving with broader applicability to scene editing and semantic reasoning.

Abstract

Dynamic scene rendering opens new avenues in autonomous driving by enabling closed-loop simulations with photorealistic data, which is crucial for validating end-to-end algorithms. However, the complex and highly dynamic nature of traffic environments presents significant challenges in accurately rendering these scenes. In this paper, we introduce a novel 4D Gaussian Splatting (4DGS) approach, which incorporates context and temporal deformation awareness to improve dynamic scene rendering. Specifically, we employ a 2D semantic segmentation foundation model to self-supervise the 4D semantic features of Gaussians, ensuring meaningful contextual embedding. Simultaneously, we track the temporal deformation of each Gaussian across adjacent frames. By aggregating and encoding both semantic and temporal deformation features, each Gaussian is equipped with cues for potential deformation compensation within 3D space, facilitating a more precise representation of dynamic scenes. Experimental results show that our method improves 4DGS's ability to capture fine details in dynamic scene rendering for autonomous driving and outperforms other self-supervised methods in 4D reconstruction and novel view synthesis. Furthermore, CoDa-4DGS deforms semantic features with each Gaussian, enabling broader applications.

CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving

TL;DR

CoDa-4DGS addresses the challenge of rendering highly dynamic driving scenes by introducing context awareness and temporal deformation awareness to 4D Gaussian splatting. It distills semantic features from a 2D foundation model to guide Gaussian embeddings and uses a HexPlane-based deformation cue along with a sinusoidal time encoding, feeding a Deformation Compensation Network that refines Gaussian deformations for accurate 4D reconstruction and novel view synthesis. The method achieves state-of-the-art results on Waymo and KITTI, enables semantic-aware 4D tasks, and offers a plug-and-play capability to enhance existing Gaussian-based dynamic scene models. This approach advances photorealistic closed-loop simulation for autonomous driving with broader applicability to scene editing and semantic reasoning.

Abstract

Dynamic scene rendering opens new avenues in autonomous driving by enabling closed-loop simulations with photorealistic data, which is crucial for validating end-to-end algorithms. However, the complex and highly dynamic nature of traffic environments presents significant challenges in accurately rendering these scenes. In this paper, we introduce a novel 4D Gaussian Splatting (4DGS) approach, which incorporates context and temporal deformation awareness to improve dynamic scene rendering. Specifically, we employ a 2D semantic segmentation foundation model to self-supervise the 4D semantic features of Gaussians, ensuring meaningful contextual embedding. Simultaneously, we track the temporal deformation of each Gaussian across adjacent frames. By aggregating and encoding both semantic and temporal deformation features, each Gaussian is equipped with cues for potential deformation compensation within 3D space, facilitating a more precise representation of dynamic scenes. Experimental results show that our method improves 4DGS's ability to capture fine details in dynamic scene rendering for autonomous driving and outperforms other self-supervised methods in 4D reconstruction and novel view synthesis. Furthermore, CoDa-4DGS deforms semantic features with each Gaussian, enabling broader applications.

Paper Structure

This paper contains 24 sections, 12 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: Our CoDa-4DGS incorporates both context awareness and deformation awareness to effectively compensate for deformable Gaussians in 4D. This results in more accurate dynamic scene rendering and enables a range of downstream applications, such as scene segmentation, instance segmentation, 4D reconstruction, novel view synthesis, and scene synthesis. Note that we use Principle Component Analysis (PCA) to visualize the splatted context awareness for each camera next to their RGB rendered results. More application demonstrations are available in the supplementary material.
  • Figure 2: System overview of CoDa-4DGS. Vanilla 4DGS wu20244d encodes and decodes temporal deformation using HexPlane cao2023hexplane encoding. Building on this, our CoDa-4DGS embeds temporal information and aggregates it with context and temporal deformation awareness through a Deformation Compensation Network (DCN). This network encodes the deformation adjustments needed to compensate for the original temporal deformation, ultimately producing an enhanced set of 4D Gaussians.
  • Figure 3: Visual comparison of rendering results from CoDa-4DGS and other approaches on Waymo Open dataset.
  • Figure 4: Visual comparison of rendering results from CoDa-4DGS and other approaches on KITTI dataset.
  • Figure 5: Visual comparison of novel view synthesis between CoDa-4DGS and StreetGaussian yan2024street. Due to its independence from bounding box labels for dynamic objects, CoDa-4DGS can surpass StreetGaussian in rendering dynamic details outside these bounding boxes (moving light reflections on wet roads).
  • ...and 10 more figures