CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving
Rui Song, Chenwei Liang, Yan Xia, Walter Zimmer, Hu Cao, Holger Caesar, Andreas Festag, Alois Knoll
TL;DR
CoDa-4DGS addresses the challenge of rendering highly dynamic driving scenes by introducing context awareness and temporal deformation awareness to 4D Gaussian splatting. It distills semantic features from a 2D foundation model to guide Gaussian embeddings and uses a HexPlane-based deformation cue along with a sinusoidal time encoding, feeding a Deformation Compensation Network that refines Gaussian deformations for accurate 4D reconstruction and novel view synthesis. The method achieves state-of-the-art results on Waymo and KITTI, enables semantic-aware 4D tasks, and offers a plug-and-play capability to enhance existing Gaussian-based dynamic scene models. This approach advances photorealistic closed-loop simulation for autonomous driving with broader applicability to scene editing and semantic reasoning.
Abstract
Dynamic scene rendering opens new avenues in autonomous driving by enabling closed-loop simulations with photorealistic data, which is crucial for validating end-to-end algorithms. However, the complex and highly dynamic nature of traffic environments presents significant challenges in accurately rendering these scenes. In this paper, we introduce a novel 4D Gaussian Splatting (4DGS) approach, which incorporates context and temporal deformation awareness to improve dynamic scene rendering. Specifically, we employ a 2D semantic segmentation foundation model to self-supervise the 4D semantic features of Gaussians, ensuring meaningful contextual embedding. Simultaneously, we track the temporal deformation of each Gaussian across adjacent frames. By aggregating and encoding both semantic and temporal deformation features, each Gaussian is equipped with cues for potential deformation compensation within 3D space, facilitating a more precise representation of dynamic scenes. Experimental results show that our method improves 4DGS's ability to capture fine details in dynamic scene rendering for autonomous driving and outperforms other self-supervised methods in 4D reconstruction and novel view synthesis. Furthermore, CoDa-4DGS deforms semantic features with each Gaussian, enabling broader applications.
