GaussianCaR: Gaussian Splatting for Efficient Camera-Radar Fusion
Santiago Montiel-Marín, Miguel Antunes-García, Fabio Sánchez-García, Angel Llamazares, Holger Caesar, Luis M. Bergasa
TL;DR
GaussianCaR addresses robust BEV perception by fusing camera and radar data through Gaussian Splatting, reframing fusion as modality→Gaussians→BEV. It introduces two modality-specific encoders (Pixels-to-Gaussians and Points-to-Gaussians) that lift features into a unified Gaussian space, followed by a four-stage multi-scale fusion and a DPT-based BEV decoder. The approach achieves state-of-the-art or competitive results on nuScenes BEV segmentation (e.g., IoU values for vehicles and map elements) while enabling fast inference, significantly outperforming some camera-only baselines and matching or surpassing rival fusion methods with roughly 3.2× faster runtimes. These results demonstrate the practicality of Gaussian-based latent fusion for scalable, real-time autonomous perception in diverse weather and traffic conditions.
Abstract
Robust and accurate perception of dynamic objects and map elements is crucial for autonomous vehicles performing safe navigation in complex traffic scenarios. While vision-only methods have become the de facto standard due to their technical advances, they can benefit from effective and cost-efficient fusion with radar measurements. In this work, we advance fusion methods by repurposing Gaussian Splatting as an efficient universal view transformer that bridges the view disparity gap, mapping both image pixels and radar points into a common Bird's-Eye View (BEV) representation. Our main contribution is GaussianCaR, an end-to-end network for BEV segmentation that, unlike prior BEV fusion methods, leverages Gaussian Splatting to map raw sensor information into latent features for efficient camera-radar fusion. Our architecture combines multi-scale fusion with a transformer decoder to efficiently extract BEV features. Experimental results demonstrate that our approach achieves performance on par with, or even surpassing, the state of the art on BEV segmentation tasks (57.3%, 82.9%, and 50.1% IoU for vehicles, roads, and lane dividers) on the nuScenes dataset, while maintaining a 3.2x faster inference runtime. Code and project page are available online.
