GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving
Shuai Liu, Quanmin Liang, Zefeng Li, Boyang Li, Kai Huang
TL;DR
GaussianFusion introduces a 2D Gaussian-based intermediate representation to fuse multi-sensor inputs for end-to-end autonomous driving, balancing spatial grounding, efficiency, and interpretability. It employs a dual-branch fusion architecture for local scene reconstruction and global planning, plus a cascade planning head that iteratively refines trajectories by querying Gaussians. Across NAVSIM and Bench2Drive, GaussianFusion achieves state-of-the-art planning metrics and robust performance, with ablations confirming the contributions of explicit/implicit Gaussian features and cascade planning. The method reduces dense BEV computation while maintaining accuracy, enabling scalable, planning-focused E2E driving. The work demonstrates the practicality and robustness of Gaussian representations for sensor fusion in autonomous driving, and points to future optimization of CUDA components for even faster deployment.
Abstract
Multi-sensor fusion is crucial for improving the performance and robustness of end-to-end autonomous driving systems. Existing methods predominantly adopt either attention-based flatten fusion or bird's eye view fusion through geometric transformations. However, these approaches often suffer from limited interpretability or dense computational overhead. In this paper, we introduce GaussianFusion, a Gaussian-based multi-sensor fusion framework for end-to-end autonomous driving. Our method employs intuitive and compact Gaussian representations as intermediate carriers to aggregate information from diverse sensors. Specifically, we initialize a set of 2D Gaussians uniformly across the driving scene, where each Gaussian is parameterized by physical attributes and equipped with explicit and implicit features. These Gaussians are progressively refined by integrating multi-modal features. The explicit features capture rich semantic and spatial information about the traffic scene, while the implicit features provide complementary cues beneficial for trajectory planning. To fully exploit rich spatial and semantic information in Gaussians, we design a cascade planning head that iteratively refines trajectory predictions through interactions with Gaussians. Extensive experiments on the NAVSIM and Bench2Drive benchmarks demonstrate the effectiveness and robustness of the proposed GaussianFusion framework. The source code will be released at https://github.com/Say2L/GaussianFusion.
