Table of Contents
Fetching ...

GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving

Shuai Liu, Quanmin Liang, Zefeng Li, Boyang Li, Kai Huang

TL;DR

GaussianFusion introduces a 2D Gaussian-based intermediate representation to fuse multi-sensor inputs for end-to-end autonomous driving, balancing spatial grounding, efficiency, and interpretability. It employs a dual-branch fusion architecture for local scene reconstruction and global planning, plus a cascade planning head that iteratively refines trajectories by querying Gaussians. Across NAVSIM and Bench2Drive, GaussianFusion achieves state-of-the-art planning metrics and robust performance, with ablations confirming the contributions of explicit/implicit Gaussian features and cascade planning. The method reduces dense BEV computation while maintaining accuracy, enabling scalable, planning-focused E2E driving. The work demonstrates the practicality and robustness of Gaussian representations for sensor fusion in autonomous driving, and points to future optimization of CUDA components for even faster deployment.

Abstract

Multi-sensor fusion is crucial for improving the performance and robustness of end-to-end autonomous driving systems. Existing methods predominantly adopt either attention-based flatten fusion or bird's eye view fusion through geometric transformations. However, these approaches often suffer from limited interpretability or dense computational overhead. In this paper, we introduce GaussianFusion, a Gaussian-based multi-sensor fusion framework for end-to-end autonomous driving. Our method employs intuitive and compact Gaussian representations as intermediate carriers to aggregate information from diverse sensors. Specifically, we initialize a set of 2D Gaussians uniformly across the driving scene, where each Gaussian is parameterized by physical attributes and equipped with explicit and implicit features. These Gaussians are progressively refined by integrating multi-modal features. The explicit features capture rich semantic and spatial information about the traffic scene, while the implicit features provide complementary cues beneficial for trajectory planning. To fully exploit rich spatial and semantic information in Gaussians, we design a cascade planning head that iteratively refines trajectory predictions through interactions with Gaussians. Extensive experiments on the NAVSIM and Bench2Drive benchmarks demonstrate the effectiveness and robustness of the proposed GaussianFusion framework. The source code will be released at https://github.com/Say2L/GaussianFusion.

GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving

TL;DR

GaussianFusion introduces a 2D Gaussian-based intermediate representation to fuse multi-sensor inputs for end-to-end autonomous driving, balancing spatial grounding, efficiency, and interpretability. It employs a dual-branch fusion architecture for local scene reconstruction and global planning, plus a cascade planning head that iteratively refines trajectories by querying Gaussians. Across NAVSIM and Bench2Drive, GaussianFusion achieves state-of-the-art planning metrics and robust performance, with ablations confirming the contributions of explicit/implicit Gaussian features and cascade planning. The method reduces dense BEV computation while maintaining accuracy, enabling scalable, planning-focused E2E driving. The work demonstrates the practicality and robustness of Gaussian representations for sensor fusion in autonomous driving, and points to future optimization of CUDA components for even faster deployment.

Abstract

Multi-sensor fusion is crucial for improving the performance and robustness of end-to-end autonomous driving systems. Existing methods predominantly adopt either attention-based flatten fusion or bird's eye view fusion through geometric transformations. However, these approaches often suffer from limited interpretability or dense computational overhead. In this paper, we introduce GaussianFusion, a Gaussian-based multi-sensor fusion framework for end-to-end autonomous driving. Our method employs intuitive and compact Gaussian representations as intermediate carriers to aggregate information from diverse sensors. Specifically, we initialize a set of 2D Gaussians uniformly across the driving scene, where each Gaussian is parameterized by physical attributes and equipped with explicit and implicit features. These Gaussians are progressively refined by integrating multi-modal features. The explicit features capture rich semantic and spatial information about the traffic scene, while the implicit features provide complementary cues beneficial for trajectory planning. To fully exploit rich spatial and semantic information in Gaussians, we design a cascade planning head that iteratively refines trajectory predictions through interactions with Gaussians. Extensive experiments on the NAVSIM and Bench2Drive benchmarks demonstrate the effectiveness and robustness of the proposed GaussianFusion framework. The source code will be released at https://github.com/Say2L/GaussianFusion.

Paper Structure

This paper contains 17 sections, 10 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Pipelines of different multi-sensor fusion methods.
  • Figure 2: The overall framework of our GaussianFusion. Given raw multi-sensor data as input, GaussianFusion first extracts image and point features using a backbone network. It then initializes a set of Gaussians, which are iteratively refined through Gaussian encoder blocks. Finally, the refined Gaussians are used to construct the semantic map and to iteratively adjust the anchor trajectories.
  • Figure 3: Comparison of fusion methods.
  • Figure 4: Visualization of Gaussians during the refinement process. Gaussians with different semantics are shown in different colors.
  • Figure 5: Visualization of predicted and ground-truth trajectories, shown in red and green, respectively.
  • ...and 5 more figures