Table of Contents
Fetching ...

FisheyeGaussianLift: BEV Feature Lifting for Surround-View Fisheye Camera Perception

Shubham Sonarghare, Prasad Deshpande, Ciaran Hogan, Deepika-Rani Kaliappan-Mahalingam, Ganesh Sistu

TL;DR

This work addresses BEV semantic segmentation from wide-angle fisheye surround-view cameras by introducing a distortion-aware pipeline that lifts image pixels into 3D as Gaussians with learned means and covariances, then fuses them into a BEV via differentiable splatting. The method relies on LUT-based unprojection and a Gaussian depth projection with discretized depth bins, enabling explicit depth uncertainty and continuous BEV construction without rectification. It reports strong drivable and vehicle segmentation performance on a private, high-resolution fisheye dataset (e.g., $87.75\%$ drivable IoU and $57.26\%$ vehicle IoU) and demonstrates robust qualitative behavior across diverse parking and urban environments. The approach offers a modular, efficient solution for distortion-aware BEV perception with potential for extension to additional sensors and camera rigs in low-speed driving and parking contexts.

Abstract

Accurate BEV semantic segmentation from fisheye imagery remains challenging due to extreme non-linear distortion, occlusion, and depth ambiguity inherent to wide-angle projections. We present a distortion-aware BEV segmentation framework that directly processes multi-camera high-resolution fisheye images,utilizing calibrated geometric unprojection and per-pixel depth distribution estimation. Each image pixel is lifted into 3D space via Gaussian parameterization, predicting spatial means and anisotropic covariances to explicitly model geometric uncertainty. The projected 3D Gaussians are fused into a BEV representation via differentiable splatting, producing continuous, uncertainty-aware semantic maps without requiring undistortion or perspective rectification. Extensive experiments demonstrate strong segmentation performance on complex parking and urban driving scenarios, achieving IoU scores of 87.75% for drivable regions and 57.26% for vehicles under severe fisheye distortion and diverse environmental conditions.

FisheyeGaussianLift: BEV Feature Lifting for Surround-View Fisheye Camera Perception

TL;DR

This work addresses BEV semantic segmentation from wide-angle fisheye surround-view cameras by introducing a distortion-aware pipeline that lifts image pixels into 3D as Gaussians with learned means and covariances, then fuses them into a BEV via differentiable splatting. The method relies on LUT-based unprojection and a Gaussian depth projection with discretized depth bins, enabling explicit depth uncertainty and continuous BEV construction without rectification. It reports strong drivable and vehicle segmentation performance on a private, high-resolution fisheye dataset (e.g., drivable IoU and vehicle IoU) and demonstrates robust qualitative behavior across diverse parking and urban environments. The approach offers a modular, efficient solution for distortion-aware BEV perception with potential for extension to additional sensors and camera rigs in low-speed driving and parking contexts.

Abstract

Accurate BEV semantic segmentation from fisheye imagery remains challenging due to extreme non-linear distortion, occlusion, and depth ambiguity inherent to wide-angle projections. We present a distortion-aware BEV segmentation framework that directly processes multi-camera high-resolution fisheye images,utilizing calibrated geometric unprojection and per-pixel depth distribution estimation. Each image pixel is lifted into 3D space via Gaussian parameterization, predicting spatial means and anisotropic covariances to explicitly model geometric uncertainty. The projected 3D Gaussians are fused into a BEV representation via differentiable splatting, producing continuous, uncertainty-aware semantic maps without requiring undistortion or perspective rectification. Extensive experiments demonstrate strong segmentation performance on complex parking and urban driving scenarios, achieving IoU scores of 87.75% for drivable regions and 57.26% for vehicles under severe fisheye distortion and diverse environmental conditions.

Paper Structure

This paper contains 18 sections, 8 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Vehicle with 4 fisheye surround view cameras covering 360° around the vehicle yogamani2019woodscape
  • Figure 2: architecture of FisheyeGaussianLift
  • Figure 3: Qualitative BEV segmentation results on fisheye images. Each visualization shows 4 fisheye camera views (FV, RV, MVL, MVR), predicted BEV segmentation masks for vehicle and drivable space classes, and the intermediate BEV feature activation map.