Table of Contents
Fetching ...

Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System

Daniel Dworak, Mateusz Komorkiewicz, Paweł Skruch, Jerzy Baranowski

TL;DR

This work addresses robust 3D object detection for autonomous vehicles by fusing camera and radar data. It introduces Cross-Domain Spatial Matching (CDSM), a low-level fusion block that spatially aligns 2D camera feature maps with 3D radar BEV representations and fuses them into a unified 3D scene representation. The method uses an EfficientDet-inspired image network and a voxel-based radar network with a dedicated CDSM alignment module, evaluated on NuScenes showing clear gains over single-sensor baselines and competitive with state-of-the-art fusion methods. The results suggest that cross-domain spatial alignment enables effective integration of camera and radar cues, with potential for improved robustness under sensor partial failure.

Abstract

In this paper, we propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems. Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance. Precisely, we extract 2D features from camera images using a state-of-the-art deep learning architecture and then apply a novel Cross-Domain Spatial Matching (CDSM) transformation method to convert these features into 3D space. We then fuse them with extracted radar data using a complementary fusion strategy to produce a final 3D object representation. To demonstrate the effectiveness of our approach, we evaluate it on the NuScenes dataset. We compare our approach to both single-sensor performance and current state-of-the-art fusion methods. Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.

Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System

TL;DR

This work addresses robust 3D object detection for autonomous vehicles by fusing camera and radar data. It introduces Cross-Domain Spatial Matching (CDSM), a low-level fusion block that spatially aligns 2D camera feature maps with 3D radar BEV representations and fuses them into a unified 3D scene representation. The method uses an EfficientDet-inspired image network and a voxel-based radar network with a dedicated CDSM alignment module, evaluated on NuScenes showing clear gains over single-sensor baselines and competitive with state-of-the-art fusion methods. The results suggest that cross-domain spatial alignment enables effective integration of camera and radar cues, with potential for improved robustness under sensor partial failure.

Abstract

In this paper, we propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems. Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance. Precisely, we extract 2D features from camera images using a state-of-the-art deep learning architecture and then apply a novel Cross-Domain Spatial Matching (CDSM) transformation method to convert these features into 3D space. We then fuse them with extracted radar data using a complementary fusion strategy to produce a final 3D object representation. To demonstrate the effectiveness of our approach, we evaluate it on the NuScenes dataset. We compare our approach to both single-sensor performance and current state-of-the-art fusion methods. Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.
Paper Structure (14 sections, 11 figures, 3 tables)

This paper contains 14 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Example results of CDSM fusion method predictions on NuScenes test data. Predicted objects are marked in blue, both in a camera and an enhanced BEV view. Green cuboids represent matched groundtruth labels. LiDAR pointcloud added for reference in BEV view.
  • Figure 2: Example results of CDSM fusion method predictions on NuScenes test data. Predicted objects are marked in blue, both in a camera and an enhanced BEV view. Green cuboids represent matched groundtruth labels. LiDAR pointcloud added for reference in BEV view.
  • Figure 3: Whole solution pipeline with camera image and pointcloud list inputs, image processing network in blue, pointcloud processing network in yellow, both with optional outputs and CDSM fusion in green with main fusion predictions output.
  • Figure 4: Camera network architecture.
  • Figure 5: Radar network architecture.
  • ...and 6 more figures