DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection
Felix Fent, Andras Palffy, Holger Caesar
TL;DR
This work tackles robust, cost-effective 3D object detection for autonomous driving by fusing camera data with raw 4D radar cube data. It introduces the Dual Perspective Fusion Transformer (DPFT), which projects radar cubes into two complementary views (range-azimuth and azimuth-elevation) and fuses them with image features through deformable attention, without enforcing a single BEV feature space. DPFT demonstrates state-of-the-art performance on the K-Radar dataset, notably under severe weather, while achieving fast inference (~$87$ ms) and graceful degradation under sensor failure. The approach broadens multimodal fusion by leveraging high-dimensional radar data and dual-perspective querying, offering a robust and scalable solution for real-world autonomous driving perception.
Abstract
The perception of autonomous vehicles has to be efficient, robust, and cost-effective. However, cameras are not robust against severe weather conditions, lidar sensors are expensive, and the performance of radar-based perception is still inferior to the others. Camera-radar fusion methods have been proposed to address this issue, but these are constrained by the typical sparsity of radar point clouds and often designed for radars without elevation information. We propose a novel camera-radar fusion approach called Dual Perspective Fusion Transformer (DPFT), designed to overcome these limitations. Our method leverages lower-level radar data (the radar cube) instead of the processed point clouds to preserve as much information as possible and employs projections in both the camera and ground planes to effectively use radars with elevation information and simplify the fusion with camera data. As a result, DPFT has demonstrated state-of-the-art performance on the K-Radar dataset while showing remarkable robustness against adverse weather conditions and maintaining a low inference time. The code is made available as open-source software under https://github.com/TUMFTM/DPFT.
