Revisiting Radar Camera Alignment by Contrastive Learning for 3D Object Detection

Linhua Kong; Dongxia Chang; Lian Liu; Zisen Kong; Pengyuan Li; Yao Zhao

Revisiting Radar Camera Alignment by Contrastive Learning for 3D Object Detection

Linhua Kong, Dongxia Chang, Lian Liu, Zisen Kong, Pengyuan Li, Yao Zhao

TL;DR

This work targets radar-camera fusion for robust 3D object detection in autonomous driving. It introduces RCAlign, a framework built around Dual-Route Alignment (DRA) to enable inter-modal feature interaction and Radar Feature Enhancement (RFE) to densify sparse radar BEV features, guided by contrastive learning and knowledge distillation. The approach achieves state-of-the-art results on the nuScenes benchmark, including substantial gains in NDS and mAP, and shows strong robustness across varying lighting and weather conditions. The combination of inter-modal alignment and radar densification offers practical improvements for reliable multi-sensor perception in real-world driving scenarios.

Abstract

Recently, 3D object detection algorithms based on radar and camera fusion have shown excellent performance, setting the stage for their application in autonomous driving perception tasks. Existing methods have focused on dealing with feature misalignment caused by the domain gap between radar and camera. However, existing methods either neglect inter-modal features interaction during alignment or fail to effectively align features at the same spatial location across modalities. To alleviate the above problems, we propose a new alignment model called Radar Camera Alignment (RCAlign). Specifically, we design a Dual-Route Alignment (DRA) module based on contrastive learning to align and fuse the features between radar and camera. Moreover, considering the sparsity of radar BEV features, a Radar Feature Enhancement (RFE) module is proposed to improve the densification of radar BEV features with the knowledge distillation loss. Experiments show RCAlign achieves a new state-of-the-art on the public nuScenes benchmark in radar camera fusion for 3D Object Detection. Furthermore, the RCAlign achieves a significant performance gain (4.3\% NDS and 8.4\% mAP) in real-time 3D detection compared to the latest state-of-the-art method (RCBEVDet).

Revisiting Radar Camera Alignment by Contrastive Learning for 3D Object Detection

TL;DR

Abstract

Revisiting Radar Camera Alignment by Contrastive Learning for 3D Object Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)