Table of Contents
Fetching ...

Bootstrapping Autonomous Driving Radars with Self-Supervised Learning

Yiduo Hao, Sohrab Madani, Junfeng Guan, Mohammed Alloulah, Saurabh Gupta, Haitham Hassanieh

TL;DR

This paper tackles the data-label bottleneck in radar-based autonomous driving by introducing Radical, a self-supervised learning framework that pretrains radar embeddings using both intra-modal radar-to-radar and cross-modal radar-to-vision contrastive losses on unlabeled radar-vision pairs. A novel Radar MIMO Mask augmentation and a mix of vision-derived and radar-specific augmentations are used to learn robust, radar-specific representations that can be transferred to radar-only downstream tasks. Empirical results on the Radatron dataset show that Radical improves radar-only 2D car detection by 5.8% in mAP over a supervised baseline, with ablations highlighting the importance of both intra- and cross-modal objectives and the proposed augmentations. The work suggests a practical route to leveraging vast unlabeled radar data for robust perception in adverse weather and across evolving radar hardware, reducing annotation costs and enabling lifelong learning in automotive radar systems.

Abstract

The perception of autonomous vehicles using radars has attracted increased research interest due its ability to operate in fog and bad weather. However, training radar models is hindered by the cost and difficulty of annotating large-scale radar data. To overcome this bottleneck, we propose a self-supervised learning framework to leverage the large amount of unlabeled radar data to pre-train radar-only embeddings for self-driving perception tasks. The proposed method combines radar-to-radar and radar-to-vision contrastive losses to learn a general representation from unlabeled radar heatmaps paired with their corresponding camera images. When used for downstream object detection, we demonstrate that the proposed self-supervision framework can improve the accuracy of state-of-the-art supervised baselines by $5.8\%$ in mAP. Code is available at \url{https://github.com/yiduohao/Radical}.

Bootstrapping Autonomous Driving Radars with Self-Supervised Learning

TL;DR

This paper tackles the data-label bottleneck in radar-based autonomous driving by introducing Radical, a self-supervised learning framework that pretrains radar embeddings using both intra-modal radar-to-radar and cross-modal radar-to-vision contrastive losses on unlabeled radar-vision pairs. A novel Radar MIMO Mask augmentation and a mix of vision-derived and radar-specific augmentations are used to learn robust, radar-specific representations that can be transferred to radar-only downstream tasks. Empirical results on the Radatron dataset show that Radical improves radar-only 2D car detection by 5.8% in mAP over a supervised baseline, with ablations highlighting the importance of both intra- and cross-modal objectives and the proposed augmentations. The work suggests a practical route to leveraging vast unlabeled radar data for robust perception in adverse weather and across evolving radar hardware, reducing annotation costs and enabling lifelong learning in automotive radar systems.

Abstract

The perception of autonomous vehicles using radars has attracted increased research interest due its ability to operate in fog and bad weather. However, training radar models is hindered by the cost and difficulty of annotating large-scale radar data. To overcome this bottleneck, we propose a self-supervised learning framework to leverage the large amount of unlabeled radar data to pre-train radar-only embeddings for self-driving perception tasks. The proposed method combines radar-to-radar and radar-to-vision contrastive losses to learn a general representation from unlabeled radar heatmaps paired with their corresponding camera images. When used for downstream object detection, we demonstrate that the proposed self-supervision framework can improve the accuracy of state-of-the-art supervised baselines by in mAP. Code is available at \url{https://github.com/yiduohao/Radical}.
Paper Structure (21 sections, 6 equations, 5 figures, 6 tables)

This paper contains 21 sections, 6 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Millimeter wave radar heatmaps are uninterpretable to humans and are hence difficult to annotate.
  • Figure 2: Overall network of Radical. Knowledge is distilled from a pretrained vision model into a radar model. A mini-batch of $B$ radar-vision pairs flow through network, whose encodings interact locally within the radar branch and globally across the radar and vision branches. That is, Radical is trained using a composite contrastive loss with intra- and cross-modal terms.
  • Figure 3: Radar-specific augmentations. (a) Scene. (b) Original radar heatmap. (c) Zoomed-in region of cars. (d) Random Phase. (e) Antenna Dropout. (f) Rotation (Polar). (g) Center Cropping (Polar).
  • Figure 4: Examples from our test set: (a) Original scene. (b) Radatron (supervised) baseline. (c) Radical. Groundtruth marked in green and predictions in red.
  • Figure 5: Controlled Fog Experiment. (a) Scene. (b) Scene in fog. (c) Prediction overlaid on radar heatmap captured in fog.