Table of Contents
Fetching ...

Leveraging Self-Supervised Instance Contrastive Learning for Radar Object Detection

Colin Decourt, Rufin VanRullen, Didier Salle, Thomas Oberlin

TL;DR

RiCL introduces radar-specific self-supervised pre-training for object detection by leveraging temporal continuities and radar-derived proposals on range-Doppler maps. It extends SoCo and AlignDet to pre-train a full detector (backbone, neck, head) in a box-domain contrastive framework, using a multi-task loss that combines a box-level contrastive objective with a localization regression term. On CARRADA and RADDet, RiCL yields notable improvements in $mAP@0.5$, especially in low-data regimes, reducing annotation requirements. This work advances radar perception toward scalable foundation models with reduced labeling burden.

Abstract

In recent years, driven by the need for safer and more autonomous transport systems, the automotive industry has shifted toward integrating a growing number of Advanced Driver Assistance Systems (ADAS). Among the array of sensors employed for object recognition tasks, radar sensors have emerged as a formidable contender due to their abilities in adverse weather conditions or low-light scenarios and their robustness in maintaining consistent performance across diverse environments. However, the small size of radar datasets and the complexity of the labelling of those data limit the performance of radar object detectors. Driven by the promising results of self-supervised learning in computer vision, this paper presents RiCL, an instance contrastive learning framework to pre-train radar object detectors. We propose to exploit the detection from the radar and the temporal information to pre-train the radar object detection model in a self-supervised way using contrastive learning. We aim to pre-train an object detector's backbone, head and neck to learn with fewer data. Experiments on the CARRADA and the RADDet datasets show the effectiveness of our approach in learning generic representations of objects in range-Doppler maps. Notably, our pre-training strategy allows us to use only 20% of the labelled data to reach a similar mAP@0.5 than a supervised approach using the whole training set.

Leveraging Self-Supervised Instance Contrastive Learning for Radar Object Detection

TL;DR

RiCL introduces radar-specific self-supervised pre-training for object detection by leveraging temporal continuities and radar-derived proposals on range-Doppler maps. It extends SoCo and AlignDet to pre-train a full detector (backbone, neck, head) in a box-domain contrastive framework, using a multi-task loss that combines a box-level contrastive objective with a localization regression term. On CARRADA and RADDet, RiCL yields notable improvements in , especially in low-data regimes, reducing annotation requirements. This work advances radar perception toward scalable foundation models with reduced labeling burden.

Abstract

In recent years, driven by the need for safer and more autonomous transport systems, the automotive industry has shifted toward integrating a growing number of Advanced Driver Assistance Systems (ADAS). Among the array of sensors employed for object recognition tasks, radar sensors have emerged as a formidable contender due to their abilities in adverse weather conditions or low-light scenarios and their robustness in maintaining consistent performance across diverse environments. However, the small size of radar datasets and the complexity of the labelling of those data limit the performance of radar object detectors. Driven by the promising results of self-supervised learning in computer vision, this paper presents RiCL, an instance contrastive learning framework to pre-train radar object detectors. We propose to exploit the detection from the radar and the temporal information to pre-train the radar object detection model in a self-supervised way using contrastive learning. We aim to pre-train an object detector's backbone, head and neck to learn with fewer data. Experiments on the CARRADA and the RADDet datasets show the effectiveness of our approach in learning generic representations of objects in range-Doppler maps. Notably, our pre-training strategy allows us to use only 20% of the labelled data to reach a similar mAP@0.5 than a supervised approach using the whole training set.
Paper Structure (25 sections, 6 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 25 sections, 6 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: FMCW radar overview
  • Figure 2: RiCL framework overview. Contrastive learning is performed at the object level, thus maximising the similarity between similar objects at different distance. The regression loss allows the model to learn to localise objects.
  • Figure 3: Overview of FCOS model fcos using the RECORD backbone record, without the LSTMs. IR stands for Inverted Residual bottleneck block as proposed in mobilenetv2. Size and number of channels are in $C\times H \times W$ format.