Table of Contents
Fetching ...

RADLER: Radar Object Detection Leveraging Semantic 3D City Models and Self-Supervised Radar-Image Learning

Yuan Luo, Rudolf Hoffmann, Yan Xia, Olaf Wysocki, Benedikt Schwab, Thomas H. Kolbe, Daniel Cremers

TL;DR

This work tackles radar-based object detection under noise by marrying contrastive self-supervised learning with semantic priors from CityGML. The RADLER framework learns robust radar representations through a cross-modal pretext task and enhances detection by fusing semantic-depth maps generated from semantic 3D city models, producing more precise confmaps and fewer duplicates. The authors introduce RadarCity, a 54K-radar–image dataset with CityGML priors, and demonstrate consistent improvements in $mAP$ and $mAR$ on RadarCity and CRUW, with notable gains when incorporating semantic-depth information. The study highlights the practical value of semantic-map guidance for radar perception and provides a dataset- and methodologically grounded foundation for future semantic-guided radar detection research.

Abstract

Semantic 3D city models are worldwide easy-accessible, providing accurate, object-oriented, and semantic-rich 3D priors. To date, their potential to mitigate the noise impact on radar object detection remains under-explored. In this paper, we first introduce a unique dataset, RadarCity, comprising 54K synchronized radar-image pairs and semantic 3D city models. Moreover, we propose a novel neural network, RADLER, leveraging the effectiveness of contrastive self-supervised learning (SSL) and semantic 3D city models to enhance radar object detection of pedestrians, cyclists, and cars. Specifically, we first obtain the robust radar features via a SSL network in the radar-image pretext task. We then use a simple yet effective feature fusion strategy to incorporate semantic-depth features from semantic 3D city models. Having prior 3D information as guidance, RADLER obtains more fine-grained details to enhance radar object detection. We extensively evaluate RADLER on the collected RadarCity dataset and demonstrate average improvements of 5.46% in mean avarage precision (mAP) and 3.51% in mean avarage recall (mAR) over previous radar object detection methods. We believe this work will foster further research on semantic-guided and map-supported radar object detection. Our project page is publicly available athttps://gpp-communication.github.io/RADLER .

RADLER: Radar Object Detection Leveraging Semantic 3D City Models and Self-Supervised Radar-Image Learning

TL;DR

This work tackles radar-based object detection under noise by marrying contrastive self-supervised learning with semantic priors from CityGML. The RADLER framework learns robust radar representations through a cross-modal pretext task and enhances detection by fusing semantic-depth maps generated from semantic 3D city models, producing more precise confmaps and fewer duplicates. The authors introduce RadarCity, a 54K-radar–image dataset with CityGML priors, and demonstrate consistent improvements in and on RadarCity and CRUW, with notable gains when incorporating semantic-depth information. The study highlights the practical value of semantic-map guidance for radar perception and provides a dataset- and methodologically grounded foundation for future semantic-guided radar detection research.

Abstract

Semantic 3D city models are worldwide easy-accessible, providing accurate, object-oriented, and semantic-rich 3D priors. To date, their potential to mitigate the noise impact on radar object detection remains under-explored. In this paper, we first introduce a unique dataset, RadarCity, comprising 54K synchronized radar-image pairs and semantic 3D city models. Moreover, we propose a novel neural network, RADLER, leveraging the effectiveness of contrastive self-supervised learning (SSL) and semantic 3D city models to enhance radar object detection of pedestrians, cyclists, and cars. Specifically, we first obtain the robust radar features via a SSL network in the radar-image pretext task. We then use a simple yet effective feature fusion strategy to incorporate semantic-depth features from semantic 3D city models. Having prior 3D information as guidance, RADLER obtains more fine-grained details to enhance radar object detection. We extensively evaluate RADLER on the collected RadarCity dataset and demonstrate average improvements of 5.46% in mean avarage precision (mAP) and 3.51% in mean avarage recall (mAR) over previous radar object detection methods. We believe this work will foster further research on semantic-guided and map-supported radar object detection. Our project page is publicly available athttps://gpp-communication.github.io/RADLER .

Paper Structure

This paper contains 20 sections, 4 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: The workflow of RADLER. Representations of the ra maps are learned against the images in the pretext task and transferred to the radar object detection task, enhanced by prior information from semantic 3D city models.
  • Figure 2: Tracking moving objects' trajectories on ra maps: street-level camera view (left) and the bev with trajectories on the ra map (right).
  • Figure 3: Schematic diagram of RADLER. The pretext task involves learning representations of ra maps by contrasting them against corresponding image features to ensure semantic alignment (green). In the downstream radar object detection task, these learned representations are utilized by a decoder to generate confmaps for object detection (blue). Additionally, the learned radar representations can be fused with features extracted from sdm to further enhance detection performance (light orange).
  • Figure 4: The utilized citygml model representing tum's city center campus in Munich.
  • Figure 5: Demonstration of the sensor platform used and all scenes for data collection in the RadarCity dataset.
  • ...and 9 more figures