Table of Contents
Fetching ...

LEROjD: Lidar Extended Radar-Only Object Detection

Patrick Palmer, Martin Krüger, Stefan Schütte, Richard Altendorfer, Ganesh Adam, Torsten Bertram

TL;DR

This work tackles the challenge of radar-only 3D object detection for automated driving by leveraging lidar data during training. It introduces two training-time transfer strategies from lidar to radar: multi-stage lidar thin-out training (MSTM) and cross-modal knowledge distillation (KD), demonstrated across multiple detectors while keeping architectures unchanged. Experiments on the VoD dataset show that MSTM, especially with voxel-based thinning, can yield up to about 3.5 percentage-point gains in radar data mAP, and KD, particularly when the teacher is trained on mixed radar+lidar data and uses feature-KD, provides additional improvements. The findings indicate that lidar information in training can substantially close the gap to lidar-enabled detection and generalize across different 3D detectors, with code made available for replication.

Abstract

Accurate 3D object detection is vital for automated driving. While lidar sensors are well suited for this task, they are expensive and have limitations in adverse weather conditions. 3+1D imaging radar sensors offer a cost-effective, robust alternative but face challenges due to their low resolution and high measurement noise. Existing 3+1D imaging radar datasets include radar and lidar data, enabling cross-modal model improvements. Although lidar should not be used during inference, it can aid the training of radar-only object detectors. We explore two strategies to transfer knowledge from the lidar to the radar domain and radar-only object detectors: 1. multi-stage training with sequential lidar point cloud thin-out, and 2. cross-modal knowledge distillation. In the multi-stage process, three thin-out methods are examined. Our results show significant performance gains of up to 4.2 percentage points in mean Average Precision with multi-stage training and up to 3.9 percentage points with knowledge distillation by initializing the student with the teacher's weights. The main benefit of these approaches is their applicability to other 3D object detection networks without altering their architecture, as we show by analyzing it on two different object detectors. Our code is available at https://github.com/rst-tu-dortmund/lerojd

LEROjD: Lidar Extended Radar-Only Object Detection

TL;DR

This work tackles the challenge of radar-only 3D object detection for automated driving by leveraging lidar data during training. It introduces two training-time transfer strategies from lidar to radar: multi-stage lidar thin-out training (MSTM) and cross-modal knowledge distillation (KD), demonstrated across multiple detectors while keeping architectures unchanged. Experiments on the VoD dataset show that MSTM, especially with voxel-based thinning, can yield up to about 3.5 percentage-point gains in radar data mAP, and KD, particularly when the teacher is trained on mixed radar+lidar data and uses feature-KD, provides additional improvements. The findings indicate that lidar information in training can substantially close the gap to lidar-enabled detection and generalize across different 3D detectors, with code made available for replication.

Abstract

Accurate 3D object detection is vital for automated driving. While lidar sensors are well suited for this task, they are expensive and have limitations in adverse weather conditions. 3+1D imaging radar sensors offer a cost-effective, robust alternative but face challenges due to their low resolution and high measurement noise. Existing 3+1D imaging radar datasets include radar and lidar data, enabling cross-modal model improvements. Although lidar should not be used during inference, it can aid the training of radar-only object detectors. We explore two strategies to transfer knowledge from the lidar to the radar domain and radar-only object detectors: 1. multi-stage training with sequential lidar point cloud thin-out, and 2. cross-modal knowledge distillation. In the multi-stage process, three thin-out methods are examined. Our results show significant performance gains of up to 4.2 percentage points in mean Average Precision with multi-stage training and up to 3.9 percentage points with knowledge distillation by initializing the student with the teacher's weights. The main benefit of these approaches is their applicability to other 3D object detection networks without altering their architecture, as we show by analyzing it on two different object detectors. Our code is available at https://github.com/rst-tu-dortmund/lerojd
Paper Structure (22 sections, 5 equations, 5 figures, 14 tables, 2 algorithms)

This paper contains 22 sections, 5 equations, 5 figures, 14 tables, 2 algorithms.

Figures (5)

  • Figure 1: Architecture overview of (a) a knowledge distillation-based method and (b) a multi-stage training method (MSTM) for utilizing lidar data in the training of radar-only object detectors. The ground truth (GT) label is the same for both methods. The diagrams above the dotted line represent the training process, while the diagrams below the dotted line represent inference.
  • Figure 1: Detection results on radar-only data utilizing selected training methods in bird's-eye view representation. The white points are single 3D radar measurements, of a point cloud accumulated over 5 frames. Orange rectangles represent ground truth annotations for all object classes. Blue, red, and green rectangles visualize the detection of cars, cyclists, and pedestrians, respectively.
  • Figure 2: MSTM pipeline. A 3D object detection network is iteratively trained on increasingly sparse lidar data. Three different thin-out strategies are utilized for sparsification. The lidar point cloud is mixed with the radar point cloud in the second to last step. In the last step, the network is only trained on radar data. At inference, only the orange-shaded part is executed. Small points represent lidar points, large points radar points. The color of points corresponds to the distance from the ego-vehicle. The camera image is not used as an input but only aids the visualization.
  • Figure 2: Detection results on radar-only data utilizing selected training methods in bird's-eye view representation. The white points are single 3D radar measurements, of a point cloud accumulated over 5 frames. Orange rectangles represent ground truth annotations for all object classes. Blue, red, and green rectangles visualize the detection of cars, cyclists, and pedestrians, respectively.
  • Figure 3: Comparison between SSTM and MSTM for different lidar sampling strategies.