LEROjD: Lidar Extended Radar-Only Object Detection
Patrick Palmer, Martin Krüger, Stefan Schütte, Richard Altendorfer, Ganesh Adam, Torsten Bertram
TL;DR
This work tackles the challenge of radar-only 3D object detection for automated driving by leveraging lidar data during training. It introduces two training-time transfer strategies from lidar to radar: multi-stage lidar thin-out training (MSTM) and cross-modal knowledge distillation (KD), demonstrated across multiple detectors while keeping architectures unchanged. Experiments on the VoD dataset show that MSTM, especially with voxel-based thinning, can yield up to about 3.5 percentage-point gains in radar data mAP, and KD, particularly when the teacher is trained on mixed radar+lidar data and uses feature-KD, provides additional improvements. The findings indicate that lidar information in training can substantially close the gap to lidar-enabled detection and generalize across different 3D detectors, with code made available for replication.
Abstract
Accurate 3D object detection is vital for automated driving. While lidar sensors are well suited for this task, they are expensive and have limitations in adverse weather conditions. 3+1D imaging radar sensors offer a cost-effective, robust alternative but face challenges due to their low resolution and high measurement noise. Existing 3+1D imaging radar datasets include radar and lidar data, enabling cross-modal model improvements. Although lidar should not be used during inference, it can aid the training of radar-only object detectors. We explore two strategies to transfer knowledge from the lidar to the radar domain and radar-only object detectors: 1. multi-stage training with sequential lidar point cloud thin-out, and 2. cross-modal knowledge distillation. In the multi-stage process, three thin-out methods are examined. Our results show significant performance gains of up to 4.2 percentage points in mean Average Precision with multi-stage training and up to 3.9 percentage points with knowledge distillation by initializing the student with the teacher's weights. The main benefit of these approaches is their applicability to other 3D object detection networks without altering their architecture, as we show by analyzing it on two different object detectors. Our code is available at https://github.com/rst-tu-dortmund/lerojd
