Enhanced 3D Object Detection via Diverse Feature Representations of 4D Radar Tensor
Seung-Hyun Song, Dong-Hee Paek, Minh-Quan Dao, Ezio Malis, Seung-Hyun Kong
TL;DR
This work tackles robust 3D object detection using 4D Radar by addressing the variability introduced by diverse radar preprocessing. It introduces 4D Radar Multi-Representation (4DR-MR), a multi-teacher knowledge distillation framework where teachers learn from different 4DRT pre-processing pipelines and a fusion-then-distillation mechanism transfers their rich representations to a lightweight student that operates on sparse radar inputs. Key contributions include the aggregation module comprising a dedicated representation-alignment stage and an attention-based fusion stage, plus a densify module to bridge the density gap between teacher and student features; combined with a balanced loss for detection and distillation. On the K-Radar dataset, 4DR-MR achieves notable gains over RTNH baselines with extremely sparse inputs and remains competitive with denser-input methods, while dramatically reducing input data size and preserving runtime efficiency. These results demonstrate the practical viability of leveraging diverse 4DRT representations to improve radar-based perception in resource-constrained autonomous systems.
Abstract
Recent advances in automotive four-dimensional (4D) Radar have enabled access to raw 4D Radar Tensor (4DRT), offering richer spatial and Doppler information than conventional point clouds. While most existing methods rely on heavily pre-processed, sparse Radar data, recent attempts to leverage raw 4DRT face high computational costs and limited scalability. To address these limitations, we propose a novel three-dimensional (3D) object detection framework that maximizes the utility of 4DRT while preserving efficiency. Our method introduces a multi-teacher knowledge distillation (KD), where multiple teacher models are trained on point clouds derived from diverse 4DRT pre-processing techniques, each capturing complementary signal characteristics. These teacher representations are fused via a dedicated aggregation module and distilled into a lightweight student model that operates solely on a sparse Radar input. Experimental results on the K-Radar dataset demonstrate that our framework achieves improvements of 7.3% in AP_3D and 9.5% in AP_BEV over the baseline RTNH model when using extremely sparse inputs. Furthermore, it attains comparable performance to denser-input baselines while significantly reducing the input data size by about 90 times, confirming the scalability and efficiency of our approach.
