Table of Contents
Fetching ...

Enhanced 3D Object Detection via Diverse Feature Representations of 4D Radar Tensor

Seung-Hyun Song, Dong-Hee Paek, Minh-Quan Dao, Ezio Malis, Seung-Hyun Kong

TL;DR

This work tackles robust 3D object detection using 4D Radar by addressing the variability introduced by diverse radar preprocessing. It introduces 4D Radar Multi-Representation (4DR-MR), a multi-teacher knowledge distillation framework where teachers learn from different 4DRT pre-processing pipelines and a fusion-then-distillation mechanism transfers their rich representations to a lightweight student that operates on sparse radar inputs. Key contributions include the aggregation module comprising a dedicated representation-alignment stage and an attention-based fusion stage, plus a densify module to bridge the density gap between teacher and student features; combined with a balanced loss for detection and distillation. On the K-Radar dataset, 4DR-MR achieves notable gains over RTNH baselines with extremely sparse inputs and remains competitive with denser-input methods, while dramatically reducing input data size and preserving runtime efficiency. These results demonstrate the practical viability of leveraging diverse 4DRT representations to improve radar-based perception in resource-constrained autonomous systems.

Abstract

Recent advances in automotive four-dimensional (4D) Radar have enabled access to raw 4D Radar Tensor (4DRT), offering richer spatial and Doppler information than conventional point clouds. While most existing methods rely on heavily pre-processed, sparse Radar data, recent attempts to leverage raw 4DRT face high computational costs and limited scalability. To address these limitations, we propose a novel three-dimensional (3D) object detection framework that maximizes the utility of 4DRT while preserving efficiency. Our method introduces a multi-teacher knowledge distillation (KD), where multiple teacher models are trained on point clouds derived from diverse 4DRT pre-processing techniques, each capturing complementary signal characteristics. These teacher representations are fused via a dedicated aggregation module and distilled into a lightweight student model that operates solely on a sparse Radar input. Experimental results on the K-Radar dataset demonstrate that our framework achieves improvements of 7.3% in AP_3D and 9.5% in AP_BEV over the baseline RTNH model when using extremely sparse inputs. Furthermore, it attains comparable performance to denser-input baselines while significantly reducing the input data size by about 90 times, confirming the scalability and efficiency of our approach.

Enhanced 3D Object Detection via Diverse Feature Representations of 4D Radar Tensor

TL;DR

This work tackles robust 3D object detection using 4D Radar by addressing the variability introduced by diverse radar preprocessing. It introduces 4D Radar Multi-Representation (4DR-MR), a multi-teacher knowledge distillation framework where teachers learn from different 4DRT pre-processing pipelines and a fusion-then-distillation mechanism transfers their rich representations to a lightweight student that operates on sparse radar inputs. Key contributions include the aggregation module comprising a dedicated representation-alignment stage and an attention-based fusion stage, plus a densify module to bridge the density gap between teacher and student features; combined with a balanced loss for detection and distillation. On the K-Radar dataset, 4DR-MR achieves notable gains over RTNH baselines with extremely sparse inputs and remains competitive with denser-input methods, while dramatically reducing input data size and preserving runtime efficiency. These results demonstrate the practical viability of leveraging diverse 4DRT representations to improve radar-based perception in resource-constrained autonomous systems.

Abstract

Recent advances in automotive four-dimensional (4D) Radar have enabled access to raw 4D Radar Tensor (4DRT), offering richer spatial and Doppler information than conventional point clouds. While most existing methods rely on heavily pre-processed, sparse Radar data, recent attempts to leverage raw 4DRT face high computational costs and limited scalability. To address these limitations, we propose a novel three-dimensional (3D) object detection framework that maximizes the utility of 4DRT while preserving efficiency. Our method introduces a multi-teacher knowledge distillation (KD), where multiple teacher models are trained on point clouds derived from diverse 4DRT pre-processing techniques, each capturing complementary signal characteristics. These teacher representations are fused via a dedicated aggregation module and distilled into a lightweight student model that operates solely on a sparse Radar input. Experimental results on the K-Radar dataset demonstrate that our framework achieves improvements of 7.3% in AP_3D and 9.5% in AP_BEV over the baseline RTNH model when using extremely sparse inputs. Furthermore, it attains comparable performance to denser-input baselines while significantly reducing the input data size by about 90 times, confirming the scalability and efficiency of our approach.

Paper Structure

This paper contains 30 sections, 4 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison of 4D Radar point clouds based on different pre-processing techniques: (a) original output from 4DRT, (b) Fixed percentile filtering in the polar domain, (c) Fixed percentile filtering in the cartesian domain with interpolation, and (d) CA-CFAR filtering.
  • Figure 2: 4DR-MR: Overall architecture of the proposed 3D object detection framework utilizing multi-representations of 4D Radar. Features from multiple teacher networks, each using different Radar pre-processing, are fused and distilled into a compact student model for efficient inference.
  • Figure 3: Radar processing pipeline: Microwave signals are digitized, then transformed via two FFTs into a dense 4D tensor. A pre-processing step then converts it to a sparse point cloud.
  • Figure 4: Aggregation module: The module consists of a Representation Alignment Block that refines each teacher feature and an Attention-based Fusion Block that adaptively integrates the aligned features.
  • Figure 5: Qualitative comparison of BEV feature maps and detection results. (a) baseline RTNH$_{99.9}$; (b) teacher models trained on (1) RTNH$_{80}$, (2) RTNH with interpolation, and (3) RTNH$_{cfar}$; (c) 4DR-MR student model.