Table of Contents
Fetching ...

TransRAD: Retentive Vision Transformer for Enhanced Radar Object Detection

Lei Cheng, Siyang Cao

TL;DR

TransRAD tackles radar-only 3D object detection by introducing a lightweight Retentive Vision Transformer backbone with explicit spatial priors (MaSA), a multi-scale FPN neck, and anchor-free decoupled heads. A tailored loss function and Location-Aware NMS further address radar-specific challenges, enabling accurate 3D bounding boxes across Range, Azimuth, and Doppler. On RADDet, TransRAD achieves state-of-the-art 3D and 2D radar detection performance with faster inference and lower computational cost than prior methods, demonstrating the feasibility of transformer-based radar perception without heavy 3D CNNs. The work offers a practical, efficient radar perception solution for robust autonomous systems, especially in adverse conditions, and outlines clear ablations validating each design choice.

Abstract

Despite significant advancements in environment perception capabilities for autonomous driving and intelligent robotics, cameras and LiDARs remain notoriously unreliable in low-light conditions and adverse weather, which limits their effectiveness. Radar serves as a reliable and low-cost sensor that can effectively complement these limitations. However, radar-based object detection has been underexplored due to the inherent weaknesses of radar data, such as low resolution, high noise, and lack of visual information. In this paper, we present TransRAD, a novel 3D radar object detection model designed to address these challenges by leveraging the Retentive Vision Transformer (RMT) to more effectively learn features from information-dense radar Range-Azimuth-Doppler (RAD) data. Our approach leverages the Retentive Manhattan Self-Attention (MaSA) mechanism provided by RMT to incorporate explicit spatial priors, thereby enabling more accurate alignment with the spatial saliency characteristics of radar targets in RAD data and achieving precise 3D radar detection across Range-Azimuth-Doppler dimensions. Furthermore, we propose Location-Aware NMS to effectively mitigate the common issue of duplicate bounding boxes in deep radar object detection. The experimental results demonstrate that TransRAD outperforms state-of-the-art methods in both 2D and 3D radar detection tasks, achieving higher accuracy, faster inference speed, and reduced computational complexity. Code is available at https://github.com/radar-lab/TransRAD

TransRAD: Retentive Vision Transformer for Enhanced Radar Object Detection

TL;DR

TransRAD tackles radar-only 3D object detection by introducing a lightweight Retentive Vision Transformer backbone with explicit spatial priors (MaSA), a multi-scale FPN neck, and anchor-free decoupled heads. A tailored loss function and Location-Aware NMS further address radar-specific challenges, enabling accurate 3D bounding boxes across Range, Azimuth, and Doppler. On RADDet, TransRAD achieves state-of-the-art 3D and 2D radar detection performance with faster inference and lower computational cost than prior methods, demonstrating the feasibility of transformer-based radar perception without heavy 3D CNNs. The work offers a practical, efficient radar perception solution for robust autonomous systems, especially in adverse conditions, and outlines clear ablations validating each design choice.

Abstract

Despite significant advancements in environment perception capabilities for autonomous driving and intelligent robotics, cameras and LiDARs remain notoriously unreliable in low-light conditions and adverse weather, which limits their effectiveness. Radar serves as a reliable and low-cost sensor that can effectively complement these limitations. However, radar-based object detection has been underexplored due to the inherent weaknesses of radar data, such as low resolution, high noise, and lack of visual information. In this paper, we present TransRAD, a novel 3D radar object detection model designed to address these challenges by leveraging the Retentive Vision Transformer (RMT) to more effectively learn features from information-dense radar Range-Azimuth-Doppler (RAD) data. Our approach leverages the Retentive Manhattan Self-Attention (MaSA) mechanism provided by RMT to incorporate explicit spatial priors, thereby enabling more accurate alignment with the spatial saliency characteristics of radar targets in RAD data and achieving precise 3D radar detection across Range-Azimuth-Doppler dimensions. Furthermore, we propose Location-Aware NMS to effectively mitigate the common issue of duplicate bounding boxes in deep radar object detection. The experimental results demonstrate that TransRAD outperforms state-of-the-art methods in both 2D and 3D radar detection tasks, achieving higher accuracy, faster inference speed, and reduced computational complexity. Code is available at https://github.com/radar-lab/TransRAD

Paper Structure

This paper contains 29 sections, 25 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Unique aspects of radar object detection.
  • Figure 2: Overall architecture of TransRAD.
  • Figure 3: Explicit spatial prior in MaSA: attention diminishes with increasing Manhattan distance from the center.
  • Figure 4: Class imbalance in RADDet dataset.
  • Figure 5: Radar object detection results comparison between the ground truth, TransRAD, and RadarResNet.
  • ...and 1 more figures