Table of Contents
Fetching ...

AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection

Loveneet Saini, Mirko Meuter, Hasan Tercan, Tobias Meisen

TL;DR

This paper tackles the challenge of sparse, non-deterministic radar data in BEV object detection by introducing AttentiveGRU, a temporal fusion layer that learns object-specific spatio-temporal context without requiring ego-motion information. The method combines attention-based gating on latent object queries with a memory-present recurrent fusion (state integration) to coherently fuse correlated structures over time, while maintaining linear complexity in space and time. It is modular, compatible with both CNN-based FPN and transformer backbones, and compatible with a FCOS-style detection head. Empirical results on nuScenes radar data show up to a 21% relative improvement in mAP over prior radar-only state-of-the-art, with additional gains on a larger proprietary dataset, demonstrating robustness to sparsity and motion variability and confirming the practical potential for radar-only or radar-augmented perception systems.

Abstract

Bird's-eye view (BEV) object detection has become important for advanced automotive 3D radar-based perception systems. However, the inherently sparse and non-deterministic nature of radar data limits the effectiveness of traditional single-frame BEV paradigms. In this paper, we addresses this limitation by introducing AttentiveGRU, a novel attention-based recurrent approach tailored for radar constraints, which extracts individualized spatio-temporal context for objects by dynamically identifying and fusing temporally correlated structures across present and memory states. By leveraging the consistency of object's latent representation over time, our approach exploits temporal relations to enrich feature representations for both stationary and moving objects, thereby enhancing detection performance and eliminating the need for externally providing or estimating any information about ego vehicle motion. Our experimental results on the public nuScenes dataset show a significant increase in mAP for the car category by 21% over the best radar-only submission. Further evaluations on an additional dataset demonstrate notable improvements in object detection capabilities, underscoring the applicability and effectiveness of our method.

AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection

TL;DR

This paper tackles the challenge of sparse, non-deterministic radar data in BEV object detection by introducing AttentiveGRU, a temporal fusion layer that learns object-specific spatio-temporal context without requiring ego-motion information. The method combines attention-based gating on latent object queries with a memory-present recurrent fusion (state integration) to coherently fuse correlated structures over time, while maintaining linear complexity in space and time. It is modular, compatible with both CNN-based FPN and transformer backbones, and compatible with a FCOS-style detection head. Empirical results on nuScenes radar data show up to a 21% relative improvement in mAP over prior radar-only state-of-the-art, with additional gains on a larger proprietary dataset, demonstrating robustness to sparsity and motion variability and confirming the practical potential for radar-only or radar-augmented perception systems.

Abstract

Bird's-eye view (BEV) object detection has become important for advanced automotive 3D radar-based perception systems. However, the inherently sparse and non-deterministic nature of radar data limits the effectiveness of traditional single-frame BEV paradigms. In this paper, we addresses this limitation by introducing AttentiveGRU, a novel attention-based recurrent approach tailored for radar constraints, which extracts individualized spatio-temporal context for objects by dynamically identifying and fusing temporally correlated structures across present and memory states. By leveraging the consistency of object's latent representation over time, our approach exploits temporal relations to enrich feature representations for both stationary and moving objects, thereby enhancing detection performance and eliminating the need for externally providing or estimating any information about ego vehicle motion. Our experimental results on the public nuScenes dataset show a significant increase in mAP for the car category by 21% over the best radar-only submission. Further evaluations on an additional dataset demonstrate notable improvements in object detection capabilities, underscoring the applicability and effectiveness of our method.

Paper Structure

This paper contains 22 sections, 1 equation, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Radar reflections for two consecutive scans from a truck approaching towards ego vehicle in the direction of arrow
  • Figure 2: Overall architecture with temporal fusion layer in the backbone
  • Figure 3: Per-point processing chain for input radar reflections. Blue regions in the BEV map represent sparsity, while yellow is the peaks from reflections
  • Figure 4: Illustration of the proposed Temporal Fusion Block with Attention Gating (AG) and State Integration (SI) modules
  • Figure 5: An exemplary AttentiveGRU layer, showcasing the use of proposed Temporal fusion blocks with Attention gating (AG) and State Integration (SI) modules
  • ...and 2 more figures