AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection
Loveneet Saini, Mirko Meuter, Hasan Tercan, Tobias Meisen
TL;DR
This paper tackles the challenge of sparse, non-deterministic radar data in BEV object detection by introducing AttentiveGRU, a temporal fusion layer that learns object-specific spatio-temporal context without requiring ego-motion information. The method combines attention-based gating on latent object queries with a memory-present recurrent fusion (state integration) to coherently fuse correlated structures over time, while maintaining linear complexity in space and time. It is modular, compatible with both CNN-based FPN and transformer backbones, and compatible with a FCOS-style detection head. Empirical results on nuScenes radar data show up to a 21% relative improvement in mAP over prior radar-only state-of-the-art, with additional gains on a larger proprietary dataset, demonstrating robustness to sparsity and motion variability and confirming the practical potential for radar-only or radar-augmented perception systems.
Abstract
Bird's-eye view (BEV) object detection has become important for advanced automotive 3D radar-based perception systems. However, the inherently sparse and non-deterministic nature of radar data limits the effectiveness of traditional single-frame BEV paradigms. In this paper, we addresses this limitation by introducing AttentiveGRU, a novel attention-based recurrent approach tailored for radar constraints, which extracts individualized spatio-temporal context for objects by dynamically identifying and fusing temporally correlated structures across present and memory states. By leveraging the consistency of object's latent representation over time, our approach exploits temporal relations to enrich feature representations for both stationary and moving objects, thereby enhancing detection performance and eliminating the need for externally providing or estimating any information about ego vehicle motion. Our experimental results on the public nuScenes dataset show a significant increase in mAP for the car category by 21% over the best radar-only submission. Further evaluations on an additional dataset demonstrate notable improvements in object detection capabilities, underscoring the applicability and effectiveness of our method.
