Table of Contents
Fetching ...

Graph Query Networks for Object Detection with Automotive Radar

Loveneet Saini, Hasan Tercan, Tobias Meisen

TL;DR

Graph Query Networks (GQN) is introduced, an attention-based framework that models objects sensed by radar as graphs, to extract individualized relational and contextual features.

Abstract

Object detection with 3D radar is essential for 360-degree automotive perception, but radar's long wavelengths produce sparse and irregular reflections that challenge traditional grid and sequence-based convolutional and transformer detectors. This paper introduces Graph Query Networks (GQN), an attention-based framework that models objects sensed by radar as graphs, to extract individualized relational and contextual features. GQN employs a novel concept of graph queries to dynamically attend over the bird's-eye view (BEV) space, constructing object-specific graphs processed by two novel modules: EdgeFocus for relational reasoning and DeepContext Pooling for contextual aggregation. On the NuScenes dataset, GQN improves relative mAP by up to +53%, including a +8.2% gain over the strongest prior radar method, while reducing peak graph construction overhead by 80% with moderate FLOPs cost.

Graph Query Networks for Object Detection with Automotive Radar

TL;DR

Graph Query Networks (GQN) is introduced, an attention-based framework that models objects sensed by radar as graphs, to extract individualized relational and contextual features.

Abstract

Object detection with 3D radar is essential for 360-degree automotive perception, but radar's long wavelengths produce sparse and irregular reflections that challenge traditional grid and sequence-based convolutional and transformer detectors. This paper introduces Graph Query Networks (GQN), an attention-based framework that models objects sensed by radar as graphs, to extract individualized relational and contextual features. GQN employs a novel concept of graph queries to dynamically attend over the bird's-eye view (BEV) space, constructing object-specific graphs processed by two novel modules: EdgeFocus for relational reasoning and DeepContext Pooling for contextual aggregation. On the NuScenes dataset, GQN improves relative mAP by up to +53%, including a +8.2% gain over the strongest prior radar method, while reducing peak graph construction overhead by 80% with moderate FLOPs cost.

Paper Structure

This paper contains 14 sections, 8 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Visualization of a graph query, with dynamic learning of nodes (red) and edges (yellow) guided by the attention map.
  • Figure 2: GQN pipeline: Graph queries are instantiated over BEV features using attention guided sampling, updated via attention over edges (EdgeFocus), and enriched through context pooling and self-attention (DeepContext Pooling). Symbols:$V_i$: node, $e_{ij}$: edge, $\beta_{ij}$: attention weight for edge $e_{ij}$, $g_i$: pooled graph query vector, $V'_i$: updated node, $g'_i$: updated pool vector.
  • Figure 3: Unified Reasoning Architecture. BEV features from a backbone and flexible temporal fusion (e.g., ConvGRU or AttentiveGRU) are processed in parallel by a GQN and a global feature extractor (e.g., Transformer or FPN) with a CenterPoint decoder. The global vectors of the graph queries are reused as decoder queries. Outputs are fused via attention and passed to a centerpoint detection head.
  • Figure 4: Precision-Recall curves for secondary dataset braun2021quantification