Table of Contents
Fetching ...

RAVEN: Radar Adaptive Vision Encoders for Efficient Chirp-wise Object Detection and Segmentation

Anuvab Sen, Mir Sayeed Mohammad, Saibal Mukhopadhyay

Abstract

This paper presents RAVEN, a computationally efficient deep learning architecture for FMCW radar perception. The method processes raw ADC data in a chirp-wise streaming manner, preserves MIMO structure through independent receiver state-space encoders, and uses a learnable cross-antenna mixing module to recover compact virtual-array features. It also introduces an early-exit mechanism so the model can make decisions using only a subset of chirps when the latent state has stabilized. Across automotive radar benchmarks, the approach reports strong object detection and BEV free-space segmentation performance while substantially reducing computation and end-to-end latency compared with conventional frame-based radar pipelines.

RAVEN: Radar Adaptive Vision Encoders for Efficient Chirp-wise Object Detection and Segmentation

Abstract

This paper presents RAVEN, a computationally efficient deep learning architecture for FMCW radar perception. The method processes raw ADC data in a chirp-wise streaming manner, preserves MIMO structure through independent receiver state-space encoders, and uses a learnable cross-antenna mixing module to recover compact virtual-array features. It also introduces an early-exit mechanism so the model can make decisions using only a subset of chirps when the latent state has stabilized. Across automotive radar benchmarks, the approach reports strong object detection and BEV free-space segmentation performance while substantially reducing computation and end-to-end latency compared with conventional frame-based radar pipelines.

Paper Structure

This paper contains 46 sections, 40 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: (a) Comparison of traditional radar processing paradigms: frame-wise CNN encoders, chirp-wise recurrent models, and sample-wise streaming SSM pipelines. (b) Our spatial-aware hybrid architecture preserves per-RX structure, performs cross-antenna attention and extracts chirp-wise virtual-array features for lightweight detection. (c) Runtime–performance characterization showing our method achieves higher accuracy at significantly lower latency and compute compared to existing radar perception models rebut2022radialgiroux2023tfftradnetsharma2024chirpnetsen2025ssmradnetsamplewisestatespace.
  • Figure 2: MIMO radar virtual antenna formation and multiplexing. (a) $N_{tx}$ transmitters and $N_{rx}$ receivers form $N_{tx}\!\times\!N_{rx}$ virtual antennas. RX channels read simultaneously. (b) TDM: TX elements fire sequentially. (c) DDM: TX elements fire spectrally interleaved FMCW pulses; virtual-array information is mixed in frequency per receiver.
  • Figure 3: RAVEN Architecture: (1) Fast-time per-RX SSMs compress I/Q into compact 2-D tokens; (2) cross-antenna attention fuses RX channels and expands to virtual-MIMO features; (3) a chirp-wise SSM updates the state online across chirps; (4) a learned projection maps features to a $T\times H\times W$ grid; (5) lightweight decoders produce detection heatmaps/boxes and segmentation.
  • Figure 4: (a) Attention Mixer: Learnable transmitter queries are used to extract Doppler-division multiplexed information from the receiver signal in the time domain. These are fused together to form the virtual antenna array for retrieving the MIMO information. (b) Early Decision Supervision: During training, decoders take outputs from multiple chirp levels, and loss is computed simultaneously kusupati2022matryoshka, forcing the model to converge on earlier chirps.
  • Figure 5: Qualitative ablation of the adaptive decision module across four scenarios. Each example shows the RGB view, segmentation evolution over chirps (white - true positive, green - false positive, red - false negative), detection evolution (point-level RA predictions), and the chirp-state contribution signal. Sample (a) complex multi-vehicle scene where early-chirp assumptions about distant objects are refined into accurate detections. Sample (b) early-chirp false positives (“hallucinated” obstacles) are suppressed as more chirps arrive. Sample (c) early hallucinations fade but segmentation remains unreliable throughout. Sample (d) an object briefly emerges in clutter before vanishing, and a noisy chirp-similarity score depicts the irregularity of the data, resulting in poor segmentation and detection.
  • ...and 7 more figures