Table of Contents
Fetching ...

A recurrent CNN for online object detection on raw radar frames

Colin Decourt, Rufin VanRullen, Didier Salle, Thomas Oberlin

TL;DR

RECORD introduces a causal recurrent CNN for online radar object detection that fuses spatial 2D convolutions with efficient ConvLSTM memory to model spatio-temporal patterns in raw radar representations (RD/RA/RAD). The architecture uses an efficient bottleneck design, layer normalization, and skip connections, with both single-view and multi-view (TMVSC) variants to leverage temporal information across frames and views. Empirical results on the ROD2021 and CARRADA datasets show RECORD achieving state-of-the-art or competitive performance with lower computational cost, particularly in online mode suitable for real-time ADAS applications. The work demonstrates that online, memory-augmented convolutional-recurrent networks can effectively distinguish classes like pedestrians and cyclists and generalize across radar representations, with promising implications for onboard deployment and real-time radar perception.

Abstract

Automotive radar sensors provide valuable information for advanced driving assistance systems (ADAS). Radars can reliably estimate the distance to an object and the relative velocity, regardless of weather and light conditions. However, radar sensors suffer from low resolution and huge intra-class variations in the shape of objects. Exploiting the time information (e.g., multiple frames) has been shown to help to capture better the dynamics of objects and, therefore, the variation in the shape of objects. Most temporal radar object detectors use 3D convolutions to learn spatial and temporal information. However, these methods are often non-causal and unsuitable for real-time applications. This work presents RECORD, a new recurrent CNN architecture for online radar object detection. We propose an end-to-end trainable architecture mixing convolutions and ConvLSTMs to learn spatio-temporal dependencies between successive frames. Our model is causal and requires only the past information encoded in the memory of the ConvLSTMs to detect objects. Our experiments show such a method's relevance for detecting objects in different radar representations (range-Doppler, range-angle) and outperform state-of-the-art models on the ROD2021 and CARRADA datasets while being less computationally expensive.

A recurrent CNN for online object detection on raw radar frames

TL;DR

RECORD introduces a causal recurrent CNN for online radar object detection that fuses spatial 2D convolutions with efficient ConvLSTM memory to model spatio-temporal patterns in raw radar representations (RD/RA/RAD). The architecture uses an efficient bottleneck design, layer normalization, and skip connections, with both single-view and multi-view (TMVSC) variants to leverage temporal information across frames and views. Empirical results on the ROD2021 and CARRADA datasets show RECORD achieving state-of-the-art or competitive performance with lower computational cost, particularly in online mode suitable for real-time ADAS applications. The work demonstrates that online, memory-augmented convolutional-recurrent networks can effectively distinguish classes like pedestrians and cyclists and generalize across radar representations, with promising implications for onboard deployment and real-time radar perception.

Abstract

Automotive radar sensors provide valuable information for advanced driving assistance systems (ADAS). Radars can reliably estimate the distance to an object and the relative velocity, regardless of weather and light conditions. However, radar sensors suffer from low resolution and huge intra-class variations in the shape of objects. Exploiting the time information (e.g., multiple frames) has been shown to help to capture better the dynamics of objects and, therefore, the variation in the shape of objects. Most temporal radar object detectors use 3D convolutions to learn spatial and temporal information. However, these methods are often non-causal and unsuitable for real-time applications. This work presents RECORD, a new recurrent CNN architecture for online radar object detection. We propose an end-to-end trainable architecture mixing convolutions and ConvLSTMs to learn spatio-temporal dependencies between successive frames. Our model is causal and requires only the past information encoded in the memory of the ConvLSTMs to detect objects. Our experiments show such a method's relevance for detecting objects in different radar representations (range-Doppler, range-angle) and outperform state-of-the-art models on the ROD2021 and CARRADA datasets while being less computationally expensive.
Paper Structure (33 sections, 7 equations, 4 figures, 5 tables)

This paper contains 33 sections, 7 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: FMCW radar overview
  • Figure 2: Model architecture (RECORD). Rounded arrows on Bottleneck LSTMs stand for a recurrent layer. Plus sign stands for the concatenation operation. We report the output size (left) and the number of output channels (right) for each layer.
  • Figure 3: Multi-view model architecture (MV-RECORD). We use the encoder described in Figure \ref{['fig:model_archi']} for each view. Dashed boxes denote an optional operation applied only if the feature maps have different shapes. Gray arrows denote the same output.
  • Figure 4: Training procedures with $N=3$. (a) Buffer training procedure (many-to-one). (b) Online training procedure (many-to-many).