Table of Contents
Fetching ...

Open-RadVLAD: Fast and Robust Radar Place Recognition

Matthew Gadd, Paul Newman

TL;DR

This work tackles radar-based place recognition under challenging viewpoint and sensor-noise conditions by representing polar radar scans and applying a 1D Fourier Transform along the radial axis to derive radial frequency responses. A VLAD descriptor built from residuals to map cluster centers yields a rotationally invariant, highly discriminative representation, enabling efficient nearest-neighbor localisation in a high-dimensional space. The method is exhaustively evaluated on 30 Oxford Radar RobotCar trajectories (870 pairings), achieving a median Recall@1 of 91.52% and outperforming open baselines like RaPlace while requiring fewer integral transforms. By releasing open-source code and an extensive benchmark, the work provides a practical, scalable platform for advancing radar-based place recognition in urban autonomous driving.

Abstract

Radar place recognition often involves encoding a live scan as a vector and matching this vector to a database in order to recognise that the vehicle is in a location that it has visited before. Radar is inherently robust to lighting or weather conditions, but place recognition with this sensor is still affected by: (1) viewpoint variation, i.e. translation and rotation, (2) sensor artefacts or "noises". For 360-degree scanning radar, rotation is readily dealt with by in some way aggregating across azimuths. Also, we argue in this work that it is more critical to deal with the richness of representation and sensor noises than it is to deal with translational invariance - particularly in urban driving where vehicles predominantly follow the same lane when repeating a route. In our method, for computational efficiency, we use only the polar representation. For partial translation invariance and robustness to signal noise, we use only a one-dimensional Fourier Transform along radial returns. We also achieve rotational invariance and a very discriminative descriptor space by building a vector of locally aggregated descriptors. Our method is more comprehensively tested than all prior radar place recognition work - over an exhaustive combination of all 870 pairs of trajectories from 30 Oxford Radar RobotCar Dataset sequences (each approximately 10 km). Code and detailed results are provided at github.com/mttgdd/open-radvlad, as an open implementation and benchmark for future work in this area. We achieve a median of 91.52% in Recall@1, outstripping the 69.55% for the only other open implementation, RaPlace, and at a fraction of its computational cost (relying on fewer integral transforms e.g. Radon, Fourier, and inverse Fourier).

Open-RadVLAD: Fast and Robust Radar Place Recognition

TL;DR

This work tackles radar-based place recognition under challenging viewpoint and sensor-noise conditions by representing polar radar scans and applying a 1D Fourier Transform along the radial axis to derive radial frequency responses. A VLAD descriptor built from residuals to map cluster centers yields a rotationally invariant, highly discriminative representation, enabling efficient nearest-neighbor localisation in a high-dimensional space. The method is exhaustively evaluated on 30 Oxford Radar RobotCar trajectories (870 pairings), achieving a median Recall@1 of 91.52% and outperforming open baselines like RaPlace while requiring fewer integral transforms. By releasing open-source code and an extensive benchmark, the work provides a practical, scalable platform for advancing radar-based place recognition in urban autonomous driving.

Abstract

Radar place recognition often involves encoding a live scan as a vector and matching this vector to a database in order to recognise that the vehicle is in a location that it has visited before. Radar is inherently robust to lighting or weather conditions, but place recognition with this sensor is still affected by: (1) viewpoint variation, i.e. translation and rotation, (2) sensor artefacts or "noises". For 360-degree scanning radar, rotation is readily dealt with by in some way aggregating across azimuths. Also, we argue in this work that it is more critical to deal with the richness of representation and sensor noises than it is to deal with translational invariance - particularly in urban driving where vehicles predominantly follow the same lane when repeating a route. In our method, for computational efficiency, we use only the polar representation. For partial translation invariance and robustness to signal noise, we use only a one-dimensional Fourier Transform along radial returns. We also achieve rotational invariance and a very discriminative descriptor space by building a vector of locally aggregated descriptors. Our method is more comprehensively tested than all prior radar place recognition work - over an exhaustive combination of all 870 pairs of trajectories from 30 Oxford Radar RobotCar Dataset sequences (each approximately 10 km). Code and detailed results are provided at github.com/mttgdd/open-radvlad, as an open implementation and benchmark for future work in this area. We achieve a median of 91.52% in Recall@1, outstripping the 69.55% for the only other open implementation, RaPlace, and at a fraction of its computational cost (relying on fewer integral transforms e.g. Radon, Fourier, and inverse Fourier).
Paper Structure (17 sections, 3 equations, 5 figures, 3 tables)

This paper contains 17 sections, 3 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Open-RadVLAD system overview. For a degree of robustness, we take the 1D Fourier Transform of the radial returns. Clustering these radial frequency responses is not specific to azimuth order and so gives us rotational invariance. Therefore, we apply a "vector of locally aggregated descriptors" (VLAD) as an informative scan descriptor with the residuals of query radial frequency responses to those cluster centres.
  • Figure 2: Top: $1000$ samples of time taken to build representations for \ref{['V4']}, our method (blue), as compared to \ref{['V2']} from jang2023iros (orange). Bottom: $1000$ samples of time taken to compute representation distance. These samples were collected by running processes on an Apple M2 Pro with $12$ cores and with 16GLPDDR5 memory.
  • Figure 3: Left: Ground truth GPS difference matrix for the $\approx$$800\times800$ possible place correspondences between sequences 2019-01-18-15-20-12 and 2019-01-15-12-01-32. Right: Corresponding embedding distance matrix for \ref{['V4']}. Red insets show revisits in the opposite lane/direction (top) and in the same lane/direction (bottom).
  • Figure 4: Example polar scan, with maximum range limited to $1024$ pixels and then resized to $400\times512$. The zoomed in inset (red) shows returns from near the vehicle (first few range columns).
  • Figure 5: Recall@1-50 curves for variants of our method (\ref{['V3']} and \ref{['V4']}) and two competitors (\ref{['V1']} and \ref{['V2']}). As a guide to reading this result: for $\mathtt{N}=20$ consider that Ring Key has approximately 85% Recall@20 -- this means that, 85% of the time, when a query has $20$ candidate nearest neighbours returned, at least $1$ of them is in fact a nearby place. Of course, for $\mathtt{N}\rightarrow\infty$ we will eventually return the entire map where we are guaranteed to find a match (Recall@$\infty\rightarrow$100%). For all N, our \ref{['V4']} performs best.