Table of Contents
Fetching ...

BEV-Patch-PF: Particle Filtering with BEV-Aerial Feature Matching for Off-Road Geo-Localization

Dongmyeong Lee, Jesse Quattrociocchi, Christian Ellis, Rwik Rana, Amanda Adkins, Adam Uccello, Garrett Warnell, Joydeep Biswas

TL;DR

BEV-Patch-PF introduces a GPS-free sequential geo-localization framework that couples a particle filter with a learned BEV–aerial feature similarity for continuous pose estimation in off-road environments. By sampling aerial feature patches corresponding to each particle pose and weighting them with a BEV-ground feature match, the method yields a smooth, discriminative likelihood over SE(2) without discretizing heading. Training uses InfoNCE for discriminative alignment and a confidence loss to calibrate sampling reliability, enabling robust operation under canopy and shadows. Empirical results on multiple off-road datasets show significant ATE improvements over retrieval-based baselines and real-time performance on consumer-level GPUs, with a public CDS dataset and ROS 2 integration enhancing practical deployment.

Abstract

We propose BEV-Patch-PF, a GPS-free sequential geo-localization system that integrates a particle filter with learned bird's-eye-view (BEV) and aerial feature maps. From onboard RGB and depth images, we construct a BEV feature map. For each 3-DoF particle pose hypothesis, we crop the corresponding patch from an aerial feature map computed from a local aerial image queried around the approximate location. BEV-Patch-PF computes a per-particle log-likelihood by matching the BEV feature to the aerial patch feature. On two real-world off-road datasets, our method achieves 7.5x lower absolute trajectory error (ATE) on seen routes and 7.0x lower ATE on unseen routes than a retrieval-based baseline, while maintaining accuracy under dense canopy and shadow. The system runs in real time at 10 Hz on an NVIDIA Tesla T4, enabling practical robot deployment.

BEV-Patch-PF: Particle Filtering with BEV-Aerial Feature Matching for Off-Road Geo-Localization

TL;DR

BEV-Patch-PF introduces a GPS-free sequential geo-localization framework that couples a particle filter with a learned BEV–aerial feature similarity for continuous pose estimation in off-road environments. By sampling aerial feature patches corresponding to each particle pose and weighting them with a BEV-ground feature match, the method yields a smooth, discriminative likelihood over SE(2) without discretizing heading. Training uses InfoNCE for discriminative alignment and a confidence loss to calibrate sampling reliability, enabling robust operation under canopy and shadows. Empirical results on multiple off-road datasets show significant ATE improvements over retrieval-based baselines and real-time performance on consumer-level GPUs, with a public CDS dataset and ROS 2 integration enhancing practical deployment.

Abstract

We propose BEV-Patch-PF, a GPS-free sequential geo-localization system that integrates a particle filter with learned bird's-eye-view (BEV) and aerial feature maps. From onboard RGB and depth images, we construct a BEV feature map. For each 3-DoF particle pose hypothesis, we crop the corresponding patch from an aerial feature map computed from a local aerial image queried around the approximate location. BEV-Patch-PF computes a per-particle log-likelihood by matching the BEV feature to the aerial patch feature. On two real-world off-road datasets, our method achieves 7.5x lower absolute trajectory error (ATE) on seen routes and 7.0x lower ATE on unseen routes than a retrieval-based baseline, while maintaining accuracy under dense canopy and shadow. The system runs in real time at 10 Hz on an NVIDIA Tesla T4, enabling practical robot deployment.

Paper Structure

This paper contains 14 sections, 6 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Visualization of BEV-Patch-PF inputs and outputs. Top (Inputs): (a) onboard RGB image $\mathcal{I}$, (b) depth image $\mathcal{D}$, and (c) a local aerial orthophoto $\mathcal{M}[\bar{\mathbf{x}}]$, where the green arrow indicates the ground-truth pose. Bottom (Outputs): (d) The predicted BEV confidence map $\mathbf{C}$, (e) the corresponding feature map $\mathbf{G}$, and (f) the aerial feature map $\mathbf{F}$. The green box on the aerial feature map highlights the patch sampled for matching against the BEV features.
  • Figure 2: Overall pipeline of the BEV-Patch-PF.
  • Figure 3: BEV-Aerial feature network architecture.
  • Figure 4: Training, validation, and test splits for TartanDrive 2.0 sivaprakasam2024tartandrive. Satellite imagery © 2025 Airbus, Maxar Technologies; map data © 2025 Google.
  • Figure 5: Comparison of estimated trajectories from BEV-Patch-PF and all baselines on the TartanDrive 2.0 dataset.
  • ...and 3 more figures