Table of Contents
Fetching ...

Event-Based Visual Teach-and-Repeat via Fast Fourier-Domain Cross-Correlation

Gokul B. Nair, Alejandro Fontan, Michael Milford, Tobias Fischer

TL;DR

This work presents a novel event-camera-based VT\&R system that formulates event-stream matching as frequency-domain cross-correlation, transforming spatial convolutions into efficient Fourier-space multiplications, and maintains Cross-Track Errors (XTE) below 15 cm.

Abstract

Visual teach-and-repeat (VT&R) navigation enables robots to autonomously traverse previously demonstrated paths using visual feedback. We present a novel event-camera-based VT\&R system. Our system formulates event-stream matching as frequency-domain cross-correlation, transforming spatial convolutions into efficient Fourier-space multiplications. By exploiting the binary structure of event frames and applying image compression techniques, we achieve a processing latency of just 2.88 ms, about 3.5 times faster than conventional camera-based baselines that are optimised for runtime efficiency. Experiments using a Prophesee EVK4 HD event camera mounted on an AgileX Scout Mini robot demonstrate successful autonomous navigation across 3000+ meters of indoor and outdoor trajectories in daytime and nighttime conditions. Our system maintains Cross-Track Errors (XTE) below 15 cm, demonstrating the practical viability of event-based perception for real-time VT\&R navigation.

Event-Based Visual Teach-and-Repeat via Fast Fourier-Domain Cross-Correlation

TL;DR

This work presents a novel event-camera-based VT\&R system that formulates event-stream matching as frequency-domain cross-correlation, transforming spatial convolutions into efficient Fourier-space multiplications, and maintains Cross-Track Errors (XTE) below 15 cm.

Abstract

Visual teach-and-repeat (VT&R) navigation enables robots to autonomously traverse previously demonstrated paths using visual feedback. We present a novel event-camera-based VT\&R system. Our system formulates event-stream matching as frequency-domain cross-correlation, transforming spatial convolutions into efficient Fourier-space multiplications. By exploiting the binary structure of event frames and applying image compression techniques, we achieve a processing latency of just 2.88 ms, about 3.5 times faster than conventional camera-based baselines that are optimised for runtime efficiency. Experiments using a Prophesee EVK4 HD event camera mounted on an AgileX Scout Mini robot demonstrate successful autonomous navigation across 3000+ meters of indoor and outdoor trajectories in daytime and nighttime conditions. Our system maintains Cross-Track Errors (XTE) below 15 cm, demonstrating the practical viability of event-based perception for real-time VT\&R navigation.

Paper Structure

This paper contains 31 sections, 13 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Event-based visual teach-and-repeat system overview. (a): The robot records event streams during the teach phase (left) and autonomously follows the trajectory during repeat (right) using Fast Fourier Transform (FFT)-based cross-correlation of accumulated event frames. (b): Cross-correlations lead to timely navigational corrections in the form of repeatedly updating goals. (c): Our system deployed on an AgileX Scout Mini with Prophesee EVK4 achieves >300 Hz correction rates throughout autonomous navigation.
  • Figure 2: Pipeline.Left: In the teach phase, a topometric map is constructed while teleoperating the mobile robot. The map stores event frames along with corresponding robot poses derived from raw odometry at regular intervals of linear or angular displacement (Section \ref{['method:teach']}). Left-Centre: In the repeat phase, as the robot retraces the teach trajectory using the stored poses, incoming event frames are matched with those in the topometric map via cross-correlation. This is performed by point-wise multiplication of image pairs in the Fourier domain (Section \ref{['method:correlation']}). Right-Centre: Correlation results yield lateral pixel offsets, which are converted to angular corrections and issued to the robot as updated goal poses (Section \ref{['method:heading']}). Right: Along-path corrections are estimated by evaluating correlations across the search space (Section \ref{['method:path']}) and applied to the robot’s motion using Equation \ref{['eq:path_corr']}.
  • Figure 3: Experimental platform and environments.Top-left: AgileX Scout Mini robot with a front-mounted Prophesee EVK4 HD event camera and an onboard processing laptop. Top-right: Narrow spaces found in our indoor (top) and outdoor (bottom) trial scenarios. Bottom-left: Indoor trajectory visualized on a map from SLAM Toolbox macenski2021slam. Bottom-right: Example outdoor trajectory (223 m) over tiled and grass surfaces.
  • Figure 4: Navigation performance across indoor and outdoor trajectories.Top row: Three indoor trajectories (left) and one outdoor trajectory (right). The blue paths denote the teach trajectories, the green paths show the repeat trajectories using our event-based correction, and the red paths indicate the odometry-only repeats. Both indoor and outdoor trajectories were estimated using LiDAR SLAM. Next to each map, failure cases of the odometry-only baseline are shown. In these examples, the green arrows indicate the direction the robot is expected to traverse for a successful repeat. Bottom row: Cross-Track Error (XTE) for our method (green) and the odometry-only baseline (red). Note that the odometry-only XTE grows unbounded, making the system prone to collisions and eventual failure.
  • Figure 5: Event-Accumulation Strategies under Varying Linear Velocities. (Section \ref{['abl1:velocity_invariance']}) Left: Comparison of Repeat and corresponding Teach frames using fixed-time binning. Significant appearance divergence occurs during angular motion, leading to navigation failure. Center: Comparison of frames using fixed-event count binning (proposed) at the same location. The representations remain consistent despite the change in linear velocity (0.33 m/s vs. 1.00 m/s). Top-Right: When using time fixed time accumulation, repeats at $0.66$ and $1.00$ m/s fail for teach traverse taken at $0.33$ m/s. Bottom-Right: For our proposed fixed-event count strategy, all three repeat traverses are completed with 100% success rate.