Table of Contents
Fetching ...

Event-based vision for egomotion estimation using precise event timing

Hugh Greatorex, Michele Mastella, Madison Cotteret, Ole Richter, Elisabetta Chicca

TL;DR

This work addresses the challenge of accurate, low-latency egomotion estimation for robotics by replacing frame-based, energy-intensive pipelines with a fully event-based approach. It introduces a Time Difference Encoder (TDE) and a shallow spiking neural network that directly processes asynchronous event streams to extract local optical flow, enabling on-chip readouts of egomotion with minimal latency and power. Silicon-level measurements of a cognigr1 TDE circuit and on-chip network emulation, complemented by scaled-up simulations with up to 178,880 units, demonstrate low drift and state-of-the-art ARRE on MVSEC compared to prior methods. The approach promises real-time, power-efficient navigation for micro-drones and edge devices, with ready integration into neuromorphic vision stacks and future hardware accelerators.

Abstract

Egomotion estimation is crucial for applications such as autonomous navigation and robotics, where accurate and real-time motion tracking is required. However, traditional methods relying on inertial sensors are highly sensitive to external conditions, and suffer from drifts leading to large inaccuracies over long distances. Vision-based methods, particularly those utilising event-based vision sensors, provide an efficient alternative by capturing data only when changes are perceived in the scene. This approach minimises power consumption while delivering high-speed, low-latency feedback. In this work, we propose a fully event-based pipeline for egomotion estimation that processes the event stream directly within the event-based domain. This method eliminates the need for frame-based intermediaries, allowing for low-latency and energy-efficient motion estimation. We construct a shallow spiking neural network using a synaptic gating mechanism to convert precise event timing into bursts of spikes. These spikes encode local optical flow velocities, and the network provides an event-based readout of egomotion. We evaluate the network's performance on a dedicated chip, demonstrating strong potential for low-latency, low-power motion estimation. Additionally, simulations of larger networks show that the system achieves state-of-the-art accuracy in egomotion estimation tasks with event-based cameras, making it a promising solution for real-time, power-constrained robotics applications.

Event-based vision for egomotion estimation using precise event timing

TL;DR

This work addresses the challenge of accurate, low-latency egomotion estimation for robotics by replacing frame-based, energy-intensive pipelines with a fully event-based approach. It introduces a Time Difference Encoder (TDE) and a shallow spiking neural network that directly processes asynchronous event streams to extract local optical flow, enabling on-chip readouts of egomotion with minimal latency and power. Silicon-level measurements of a cognigr1 TDE circuit and on-chip network emulation, complemented by scaled-up simulations with up to 178,880 units, demonstrate low drift and state-of-the-art ARRE on MVSEC compared to prior methods. The approach promises real-time, power-efficient navigation for micro-drones and edge devices, with ready integration into neuromorphic vision stacks and future hardware accelerators.

Abstract

Egomotion estimation is crucial for applications such as autonomous navigation and robotics, where accurate and real-time motion tracking is required. However, traditional methods relying on inertial sensors are highly sensitive to external conditions, and suffer from drifts leading to large inaccuracies over long distances. Vision-based methods, particularly those utilising event-based vision sensors, provide an efficient alternative by capturing data only when changes are perceived in the scene. This approach minimises power consumption while delivering high-speed, low-latency feedback. In this work, we propose a fully event-based pipeline for egomotion estimation that processes the event stream directly within the event-based domain. This method eliminates the need for frame-based intermediaries, allowing for low-latency and energy-efficient motion estimation. We construct a shallow spiking neural network using a synaptic gating mechanism to convert precise event timing into bursts of spikes. These spikes encode local optical flow velocities, and the network provides an event-based readout of egomotion. We evaluate the network's performance on a dedicated chip, demonstrating strong potential for low-latency, low-power motion estimation. Additionally, simulations of larger networks show that the system achieves state-of-the-art accuracy in egomotion estimation tasks with event-based cameras, making it a promising solution for real-time, power-constrained robotics applications.
Paper Structure (20 sections, 28 equations, 9 figures, 1 table)

This paper contains 20 sections, 28 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Photograph of the cognigr1 chip, fabricated in [detect-all]180 technology along with the schematic of the TDE synapse.a) The relevant structures on the die are indicated. The total size of the TDE circuit is 19$\times$56 including guard rings and is biased by an on-chip DAC. b) Schematic of the state-of-the-art CMOS TDE synapse greatorex_scalable_2025, with facilitatory and trigger blocks labeled.
  • Figure 2: Silicon measurements of the TDE circuit.a) The time ($\Delta t$) between FAC and TRG input events was increased systematically. Each plot shows a $\Delta t$ increment of 20 from 0 to 80m s. The membrane potential, $V_{\text{mem}}$ of the neuron integrates the input current from the TDE synapse and outputs a spike when the threshold is reached. The encoding of the temporal distance between the input events is exhibited by the dynamics of $V_{\text{mem}}$ and the associated output spikes of the TDE. b) The time to first spike of the TDE spiking output for increasing time difference between input events. The response of the simulated TDE model to the same input is also shown as a comparison. c) The number of spikes in the burst of the TDE with respect to the $\Delta t$ of input events.
  • Figure 3: The TDE can be applied to event-based vision tasks by "connecting" the FAC and TRG inputs to specific $(x,y)$ pixels. In this way the TDE becomes receptive to a particular direction of motion. In this case the TDE is sensitive to left-right motion, indicated by the arrow representing the TDE connectivity. We define the stride as the pixel-wise separation of FAC and TRG inputs, in this example it is 1.
  • Figure 4: An illustration of how the FAC and TRG connections are oriented for each TDE unit in relation to the $\bm{(x, y)}$ event data from the event camera.b) A single frame (generated by integrating activity over an arbitrary time step) of the MVSEC sample outdoor_day1. This event data was recorded by a DAVIS 346B DVS camera ($346 \times 260$ pixels) situated on the bonnet of a vehicle driving through a city. The two boxes illustrate the two sample areas from which events were sampled to estimate the egomotion of the vehicle. Both boxes, a) and c), referred to as left and right, are $20 \times 20$ pixels in size and contain 100 randomly placed TDE, with an equal proportion orientated with either left-right or right-left polarity and a stride of 2 pixels. Both sample boxes have precisely the same placement of TDE.
  • Figure 5: Measurements from the TDE circuit on the cognigr1 chip, implementing the egomotion estimation task. For this task the first 90 seconds of events from the outdoor_day1 MVSEC sample were used. a) The event data from the event-camera mounted on the driving car. The two boxes (each $20\times20$ pixels) depict the two areas of the visual field from which events were sampled. b) The input events from the left box (blue) and the response of TDE in the network orientated in the right-left ($\text{R} \mathrel{ \mkern-4mu\hbox{)}}\text{L}$) and left-right ($\text{L} \mathrel{ \mkern-4mu\hbox{)}}\text{R}$) direction. Below each raster plot the integrated activity, normalised for each box, is shown. This activity shows the differing sensitivity of TDE orientation and the event rate propagated through the network. c) The same information for the right box (orange). Additionally, a zoomed in section of the TDE network response is displayed to show the bursting activity of individual TDE units.
  • ...and 4 more figures