Table of Contents
Fetching ...

Lightweight Event-based Optical Flow Estimation via Iterative Deblurring

Yilun Wu, Federico Paredes-Vallés, Guido C. H. E. de Croon

TL;DR

Event-based optical flow methods are often bottlenecked by correlation-volume computations that incur high latency and memory usage. IDNet introduces a correlation-volume-free approach that estimates flow from continuous event traces using iterative deblurring with a ConvGRU backbone, offering two update schemes: ID (batch-wise iterations) and TID (time-stepped iterations). The method achieves near state-of-the-art accuracy on DSEC-Flow with far fewer parameters and memory, and enables real-time operation on embedded hardware with TID, while still maintaining strong performance at higher resolutions. This work demonstrates that iterative deblurring and temporal priors can yield highly efficient, scalable flow estimation suitable for resource-constrained robotic systems.

Abstract

Inspired by frame-based methods, state-of-the-art event-based optical flow networks rely on the explicit construction of correlation volumes, which are expensive to compute and store, rendering them unsuitable for robotic applications with limited compute and energy budget. Moreover, correlation volumes scale poorly with resolution, prohibiting them from estimating high-resolution flow. We observe that the spatiotemporally continuous traces of events provide a natural search direction for seeking pixel correspondences, obviating the need to rely on gradients of explicit correlation volumes as such search directions. We introduce IDNet (Iterative Deblurring Network), a lightweight yet high-performing event-based optical flow network directly estimating flow from event traces without using correlation volumes. We further propose two iterative update schemes: "ID" which iterates over the same batch of events, and "TID" which iterates over time with streaming events in an online fashion. Our top-performing ID model sets a new state of the art on DSEC benchmark. Meanwhile, the base ID model is competitive with prior arts while using 80% fewer parameters, consuming 20x less memory footprint and running 40% faster on the NVidia Jetson Xavier NX. Furthermore, the TID model is even more efficient offering an additional 5x faster inference speed and 8 ms ultra-low latency at the cost of only a 9% performance drop, making it the only model among current literature capable of real-time operation while maintaining decent performance.

Lightweight Event-based Optical Flow Estimation via Iterative Deblurring

TL;DR

Event-based optical flow methods are often bottlenecked by correlation-volume computations that incur high latency and memory usage. IDNet introduces a correlation-volume-free approach that estimates flow from continuous event traces using iterative deblurring with a ConvGRU backbone, offering two update schemes: ID (batch-wise iterations) and TID (time-stepped iterations). The method achieves near state-of-the-art accuracy on DSEC-Flow with far fewer parameters and memory, and enables real-time operation on embedded hardware with TID, while still maintaining strong performance at higher resolutions. This work demonstrates that iterative deblurring and temporal priors can yield highly efficient, scalable flow estimation suitable for resource-constrained robotic systems.

Abstract

Inspired by frame-based methods, state-of-the-art event-based optical flow networks rely on the explicit construction of correlation volumes, which are expensive to compute and store, rendering them unsuitable for robotic applications with limited compute and energy budget. Moreover, correlation volumes scale poorly with resolution, prohibiting them from estimating high-resolution flow. We observe that the spatiotemporally continuous traces of events provide a natural search direction for seeking pixel correspondences, obviating the need to rely on gradients of explicit correlation volumes as such search directions. We introduce IDNet (Iterative Deblurring Network), a lightweight yet high-performing event-based optical flow network directly estimating flow from event traces without using correlation volumes. We further propose two iterative update schemes: "ID" which iterates over the same batch of events, and "TID" which iterates over time with streaming events in an online fashion. Our top-performing ID model sets a new state of the art on DSEC benchmark. Meanwhile, the base ID model is competitive with prior arts while using 80% fewer parameters, consuming 20x less memory footprint and running 40% faster on the NVidia Jetson Xavier NX. Furthermore, the TID model is even more efficient offering an additional 5x faster inference speed and 8 ms ultra-low latency at the cost of only a 9% performance drop, making it the only model among current literature capable of real-time operation while maintaining decent performance.
Paper Structure (21 sections, 3 equations, 6 figures, 6 tables, 2 algorithms)

This paper contains 21 sections, 3 equations, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: Illustration of the IDNet pipeline for temporal iterative deblurring (i.e. TID scheme). Raw events are first deblurred according to the initial coarse optical flow estimate $\hat{\mathcal{F}}^t$ before being processed by the backbone RNN. The RNN extracts the residual motion from the deblurred events and outputs the residual flow $\Delta \mathcal{F}^t$ which is added to the initial estimate $\hat{\mathcal{F}}^t$ to arrive at the final flow estimation $\mathcal{F}^t$. The RNN additionally proposes a coarse estimate $\hat{\mathcal{F}}^{t+1}$ for the next timestep under continuous operation.
  • Figure 2: Overall pipeline of IDNet with iterative deblurring scheme (i.e. ID scheme). Starting with a zero flow, each iteration deblurs events using prior flow. The deblurred event bins are fed into the RNN sequentially one bin at a time. A residual flow is estimated and used for deblurring in the next iteration. The final flow accumulates all residual flows throughout iterations. An L1 loss is applied between the final flow estimate and ground truth. The detailed network structure is shown on the right. The parameters ch, k, and s of the Conv2d layer refer to the output channel count, kernel size, and stride.
  • Figure 3: Initial flow $\hat{\mathcal{F}}^t$, residual flow $\Delta \mathcal{F}^t$ and final flow $\mathcal{F}^t$ as time progresses and new event bins keeps coming. A trend of increasing quality in $\mathcal{F}^t$ and lowering magnitude in $\Delta \mathcal{F}^t$ can be observed, implying that the flow is iteratively refined through time.
  • Figure 4: Qualitative results of optical flow predictions on DSEC-Flow with highlighted regions of interest. Images are for visualization only, as optical flow is event-based. Test samples reveal that our ID method at 1/4 resolution produces superior results on fine details and small objects, while the TID method yields results that are comparable to those of E-RAFT.
  • Figure 5: Qualitative results on MVSEC outdoor_day_1 sequence.
  • ...and 1 more figures