Table of Contents
Fetching ...

Secrets of Event-Based Optical Flow

Shintaro Shiba, Yoshimitsu Aoki, Guillermo Gallego

TL;DR

A principled method to extend the Contrast Maximization framework to estimate optical flow from events alone, which ranks first among unsupervised methods on the MVSEC benchmark, and is competitive on the DSEC benchmark.

Abstract

Event cameras respond to scene dynamics and offer advantages to estimate motion. Following recent image-based deep-learning achievements, optical flow estimation methods for event cameras have rushed to combine those image-based methods with event data. However, it requires several adaptations (data conversion, loss function, etc.) as they have very different properties. We develop a principled method to extend the Contrast Maximization framework to estimate optical flow from events alone. We investigate key elements: how to design the objective function to prevent overfitting, how to warp events to deal better with occlusions, and how to improve convergence with multi-scale raw events. With these key elements, our method ranks first among unsupervised methods on the MVSEC benchmark, and is competitive on the DSEC benchmark. Moreover, our method allows us to expose the issues of the ground truth flow in those benchmarks, and produces remarkable results when it is transferred to unsupervised learning settings. Our code is available at https://github.com/tub-rip/event_based_optical_flow

Secrets of Event-Based Optical Flow

TL;DR

A principled method to extend the Contrast Maximization framework to estimate optical flow from events alone, which ranks first among unsupervised methods on the MVSEC benchmark, and is competitive on the DSEC benchmark.

Abstract

Event cameras respond to scene dynamics and offer advantages to estimate motion. Following recent image-based deep-learning achievements, optical flow estimation methods for event cameras have rushed to combine those image-based methods with event data. However, it requires several adaptations (data conversion, loss function, etc.) as they have very different properties. We develop a principled method to extend the Contrast Maximization framework to estimate optical flow from events alone. We investigate key elements: how to design the objective function to prevent overfitting, how to warp events to deal better with occlusions, and how to improve convergence with multi-scale raw events. With these key elements, our method ranks first among unsupervised methods on the MVSEC benchmark, and is competitive on the DSEC benchmark. Moreover, our method allows us to expose the issues of the ground truth flow in those benchmarks, and produces remarkable results when it is transferred to unsupervised learning settings. Our code is available at https://github.com/tub-rip/event_based_optical_flow
Paper Structure (25 sections, 15 equations, 11 figures, 7 tables)

This paper contains 25 sections, 15 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Two test sequences (interlaken_00_b, thun_01_a) from the DSEC dataset Gehrig21ral. Our optical flow estimation method produces sharp images of warped events (IWE) despite the scene complexity, the large pixel displacement and the high dynamic range. The examples utilize 500k events on an event camera with $640 \times 480$ pixels.
  • Figure 2: Multi-reference focus loss. Assume an edge moves from left to right. Flow estimation with single reference time ($t_1$) can overfit to the data, warping all events into a single pixel, which results in a maximum contrast (at $t_1$). However, the same flow would produce low contrast (i.e., a blurry image) if events were warped to time $t_{N_e}$. Instead, we favor flow fields that produce high contrast (i.e., sharp images) at any reference time (here, $t_\text{ref} = t_1$ and $t_\text{ref} = t_{N_e}$). See results in Fig. \ref{['fig:suppl_abl_multiref']}.
  • Figure 3: Time-aware Flow. Traditional flow \ref{['eq:warp:oflow']}, inherited from frame-based approaches, assumes per-pixel constant flow $\mathbf{v}(\mathbf{x}) = \text{const}$, which cannot handle occlusions properly. The proposed space-time flow assumes constancy along streamlines, $\mathbf{v}(\mathbf{x}(t),t) = \text{const}$, which allows us to handle occlusions more accurately. (See results in Fig. \ref{['fig:effect_of_time_aware']})
  • Figure 4: Multi-scale Approach using tiles (rectangles) and raw events.
  • Figure 5: MVSEC comparison ($dt=4$) of our method and two state-of-the-art baselines: ConvGRU-EV-FlowNet (USL) Paredes21neurips and EV-FlowNet (SSL) Zhu18rss. For each sequence, the upper row shows the flow masked by the input events, and the lower row shows the IWE using the flow. Our method produces the sharpest motion-compensated IWEs. Note that learning-based methods crop input events to center 256 $\times$ 256 pixels, whereas our method does not. Black points in ground truth (GT) flow maps indicate the absence of LiDAR measurements. The optical flow color wheel is in Fig. \ref{['fig:eye_catcher']}.
  • ...and 6 more figures