Table of Contents
Fetching ...

Secrets of Edge-Informed Contrast Maximization for Event-Based Vision

Pritam P. Karmokar, Quan H. Nguyen, William J. Beksi

TL;DR

This work tackles dense optical-flow estimation from asynchronous event data by introducing edge-informed contrast maximization (EINCM), a bi-modal framework that jointly optimizes the warped-event image contrast and its spatial correlation with a frame-derived edge image. By extending CM to incorporate multi-reference times and a multiscale handover scheme, EINCM leverages both events and edges to produce sharper, more accurate motion estimates, achieving state-of-the-art performance among model-based methods on MVSEC, DSEC, and ECD benchmarks. The approach demonstrates improved IWE sharpness and more faithful edge alignment without requiring ground-truth optical flow for training, while acknowledging practical challenges from edge reliability and frame-registration issues. Overall, EINCM advances real-time, edge-consistent event vision by fusing modalities and exploiting hierarchical optimization to surpass previous CM-based baselines.

Abstract

Event cameras capture the motion of intensity gradients (edges) in the image plane in the form of rapid asynchronous events. When accumulated in 2D histograms, these events depict overlays of the edges in motion, consequently obscuring the spatial structure of the generating edges. Contrast maximization (CM) is an optimization framework that can reverse this effect and produce sharp spatial structures that resemble the moving intensity gradients by estimating the motion trajectories of the events. Nonetheless, CM is still an underexplored area of research with avenues for improvement. In this paper, we propose a novel hybrid approach that extends CM from uni-modal (events only) to bi-modal (events and edges). We leverage the underpinning concept that, given a reference time, optimally warped events produce sharp gradients consistent with the moving edge at that time. Specifically, we formalize a correlation-based objective to aid CM and provide key insights into the incorporation of multiscale and multireference techniques. Moreover, our edge-informed CM method yields superior sharpness scores and establishes new state-of-the-art event optical flow benchmarks on the MVSEC, DSEC, and ECD datasets.

Secrets of Edge-Informed Contrast Maximization for Event-Based Vision

TL;DR

This work tackles dense optical-flow estimation from asynchronous event data by introducing edge-informed contrast maximization (EINCM), a bi-modal framework that jointly optimizes the warped-event image contrast and its spatial correlation with a frame-derived edge image. By extending CM to incorporate multi-reference times and a multiscale handover scheme, EINCM leverages both events and edges to produce sharper, more accurate motion estimates, achieving state-of-the-art performance among model-based methods on MVSEC, DSEC, and ECD benchmarks. The approach demonstrates improved IWE sharpness and more faithful edge alignment without requiring ground-truth optical flow for training, while acknowledging practical challenges from edge reliability and frame-registration issues. Overall, EINCM advances real-time, edge-consistent event vision by fusing modalities and exploiting hierarchical optimization to surpass previous CM-based baselines.

Abstract

Event cameras capture the motion of intensity gradients (edges) in the image plane in the form of rapid asynchronous events. When accumulated in 2D histograms, these events depict overlays of the edges in motion, consequently obscuring the spatial structure of the generating edges. Contrast maximization (CM) is an optimization framework that can reverse this effect and produce sharp spatial structures that resemble the moving intensity gradients by estimating the motion trajectories of the events. Nonetheless, CM is still an underexplored area of research with avenues for improvement. In this paper, we propose a novel hybrid approach that extends CM from uni-modal (events only) to bi-modal (events and edges). We leverage the underpinning concept that, given a reference time, optimally warped events produce sharp gradients consistent with the moving edge at that time. Specifically, we formalize a correlation-based objective to aid CM and provide key insights into the incorporation of multiscale and multireference techniques. Moreover, our edge-informed CM method yields superior sharpness scores and establishes new state-of-the-art event optical flow benchmarks on the MVSEC, DSEC, and ECD datasets.
Paper Structure (26 sections, 9 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 26 sections, 9 equations, 8 figures, 9 tables, 1 algorithm.

Figures (8)

  • Figure 1: A visualization of IWEs with zero and optimal warps (left), and the corresponding event and edge images (right). Events accumulated along optimally estimated motion trajectories are not only sharp, but also coincident with edges. The left column of insets shows the edges of a toy pyramid moving to the left through times $t_0$, $t_\text{mid}$, and $t_1$; however, the zero-warped IWE is invariant. The right column depicts how integration of optimally warped events look sharp and move covariantly with the edges.
  • Figure 2: The edge extraction pipeline. To extract a viable edge image from a grayscale image, we sequentially apply non-local means denoising, contrast-limited adaptive histogram equalization (CLAHE), Gaussian sharpening, bilateral filtering, Canny edge detection, and finally Gaussian blur for edge smoothing (please see supplementary material for an edge smoothing sensitivity analysis). Our pipeline was tested extensively for images in the MVSEC (low-quality) and DSEC (high-quality) datasets, and can be adjusted for other use cases.
  • Figure 3: Our pre-handover multiscale strategy with an optimize-handover-upsample pipeline. Notation: ${}_{p}\boldsymbol{\Theta}^{q}_{r}$, the number $p \in \{0, 1, 2, 3, 4\}$ denotes the pyramid level, $q \in \{0, \ast, \downarrow\}$ indicates different versions of the motion parameters, where $q=0$ and $q=\ast$ represents pre- and post-optimization, respectively. $q=\,\downarrow$ indicates downsampled from the preceding iteration. $r \in \{i, i-1\}$ represents the iteration. The symbols and denote handover and upsampling operations, respectively.
  • Figure 4: Qualitative comparisons ($dt=4$) of our approach against two prominent methods zhu2018evflownetshiba2022secrets on MVSEC. For each sequence, the two subsequent rows highlight the results. Column (a) shows our preprocessed edge images and the images of (original) events. Column (b) displays the available ground-truth (GT) flows and the corresponding IWEs. Columns (c-e) display the predicted flows masked by the original events and the constructed IWEs for each method.
  • Figure 5: MVSEC and DSEC ground-truth (GT) diagnosis. Events are overlaid over corresponding image frames. (a-c) shows the original events, GT warped events, and our warped events, respectively, on the MVSEC sequence indoor_flying_2. Compared to the GT, our method yields sharper warped events () that display better alignment with the image edges (also refer to \ref{['fig:mvsec_comparisons']} (c) rows 3 and 4). (d) and (e) show the original and our warped events, respectively, on the DSEC sequence thun_01_b, which was captured using different sensors. Note the grid-like rectification artifacts () in (d). Also, observe in (e) that the warped events are sharp, however the image alignment is limited. Misalignment artifacts from imperfect frame registration become prominent (e.g., road markings ) at points near the camera (see \ref{['fig:dsec_ecd_evaluations']} row 3 for further reference). These artifacts can render the problem ill-posed. In such scenarios, assigning a higher value to the coefficient of the correlation objective, $\beta$, may hinder overall convergence.
  • ...and 3 more figures