Table of Contents
Fetching ...

An Asynchronous Linear Filter Architecture for Hybrid Event-Frame Cameras

Ziwei Wang, Yonhon Ng, Cedric Scheerlinck, Robert Mahony

TL;DR

This work addresses HDR video reconstruction and spatial convolution with hybrid event-frame cameras by introducing an asynchronous linear filter architecture. It combines a Complementary Filter (CF) and an Asynchronous Kalman Filter (AKF) with per-pixel uncertainty, frame augmentation, and an event-based spatial convolution module, operational at the same temporal resolution as events. A unified noise model and exact-discretization enable fully asynchronous updates, while frame augmentation and per-pixel Kalman gains substantially improve HDR fidelity, reducing ghosting and artifacts. A novel HDR hybrid event-frame dataset with ground-truth HDR references demonstrates state-of-the-art performance, and the framework’s capacity for directly computing spatial convolutions with kernels like Gaussian, Sobel, and Laplacian highlights its practical utility for real-time robotics and embedded systems.

Abstract

Event cameras are ideally suited to capture High Dynamic Range (HDR) visual information without blur but provide poor imaging capability for static or slowly varying scenes. Conversely, conventional image sensors measure absolute intensity of slowly changing scenes effectively but do poorly on HDR or quickly changing scenes. In this paper, we present an asynchronous linear filter architecture, fusing event and frame camera data, for HDR video reconstruction and spatial convolution that exploits the advantages of both sensor modalities. The key idea is the introduction of a state that directly encodes the integrated or convolved image information and that is updated asynchronously as each event or each frame arrives from the camera. The state can be read-off as-often-as and whenever required to feed into subsequent vision modules for real-time robotic systems. Our experimental results are evaluated on both publicly available datasets with challenging lighting conditions and fast motions, along with a new dataset with HDR reference that we provide. The proposed AKF pipeline outperforms other state-of-the-art methods in both absolute intensity error (69.4% reduction) and image similarity indexes (average 35.5% improvement). We also demonstrate the integration of image convolution with linear spatial kernels Gaussian, Sobel, and Laplacian as an application of our architecture.

An Asynchronous Linear Filter Architecture for Hybrid Event-Frame Cameras

TL;DR

This work addresses HDR video reconstruction and spatial convolution with hybrid event-frame cameras by introducing an asynchronous linear filter architecture. It combines a Complementary Filter (CF) and an Asynchronous Kalman Filter (AKF) with per-pixel uncertainty, frame augmentation, and an event-based spatial convolution module, operational at the same temporal resolution as events. A unified noise model and exact-discretization enable fully asynchronous updates, while frame augmentation and per-pixel Kalman gains substantially improve HDR fidelity, reducing ghosting and artifacts. A novel HDR hybrid event-frame dataset with ground-truth HDR references demonstrates state-of-the-art performance, and the framework’s capacity for directly computing spatial convolutions with kernels like Gaussian, Sobel, and Laplacian highlights its practical utility for real-time robotics and embedded systems.

Abstract

Event cameras are ideally suited to capture High Dynamic Range (HDR) visual information without blur but provide poor imaging capability for static or slowly varying scenes. Conversely, conventional image sensors measure absolute intensity of slowly changing scenes effectively but do poorly on HDR or quickly changing scenes. In this paper, we present an asynchronous linear filter architecture, fusing event and frame camera data, for HDR video reconstruction and spatial convolution that exploits the advantages of both sensor modalities. The key idea is the introduction of a state that directly encodes the integrated or convolved image information and that is updated asynchronously as each event or each frame arrives from the camera. The state can be read-off as-often-as and whenever required to feed into subsequent vision modules for real-time robotic systems. Our experimental results are evaluated on both publicly available datasets with challenging lighting conditions and fast motions, along with a new dataset with HDR reference that we provide. The proposed AKF pipeline outperforms other state-of-the-art methods in both absolute intensity error (69.4% reduction) and image similarity indexes (average 35.5% improvement). We also demonstrate the integration of image convolution with linear spatial kernels Gaussian, Sobel, and Laplacian as an application of our architecture.
Paper Structure (34 sections, 52 equations, 18 figures, 7 tables)

This paper contains 34 sections, 52 equations, 18 figures, 7 tables.

Figures (18)

  • Figure 1: An example HDR image reconstruction and Laplacian spatial convolution on a driving sequence taken from the open-source stereo event-frame dataset DSEC Gehrig21ral, city night sequence. Image (a) is a raw image from a conventional camera that is low dynamic range (LDR) and blurry. Image (b) are the events from a co-located event camera, red plus for positive events and blue cross for negative events. Image (c) is our HDR reconstruction that clearly reconstructs sharp objects in a challenging low-light condition. Image (d) is our Laplacian spatial convolution result, detecting many edges that are not visible in the input LDR image.
  • Figure 2: Block diagram of the image processing architecture discussed in §\ref{['sec:method']}.
  • Figure 3: Frame augmentation. Two deblurred frames at times $\tau^k + \frac{T}{2}$ and $\tau^{k+1} - \frac{T}{2}$ are computed. The event stream is used to interpolate between the two deblurred frames to improve temporal resolution.
  • Figure 4: Comparison of state-of-the-art event-based video reconstruction methods on the newest open-source event camera driving sequences DSEC Gehrig21ral, with challenging city night scenes and high dynamic range outdoor scenes. E2VID Rebecq20pami fails on city night sequences, possibly because the DSEC dataset Gehrig21ral is based on a different type of event camera. ECNN Stoffregen20eccv achieves better HDR reconstruction but still sensitive to event noise. Han et al. han2020neuromorphic generates HDR images with clear details in both dark and bright scenarios but the images suffer from blur and washed-out artefacts. Our CF and AKF are able to compute sharp and clear objects in extreme lighting conditions and clearly outperform other methods.
  • Figure 5: Typical results from the proposed HDR and AHDR dataset. Our HDR dataset includes referenced HDR images generated by fusing several images of various exposures. Our AHDR dataset is simulated by saturating the values of well-exposed real images, taking out most of the details. The original images are used as HDR references. E2VID Rebecq20pami and ECNN Stoffregen20eccv use event data only. They perform poorly on the dark trees in the HDR dataset and the road/sky in the AHDR dataset. Han et al. han2020neuromorphic successfully reconstructs Tree but fails to reconstruct the road in Mountain. The output image also has unrealistic artefacts appearing near the image edges. Our CF fuses event data and low dynamic range frames properly but still suffers from motion blur (on the left-hand trees) and shadows on moving object edges. Our AKF correctly computes the underexposed and overexposed trees sharply in the HDR dataset and reconstructs the mountain road clearly in the artificially saturated regions.
  • ...and 13 more figures

Theorems & Definitions (2)

  • Remark 4.1
  • Remark 4.2