Table of Contents
Fetching ...

An Event-Oriented Diffusion-Refinement Method for Sparse Events Completion

Bo Zhang, Yuqi Han, Jinli Suo, Qionghai Dai

TL;DR

This work treats event streams as 3D event clouds in the spatiotemporal domain, develops a diffusion-based generative model to generate dense clouds in a coarse-to-fine manner, and recover exact timestamps to maintain the temporal resolution of raw data successfully.

Abstract

Event cameras or dynamic vision sensors (DVS) record asynchronous response to brightness changes instead of conventional intensity frames, and feature ultra-high sensitivity at low bandwidth. The new mechanism demonstrates great advantages in challenging scenarios with fast motion and large dynamic range. However, the recorded events might be highly sparse due to either limited hardware bandwidth or extreme photon starvation in harsh environments. To unlock the full potential of event cameras, we propose an inventive event sequence completion approach conforming to the unique characteristics of event data in both the processing stage and the output form. Specifically, we treat event streams as 3D event clouds in the spatiotemporal domain, develop a diffusion-based generative model to generate dense clouds in a coarse-to-fine manner, and recover exact timestamps to maintain the temporal resolution of raw data successfully. To validate the effectiveness of our method comprehensively, we perform extensive experiments on three widely used public datasets with different spatial resolutions, and additionally collect a novel event dataset covering diverse scenarios with highly dynamic motions and under harsh illumination. Besides generating high-quality dense events, our method can benefit downstream applications such as object classification and intensity frame reconstruction.

An Event-Oriented Diffusion-Refinement Method for Sparse Events Completion

TL;DR

This work treats event streams as 3D event clouds in the spatiotemporal domain, develops a diffusion-based generative model to generate dense clouds in a coarse-to-fine manner, and recover exact timestamps to maintain the temporal resolution of raw data successfully.

Abstract

Event cameras or dynamic vision sensors (DVS) record asynchronous response to brightness changes instead of conventional intensity frames, and feature ultra-high sensitivity at low bandwidth. The new mechanism demonstrates great advantages in challenging scenarios with fast motion and large dynamic range. However, the recorded events might be highly sparse due to either limited hardware bandwidth or extreme photon starvation in harsh environments. To unlock the full potential of event cameras, we propose an inventive event sequence completion approach conforming to the unique characteristics of event data in both the processing stage and the output form. Specifically, we treat event streams as 3D event clouds in the spatiotemporal domain, develop a diffusion-based generative model to generate dense clouds in a coarse-to-fine manner, and recover exact timestamps to maintain the temporal resolution of raw data successfully. To validate the effectiveness of our method comprehensively, we perform extensive experiments on three widely used public datasets with different spatial resolutions, and additionally collect a novel event dataset covering diverse scenarios with highly dynamic motions and under harsh illumination. Besides generating high-quality dense events, our method can benefit downstream applications such as object classification and intensity frame reconstruction.
Paper Structure (16 sections, 11 equations, 8 figures, 3 tables)

This paper contains 16 sections, 11 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: An exemplar demonstration of our event completion performance, in terms of 3D spatiotemporal cloud (upper) and accumulated 2D image (lower). left: the sub-sampled sparse sequence consisting of 128 events; middle: the completed counterpart; right: the ground truth.
  • Figure 2: The overview of diffusion-based coarse-to-fine event completion pipeline. First, we use an event-oriented network to generate coarse distributions of events based on conditional sparse events. Then, we use a second network to yield final completed dense events.
  • Figure 3: The architecture of EDR network. The upper branch extracts features from the conditional input, which is absorbed into the lower branch to denoise the noisy input. The proposed event-inspired cuboid query is extensively used in the three main modules---event-oriented set abstraction, feature propagation and feature transfer.
  • Figure 4: The illustration of the original ball query (left) and the proposed cuboid query (right). Cuboid query consumes more events in the temporal dimension which is important for 3D event cloud representation.
  • Figure 5: The event completion results on N-MNIST dataset from input with 256 events (a) and 128 events (b). STCL leads to too dense events which may lose local shape, e.g. '7' in (b), while results of PoinTr and VRCNet tend to suffer from missing entries. Our method maintains both overall event completeness and local shape.
  • ...and 3 more figures