Table of Contents
Fetching ...

Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations

Zipeng Wang, Yunfan Lu, Lin Wang

TL;DR

The paper addresses reconstructing high-temporal-resolution video from asynchronous events without labeled data or optical flow by linking the event generation PDE to an implicit neural representation of intensity. It introduces EvINR, which uses a fully connected MLP to model the continuous log-intensity and trains it via temporal derivatives derived from events, complemented by spatial regularization and a principled tone-mapping pipeline. Key contributions include (i) solving the event generation equation with an INR in a self-supervised manner, (ii) acceleration mechanisms that enable online usage, and (iii) a real-world AED dataset collected with ALPIX-Eiger to test robustness across sensors. The results show EvINR outperforms prior SSL methods and is competitive with state-of-the-art supervised approaches, offering improved interpretability, robustness to noise, and practical applicability for real-time event-to-video reconstruction.

Abstract

Reconstructing intensity frames from event data while maintaining high temporal resolution and dynamic range is crucial for bridging the gap between event-based and frame-based computer vision. Previous approaches have depended on supervised learning on synthetic data, which lacks interpretability and risk over-fitting to the setting of the event simulator. Recently, self-supervised learning (SSL) based methods, which primarily utilize per-frame optical flow to estimate intensity via photometric constancy, has been actively investigated. However, they are vulnerable to errors in the case of inaccurate optical flow. This paper proposes a novel SSL event-to-video reconstruction approach, dubbed EvINR, which eliminates the need for labeled data or optical flow estimation. Our core idea is to reconstruct intensity frames by directly addressing the event generation model, essentially a partial differential equation (PDE) that describes how events are generated based on the time-varying brightness signals. Specifically, we utilize an implicit neural representation (INR), which takes in spatiotemporal coordinate $(x, y, t)$ and predicts intensity values, to represent the solution of the event generation equation. The INR, parameterized as a fully-connected Multi-layer Perceptron (MLP), can be optimized with its temporal derivatives supervised by events. To make EvINR feasible for online requisites, we propose several acceleration techniques that substantially expedite the training process. Comprehensive experiments demonstrate that our EvINR surpasses previous SSL methods by 38% w.r.t. Mean Squared Error (MSE) and is comparable or superior to SoTA supervised methods. Project page: https://vlislab22.github.io/EvINR/.

Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations

TL;DR

The paper addresses reconstructing high-temporal-resolution video from asynchronous events without labeled data or optical flow by linking the event generation PDE to an implicit neural representation of intensity. It introduces EvINR, which uses a fully connected MLP to model the continuous log-intensity and trains it via temporal derivatives derived from events, complemented by spatial regularization and a principled tone-mapping pipeline. Key contributions include (i) solving the event generation equation with an INR in a self-supervised manner, (ii) acceleration mechanisms that enable online usage, and (iii) a real-world AED dataset collected with ALPIX-Eiger to test robustness across sensors. The results show EvINR outperforms prior SSL methods and is competitive with state-of-the-art supervised approaches, offering improved interpretability, robustness to noise, and practical applicability for real-time event-to-video reconstruction.

Abstract

Reconstructing intensity frames from event data while maintaining high temporal resolution and dynamic range is crucial for bridging the gap between event-based and frame-based computer vision. Previous approaches have depended on supervised learning on synthetic data, which lacks interpretability and risk over-fitting to the setting of the event simulator. Recently, self-supervised learning (SSL) based methods, which primarily utilize per-frame optical flow to estimate intensity via photometric constancy, has been actively investigated. However, they are vulnerable to errors in the case of inaccurate optical flow. This paper proposes a novel SSL event-to-video reconstruction approach, dubbed EvINR, which eliminates the need for labeled data or optical flow estimation. Our core idea is to reconstruct intensity frames by directly addressing the event generation model, essentially a partial differential equation (PDE) that describes how events are generated based on the time-varying brightness signals. Specifically, we utilize an implicit neural representation (INR), which takes in spatiotemporal coordinate and predicts intensity values, to represent the solution of the event generation equation. The INR, parameterized as a fully-connected Multi-layer Perceptron (MLP), can be optimized with its temporal derivatives supervised by events. To make EvINR feasible for online requisites, we propose several acceleration techniques that substantially expedite the training process. Comprehensive experiments demonstrate that our EvINR surpasses previous SSL methods by 38% w.r.t. Mean Squared Error (MSE) and is comparable or superior to SoTA supervised methods. Project page: https://vlislab22.github.io/EvINR/.
Paper Structure (17 sections, 11 equations, 7 figures, 4 tables)

This paper contains 17 sections, 11 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Connection between event generation model and EvINR: Event generation model reveals the relation between discrete events and continuous temporal intensity changes, described as the event generation equation (Eq. \ref{['eq:4']}). EvINR utilizes an INR to solve Eq. \ref{['eq:4']} and recovering a continuous function of intensity w.r.t. time, implicitly parameterized with a fully connected MLP.
  • Figure 2: Overview of EvINR. A fully connected MLP is used to implicitly solve the event generation equation. The temporal gradient of the MLP is supervised by temporal intensity changes of events, and the spatial gradient is penalized to reduce noise.
  • Figure 3: Overview of acceleration techniques. (a) and (b) illustrate the difference between the basic coordinate-based and our frame-based optimization, respectively. (c) depicts our proposed coarse-to-fine training scheme and network ensembling technique.
  • Figure 4: Qualitative comparison with baseline methods on IJRR(Row 1&2), HQF (Row 3&4) and AED(Row 5&6).
  • Figure 5: The impact of removing the spatial regularization and coarse-to-fine training.
  • ...and 2 more figures