Table of Contents
Fetching ...

EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction

Chengjie Ge, Xueyang Fu, Peng He, Kunyu Wang, Chengzhi Cao, Zheng-Jun Zha

TL;DR

EventMamba tackles EBVR by addressing translation invariance and spatio-temporal locality losses in existing Vision Mamba models. It introduces Random Window Offset for spatial domains and Hilbert/Trans-Hilbert space-filling curve serialization for temporal-spatial ordering, integrated into a Mamba-based architecture. Across HQF, IJRR, and MVSEC, it achieves state-of-the-art or competitive reconstruction quality with favorable speed, demonstrating strong practical potential for real-time EBVR on resource-constrained devices. These innovations advance EBVR applicability by delivering high-fidelity reconstructions with efficient computation.

Abstract

Leveraging its robust linear global modeling capability, Mamba has notably excelled in computer vision. Despite its success, existing Mamba-based vision models have overlooked the nuances of event-driven tasks, especially in video reconstruction. Event-based video reconstruction (EBVR) demands spatial translation invariance and close attention to local event relationships in the spatio-temporal domain. Unfortunately, conventional Mamba algorithms apply static window partitions and standard reshape scanning methods, leading to significant losses in local connectivity. To overcome these limitations, we introduce EventMamba--a specialized model designed for EBVR tasks. EventMamba innovates by incorporating random window offset (RWO) in the spatial domain, moving away from the restrictive fixed partitioning. Additionally, it features a new consistent traversal serialization approach in the spatio-temporal domain, which maintains the proximity of adjacent events both spatially and temporally. These enhancements enable EventMamba to retain Mamba's robust modeling capabilities while significantly preserving the spatio-temporal locality of event data. Comprehensive testing on multiple datasets shows that EventMamba markedly enhances video reconstruction, drastically improving computation speed while delivering superior visual quality compared to Transformer-based methods.

EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction

TL;DR

EventMamba tackles EBVR by addressing translation invariance and spatio-temporal locality losses in existing Vision Mamba models. It introduces Random Window Offset for spatial domains and Hilbert/Trans-Hilbert space-filling curve serialization for temporal-spatial ordering, integrated into a Mamba-based architecture. Across HQF, IJRR, and MVSEC, it achieves state-of-the-art or competitive reconstruction quality with favorable speed, demonstrating strong practical potential for real-time EBVR on resource-constrained devices. These innovations advance EBVR applicability by delivering high-fidelity reconstructions with efficient computation.

Abstract

Leveraging its robust linear global modeling capability, Mamba has notably excelled in computer vision. Despite its success, existing Mamba-based vision models have overlooked the nuances of event-driven tasks, especially in video reconstruction. Event-based video reconstruction (EBVR) demands spatial translation invariance and close attention to local event relationships in the spatio-temporal domain. Unfortunately, conventional Mamba algorithms apply static window partitions and standard reshape scanning methods, leading to significant losses in local connectivity. To overcome these limitations, we introduce EventMamba--a specialized model designed for EBVR tasks. EventMamba innovates by incorporating random window offset (RWO) in the spatial domain, moving away from the restrictive fixed partitioning. Additionally, it features a new consistent traversal serialization approach in the spatio-temporal domain, which maintains the proximity of adjacent events both spatially and temporally. These enhancements enable EventMamba to retain Mamba's robust modeling capabilities while significantly preserving the spatio-temporal locality of event data. Comprehensive testing on multiple datasets shows that EventMamba markedly enhances video reconstruction, drastically improving computation speed while delivering superior visual quality compared to Transformer-based methods.

Paper Structure

This paper contains 15 sections, 13 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: (a) Example illustrating the loss of locality in the fixed window strategy (spatial locality loss in the red box), and our proposed Random Window Offset solution. (b) Demonstration of the loss of spatio-temporal locality in conventional space-filling curves contrasted with our introduced Hilbert space-filling curve technique.
  • Figure 2: The EventMamba architecture is U-Net-like, processing event voxels ($V_k$) to predict intensity images. It incorporates two key components: RWOMamba and HSFCMamba, which are designed to maintain the translation invariance and spatio-temporal locality of event features, respectively. The number of $N_1$ is set to 2 in our EventMamba architecture.
  • Figure 3: Qualitative comparisons on three benchmarks from HQF (row1-2), IJRR (row3), and MVSEC (row4).
  • Figure 4: Qualitative comparisons on sequences captured by the Prophesee EVK4 camera.