Table of Contents
Fetching ...

EvRainDrop: HyperGraph-guided Completion for Effective Frame and Event Stream Aggregation

Futian Wang, Fan Zhang, Xiao Wang, Mengqi Wang, Dexing Huang, Jin Tang

TL;DR

This work addresses the challenge of sparse, asynchronous event streams by modeling event tokens as nodes in a hypergraph guided by RGB context to perform spatio-temporal completion. The EvRainDrop framework employs two-stage hypergraph propagation—dynamic self-completion and cross-modal enhancement—followed by temporal self-attention to fuse information across time. It demonstrates state-of-the-art performance on both single-label (HAR) and multi-label (PAR) tasks across four datasets, validating the effectiveness of high-order, multimodal hypergraph completion for event-based perception. The proposed approach offers a principled path to mitigate spatial sparsity while leveraging rich temporal dynamics, with practical implications for robust RGB-Event fusion in real-world scenarios.

Abstract

Event cameras produce asynchronous event streams that are spatially sparse yet temporally dense. Mainstream event representation learning algorithms typically use event frames, voxels, or tensors as input. Although these approaches have achieved notable progress, they struggle to address the undersampling problem caused by spatial sparsity. In this paper, we propose a novel hypergraph-guided spatio-temporal event stream completion mechanism, which connects event tokens across different times and spatial locations via hypergraphs and leverages contextual information message passing to complete these sparse events. The proposed method can flexibly incorporate RGB tokens as nodes in the hypergraph within this completion framework, enabling multi-modal hypergraph-based information completion. Subsequently, we aggregate hypergraph node information across different time steps through self-attention, enabling effective learning and fusion of multi-modal features. Extensive experiments on both single- and multi-label event classification tasks fully validated the effectiveness of our proposed framework. The source code of this paper will be released on https://github.com/Event-AHU/EvRainDrop.

EvRainDrop: HyperGraph-guided Completion for Effective Frame and Event Stream Aggregation

TL;DR

This work addresses the challenge of sparse, asynchronous event streams by modeling event tokens as nodes in a hypergraph guided by RGB context to perform spatio-temporal completion. The EvRainDrop framework employs two-stage hypergraph propagation—dynamic self-completion and cross-modal enhancement—followed by temporal self-attention to fuse information across time. It demonstrates state-of-the-art performance on both single-label (HAR) and multi-label (PAR) tasks across four datasets, validating the effectiveness of high-order, multimodal hypergraph completion for event-based perception. The proposed approach offers a principled path to mitigate spatial sparsity while leveraging rich temporal dynamics, with practical implications for robust RGB-Event fusion in real-world scenarios.

Abstract

Event cameras produce asynchronous event streams that are spatially sparse yet temporally dense. Mainstream event representation learning algorithms typically use event frames, voxels, or tensors as input. Although these approaches have achieved notable progress, they struggle to address the undersampling problem caused by spatial sparsity. In this paper, we propose a novel hypergraph-guided spatio-temporal event stream completion mechanism, which connects event tokens across different times and spatial locations via hypergraphs and leverages contextual information message passing to complete these sparse events. The proposed method can flexibly incorporate RGB tokens as nodes in the hypergraph within this completion framework, enabling multi-modal hypergraph-based information completion. Subsequently, we aggregate hypergraph node information across different time steps through self-attention, enabling effective learning and fusion of multi-modal features. Extensive experiments on both single- and multi-label event classification tasks fully validated the effectiveness of our proposed framework. The source code of this paper will be released on https://github.com/Event-AHU/EvRainDrop.

Paper Structure

This paper contains 19 sections, 8 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Illustration of the irregular raindrop, irregularly sampled event streams and RGB frames, and hypergraph-guided RGB-Event spatio-temporal completion.
  • Figure 2: An overview of our proposed HyperGraph-guided completion framework for effective frame and event stream aggregation, termed EvRainDrop. Specifically, we first partition and map the given RGB and event stream inputs, and then process them through a visual encoder to obtain RGB and Event tokens. Subsequently, the dual-modal tokens are fed into the hypergraph-guided message passing module, where dense event stream information and RGB spatial information are leveraged to compensate for the sparsity of event streams in the spatial domain. Afterward, we concatenate the RGB features with the enhanced event stream and pass it through a classification head to perform downstream pattern recognition, including pedestrian attribute recognition and human action recognition.
  • Figure 3: Hyperparameter analysis of our proposed framework on the PokerEvent dataset.
  • Figure 4: Visualization of pedestrian attributes predicted by our proposed method on DukeMTMC-VID-Attribute Dataset. The red attributes indicate incorrect predictions, blue attributes indicate missing predictions, and green attributes represent the ground truth.
  • Figure 5: The feature distribution of Baseline (left sub-figure) and our newly proposed model (right sub-figure) on the PokerEvent dataset using T-SNE.
  • ...and 2 more figures