Table of Contents
Fetching ...

Making Every Event Count: Balancing Data Efficiency and Accuracy in Event Camera Subsampling

Hesam Araghi, Jan van Gemert, Nergis Tomen

TL;DR

The paper addresses the challenge of efficient processing for high-rate event cameras by evaluating six hardware-friendly subsampling methods for CNN-based event-video classification across three benchmarks. It introduces a simple causal density-based subsampling approach within an EST voxel-grid representation and uses a consistent training setup to compare methods, aided by the metric of normalized AUC over the number of events. Key findings show that density-based and corner-based subsampling offer the best balance between data efficiency and accuracy, with random and naive spatial subsampling serving as useful baselines; density normalization significantly improves performance in sparse regimes, and temporal subsampling tends to be more robust to offset variations. The work provides practical guidance for hardware implementations, highlighting trade-offs in memory and compute and suggesting adaptive strategies to handle large event-rate variance in real-world deployments.

Abstract

Event cameras offer high temporal resolution and power efficiency, making them well-suited for edge AI applications. However, their high event rates present challenges for data transmission and processing. Subsampling methods provide a practical solution, but their effect on downstream visual tasks remains underexplored. In this work, we systematically evaluate six hardware-friendly subsampling methods using convolutional neural networks for event video classification on various benchmark datasets. We hypothesize that events from high-density regions carry more task-relevant information and are therefore better suited for subsampling. To test this, we introduce a simple causal density-based subsampling method, demonstrating improved classification accuracy in sparse regimes. Our analysis further highlights key factors affecting subsampling performance, including sensitivity to hyperparameters and failure cases in scenarios with large event count variance. These findings provide insights for utilization of hardware-efficient subsampling strategies that balance data efficiency and task accuracy. The code for this paper will be released at: https://github.com/hesamaraghi/event-camera-subsampling-methods.

Making Every Event Count: Balancing Data Efficiency and Accuracy in Event Camera Subsampling

TL;DR

The paper addresses the challenge of efficient processing for high-rate event cameras by evaluating six hardware-friendly subsampling methods for CNN-based event-video classification across three benchmarks. It introduces a simple causal density-based subsampling approach within an EST voxel-grid representation and uses a consistent training setup to compare methods, aided by the metric of normalized AUC over the number of events. Key findings show that density-based and corner-based subsampling offer the best balance between data efficiency and accuracy, with random and naive spatial subsampling serving as useful baselines; density normalization significantly improves performance in sparse regimes, and temporal subsampling tends to be more robust to offset variations. The work provides practical guidance for hardware implementations, highlighting trade-offs in memory and compute and suggesting adaptive strategies to handle large event-rate variance in real-world deployments.

Abstract

Event cameras offer high temporal resolution and power efficiency, making them well-suited for edge AI applications. However, their high event rates present challenges for data transmission and processing. Subsampling methods provide a practical solution, but their effect on downstream visual tasks remains underexplored. In this work, we systematically evaluate six hardware-friendly subsampling methods using convolutional neural networks for event video classification on various benchmark datasets. We hypothesize that events from high-density regions carry more task-relevant information and are therefore better suited for subsampling. To test this, we introduce a simple causal density-based subsampling method, demonstrating improved classification accuracy in sparse regimes. Our analysis further highlights key factors affecting subsampling performance, including sensitivity to hyperparameters and failure cases in scenarios with large event count variance. These findings provide insights for utilization of hardware-efficient subsampling strategies that balance data efficiency and task accuracy. The code for this paper will be released at: https://github.com/hesamaraghi/event-camera-subsampling-methods.

Paper Structure

This paper contains 23 sections, 5 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Spatial, temporal, random, and density-based subsampling applied to an example video from the DVS-Gesture amir_low_2017 dataset, along with the original video. Positive and negative polarity events (red, blue) are plotted in 3 dimensions spanned by pixel coordinates (x, y) and time. Although all subsampled outputs contain the same number of events (3000), their structures vary due to the distinct characteristics of each subsampling method. We evaluate their impact on downstream tasks.
  • Figure 2: (a) Spatial subsampling: we keep events from every ${r_y}$-th row vertically and every ${r_x}$-th column horizontally (dark blue pixels). The horizontal and vertical offsets are denoted by ${r_{x,0}}$ and ${r_{y,0}}$, respectively. (b) Temporal subsampling: we keep the events within the sampling interval ${\Delta t}$ (colored) in a temporal window of size ${\textit{w}_t}$, where ${\Delta t_0}$ is the time offset. In both cases, the topmost subsampling example has zero offset(s).
  • Figure 3: Causal density-based subsampling using fixed and random thresholding, retaining a similar number of events. (a) shows the original unfiltered events. Random thresholding (b) preserves the overall shape of the arm and hand movement, while fixed thresholding (c) focuses greedily on a small region near the hand.
  • Figure 4: Classification accuracy at different subsampling levels for six subsampling methods on the N-Caltech101 dataset. Each curve is the average of 18 independent runs with different seeds. The $x$-axis represents the average number of events per video, ${\langle{N}\rangle}$. Error bars show the standard deviation across runs.
  • Figure 5: Classification accuracy at different subsampling levels for six subsampling methods on the DVS-Gesture datasets. Each curve is the average of 18 independent runs with different seeds. Error bars show the standard deviation across runs.
  • ...and 6 more figures