Pushing the boundaries of event subsampling in event-based video classification using CNNs

Hesam Araghi; Jan van Gemert; Nergis Tomen

Pushing the boundaries of event subsampling in event-based video classification using CNNs

Hesam Araghi, Jan van Gemert, Nergis Tomen

TL;DR

The paper addresses how to reduce the data burden of event cameras for CNN based video classification by studying random event subsampling and its impact on accuracy. It leverages the EST algorithm to convert events to 18 channel frames and trains a ResNet34 across multiple neuromorphic datasets, with epoch wise per epoch subsampling and averaging test results over 20 draws. Key findings include that event counts can be reduced by an order of magnitude with minimal accuracy loss, along with increased sensitivity to hyperparameters and observable gradient diversity in sparse regimes, for which a novel hyperparameter sensitivity metric is introduced. The work offers practical guidance for edge AI deployments, highlights limitations in certain datasets, and suggests future extensions to other architectures and sparsity mitigation approaches.

Abstract

Event cameras offer low-power visual sensing capabilities ideal for edge-device applications. However, their high event rate, driven by high temporal details, can be restrictive in terms of bandwidth and computational resources. In edge AI applications, determining the minimum amount of events for specific tasks can allow reducing the event rate to improve bandwidth, memory, and processing efficiency. In this paper, we study the effect of event subsampling on the accuracy of event data classification using convolutional neural network (CNN) models. Surprisingly, across various datasets, the number of events per video can be reduced by an order of magnitude with little drop in accuracy, revealing the extent to which we can push the boundaries in accuracy vs. event rate trade-off. Additionally, we also find that lower classification accuracy in high subsampling rates is not solely attributable to information loss due to the subsampling of the events, but that the training of CNNs can be challenging in highly subsampled scenarios, where the sensitivity to hyperparameters increases. We quantify training instability across multiple event-based classification datasets using a novel metric for evaluating the hyperparameter sensitivity of CNNs in different subsampling settings. Finally, we analyze the weight gradients of the network to gain insight into this instability.

Pushing the boundaries of event subsampling in event-based video classification using CNNs

TL;DR

Abstract

Paper Structure (30 sections, 1 equation, 7 figures, 6 tables)

This paper contains 30 sections, 1 equation, 7 figures, 6 tables.

Introduction
Related work
Event Classification Using CNNs
Sparsity in Event Cameras
Method
The EST Algorithm for Converting Events to Frames
Event Subsampling Procedure
Training Procedure
Event Classification Datasets
N-Caltech101 orchard_converting_2015:
N-Cars sironi_hats_2018:
N-ASL bi_graph-based_2019:
DVS-Gesture amir_low_2017:
Fan1vs3:
Experiments
...and 15 more sections

Figures (7)

Figure 1: Illustration of two classes ('V' and 'W') from the Neuromorphic American Sign Language (N-ASL) dataset, which includes 24 classes, for different numbers of events per video. The events are accumulated into a single frame, with red and blue indicating the two polarities. Even with a significantly reduced number of events, a CNN-based classifier can still achieve very high accuracies.
Figure 1: Classification accuracy curves as the number of events per video decreases across various datasets. The error bars represent the standard deviation of accuracies across different runs. For many datasets, the accuracy curves do not significantly drop compared to the dense input case.
Figure 2: Parallel coordinate plots showing HP tuning results for Fan1vs3 dataset using dense and sparse inputs. HPs: learning rate, batch size, and weight decay. In the dense setting, we observe test accuracies concentrated near the maximum accuracy, while in the sparse setting, we observe a small number of runs achieving the maximum accuracy. The plots are from the Weight and Biases website wandb.
Figure 2: Event illustration of a video for each speed class of Fan1vs3 dataset in spatiotemporal space. The second row displays the subsampled events, reduced to 1024 events.
Figure 3: Histograms of all 300 accuracies attained during hyperparameter tuning for both dense and sparse input scenarios (first column), and boxplots of test accuracies obtained using each individual hyperparameter: learning rate, batch size, and weight decay (second to fourth columns). The blue curve represents the mean test accuracies.
...and 2 more figures

Pushing the boundaries of event subsampling in event-based video classification using CNNs

TL;DR

Abstract

Pushing the boundaries of event subsampling in event-based video classification using CNNs

Authors

TL;DR

Abstract

Table of Contents

Figures (7)