Table of Contents
Fetching ...

EventAug: Multifaceted Spatio-Temporal Data Augmentation Methods for Event-based Learning

Yukun Tian, Hao Chen, Yongjian Deng, Feihong Shen, Kepan Liu, Wei You, Ziyang Zhang

TL;DR

This work introduces a systematic augmentation scheme named EventAug to enrich spatial-temporal diversity and proposes Multi-scale Temporal Integration (MSTI) to diversify the motion speed of objects, then introduces Spatial-salient Event Mask (SSEM) and Temporal-salient Event Mask (TSEM) to enrich object variants.

Abstract

The event camera has demonstrated significant success across a wide range of areas due to its low time latency and high dynamic range. However, the community faces challenges such as data deficiency and limited diversity, often resulting in over-fitting and inadequate feature learning. Notably, the exploration of data augmentation techniques in the event community remains scarce. This work aims to address this gap by introducing a systematic augmentation scheme named EventAug to enrich spatial-temporal diversity. In particular, we first propose Multi-scale Temporal Integration (MSTI) to diversify the motion speed of objects, then introduce Spatial-salient Event Mask (SSEM) and Temporal-salient Event Mask (TSEM) to enrich object variants. Our EventAug can facilitate models learning with richer motion patterns, object variants and local spatio-temporal relations, thus improving model robustness to varied moving speeds, occlusions, and action disruptions. Experiment results show that our augmentation method consistently yields significant improvements across different tasks and backbones (e.g., a 4.87% accuracy gain on DVS128 Gesture). Our code will be publicly available for this community.

EventAug: Multifaceted Spatio-Temporal Data Augmentation Methods for Event-based Learning

TL;DR

This work introduces a systematic augmentation scheme named EventAug to enrich spatial-temporal diversity and proposes Multi-scale Temporal Integration (MSTI) to diversify the motion speed of objects, then introduces Spatial-salient Event Mask (SSEM) and Temporal-salient Event Mask (TSEM) to enrich object variants.

Abstract

The event camera has demonstrated significant success across a wide range of areas due to its low time latency and high dynamic range. However, the community faces challenges such as data deficiency and limited diversity, often resulting in over-fitting and inadequate feature learning. Notably, the exploration of data augmentation techniques in the event community remains scarce. This work aims to address this gap by introducing a systematic augmentation scheme named EventAug to enrich spatial-temporal diversity. In particular, we first propose Multi-scale Temporal Integration (MSTI) to diversify the motion speed of objects, then introduce Spatial-salient Event Mask (SSEM) and Temporal-salient Event Mask (TSEM) to enrich object variants. Our EventAug can facilitate models learning with richer motion patterns, object variants and local spatio-temporal relations, thus improving model robustness to varied moving speeds, occlusions, and action disruptions. Experiment results show that our augmentation method consistently yields significant improvements across different tasks and backbones (e.g., a 4.87% accuracy gain on DVS128 Gesture). Our code will be publicly available for this community.
Paper Structure (19 sections, 6 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 19 sections, 6 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison of our EventAug and other state-of-the-art augmentation methods on different tasks and kinds of backbones.
  • Figure 2: An example of augmented events with our EventAug, including the visualization of the original and augmented event stream and event frame. Our methods greatly enhance the diversity of the original dataset, which improves the model’s generalization abilities.
  • Figure 3: Illustration of our EventAug methods. The left is Multi-scale Temporal Integration. Since frames generated by short-term and long-term temporal scale reveal different motion patterns. Therefore, by applying a multi-scale integration strategy, we enable the model to better learn motion information together with edge feature. The right is Spatial and Temporal Salient Event Mask. Guided by the saliency information, we selectively mask event in salient spatial patches and salient temporal slices.
  • Figure 4: Examples of consecutive frames in CIFAR10-DVS datasets.