Table of Contents
Fetching ...

DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition

Qi Wang, Zhou Xu, Yuming Lin, Jingtao Ye, Hongsheng Li, Guangming Zhu, Syed Afaq Ali Shah, Mohammed Bennamoun, Liang Zhang

TL;DR

This work introduces DailyDVS-200, a large-scale real-world neuromorphic dataset for event-based action recognition, comprising 200 action categories, 47 subjects, and over 22k event sequences with 14 per-sample attributes. It details data modalities from a DVS sensor and synchronized RGB, diverse collection setups, meticulous annotations, and two standardized benchmarks (cross-subject and multi-group) to enable robust model evaluation. The authors benchmark 12 diverse architectures spanning frame-based, token-based, and spike-based approaches, revealing that current event-based methods lag behind frame-based baselines on this dataset and that performance is highly sensitive to camera motion, lighting, distance, and action granularity. By providing extensive analyses, confusion patterns, and frame-settings experiments, the paper argues for new architectures and representations tailored to dynamic, real-world neuromorphic data, positioning DailyDVS-200 as a challenging catalyst for future progress in event-based action recognition. The dataset and code are publicly available to foster broad adoption and fair comparisons.

Abstract

Neuromorphic sensors, specifically event cameras, revolutionize visual data acquisition by capturing pixel intensity changes with exceptional dynamic range, minimal latency, and energy efficiency, setting them apart from conventional frame-based cameras. The distinctive capabilities of event cameras have ignited significant interest in the domain of event-based action recognition, recognizing their vast potential for advancement. However, the development in this field is currently slowed by the lack of comprehensive, large-scale datasets, which are critical for developing robust recognition frameworks. To bridge this gap, we introduces DailyDVS-200, a meticulously curated benchmark dataset tailored for the event-based action recognition community. DailyDVS-200 is extensive, covering 200 action categories across real-world scenarios, recorded by 47 participants, and comprises more than 22,000 event sequences. This dataset is designed to reflect a broad spectrum of action types, scene complexities, and data acquisition diversity. Each sequence in the dataset is annotated with 14 attributes, ensuring a detailed characterization of the recorded actions. Moreover, DailyDVS-200 is structured to facilitate a wide range of research paths, offering a solid foundation for both validating existing approaches and inspiring novel methodologies. By setting a new benchmark in the field, we challenge the current limitations of neuromorphic data processing and invite a surge of new approaches in event-based action recognition techniques, which paves the way for future explorations in neuromorphic computing and beyond. The dataset and source code are available at https://github.com/QiWang233/DailyDVS-200.

DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition

TL;DR

This work introduces DailyDVS-200, a large-scale real-world neuromorphic dataset for event-based action recognition, comprising 200 action categories, 47 subjects, and over 22k event sequences with 14 per-sample attributes. It details data modalities from a DVS sensor and synchronized RGB, diverse collection setups, meticulous annotations, and two standardized benchmarks (cross-subject and multi-group) to enable robust model evaluation. The authors benchmark 12 diverse architectures spanning frame-based, token-based, and spike-based approaches, revealing that current event-based methods lag behind frame-based baselines on this dataset and that performance is highly sensitive to camera motion, lighting, distance, and action granularity. By providing extensive analyses, confusion patterns, and frame-settings experiments, the paper argues for new architectures and representations tailored to dynamic, real-world neuromorphic data, positioning DailyDVS-200 as a challenging catalyst for future progress in event-based action recognition. The dataset and code are publicly available to foster broad adoption and fair comparisons.

Abstract

Neuromorphic sensors, specifically event cameras, revolutionize visual data acquisition by capturing pixel intensity changes with exceptional dynamic range, minimal latency, and energy efficiency, setting them apart from conventional frame-based cameras. The distinctive capabilities of event cameras have ignited significant interest in the domain of event-based action recognition, recognizing their vast potential for advancement. However, the development in this field is currently slowed by the lack of comprehensive, large-scale datasets, which are critical for developing robust recognition frameworks. To bridge this gap, we introduces DailyDVS-200, a meticulously curated benchmark dataset tailored for the event-based action recognition community. DailyDVS-200 is extensive, covering 200 action categories across real-world scenarios, recorded by 47 participants, and comprises more than 22,000 event sequences. This dataset is designed to reflect a broad spectrum of action types, scene complexities, and data acquisition diversity. Each sequence in the dataset is annotated with 14 attributes, ensuring a detailed characterization of the recorded actions. Moreover, DailyDVS-200 is structured to facilitate a wide range of research paths, offering a solid foundation for both validating existing approaches and inspiring novel methodologies. By setting a new benchmark in the field, we challenge the current limitations of neuromorphic data processing and invite a surge of new approaches in event-based action recognition techniques, which paves the way for future explorations in neuromorphic computing and beyond. The dataset and source code are available at https://github.com/QiWang233/DailyDVS-200.
Paper Structure (24 sections, 5 figures, 5 tables)

This paper contains 24 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The flow of our data acquisition process. We use both an RGB camera (above) and a DVS camera (below). Upon completion of the recording, the DVS camera generates event flow data, while the RGB camera captures the synchronized video stream. Subsequently, the data is processed to remove noise, and each sample is categorized based on its motion characteristics.
  • Figure 2: (Left) A comparison between existing datasets and our proposed DailyDVS-200 dataset for event-based action classification. (Right) Summary of Characteristics of DailyDVS-200. The nomenclature is PR: Props, PO: Posture, DR: Duration, AR: Action Range, PN: Person Num, CM: Camera Motion, IL: Illumination Direction, PE: Perspective, DI: Diurnality, LO: Location, DT: Distance, HE: Height, SH: Shadow, BC: Background Complexity.
  • Figure 3: A preview of our proposed DailyDVS-200 dataset and examples of our attribute annotations.
  • Figure 4: Statistical data and analysis of DailyDVS-200. (a) Data proportions for the 14 attributes. (b) Distribution of data volumes for different time duration in seconds. (c) Number of images per class. (d) Distribution of Event Count compared between the moving and static. (e) Distribution of Event Count for all categories.
  • Figure 5: (a) Evaluation of using different sizes of Moving camera set for action recognition. (b) Confusion matrix of Swin-Tliu2022video.