Table of Contents
Fetching ...

Descriptor: Face Detection Dataset for Programmable Threshold-Based Sparse-Vision

Riadul Islam, Sri Ranga Sai Krishna Tummala, Joey Mulé, Rohith Kankipati, Suraj Jalapally, Dhandeep Challagundla, Chad Howard, Ryan Robucci

TL;DR

An annotated, temporal-threshold-based vision dataset specifically designed for face detection tasks derived from the same videos used for Aff-Wild2, anticipating that this resource will significantly support the development of robust vision systems based on smart sensors that can process based on temporal-difference thresholds.

Abstract

Smart focal-plane and in-chip image processing has emerged as a crucial technology for vision-enabled embedded systems with energy efficiency and privacy. However, the lack of special datasets providing examples of the data that these neuromorphic sensors compute to convey visual information has hindered the adoption of these promising technologies. Neuromorphic imager variants, including event-based sensors, produce various representations such as streams of pixel addresses representing time and locations of intensity changes in the focal plane, temporal-difference data, data sifted/thresholded by temporal differences, image data after applying spatial transformations, optical flow data, and/or statistical representations. To address the critical barrier to entry, we provide an annotated, temporal-threshold-based vision dataset specifically designed for face detection tasks derived from the same videos used for Aff-Wild2. By offering multiple threshold levels (e.g., 4, 8, 12, and 16), this dataset allows for comprehensive evaluation and optimization of state-of-the-art neural architectures under varying conditions and settings compared to traditional methods. The accompanying tool flow for generating event data from raw videos further enhances accessibility and usability. We anticipate that this resource will significantly support the development of robust vision systems based on smart sensors that can process based on temporal-difference thresholds, enabling more accurate and efficient object detection and localization and ultimately promoting the broader adoption of low-power, neuromorphic imaging technologies. To support further research, we publicly released the dataset at \url{https://dx.doi.org/10.21227/bw2e-dj78}.

Descriptor: Face Detection Dataset for Programmable Threshold-Based Sparse-Vision

TL;DR

An annotated, temporal-threshold-based vision dataset specifically designed for face detection tasks derived from the same videos used for Aff-Wild2, anticipating that this resource will significantly support the development of robust vision systems based on smart sensors that can process based on temporal-difference thresholds.

Abstract

Smart focal-plane and in-chip image processing has emerged as a crucial technology for vision-enabled embedded systems with energy efficiency and privacy. However, the lack of special datasets providing examples of the data that these neuromorphic sensors compute to convey visual information has hindered the adoption of these promising technologies. Neuromorphic imager variants, including event-based sensors, produce various representations such as streams of pixel addresses representing time and locations of intensity changes in the focal plane, temporal-difference data, data sifted/thresholded by temporal differences, image data after applying spatial transformations, optical flow data, and/or statistical representations. To address the critical barrier to entry, we provide an annotated, temporal-threshold-based vision dataset specifically designed for face detection tasks derived from the same videos used for Aff-Wild2. By offering multiple threshold levels (e.g., 4, 8, 12, and 16), this dataset allows for comprehensive evaluation and optimization of state-of-the-art neural architectures under varying conditions and settings compared to traditional methods. The accompanying tool flow for generating event data from raw videos further enhances accessibility and usability. We anticipate that this resource will significantly support the development of robust vision systems based on smart sensors that can process based on temporal-difference thresholds, enabling more accurate and efficient object detection and localization and ultimately promoting the broader adoption of low-power, neuromorphic imaging technologies. To support further research, we publicly released the dataset at \url{https://dx.doi.org/10.21227/bw2e-dj78}.
Paper Structure (1 section, 8 figures, 2 tables, 1 algorithm)

This paper contains 1 section, 8 figures, 2 tables, 1 algorithm.

Table of Contents

  1. Records and Storage

Figures (8)

  • Figure 1: The proposed tool flow converts raw video data into image frames, and then the binary event generator differentiates temporally different image frames and a set of reference threshold values to generate thresholded images.
  • Figure 2: Face annotation involved creating a rectangle over the face, with the rectangle's orientation varying by pose: parallel and horizontally aligned for frontal, vertically aligned for profile, and angled for angular, depending on the face's tilt.
  • Figure 3: For a 333 ms frame time difference, lower thresholds yield higher average active pixel rates, with (a) $T_h = 4$ yielding 25.39% activity and (b) $T_h = 8$ yielding 17.86%, (c) $T_h = 12$ yielding 14.27% and (d) $T_h = 16$ yielding 11.94%.
  • Figure 4: For a constant $T_h$ of 4, the average active pixel count increases with the larger interframe time difference, (a) 19.88% average pixel activity at 166 ms, (b) 25.40% at 333 ms, and (c) 29.86% at 500 ms. Conversely, (d) the average active pixel count decreases as $T_h$ increases.
  • Figure 5: Outline of dataset directory structure
  • ...and 3 more figures