Table of Contents
Fetching ...

A Recurrent YOLOv8-based framework for Event-Based Object Detection

Diego A. Silva, Kamilya Smagulova, Ahmed Elsheikh, Mohammed E. Fouda, Ahmed M. Eltawil

TL;DR

This paper tackles the limitations of frame-based object detectors in challenging conditions by leveraging event-based cameras. It introduces ReYOLOv8, a recurrent extension of YOLOv8 that processes a lightweight Volume of Ternary Event Images (VTEI) encoding and employs ConvLSTM-based temporal modeling to achieve enhanced accuracy with fewer parameters. A novel Random Polarity Suppression (RPS) augmentation is proposed to mitigate polarity biases in event data. Evaluations on GEN1 and PEDRo show consistent mAP improvements (up to 18% on PEDRo) with real-time inference (9.2–15.5 ms) and substantial model compression (average ~20% fewer parameters), highlighting the practical potential for robotics and autonomous driving applications where latency and robustness are critical.

Abstract

Object detection is crucial in various cutting-edge applications, such as autonomous vehicles and advanced robotics systems, primarily relying on data from conventional frame-based RGB sensors. However, these sensors often struggle with issues like motion blur and poor performance in challenging lighting conditions. In response to these challenges, event-based cameras have emerged as an innovative paradigm. These cameras, mimicking the human eye, demonstrate superior performance in environments with fast motion and extreme lighting conditions while consuming less power. This study introduces ReYOLOv8, an advanced object detection framework that enhances a leading frame-based detection system with spatiotemporal modeling capabilities. We implemented a low-latency, memory-efficient method for encoding event data to boost the system's performance. We also developed a novel data augmentation technique tailored to leverage the unique attributes of event data, thus improving detection accuracy. Our models outperformed all comparable approaches in the GEN1 dataset, focusing on automotive applications, achieving mean Average Precision (mAP) improvements of 5%, 2.8%, and 2.5% across nano, small, and medium scales, respectively.These enhancements were achieved while reducing the number of trainable parameters by an average of 4.43% and maintaining real-time processing speeds between 9.2ms and 15.5ms. On the PEDRo dataset, which targets robotics applications, our models showed mAP improvements ranging from 9% to 18%, with 14.5x and 3.8x smaller models and an average speed enhancement of 1.67x.

A Recurrent YOLOv8-based framework for Event-Based Object Detection

TL;DR

This paper tackles the limitations of frame-based object detectors in challenging conditions by leveraging event-based cameras. It introduces ReYOLOv8, a recurrent extension of YOLOv8 that processes a lightweight Volume of Ternary Event Images (VTEI) encoding and employs ConvLSTM-based temporal modeling to achieve enhanced accuracy with fewer parameters. A novel Random Polarity Suppression (RPS) augmentation is proposed to mitigate polarity biases in event data. Evaluations on GEN1 and PEDRo show consistent mAP improvements (up to 18% on PEDRo) with real-time inference (9.2–15.5 ms) and substantial model compression (average ~20% fewer parameters), highlighting the practical potential for robotics and autonomous driving applications where latency and robustness are critical.

Abstract

Object detection is crucial in various cutting-edge applications, such as autonomous vehicles and advanced robotics systems, primarily relying on data from conventional frame-based RGB sensors. However, these sensors often struggle with issues like motion blur and poor performance in challenging lighting conditions. In response to these challenges, event-based cameras have emerged as an innovative paradigm. These cameras, mimicking the human eye, demonstrate superior performance in environments with fast motion and extreme lighting conditions while consuming less power. This study introduces ReYOLOv8, an advanced object detection framework that enhances a leading frame-based detection system with spatiotemporal modeling capabilities. We implemented a low-latency, memory-efficient method for encoding event data to boost the system's performance. We also developed a novel data augmentation technique tailored to leverage the unique attributes of event data, thus improving detection accuracy. Our models outperformed all comparable approaches in the GEN1 dataset, focusing on automotive applications, achieving mean Average Precision (mAP) improvements of 5%, 2.8%, and 2.5% across nano, small, and medium scales, respectively.These enhancements were achieved while reducing the number of trainable parameters by an average of 4.43% and maintaining real-time processing speeds between 9.2ms and 15.5ms. On the PEDRo dataset, which targets robotics applications, our models showed mAP improvements ranging from 9% to 18%, with 14.5x and 3.8x smaller models and an average speed enhancement of 1.67x.
Paper Structure (16 sections, 6 equations, 6 figures, 8 tables)

This paper contains 16 sections, 6 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Working principle behind the Volume Ternary Event Image encoding.
  • Figure 2: Overview of the Recurrent YOLOv8 architecture.
  • Figure 3: Structure of the CSP Bottleneck block with 2 convolutions from YOLOv8 yolov8.
  • Figure 4: Example of Random Polarity Suppression's transformation on VTEI tensors. The grayscale corresponding images are also shown.
  • Figure 5: Comparison of the mAP from ReYOLOv8s by sweeping the suppression polarity, $s$, given fixed probabilities of suppressing the positive polarity rather than the negative ones, given by $p$, for the PEDRO dataset's validation set.
  • ...and 1 more figures