Table of Contents
Fetching ...

Benchmarking Recurrent Event-Based Object Detection for Industrial Multi-Class Recognition on MTEvent

Lokeshwaran Manohar, Moritz Roidl

Abstract

Event cameras are attractive for industrial robotics because they provide high temporal resolution, high dynamic range, and reduced motion blur. However, most event-based object detection studies focus on outdoor driving scenarios or limited class settings. In this work, we benchmark recurrent ReYOLOv8s on MTEvent for industrial multi-class recognition and use a non-recurrent YOLOv8s variant as a baseline to analyze the effect of temporal memory. On the MTEvent validation split, the best scratch recurrent model (C21) reaches 0.285 mAP50, corresponding to a 9.6% relative improvement over the nonrecurrent YOLOv8s baseline (0.260). Event-domain pretraining has a stronger effect: GEN1-initialized fine-tuning yields the best overall result of 0.329 mAP50 at clip length 21, and unlike scratch training, GEN1-pretrained models improve consistently with clip length. PEDRo initialization drops to 0.251, indicating that mismatched source-domain pretraining can be less effective than training from scratch. Persistent failure modes are dominated by class imbalance and human-object interaction. Overall, we position this work as a focused benchmarking and analysis study of recurrent event-based detection in industrial environments.

Benchmarking Recurrent Event-Based Object Detection for Industrial Multi-Class Recognition on MTEvent

Abstract

Event cameras are attractive for industrial robotics because they provide high temporal resolution, high dynamic range, and reduced motion blur. However, most event-based object detection studies focus on outdoor driving scenarios or limited class settings. In this work, we benchmark recurrent ReYOLOv8s on MTEvent for industrial multi-class recognition and use a non-recurrent YOLOv8s variant as a baseline to analyze the effect of temporal memory. On the MTEvent validation split, the best scratch recurrent model (C21) reaches 0.285 mAP50, corresponding to a 9.6% relative improvement over the nonrecurrent YOLOv8s baseline (0.260). Event-domain pretraining has a stronger effect: GEN1-initialized fine-tuning yields the best overall result of 0.329 mAP50 at clip length 21, and unlike scratch training, GEN1-pretrained models improve consistently with clip length. PEDRo initialization drops to 0.251, indicating that mismatched source-domain pretraining can be less effective than training from scratch. Persistent failure modes are dominated by class imbalance and human-object interaction. Overall, we position this work as a focused benchmarking and analysis study of recurrent event-based detection in industrial environments.
Paper Structure (17 sections, 2 figures, 1 table)

This paper contains 17 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Validation performance on MTEvent across non-recurrent and recurrent settings, temporal clip lengths, and pretraining configurations. YOLO denotes the non-recurrent YOLOv8s baseline, ReY-C$n$ denotes ReYOLOv8s trained from scratch with clip length $n$, and GEN1/PEDRo-C$n$ denote ReYOLOv8s fine-tuned from pretrained weights at clip length $n$. GEN1 initialization yields the best overall result at C21; Numerical values are reported in Table \ref{['tab:ablation']}.
  • Figure 2: Qualitative zero-shot transfer examples on MTEvent. The GEN1-pretrained model does not transfer reliably to MTEvent humans, whereas the PEDRo-pretrained model produces a stronger human detection. The fine-tuned MTEvent model detects the human with the correct target label.