Table of Contents
Fetching ...

FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequencies

Dongyue Lu, Lingdong Kong, Gim Hee Lee, Camille Simon Chane, Wei Tsang Ooi

TL;DR

FlexEvent tackles the challenge of object detection with event cameras at varying operational frequencies. It introduces FlexFuse, an adaptive event-frame fusion module, and FlexTune, a frequency-adaptive fine-tuning strategy, enabling robust detection from low to extreme high frequencies (e.g., $20$ Hz to $180$ Hz). The approach leverages a dual-branch architecture (RVT for events and ResNet-50 for frames) with learnable gating to balance modalities across frequencies and uses self-training with pseudo-labels to generalize to unseen temporal resolutions. Empirical results on large-scale DSEC variants show significant mAP gains over state-of-the-art methods and demonstrated robustness across frequency shifts, supporting real-time deployment in dynamic environments.

Abstract

Event cameras offer unparalleled advantages for real-time perception in dynamic environments, thanks to the microsecond-level temporal resolution and asynchronous operation. Existing event detectors, however, are limited by fixed-frequency paradigms and fail to fully exploit the high-temporal resolution and adaptability of event data. To address these limitations, we propose FlexEvent, a novel framework that enables detection at varying frequencies. Our approach consists of two key components: FlexFuse, an adaptive event-frame fusion module that integrates high-frequency event data with rich semantic information from RGB frames, and FlexTune, a frequency-adaptive fine-tuning mechanism that generates frequency-adjusted labels to enhance model generalization across varying operational frequencies. This combination allows our method to detect objects with high accuracy in both fast-moving and static scenarios, while adapting to dynamic environments. Extensive experiments on large-scale event camera datasets demonstrate that our approach surpasses state-of-the-art methods, achieving significant improvements in both standard and high-frequency settings. Notably, our method maintains robust performance when scaling from 20 Hz to 90 Hz and delivers accurate detection up to 180 Hz, proving its effectiveness in extreme conditions. Our framework sets a new benchmark for event-based object detection and paves the way for more adaptable, real-time vision systems.

FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequencies

TL;DR

FlexEvent tackles the challenge of object detection with event cameras at varying operational frequencies. It introduces FlexFuse, an adaptive event-frame fusion module, and FlexTune, a frequency-adaptive fine-tuning strategy, enabling robust detection from low to extreme high frequencies (e.g., Hz to Hz). The approach leverages a dual-branch architecture (RVT for events and ResNet-50 for frames) with learnable gating to balance modalities across frequencies and uses self-training with pseudo-labels to generalize to unseen temporal resolutions. Empirical results on large-scale DSEC variants show significant mAP gains over state-of-the-art methods and demonstrated robustness across frequency shifts, supporting real-time deployment in dynamic environments.

Abstract

Event cameras offer unparalleled advantages for real-time perception in dynamic environments, thanks to the microsecond-level temporal resolution and asynchronous operation. Existing event detectors, however, are limited by fixed-frequency paradigms and fail to fully exploit the high-temporal resolution and adaptability of event data. To address these limitations, we propose FlexEvent, a novel framework that enables detection at varying frequencies. Our approach consists of two key components: FlexFuse, an adaptive event-frame fusion module that integrates high-frequency event data with rich semantic information from RGB frames, and FlexTune, a frequency-adaptive fine-tuning mechanism that generates frequency-adjusted labels to enhance model generalization across varying operational frequencies. This combination allows our method to detect objects with high accuracy in both fast-moving and static scenarios, while adapting to dynamic environments. Extensive experiments on large-scale event camera datasets demonstrate that our approach surpasses state-of-the-art methods, achieving significant improvements in both standard and high-frequency settings. Notably, our method maintains robust performance when scaling from 20 Hz to 90 Hz and delivers accurate detection up to 180 Hz, proving its effectiveness in extreme conditions. Our framework sets a new benchmark for event-based object detection and paves the way for more adaptable, real-time vision systems.

Paper Structure

This paper contains 30 sections, 10 equations, 14 figures, 10 tables.

Figures (14)

  • Figure 1: Illustrative examples of event camera object detection at varying frequencies. The detection performance of the classic RVT detector gehrig2023recurrent tends to drop significantly at higher event operational frequencies. Motivated by this observation, we propose FlexEvent, a robust and flexible event-frame detector that maintains high detection accuracy across a wide range of frequencies (low, middle, high), ensuring strong adaptability in real-world, dynamic sensing environments.
  • Figure 2: Framework Overview. The proposed FlexEvent consists of two branches: Event and Frame. The event branch captures high-temporal resolution data, while the frame branch leverages the rich semantic information from frames (cf. Sec. \ref{['sec:preliminaries']}). These branches are fused dynamically through FlexFuse, allowing adaptive integration of event and frame data (cf. Sec. \ref{['sec:flexfuser']}). Additionally, the FlexTune learning mechanism ensures robust detection performance across varying operational frequencies (cf. Sec. \ref{['sec:fal']}). Together, these components enable the model to handle diverse motion dynamics and maintain high detection accuracy in varying frequency scenarios.
  • Figure 3: Illustration of the FlexFuse module. We show a general example of event and frame under frequency $a$ at the $i$-stage.
  • Figure 4: Illustration of the FlexTune learning mechanism. We first train on high-frequency events with sparse low-frequency labels, then we generate and refine high-frequency labels for cyclic self-training across frequencies.
  • Figure 5: Qualitative comparisons of state-of-the-art event camera detectors. We compare FlexEvent with RVT gehrig2023recurrent, SAST peng2024sast, and DAGr gehrig2024low on the test set of DSEC-Det. Best viewed in colors.
  • ...and 9 more figures