Table of Contents
Fetching ...

Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera

Jiahang Cao, Xu Zheng, Yuanhuiyi Lyu, Jiaxu Wang, Renjing Xu, Lin Wang

TL;DR

EOLO is proposed, a novel object detection framework that achieves robust and efficient all-day detection by fusing both RGB and event modalities and is built based on a lightweight spiking neural network to efficiently leverage the asynchronous property of events.

Abstract

The ability to detect objects in all lighting (i.e., normal-, over-, and under-exposed) conditions is crucial for real-world applications, such as self-driving.Traditional RGB-based detectors often fail under such varying lighting conditions.Therefore, recent works utilize novel event cameras to supplement or guide the RGB modality; however, these methods typically adopt asymmetric network structures that rely predominantly on the RGB modality, resulting in limited robustness for all-day detection. In this paper, we propose EOLO, a novel object detection framework that achieves robust and efficient all-day detection by fusing both RGB and event modalities. Our EOLO framework is built based on a lightweight spiking neural network (SNN) to efficiently leverage the asynchronous property of events. Buttressed by it, we first introduce an Event Temporal Attention (ETA) module to learn the high temporal information from events while preserving crucial edge information. Secondly, as different modalities exhibit varying levels of importance under diverse lighting conditions, we propose a novel Symmetric RGB-Event Fusion (SREF) module to effectively fuse RGB-Event features without relying on a specific modality, thus ensuring a balanced and adaptive fusion for all-day detection. In addition, to compensate for the lack of paired RGB-Event datasets for all-day training and evaluation, we propose an event synthesis approach based on the randomized optical flow that allows for directly generating the event frame from a single exposure image. We further build two new datasets, E-MSCOCO and E-VOC based on the popular benchmarks MSCOCO and PASCAL VOC. Extensive experiments demonstrate that our EOLO outperforms the state-of-the-art detectors,e.g.,RENet,by a substantial margin (+3.74% mAP50) in all lighting conditions.Our code and datasets will be available at https://vlislab22.github.io/EOLO/

Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera

TL;DR

EOLO is proposed, a novel object detection framework that achieves robust and efficient all-day detection by fusing both RGB and event modalities and is built based on a lightweight spiking neural network to efficiently leverage the asynchronous property of events.

Abstract

The ability to detect objects in all lighting (i.e., normal-, over-, and under-exposed) conditions is crucial for real-world applications, such as self-driving.Traditional RGB-based detectors often fail under such varying lighting conditions.Therefore, recent works utilize novel event cameras to supplement or guide the RGB modality; however, these methods typically adopt asymmetric network structures that rely predominantly on the RGB modality, resulting in limited robustness for all-day detection. In this paper, we propose EOLO, a novel object detection framework that achieves robust and efficient all-day detection by fusing both RGB and event modalities. Our EOLO framework is built based on a lightweight spiking neural network (SNN) to efficiently leverage the asynchronous property of events. Buttressed by it, we first introduce an Event Temporal Attention (ETA) module to learn the high temporal information from events while preserving crucial edge information. Secondly, as different modalities exhibit varying levels of importance under diverse lighting conditions, we propose a novel Symmetric RGB-Event Fusion (SREF) module to effectively fuse RGB-Event features without relying on a specific modality, thus ensuring a balanced and adaptive fusion for all-day detection. In addition, to compensate for the lack of paired RGB-Event datasets for all-day training and evaluation, we propose an event synthesis approach based on the randomized optical flow that allows for directly generating the event frame from a single exposure image. We further build two new datasets, E-MSCOCO and E-VOC based on the popular benchmarks MSCOCO and PASCAL VOC. Extensive experiments demonstrate that our EOLO outperforms the state-of-the-art detectors,e.g.,RENet,by a substantial margin (+3.74% mAP50) in all lighting conditions.Our code and datasets will be available at https://vlislab22.github.io/EOLO/
Paper Structure (14 sections, 10 equations, 5 figures, 4 tables)

This paper contains 14 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Comparison with the SOTA baselines in PASCAL VOC val set under extreme overexposure scenarios. Our model demonstrates a noticeable improvement compared to other methods in terms of average precision metrics and also exhibits accurate qualitative results.
  • Figure 2: (a) The Overall Framework of our proposed EOLO; (b) Event Temporal Attention Module (ETA, Sec. \ref{['subsec:ETA']}); and (c) Symmetric RGB-Event Fusion Module (SREF, Sec. \ref{['subsec:SREF']}), which includes Cross-modality Alignment (CMA) and Symmetric Modality Fusion (SMF). The RGB inputs and event inputs are first processed by the CSPDarknet-Tiny and the Spiking Neural Network to obtain the features of the corresponding modalities $F_r^i$ and $F_e^i$ , respectively. Subsequently, the ETA module extracts and refines the temporal attributes of events, yielding $F_{ETA}^i$. The SREF module then integrates RGB-Event features without relying on a specific modality for a balanced and adaptive fusion. Finally, the fusion features $F_{f,out}^i$ are passed through the detection head to obtain the prediction results. The detection head and the loss function are adapted from redmon2018yolov3.
  • Figure 3: Overview of our randomized optical flow-based event synthesis algorithm. '$\otimes$' denotes the dot product.
  • Figure 4: Qualitative comparison of our EOLO on the PASCAL VOC dataset under all-day exposure conditions.
  • Figure 5: Visualization of real-world detection under extreme (a) underexposure scenarios, and (b) overexposure scenarios. ① and ② denote real RGB image and real event, respectively, captured by a DAVIS-346 event-based camera. By event synthesis method (Sec. \ref{['subsec:extreme_trans']}), we can obtain the randomized optical flow-based event ③ and its fixed counterpart ④. In real-world scenes, EOLO yields excellent detection results with inputs of paired RGB-real events (⑤) or paired RGB-synthetic events (⑥).