Table of Contents
Fetching ...

Event-assisted Low-Light Video Object Segmentation

Hebei Li, Jin Wang, Jiahui Yuan, Yue Li, Wenming Weng, Yansong Peng, Yueyi Zhang, Zhiwei Xiong, Xiaoyan Sun

TL;DR

This work addresses video object segmentation under severe low-light conditions by fusing frame-based and event-based information. It introduces two novel components, Adaptive Cross-Modal Fusion (ACMF) and Event-Guided Memory Matching (EGMM), within an end-to-end VOS framework and provides two dedicated datasets, LLE-DAVIS (synthetic) and LLE-VOS (real-world), to evaluate performance. Experiments demonstrate that the proposed method surpasses state-of-the-art baselines on both synthetic and real low-light datasets, highlighting the practical value of event data for robust segmentation when illumination is limited. The research advances the field by enabling reliable VOS in challenging lighting, with potential applications in surveillance, autonomous systems, and night-time scene analysis.

Abstract

In the realm of video object segmentation (VOS), the challenge of operating under low-light conditions persists, resulting in notably degraded image quality and compromised accuracy when comparing query and memory frames for similarity computation. Event cameras, characterized by their high dynamic range and ability to capture motion information of objects, offer promise in enhancing object visibility and aiding VOS methods under such low-light conditions. This paper introduces a pioneering framework tailored for low-light VOS, leveraging event camera data to elevate segmentation accuracy. Our approach hinges on two pivotal components: the Adaptive Cross-Modal Fusion (ACMF) module, aimed at extracting pertinent features while fusing image and event modalities to mitigate noise interference, and the Event-Guided Memory Matching (EGMM) module, designed to rectify the issue of inaccurate matching prevalent in low-light settings. Additionally, we present the creation of a synthetic LLE-DAVIS dataset and the curation of a real-world LLE-VOS dataset, encompassing frames and events. Experimental evaluations corroborate the efficacy of our method across both datasets, affirming its effectiveness in low-light scenarios.

Event-assisted Low-Light Video Object Segmentation

TL;DR

This work addresses video object segmentation under severe low-light conditions by fusing frame-based and event-based information. It introduces two novel components, Adaptive Cross-Modal Fusion (ACMF) and Event-Guided Memory Matching (EGMM), within an end-to-end VOS framework and provides two dedicated datasets, LLE-DAVIS (synthetic) and LLE-VOS (real-world), to evaluate performance. Experiments demonstrate that the proposed method surpasses state-of-the-art baselines on both synthetic and real low-light datasets, highlighting the practical value of event data for robust segmentation when illumination is limited. The research advances the field by enabling reliable VOS in challenging lighting, with potential applications in surveillance, autonomous systems, and night-time scene analysis.

Abstract

In the realm of video object segmentation (VOS), the challenge of operating under low-light conditions persists, resulting in notably degraded image quality and compromised accuracy when comparing query and memory frames for similarity computation. Event cameras, characterized by their high dynamic range and ability to capture motion information of objects, offer promise in enhancing object visibility and aiding VOS methods under such low-light conditions. This paper introduces a pioneering framework tailored for low-light VOS, leveraging event camera data to elevate segmentation accuracy. Our approach hinges on two pivotal components: the Adaptive Cross-Modal Fusion (ACMF) module, aimed at extracting pertinent features while fusing image and event modalities to mitigate noise interference, and the Event-Guided Memory Matching (EGMM) module, designed to rectify the issue of inaccurate matching prevalent in low-light settings. Additionally, we present the creation of a synthetic LLE-DAVIS dataset and the curation of a real-world LLE-VOS dataset, encompassing frames and events. Experimental evaluations corroborate the efficacy of our method across both datasets, affirming its effectiveness in low-light scenarios.
Paper Structure (23 sections, 6 equations, 6 figures, 5 tables)

This paper contains 23 sections, 6 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Examples of our LLE-VOS dataset. The dataset contains paired normal/low-light APS images, event stream and annotations.
  • Figure 2: (a) A hybrid camera system for building real-world dataset. We configure two identical cameras with different exposure time for generating normal-light (b) and low-light (c) pairs.
  • Figure 3: (a) Overview of the proposed method for event-assisted low-light video object segmentation. (b) The structure of Adaptive Cross-Modal Fusion (ACMF) module. (c) The structure of Event-Guided Memory Matching (EGMM) module.
  • Figure 4: Qualitative comparisons with other methods on the synthetic LLE-DAVIS dataset.
  • Figure 5: Qualitative comparisons with other methods on the real-world LLE-VOS dataset.
  • ...and 1 more figures