Event-assisted Low-Light Video Object Segmentation
Hebei Li, Jin Wang, Jiahui Yuan, Yue Li, Wenming Weng, Yansong Peng, Yueyi Zhang, Zhiwei Xiong, Xiaoyan Sun
TL;DR
This work addresses video object segmentation under severe low-light conditions by fusing frame-based and event-based information. It introduces two novel components, Adaptive Cross-Modal Fusion (ACMF) and Event-Guided Memory Matching (EGMM), within an end-to-end VOS framework and provides two dedicated datasets, LLE-DAVIS (synthetic) and LLE-VOS (real-world), to evaluate performance. Experiments demonstrate that the proposed method surpasses state-of-the-art baselines on both synthetic and real low-light datasets, highlighting the practical value of event data for robust segmentation when illumination is limited. The research advances the field by enabling reliable VOS in challenging lighting, with potential applications in surveillance, autonomous systems, and night-time scene analysis.
Abstract
In the realm of video object segmentation (VOS), the challenge of operating under low-light conditions persists, resulting in notably degraded image quality and compromised accuracy when comparing query and memory frames for similarity computation. Event cameras, characterized by their high dynamic range and ability to capture motion information of objects, offer promise in enhancing object visibility and aiding VOS methods under such low-light conditions. This paper introduces a pioneering framework tailored for low-light VOS, leveraging event camera data to elevate segmentation accuracy. Our approach hinges on two pivotal components: the Adaptive Cross-Modal Fusion (ACMF) module, aimed at extracting pertinent features while fusing image and event modalities to mitigate noise interference, and the Event-Guided Memory Matching (EGMM) module, designed to rectify the issue of inaccurate matching prevalent in low-light settings. Additionally, we present the creation of a synthetic LLE-DAVIS dataset and the curation of a real-world LLE-VOS dataset, encompassing frames and events. Experimental evaluations corroborate the efficacy of our method across both datasets, affirming its effectiveness in low-light scenarios.
