Table of Contents
Fetching ...

EvRWKV: A Continuous Interactive RWKV Framework for Effective Event-Guided Low-Light Image Enhancement

Wenjie Cai, Qingguo Meng, Zhenyu Wang, Xingbo Dong, Zhe Jin

TL;DR

EvRWKV tackles low-light image enhancement by enabling continuous cross-modal interaction between event streams and RGB frames. It introduces a Cross-RWKV backbone for fine-grained spatiotemporal fusion and an EISFE module that denoises and aligns features in both frequency and spatial domains. The approach achieves state-of-the-art results on real-world LLIE datasets (SDE, SDSD, RELED) and substantially improves downstream semantic segmentation performance. The work demonstrates the practical impact of dual-domain, continuous fusion for robust perception in challenging lighting, motion, and noise conditions, and points to future work in real-time video and extreme sparsity scenarios.

Abstract

Event cameras offer significant potential for Low-light Image Enhancement (LLIE), yet existing fusion approaches are constrained by a fundamental dilemma: early fusion struggles with modality heterogeneity, while late fusion severs crucial feature correlations. To address these limitations, we propose EvRWKV, a novel framework that enables continuous cross-modal interaction through dual-domain processing, which mainly includes a Cross-RWKV Module to capture fine-grained temporal and cross-modal dependencies, and an Event Image Spectral Fusion Enhancer (EISFE) module to perform joint adaptive frequency-domain denoising and spatial-domain alignment. This continuous interaction maintains feature consistency from low-level textures to high-level semantics. Extensive experiments on the real-world SDE and SDSD datasets demonstrate that EvRWKV significantly outperforms only image-based methods by 1.79 dB and 1.85 dB in PSNR, respectively. To further validate the practical utility of our method for downstream applications, we evaluated its impact on semantic segmentation. Experiments demonstrate that images enhanced by EvRWKV lead to a significant 35.44% improvement in mIoU.

EvRWKV: A Continuous Interactive RWKV Framework for Effective Event-Guided Low-Light Image Enhancement

TL;DR

EvRWKV tackles low-light image enhancement by enabling continuous cross-modal interaction between event streams and RGB frames. It introduces a Cross-RWKV backbone for fine-grained spatiotemporal fusion and an EISFE module that denoises and aligns features in both frequency and spatial domains. The approach achieves state-of-the-art results on real-world LLIE datasets (SDE, SDSD, RELED) and substantially improves downstream semantic segmentation performance. The work demonstrates the practical impact of dual-domain, continuous fusion for robust perception in challenging lighting, motion, and noise conditions, and points to future work in real-time video and extreme sparsity scenarios.

Abstract

Event cameras offer significant potential for Low-light Image Enhancement (LLIE), yet existing fusion approaches are constrained by a fundamental dilemma: early fusion struggles with modality heterogeneity, while late fusion severs crucial feature correlations. To address these limitations, we propose EvRWKV, a novel framework that enables continuous cross-modal interaction through dual-domain processing, which mainly includes a Cross-RWKV Module to capture fine-grained temporal and cross-modal dependencies, and an Event Image Spectral Fusion Enhancer (EISFE) module to perform joint adaptive frequency-domain denoising and spatial-domain alignment. This continuous interaction maintains feature consistency from low-level textures to high-level semantics. Extensive experiments on the real-world SDE and SDSD datasets demonstrate that EvRWKV significantly outperforms only image-based methods by 1.79 dB and 1.85 dB in PSNR, respectively. To further validate the practical utility of our method for downstream applications, we evaluated its impact on semantic segmentation. Experiments demonstrate that images enhanced by EvRWKV lead to a significant 35.44% improvement in mIoU.

Paper Structure

This paper contains 31 sections, 28 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: Overall framework of event-guided LLIE approaches. (a) Early fusion: combining image and event data at the input. (b) Late fusion: processing image and event data separately and merging features. (c) Continuous interaction fusion: enabling ongoing interaction between image and event data.
  • Figure 2: A challenging example from our dataset containing an extremely low-light image (a) and sparse events (b). Compared with the result from the state-of-the-art event-guided method EvLight (c), our EvRWKV (d) not only recovers the wheel details in the dark areas (e.g., the wheel) but also preserves edge details (e.g., the white line on the floor).
  • Figure 3: Overall Architecture of the proposed EvRWKV. Our method consists of three parts: (a) Feature Initialization(Sec \ref{['label:a']}), (b) Cross-modal Feature Restoration(Sec \ref{['label:b']}), and (c) Frequency-aware Feature Fusion(Sec \ref{['label:c']}). Specifically, Cross-modal Feature Restoration contains multiple Cross-RWKV blocks for feature alignment, and Frequency-aware Feature Fusion integrates image and event features for final output.
  • Figure 4: Architecture of the Cross-RWKV Module, which includes Spatial Mix for spatial feature processing and Channel Mix for channel-wise interaction.
  • Figure 5: Illustration of Re-WKV mechanism. Bi-WKV attention is applied recurrently along alternating horizontal and vertical scan directions, with each iteration taking the previous attention result as input.
  • ...and 8 more figures