Table of Contents
Fetching ...

CompEvent: Complex-valued Event-RGB Fusion for Low-light Video Enhancement and Deblurring

Mingchen Zhong, Xin Lu, Dong Li, Senyan Xu, Ruixuan Jiang, Xueyang Fu, Baocai Yin

TL;DR

This work tackles the challenge of restoring videos degraded by simultaneous low-light conditions and motion blur. It introduces CompEvent, a complex-valued neural network framework that fuses event data and RGB frames throughout processing via two main components: the Complex Temporal Alignment Gated Recurrent Unit (CTA-GRU) and the Complex Space-Frequency Learning (CSFL) backbone. The method enables full-process spatiotemporal fusion in the complex domain, outperforming state-of-the-art methods on real-world (RELED) and synthetic (LOL-Blur) benchmarks. Results demonstrate the effectiveness of holistic complex fusion for robust low-light video enhancement and deblurring, with notable gains in PSNR and SSIM and solid ablation evidence for each component.

Abstract

Low-light video deblurring poses significant challenges in applications like nighttime surveillance and autonomous driving due to dim lighting and long exposures. While event cameras offer potential solutions with superior low-light sensitivity and high temporal resolution, existing fusion methods typically employ staged strategies, limiting their effectiveness against combined low-light and motion blur degradations. To overcome this, we propose CompEvent, a complex neural network framework enabling holistic full-process fusion of event data and RGB frames for enhanced joint restoration. CompEvent features two core components: 1) Complex Temporal Alignment GRU, which utilizes complex-valued convolutions and processes video and event streams iteratively via GRU to achieve temporal alignment and continuous fusion; and 2) Complex Space-Frequency Learning module, which performs unified complex-valued signal processing in both spatial and frequency domains, facilitating deep fusion through spatial structures and system-level characteristics. By leveraging the holistic representation capability of complex-valued neural networks, CompEvent achieves full-process spatiotemporal fusion, maximizes complementary learning between modalities, and significantly strengthens low-light video deblurring capability. Extensive experiments demonstrate that CompEvent outperforms SOTA methods in addressing this challenging task.

CompEvent: Complex-valued Event-RGB Fusion for Low-light Video Enhancement and Deblurring

TL;DR

This work tackles the challenge of restoring videos degraded by simultaneous low-light conditions and motion blur. It introduces CompEvent, a complex-valued neural network framework that fuses event data and RGB frames throughout processing via two main components: the Complex Temporal Alignment Gated Recurrent Unit (CTA-GRU) and the Complex Space-Frequency Learning (CSFL) backbone. The method enables full-process spatiotemporal fusion in the complex domain, outperforming state-of-the-art methods on real-world (RELED) and synthetic (LOL-Blur) benchmarks. Results demonstrate the effectiveness of holistic complex fusion for robust low-light video enhancement and deblurring, with notable gains in PSNR and SSIM and solid ablation evidence for each component.

Abstract

Low-light video deblurring poses significant challenges in applications like nighttime surveillance and autonomous driving due to dim lighting and long exposures. While event cameras offer potential solutions with superior low-light sensitivity and high temporal resolution, existing fusion methods typically employ staged strategies, limiting their effectiveness against combined low-light and motion blur degradations. To overcome this, we propose CompEvent, a complex neural network framework enabling holistic full-process fusion of event data and RGB frames for enhanced joint restoration. CompEvent features two core components: 1) Complex Temporal Alignment GRU, which utilizes complex-valued convolutions and processes video and event streams iteratively via GRU to achieve temporal alignment and continuous fusion; and 2) Complex Space-Frequency Learning module, which performs unified complex-valued signal processing in both spatial and frequency domains, facilitating deep fusion through spatial structures and system-level characteristics. By leveraging the holistic representation capability of complex-valued neural networks, CompEvent achieves full-process spatiotemporal fusion, maximizes complementary learning between modalities, and significantly strengthens low-light video deblurring capability. Extensive experiments demonstrate that CompEvent outperforms SOTA methods in addressing this challenging task.

Paper Structure

This paper contains 21 sections, 14 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison of our method with previous methods. Previous methods perform fusion in a staged manner and split complex features into real-valued components before convolution in the frequency domain. Our method uses complex representations for both modalities, enabling full-process fusion by interacting features during processing. Moreover, our method directly applies complex convolutions to frequency features without separating them.
  • Figure 2: (a) Overall architecture of the CompEvent framework. The Complex Temporal Alignment Gated Recurrent Unit (CTA-GRU) is shown in Figure \ref{['GRU']}. (b) Comparison between the operations of complex convolution and real-valued convolution.
  • Figure 3: Complex Temporal Alignment Gated Recurrent Unit (CTA-GRU). It consists of multiple cascaded Complex GRU Modules (CGM) with a bidirectional architecture, enabling the fusion of temporal information from both past and future frames.
  • Figure 4: Qualitative comparisons on the RELED dataset. Zoom in for better view.
  • Figure 5: Qualitative comparisons on the LOL dataset. Zoom in for better view.