Table of Contents
Fetching ...

URWKV: Unified RWKV Model with Multi-state Perspective for Low-light Image Restoration

Rui Xu, Yuzhen Niu, Yuezhou Li, Huangbiao Xu, Wenxi Liu, Yuzhong Chen

TL;DR

The paper tackles the challenge of restoring low-light images under dynamically coupled degradations by introducing URWKV, a unified RWKV model with a multi-state perspective. It employs luminance-adaptive normalization (LAN) for scene-aware luminance modulation, exponential moving average (EMA) aggregation for intra-state interactions, and a state-aware selective fusion (SSF) to align and fuse multi-state features across encoder stages. These components collectively enable effective and efficient restoration with fewer parameters and FLOPs, outperforming state-of-the-art LLIE, LLIE-deblur, and unified models across multiple benchmarks. The approach has broad practical implications for nocturnal imaging tasks in surveillance, photography, and remote sensing where degradations vary spatially and temporally.

Abstract

Existing low-light image enhancement (LLIE) and joint LLIE and deblurring (LLIE-deblur) models have made strides in addressing predefined degradations, yet they are often constrained by dynamically coupled degradations. To address these challenges, we introduce a Unified Receptance Weighted Key Value (URWKV) model with multi-state perspective, enabling flexible and effective degradation restoration for low-light images. Specifically, we customize the core URWKV block to perceive and analyze complex degradations by leveraging multiple intra- and inter-stage states. First, inspired by the pupil mechanism in the human visual system, we propose Luminance-adaptive Normalization (LAN) that adjusts normalization parameters based on rich inter-stage states, allowing for adaptive, scene-aware luminance modulation. Second, we aggregate multiple intra-stage states through exponential moving average approach, effectively capturing subtle variations while mitigating information loss inherent in the single-state mechanism. To reduce the degradation effects commonly associated with conventional skip connections, we propose the State-aware Selective Fusion (SSF) module, which dynamically aligns and integrates multi-state features across encoder stages, selectively fusing contextual information. In comparison to state-of-the-art models, our URWKV model achieves superior performance on various benchmarks, while requiring significantly fewer parameters and computational resources.

URWKV: Unified RWKV Model with Multi-state Perspective for Low-light Image Restoration

TL;DR

The paper tackles the challenge of restoring low-light images under dynamically coupled degradations by introducing URWKV, a unified RWKV model with a multi-state perspective. It employs luminance-adaptive normalization (LAN) for scene-aware luminance modulation, exponential moving average (EMA) aggregation for intra-state interactions, and a state-aware selective fusion (SSF) to align and fuse multi-state features across encoder stages. These components collectively enable effective and efficient restoration with fewer parameters and FLOPs, outperforming state-of-the-art LLIE, LLIE-deblur, and unified models across multiple benchmarks. The approach has broad practical implications for nocturnal imaging tasks in surveillance, photography, and remote sensing where degradations vary spatially and temporally.

Abstract

Existing low-light image enhancement (LLIE) and joint LLIE and deblurring (LLIE-deblur) models have made strides in addressing predefined degradations, yet they are often constrained by dynamically coupled degradations. To address these challenges, we introduce a Unified Receptance Weighted Key Value (URWKV) model with multi-state perspective, enabling flexible and effective degradation restoration for low-light images. Specifically, we customize the core URWKV block to perceive and analyze complex degradations by leveraging multiple intra- and inter-stage states. First, inspired by the pupil mechanism in the human visual system, we propose Luminance-adaptive Normalization (LAN) that adjusts normalization parameters based on rich inter-stage states, allowing for adaptive, scene-aware luminance modulation. Second, we aggregate multiple intra-stage states through exponential moving average approach, effectively capturing subtle variations while mitigating information loss inherent in the single-state mechanism. To reduce the degradation effects commonly associated with conventional skip connections, we propose the State-aware Selective Fusion (SSF) module, which dynamically aligns and integrates multi-state features across encoder stages, selectively fusing contextual information. In comparison to state-of-the-art models, our URWKV model achieves superior performance on various benchmarks, while requiring significantly fewer parameters and computational resources.

Paper Structure

This paper contains 13 sections, 8 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Compared against existing solutions (including BiFormer BiFormer for LLIE, PDHAT PDHAT for joint LLIE and deblurring, and Restormer Restormer as a unified model), our proposed URWKV achieves consistently superior PSNR performance across various degradation scenarios. For better visualization, the maximum PSNR values across all datasets are normalized.
  • Figure 2: Overview of the URWKV model. The core URWKV block is integrated into an encoder-decoder framework, featuring a multi-state spatial mixing sub-block and a multi-state channel mixing sub-block. The liminance-adaptive normalization (LAN) incorporates both the current input state $X_t$ and multiple inter-stage states $M_i$. Additionally, the multi-state quad-directional token shift (SQ-Shift) aggregates both current input state and multiple intra-stage states $H_i$ ($H_{S_i}$ for spatial mixing and $H_{C_i}$ for channel mixing, respectively). The state-aware selective fusion (SSF) module further aggregates rich contextual information across encoder stages.
  • Figure 3: Illustration of luminance-adaptive normalization (LAN). LAN integrates inter-stage states $M_i$ throughout the restoration process, facilitating scene-aware luminance modulation.
  • Figure 4: Illustration of the state-aware fusion (SSF) module.
  • Figure 5: Visual comparison of state-of-the-art models on the SID dataset (top) and the SMID dataset (bottom).
  • ...and 3 more figures