Table of Contents
Fetching ...

Towards Controllable Low-Light Image Enhancement: A Continuous Multi-illumination Dataset and Efficient State Space Framework

Hongru Han, Tingrui Guo, Liming Zhang, Yan Su, Qiwen Xu, Zhuohua Ye

Abstract

Low-light image enhancement (LLIE) has traditionally been formulated as a deterministic mapping. However, this paradigm often struggles to account for the ill-posed nature of the task, where unknown ambient conditions and sensor parameters create a multimodal solution space. Consequently, state-of-the-art methods frequently encounter luminance discrepancies between predictions and labels, often necessitating "gt-mean" post-processing to align output luminance for evaluation. To address this fundamental limitation, we propose a transition toward Controllable Low-light Enhancement (CLE), explicitly reformulating the task as a well-posed conditional problem. To this end, we introduce CLE-RWKV, a holistic framework supported by Light100, a new benchmark featuring continuous real-world illumination transitions. To resolve the conflict between luminance control and chromatic fidelity, a noise-decoupled supervision strategy in the HVI color space is employed, effectively separating illumination modulation from texture restoration. Architecturally, to adapt efficient State Space Models (SSMs) for dense prediction, we leverage a Space-to-Depth (S2D) strategy. By folding spatial neighborhoods into channel dimensions, this design allows the model to recover local inductive biases and effectively bridge the "scanning gap" inherent in flattened visual sequences without sacrificing linear complexity. Experiments across seven benchmarks demonstrate that our approach achieves competitive performance and robust controllability, providing a real-world multi-illumination alternative that significantly reduces the reliance on gt-mean post-processing.

Towards Controllable Low-Light Image Enhancement: A Continuous Multi-illumination Dataset and Efficient State Space Framework

Abstract

Low-light image enhancement (LLIE) has traditionally been formulated as a deterministic mapping. However, this paradigm often struggles to account for the ill-posed nature of the task, where unknown ambient conditions and sensor parameters create a multimodal solution space. Consequently, state-of-the-art methods frequently encounter luminance discrepancies between predictions and labels, often necessitating "gt-mean" post-processing to align output luminance for evaluation. To address this fundamental limitation, we propose a transition toward Controllable Low-light Enhancement (CLE), explicitly reformulating the task as a well-posed conditional problem. To this end, we introduce CLE-RWKV, a holistic framework supported by Light100, a new benchmark featuring continuous real-world illumination transitions. To resolve the conflict between luminance control and chromatic fidelity, a noise-decoupled supervision strategy in the HVI color space is employed, effectively separating illumination modulation from texture restoration. Architecturally, to adapt efficient State Space Models (SSMs) for dense prediction, we leverage a Space-to-Depth (S2D) strategy. By folding spatial neighborhoods into channel dimensions, this design allows the model to recover local inductive biases and effectively bridge the "scanning gap" inherent in flattened visual sequences without sacrificing linear complexity. Experiments across seven benchmarks demonstrate that our approach achieves competitive performance and robust controllability, providing a real-world multi-illumination alternative that significantly reduces the reliance on gt-mean post-processing.

Paper Structure

This paper contains 29 sections, 11 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Comparison between static "one-to-one" benchmarks (top) and our Light100 dataset (bottom). While traditional datasets are limited to a single ground-truth mapping, Light100 captures 100 progressive lighting levels. This allows for a flexible mapping governed by the target luminance level $\beta$, accommodating various user preferences and ambient conditions.
  • Figure 2: Visual comparison of state-of-the-art deterministic enhancement methods on the LOL-v2-Real dataset. Most advanced models produce outputs with a higher average luminance than the Test Ground Truth (GT), demonstrating a systematic luminance mismatch that stems from the deterministic mapping paradigm.
  • Figure 3: Quantitative analysis of evaluation bias on the LOL-v2-Real dataset. (a) Luminance distribution analysis: outputs of SOTA methods tend to align with the global Training GT mean rather than adapting to the specific illumination of individual test samples. (b) The "gt-mean" heuristic is commonly employed to evaluate the fidelity of texture and color by neutralizing the global luminance gap, leading to substantial PSNR gains. Here, w/o and w/ denote evaluations without and with this post-hoc adjustment, respectively.
  • Figure 4: Overview of the Proposed CLE-RWKV Framework. (a) The overall pipeline operates in the decoupled HVI color space, where target luminance commands ($\beta$) modulate features via FiLM. (b) The S2D-Net Backbone employs a split-transform-merge strategy, where the L-RWKV Block serves as the core processor. (c) Analysis of the Periodic Shuffle Strategy. Left: The PS-DS Embedding layer (Pixel Shuffle Downsampling with $r$ followed by $1\times 1$ Conv) demonstrates that semantic topology and features are preserved even at lower resolutions. Right: Visualization of the implicit 2D Token Shift. Compared to conventional 1D flattening, our periodic shuffle strategy folds spatial neighborhoods into the channel dimension, effectively bridging the "scanning gap" and enabling the L-RWKV block to capture local details while mitigating long-range forgetting.
  • Figure 5: Resolving the Intensity-Chroma Conflict. (Left) Real physical captures at low illumination suffer from diminished SNR, manifesting as both stochastic noise and severe color shifts. (Middle) The hybrid target synthesizes the target-level intensity ($\mathbf{I}_{max}$) with the pristine chromaticity and texture ($\hat{\mathbf{H}}, \hat{\mathbf{V}}$) extracted from the high-quality reference (Right). This strategy provides a noise-free, color-accurate guidance that decouples illumination control from intrinsic degradation restoration.
  • ...and 3 more figures