Table of Contents
Fetching ...

TSCnet: A Text-driven Semantic-level Controllable Framework for Customized Low-Light Image Enhancement

Miao Zhang, Jun Yin, Pengyu Zeng, Yiqing Shen, Shuai Lu, Xueqian Wang

TL;DR

The paper tackles the rigidity of traditional LLIE methods by introducing a text-driven, semantic-level lighting control framework. It combines an LLM-based interpretation of natural language with a Retinex-based Reasoning Segment for target localization, a Text-based Brightness Controllable module for quantitative region-level adjustments, and an Adaptive Contextual Compensation module to fuse multimodal cues, all guiding a diffusion-based synthesizer for final enhancement. Empirical results on LOL and MIT-Adobe FiveK show superior PSNR, SSIM, and perceptual quality, along with robust open-world semantic controllability via prompts. The approach enables personalized, region-specific lighting adjustments in diverse scenes and improves downstream tasks like low-light face detection, signaling strong practical value for real-world imaging systems.

Abstract

Deep learning-based image enhancement methods show significant advantages in reducing noise and improving visibility in low-light conditions. These methods are typically based on one-to-one mapping, where the model learns a direct transformation from low light to specific enhanced images. Therefore, these methods are inflexible as they do not allow highly personalized mapping, even though an individual's lighting preferences are inherently personalized. To overcome these limitations, we propose a new light enhancement task and a new framework that provides customized lighting control through prompt-driven, semantic-level, and quantitative brightness adjustments. The framework begins by leveraging a Large Language Model (LLM) to understand natural language prompts, enabling it to identify target objects for brightness adjustments. To localize these target objects, the Retinex-based Reasoning Segment (RRS) module generates precise target localization masks using reflection images. Subsequently, the Text-based Brightness Controllable (TBC) module adjusts brightness levels based on the generated illumination map. Finally, an Adaptive Contextual Compensation (ACC) module integrates multi-modal inputs and controls a conditional diffusion model to adjust the lighting, ensuring seamless and precise enhancements accurately. Experimental results on benchmark datasets demonstrate our framework's superior performance at increasing visibility, maintaining natural color balance, and amplifying fine details without creating artifacts. Furthermore, its robust generalization capabilities enable complex semantic-level lighting adjustments in diverse open-world environments through natural language interactions.

TSCnet: A Text-driven Semantic-level Controllable Framework for Customized Low-Light Image Enhancement

TL;DR

The paper tackles the rigidity of traditional LLIE methods by introducing a text-driven, semantic-level lighting control framework. It combines an LLM-based interpretation of natural language with a Retinex-based Reasoning Segment for target localization, a Text-based Brightness Controllable module for quantitative region-level adjustments, and an Adaptive Contextual Compensation module to fuse multimodal cues, all guiding a diffusion-based synthesizer for final enhancement. Empirical results on LOL and MIT-Adobe FiveK show superior PSNR, SSIM, and perceptual quality, along with robust open-world semantic controllability via prompts. The approach enables personalized, region-specific lighting adjustments in diverse scenes and improves downstream tasks like low-light face detection, signaling strong practical value for real-world imaging systems.

Abstract

Deep learning-based image enhancement methods show significant advantages in reducing noise and improving visibility in low-light conditions. These methods are typically based on one-to-one mapping, where the model learns a direct transformation from low light to specific enhanced images. Therefore, these methods are inflexible as they do not allow highly personalized mapping, even though an individual's lighting preferences are inherently personalized. To overcome these limitations, we propose a new light enhancement task and a new framework that provides customized lighting control through prompt-driven, semantic-level, and quantitative brightness adjustments. The framework begins by leveraging a Large Language Model (LLM) to understand natural language prompts, enabling it to identify target objects for brightness adjustments. To localize these target objects, the Retinex-based Reasoning Segment (RRS) module generates precise target localization masks using reflection images. Subsequently, the Text-based Brightness Controllable (TBC) module adjusts brightness levels based on the generated illumination map. Finally, an Adaptive Contextual Compensation (ACC) module integrates multi-modal inputs and controls a conditional diffusion model to adjust the lighting, ensuring seamless and precise enhancements accurately. Experimental results on benchmark datasets demonstrate our framework's superior performance at increasing visibility, maintaining natural color balance, and amplifying fine details without creating artifacts. Furthermore, its robust generalization capabilities enable complex semantic-level lighting adjustments in diverse open-world environments through natural language interactions.

Paper Structure

This paper contains 19 sections, 14 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: An overview of low-light enhancement tasks. The figure below illustrates three levels of low-light image enhancement tasks: (a) Global-level Enhancement, which involves uniform brightness adjustment across the entire image; (b) Regional-level enhancement, which includes selective enhancement of certain portions of an image, such as the background or the object of interest; (c) Text-driven Semantic Enhancement, in which natural language prompts are used to adjust the brightness of targeted regions or the entire image, guided by semantic understandings.
  • Figure 2: The overview of our framework, including the RRS Module, TBC Module, ACC Module and control diffusion. The framework begins with input prompts processed through a text encoder, generating task-specific adjustments such as brightness modification. The Retinex-based Reasoning Segment (RRS) identifies target regions via multi-modal input, enhanced by a LoRA-based mechanism. The Text-based Brightness Controllable (TBC) module applies precise brightness modifications using spatial attention. Finally, the Adaptive Contextual Compensation (ACC) module integrates multi-source inputs, ensuring coherent image enhancement through adaptive weight fusion and ControlNet, driving the conditional diffusion process to achieve high-quality, personalized low-light image enhancement.
  • Figure 3: The architecture of the Adaptive Contextual Compensation (ACC) Module. The module processes three key inputs: illumination, mask, and reflection, each passed through spatial depth-wise convolution layers. Cross-attention mechanisms are used to enhance the interaction of features between the different inputs. In the following steps, channel-wise concatenation is followed by an element-wise summation operation to combine the resulting feature maps, followed by an element-wise product operation to refine and adapt the features. The final output feature maps $F_{con1}$ and $F_{con2}$ guide the lighting adjustments in the low-light image $I_{low}$, ensuring context-aware, coherent image enhancement.
  • Figure 4: Visual comparison with other advanced approaches on LOL.
  • Figure 5: Visual comparison with other advanced approaches on LOL2.
  • ...and 10 more figures