Table of Contents
Fetching ...

Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement

Guanlin Li, Ke Zhang, Ting Wang, Ming Li, Bin Zhao, Xuelong Li

TL;DR

This work proposes a mean-teacher-based semi-supervised low-light enhancement (Semi-LLIE) framework that integrates the unpaired data into model training, and proposes novel perceptive loss based on the large-scale vision-language Recognize Anything Model (RAM) to help generate enhanced images with richer textual details.

Abstract

Despite the impressive advancements made in recent low-light image enhancement techniques, the scarcity of paired data has emerged as a significant obstacle to further advancements. This work proposes a mean-teacher-based semi-supervised low-light enhancement (Semi-LLIE) framework that integrates the unpaired data into model training. The mean-teacher technique is a prominent semi-supervised learning method, successfully adopted for addressing high-level and low-level vision tasks. However, two primary issues hinder the naive mean-teacher method from attaining optimal performance in low-light image enhancement. Firstly, pixel-wise consistency loss is insufficient for transferring realistic illumination distribution from the teacher to the student model, which results in color cast in the enhanced images. Secondly, cutting-edge image enhancement approaches fail to effectively cooperate with the mean-teacher framework to restore detailed information in dark areas due to their tendency to overlook modeling structured information within local regions. To mitigate the above issues, we first introduce a semantic-aware contrastive loss to faithfully transfer the illumination distribution, contributing to enhancing images with natural colors. Then, we design a Mamba-based low-light image enhancement backbone to effectively enhance Mamba's local region pixel relationship representation ability with a multi-scale feature learning scheme, facilitating the generation of images with rich textural details. Further, we propose novel perceptive loss based on the large-scale vision-language Recognize Anything Model (RAM) to help generate enhanced images with richer textual details. The experimental results indicate that our Semi-LLIE surpasses existing methods in both quantitative and qualitative metrics.

Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement

TL;DR

This work proposes a mean-teacher-based semi-supervised low-light enhancement (Semi-LLIE) framework that integrates the unpaired data into model training, and proposes novel perceptive loss based on the large-scale vision-language Recognize Anything Model (RAM) to help generate enhanced images with richer textual details.

Abstract

Despite the impressive advancements made in recent low-light image enhancement techniques, the scarcity of paired data has emerged as a significant obstacle to further advancements. This work proposes a mean-teacher-based semi-supervised low-light enhancement (Semi-LLIE) framework that integrates the unpaired data into model training. The mean-teacher technique is a prominent semi-supervised learning method, successfully adopted for addressing high-level and low-level vision tasks. However, two primary issues hinder the naive mean-teacher method from attaining optimal performance in low-light image enhancement. Firstly, pixel-wise consistency loss is insufficient for transferring realistic illumination distribution from the teacher to the student model, which results in color cast in the enhanced images. Secondly, cutting-edge image enhancement approaches fail to effectively cooperate with the mean-teacher framework to restore detailed information in dark areas due to their tendency to overlook modeling structured information within local regions. To mitigate the above issues, we first introduce a semantic-aware contrastive loss to faithfully transfer the illumination distribution, contributing to enhancing images with natural colors. Then, we design a Mamba-based low-light image enhancement backbone to effectively enhance Mamba's local region pixel relationship representation ability with a multi-scale feature learning scheme, facilitating the generation of images with rich textural details. Further, we propose novel perceptive loss based on the large-scale vision-language Recognize Anything Model (RAM) to help generate enhanced images with richer textual details. The experimental results indicate that our Semi-LLIE surpasses existing methods in both quantitative and qualitative metrics.
Paper Structure (30 sections, 10 equations, 7 figures, 7 tables)

This paper contains 30 sections, 10 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Comparison with state-of-the-art (SOTA) supervised method Retinexformer cai2023retinexformer and unsupervised method SCI ma2022sci on the Visdrone dataset. It can be observed that the image enhanced by our method exhibits high contrast and visual-friendly textures. Other methods suffer from either textural detail distortion (b) or color deviation (c) as shown in the area marked by the green rectangles.
  • Figure 2: Architecture of our Semi-LLIE for low-light image enhancement. The Semi-LLIE employs the mean teacher paradigm, composed of a teacher and a student model. To faithfully transfer the illumination distribution from the teacher to the student model and reduce the color cast problem, we design a RAM-based semantic-aware contrastive loss $\mathcal{L}_{scr}$. To facilitate generating visual-friendly enhanced images with rich textural details, we design a RAM-based perceptual loss $\mathcal{L}_{ramper}$. The teacher model's weights are updated using the EMA from the student model.
  • Figure 3: The pipeline of our Mamba-based low-light image enhancement backbone (a), which mainly consists of an illumination estimation module (IEM) and an illumination-guided enhancement module (IGEM). The IGEM processes two building blocks, the multi-scale state space block (MSSB) (b) and the multi-scale state space module (MSSM) (c).
  • Figure 4: Visual comparison results of various methods on the unpaired test dataset of Visdrone. Best zoomed in for detail.
  • Figure 5: Visual comparison results of various methods on the paired test dataset of LSRW. Best zoomed in for detail.
  • ...and 2 more figures