Table of Contents
Fetching ...

GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval

Han Zhou, Wei Dong, Xiaohong Liu, Shuaicheng Liu, Xiongkuo Min, Guangtao Zhai, Jun Chen

TL;DR

GLARE tackles LLIE as an ill-posed problem by injecting a strong external prior: a normal-light codebook learned from NL images via VQGAN. It couples this prior with an invertible latent normalizing flow (I-LNF) to align LL latent features with NL representations, enabling accurate codebook retrieval. An Adaptive Feature Transformation with a dual-decoder, including an Adaptive Mix-up Block, flexibly merges LL cues into decoding to boost fidelity while preserving realism. Across multiple datasets, GLARE achieves state-of-the-art LLIE performance and also improves downstream low-light object detection when used as a preprocessing step, with code released for public use.

Abstract

Most existing Low-light Image Enhancement (LLIE) methods either directly map Low-Light (LL) to Normal-Light (NL) images or use semantic or illumination maps as guides. However, the ill-posed nature of LLIE and the difficulty of semantic retrieval from impaired inputs limit these methods, especially in extremely low-light conditions. To address this issue, we present a new LLIE network via Generative LAtent feature based codebook REtrieval (GLARE), in which the codebook prior is derived from undegraded NL images using a Vector Quantization (VQ) strategy. More importantly, we develop a generative Invertible Latent Normalizing Flow (I-LNF) module to align the LL feature distribution to NL latent representations, guaranteeing the correct code retrieval in the codebook. In addition, a novel Adaptive Feature Transformation (AFT) module, featuring an adjustable function for users and comprising an Adaptive Mix-up Block (AMB) along with a dual-decoder architecture, is devised to further enhance fidelity while preserving the realistic details provided by codebook prior. Extensive experiments confirm the superior performance of GLARE on various benchmark datasets and real-world data. Its effectiveness as a preprocessing tool in low-light object detection tasks further validates GLARE for high-level vision applications. Code is released at https://github.com/LowLevelAI/GLARE.

GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval

TL;DR

GLARE tackles LLIE as an ill-posed problem by injecting a strong external prior: a normal-light codebook learned from NL images via VQGAN. It couples this prior with an invertible latent normalizing flow (I-LNF) to align LL latent features with NL representations, enabling accurate codebook retrieval. An Adaptive Feature Transformation with a dual-decoder, including an Adaptive Mix-up Block, flexibly merges LL cues into decoding to boost fidelity while preserving realism. Across multiple datasets, GLARE achieves state-of-the-art LLIE performance and also improves downstream low-light object detection when used as a preprocessing step, with code released for public use.

Abstract

Most existing Low-light Image Enhancement (LLIE) methods either directly map Low-Light (LL) to Normal-Light (NL) images or use semantic or illumination maps as guides. However, the ill-posed nature of LLIE and the difficulty of semantic retrieval from impaired inputs limit these methods, especially in extremely low-light conditions. To address this issue, we present a new LLIE network via Generative LAtent feature based codebook REtrieval (GLARE), in which the codebook prior is derived from undegraded NL images using a Vector Quantization (VQ) strategy. More importantly, we develop a generative Invertible Latent Normalizing Flow (I-LNF) module to align the LL feature distribution to NL latent representations, guaranteeing the correct code retrieval in the codebook. In addition, a novel Adaptive Feature Transformation (AFT) module, featuring an adjustable function for users and comprising an Adaptive Mix-up Block (AMB) along with a dual-decoder architecture, is devised to further enhance fidelity while preserving the realistic details provided by codebook prior. Extensive experiments confirm the superior performance of GLARE on various benchmark datasets and real-world data. Its effectiveness as a preprocessing tool in low-light object detection tasks further validates GLARE for high-level vision applications. Code is released at https://github.com/LowLevelAI/GLARE.
Paper Structure (20 sections, 10 equations, 9 figures, 7 tables)

This paper contains 20 sections, 10 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: (a) GLARE significantly outperforms SOTA methods on LOL LOLv1. (b) GLARE can generate appealing results on both LOL (upper) and real-world (lower) images.
  • Figure 2: (a) T-SNE t-SNE visualization of distributions of NL features, LL features, and LL-NF features. Compared to LL features, NF-LL features are better aligned with NL features. (b) Visual observations on each stage of GLARE on LOL LOLv1 dataset. Column $2$-$4$ represent the enhanced results from LL-feat, the enhanced images from NF-LL-feat, and the final results of our GLARE. From column $2$-$4$, we observe a noticeable improvement on visibility, color preservation and detail recovery, which demonstrates the effectiveness of each stage of our GLARE. [Key: NL-feat: generated by NL encoder with NL inputs, LL-feat: LL features obtained form the fine-tuned NL encoder (Stage I), NF-LL-feat: LL features generated by our generative I-LNF module (Stage II)]
  • Figure 3: The overall architecture of our proposed GLARE. There are three training stages in our model. Stage I aims to learn a comprehensive normal-light codebook $\bm{\mathcal{C}}$ using VQGAN. In Stage II training, given the $\mathbf{c}_{ll}$ and $\mathbf{z}_{ll}$ generated by the conditional encoder $E_{c}$ and the convolution layer respectively, I-LNF module $f_{\bm{\theta}}$ learns to transform the normal-light feature $\mathbf{z}_{nl}$ to a simplified Gaussian distribution $\mathbf{v} = f_{\bm{\theta}}(\mathbf{z}_{nl}; \mathbf{c}_{ll})$ with the mean of $\mathbf{z}_{ll}$. We optimize $E_{c}$ and $f_{\bm{\theta}}$ by minimizing the negative log-likelihood described in Eq. \ref{['eq_nll']}. In Stage III, the codebook $\bm{\mathcal{C}}$, the NL decoder $D_{nl}$, the conditional encoder $E_{c}$, and the I-LNF $f_{\bm{\theta}}$ are all fixed. Our GLARE can effectively transform a Gaussian density $p_{\bm{v}}(\mathbf{v})\sim \mathcal{N}(\mathbf{z}_{ll}, \bm{\Sigma})$ to the NL feature distribution $p_{\mathbf{z}_{nl}|\mathbf{c}_{ll} }(\mathbf{z}_{nl}|\mathbf{c}_{ll}, \bm{\theta})$. To further improve the enhancement performance, we propose an adaptive feature transformation strategy. By leveraging $D_{mf}$ and AMB to flexibly incorporate LL information for decoding ($\mathbf{F}_d = DeConv(\mathbf{F}_{nl}, AMB(\mathbf{F}_c, \mathbf{F}_{mf}))$), our GLARE is capable to generate results with more refined texture and details. [Key: $\bm{\Sigma}$: The unit variance]
  • Figure 4: Visual comparisons on LOL LOLv1 dataset. Our method can effectively enhance visibility and generate visually appealing results.
  • Figure 5: Visual comparisons on LOL-v2-real LOLv2 (top) and LOL-v2-synthetic LOLv2 (bottom) datasets. Previous methods suffer from either severe color distortion or detail deficiency, while our GLARE performs favorably without these issues.
  • ...and 4 more figures