Table of Contents
Fetching ...

CAPformer: Compression-Aware Pre-trained Transformer for Low-Light Image Enhancement

Wei Wang, Zhi Jin

TL;DR

This work tackles the challenge of low-light image enhancement when JPEG compression degrades dark-region information. It introduces CAPformer, a compression-aware pre-trained Transformer with a Brightness-Guided Self-Attention (BGSA) mechanism, designed to learn lossless information from uncompressed low-light data and suppress unreliable information from very dark regions during enhancement. The model employs a U-shaped encoder–decoder with a Transformer bottleneck and a pre-training–fine-tuning strategy, achieving state-of-the-art PSNR/SSIM on JPEG LLIE benchmarks and strong qualitative results with fewer artifacts and better color fidelity. The approach enables robust LLIE in resource-constrained settings, benefiting mobile photography where storage and transmission constraints are common.

Abstract

Low-Light Image Enhancement (LLIE) has advanced with the surge in phone photography demand, yet many existing methods neglect compression, a crucial concern for resource-constrained phone photography. Most LLIE methods overlook this, hindering their effectiveness. In this study, we investigate the effects of JPEG compression on low-light images and reveal substantial information loss caused by JPEG due to widespread low pixel values in dark areas. Hence, we propose the Compression-Aware Pre-trained Transformer (CAPformer), employing a novel pre-training strategy to learn lossless information from uncompressed low-light images. Additionally, the proposed Brightness-Guided Self-Attention (BGSA) mechanism enhances rational information gathering. Experiments demonstrate the superiority of our approach in mitigating compression effects on LLIE, showcasing its potential for improving LLIE in resource-constrained scenarios.

CAPformer: Compression-Aware Pre-trained Transformer for Low-Light Image Enhancement

TL;DR

This work tackles the challenge of low-light image enhancement when JPEG compression degrades dark-region information. It introduces CAPformer, a compression-aware pre-trained Transformer with a Brightness-Guided Self-Attention (BGSA) mechanism, designed to learn lossless information from uncompressed low-light data and suppress unreliable information from very dark regions during enhancement. The model employs a U-shaped encoder–decoder with a Transformer bottleneck and a pre-training–fine-tuning strategy, achieving state-of-the-art PSNR/SSIM on JPEG LLIE benchmarks and strong qualitative results with fewer artifacts and better color fidelity. The approach enables robust LLIE in resource-constrained settings, benefiting mobile photography where storage and transmission constraints are common.

Abstract

Low-Light Image Enhancement (LLIE) has advanced with the surge in phone photography demand, yet many existing methods neglect compression, a crucial concern for resource-constrained phone photography. Most LLIE methods overlook this, hindering their effectiveness. In this study, we investigate the effects of JPEG compression on low-light images and reveal substantial information loss caused by JPEG due to widespread low pixel values in dark areas. Hence, we propose the Compression-Aware Pre-trained Transformer (CAPformer), employing a novel pre-training strategy to learn lossless information from uncompressed low-light images. Additionally, the proposed Brightness-Guided Self-Attention (BGSA) mechanism enhances rational information gathering. Experiments demonstrate the superiority of our approach in mitigating compression effects on LLIE, showcasing its potential for improving LLIE in resource-constrained scenarios.
Paper Structure (15 sections, 4 equations, 4 figures, 2 tables)

This paper contains 15 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The results on uncompressed and compressed low-light images of SOTA method SNR snr and our CAPformer. Compression is set at JPEG QF80. SNR performs admirably on uncompressed images but yields unsatisfactory results when dealing with compressed ones. This trend is also observed in other SOTA methods, such as Retinexformer retinexformer. In contrast, our method exhibits superior results on compressed images. Zoom in for a better view.
  • Figure 2: The track of the JPEG process. The loss map on the right shows how severe the loss is. A higher density of colored areas indicates more significant information loss. Though compressed at a quite high QF of 80, the low-light image still suffers severe information loss concentrated in darker regions.
  • Figure 3: Overview of the proposed method. CAPformer is pre-trained and then fine-tuned for enhancement. We employ a convolution layer with a stride of 2 for downsampling in the Encoder of CAPformer. The Decoder is symmetric to the Encoder, and the upsampling is implemented using the pixel shuffle layer. In BGSA, the black columns represent -1.0e9. After the $Softmax$ operation, these columns turn gray, representing nearly 0.
  • Figure 4: Visual comparison on LOLv1-JPEG, LOLv2-Real-JPEG, and LOLv2-Synthetic-JPEG (top to bottom). All inputs are compressed at QF 80. Our method yields fewer artifacts and better color consistency than others.