Table of Contents
Fetching ...

DEFormer: DCT-driven Enhancement Transformer for Low-light Image and Dark Vision

Xiangchen Yin, Zhenda Yu, Xin Gao, Xiao Sun

TL;DR

DEFormer addresses low-light image enhancement by introducing a frequency-guided transformer framework. It introduces a Learnable Frequency Branch (LFB) that integrates DCT-based frequency cues and curvature-based frequency enhancement, along with Cross Domain Fusion (CDF) to align RGB features with frequency information. The approach yields state-of-the-art results on the LOL and MIT-Adobe FiveK datasets and improves downstream dark-object detection on ExDark when used in end-to-end detector training. The combination of frequency-domain cues with a transformer backbone provides improved texture recovery in dark regions at a realistic computational cost.

Abstract

Low-light image enhancement restores the colors and details of a single image and improves high-level visual tasks. However, restoring the lost details in the dark area is still a challenge relying only on the RGB domain. In this paper, we delve into frequency as a new clue into the model and propose a DCT-driven enhancement transformer (DEFormer) framework. First, we propose a learnable frequency branch (LFB) for frequency enhancement contains DCT processing and curvature-based frequency enhancement (CFE) to represent frequency features. Additionally, we propose a cross domain fusion (CDF) to reduce the differences between the RGB domain and the frequency domain. Our DEFormer has achieved superior results on the LOL and MIT-Adobe FiveK datasets, improving the dark detection performance.

DEFormer: DCT-driven Enhancement Transformer for Low-light Image and Dark Vision

TL;DR

DEFormer addresses low-light image enhancement by introducing a frequency-guided transformer framework. It introduces a Learnable Frequency Branch (LFB) that integrates DCT-based frequency cues and curvature-based frequency enhancement, along with Cross Domain Fusion (CDF) to align RGB features with frequency information. The approach yields state-of-the-art results on the LOL and MIT-Adobe FiveK datasets and improves downstream dark-object detection on ExDark when used in end-to-end detector training. The combination of frequency-domain cues with a transformer backbone provides improved texture recovery in dark regions at a realistic computational cost.

Abstract

Low-light image enhancement restores the colors and details of a single image and improves high-level visual tasks. However, restoring the lost details in the dark area is still a challenge relying only on the RGB domain. In this paper, we delve into frequency as a new clue into the model and propose a DCT-driven enhancement transformer (DEFormer) framework. First, we propose a learnable frequency branch (LFB) for frequency enhancement contains DCT processing and curvature-based frequency enhancement (CFE) to represent frequency features. Additionally, we propose a cross domain fusion (CDF) to reduce the differences between the RGB domain and the frequency domain. Our DEFormer has achieved superior results on the LOL and MIT-Adobe FiveK datasets, improving the dark detection performance.
Paper Structure (9 sections, 7 equations, 5 figures, 2 tables)

This paper contains 9 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: (a) represents the input, (b) represents the frequency spectrum with DCT processing. The DCT coefficient matrix concentrate the energy of the image signal. (c) represents the performance comparison between other SOTA methods on the MIT-Adobe FiveK dataset, the x-axis represents Flops and the y-axis represents SSIM.
  • Figure 2: Overview of DEFormer. In the learnable frequency branch (LFB), we introduce frequency clues through DCT processing and curve-based frequency enhancement. In the cross domain fusion (CDF), the difference between the different domains is reduced through cross fusion.
  • Figure 3: Details of cross domain fusion (CDF). We first complement the information between different domains through cross fusion, controling the spatial information through a soft attention to reduce noise propagation.
  • Figure 4: Visualization of dark object detection on the ExDark dataset. MBLLEN directly predict the enhanced image and apply detector training, DEFormer uses end-to-end training.
  • Figure 5: Visualization of different low-light image enhancement methods on the LOL dataset. Each row is a different image sample. We advise that zoom in to observe the details.