Table of Contents
Fetching ...

Row-Column Separated Attention Based Low-Light Image/Video Enhancement

Chengqi Dong, Zhiyuan Cao, Tuoshi Qi, Kexin Wu, Yixing Gao, Fan Tang

TL;DR

This work introduces a lightweight Row-Column Separated Attention (RCSA) module that provides global guidance by modeling row and column statistics, integrated after an improved U-Net to form the U-RCSANet for low-light image and video enhancement. The RCSA uses mean/max row/column features to compute pixel-level attention with significantly fewer parameters, enabling efficient global information fusion. Temporal consistency for video is enforced with two dedicated loss functions, addressing flicker and temporal stability without heavy optical-flow reliance. Extensive experiments on LOL, MIT FiveK, and SDSD demonstrate superior image quality metrics and competitive temporal performance, with code publicly available for reproducibility.

Abstract

U-Net structure is widely used for low-light image/video enhancement. The enhanced images result in areas with large local noise and loss of more details without proper guidance for global information. Attention mechanisms can better focus on and use global information. However, attention to images could significantly increase the number of parameters and computations. We propose a Row-Column Separated Attention module (RCSA) inserted after an improved U-Net. The RCSA module's input is the mean and maximum of the row and column of the feature map, which utilizes global information to guide local information with fewer parameters. We propose two temporal loss functions to apply the method to low-light video enhancement and maintain temporal consistency. Extensive experiments on the LOL, MIT Adobe FiveK image, and SDSD video datasets demonstrate the effectiveness of our approach. The code is publicly available at https://github.com/cq-dong/URCSA.

Row-Column Separated Attention Based Low-Light Image/Video Enhancement

TL;DR

This work introduces a lightweight Row-Column Separated Attention (RCSA) module that provides global guidance by modeling row and column statistics, integrated after an improved U-Net to form the U-RCSANet for low-light image and video enhancement. The RCSA uses mean/max row/column features to compute pixel-level attention with significantly fewer parameters, enabling efficient global information fusion. Temporal consistency for video is enforced with two dedicated loss functions, addressing flicker and temporal stability without heavy optical-flow reliance. Extensive experiments on LOL, MIT FiveK, and SDSD demonstrate superior image quality metrics and competitive temporal performance, with code publicly available for reproducibility.

Abstract

U-Net structure is widely used for low-light image/video enhancement. The enhanced images result in areas with large local noise and loss of more details without proper guidance for global information. Attention mechanisms can better focus on and use global information. However, attention to images could significantly increase the number of parameters and computations. We propose a Row-Column Separated Attention module (RCSA) inserted after an improved U-Net. The RCSA module's input is the mean and maximum of the row and column of the feature map, which utilizes global information to guide local information with fewer parameters. We propose two temporal loss functions to apply the method to low-light video enhancement and maintain temporal consistency. Extensive experiments on the LOL, MIT Adobe FiveK image, and SDSD video datasets demonstrate the effectiveness of our approach. The code is publicly available at https://github.com/cq-dong/URCSA.
Paper Structure (14 sections, 6 equations, 13 figures, 10 tables)

This paper contains 14 sections, 6 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: The light flow results of the low and normal light scenes are calculated by the GMflow xu2022gmflow model, in which the light flow of the dark scene is more blurred and missing many details.
  • Figure 2: The framework of U-RCSANet. U-RCSANet consists of three U-RCSA blocks with the same parameters. Each U-RCSA block has an improved U-Net and a Row-Column Separated Attention module.
  • Figure 3: Row-Column Separated Attention module. Average and maximum attention can be obtained through column mean and maximum value.
  • Figure 4: The results of dark light image enhancement for different sizes. From right to left, the image sizes are 1512$\times$1036, 768$\times$512, 512$\times$352, and 384$\times$256.
  • Figure 5: MIT FiveK dataset low light image, normal light image and enhanced image brightness visualization results, it can be found that the brightness of the enhanced image is consistent with the brightness of the normal light image through the degree of fluctuation of the three curves can be sent to achieve the model can be adaptive to the dark light image brightness enhancement.
  • ...and 8 more figures