Table of Contents
Fetching ...

SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement

Linlin Hu, Ao Sun, Shijie Hao, Richang Hong, Meng Wang

TL;DR

The paper addresses the limited cross-view interaction in low-light stereo image enhancement by introducing SDI-Net, a dual-branch UNet framework that uses the Cross-View Sufficient Interaction Module (CSIM) to fully leverage left–right correlations. CSIM comprises CAIM for view-level attention and PCAB for channel and pixel-level refinements, enabling comprehensive dual-view feature fusion. The model is trained with a combined loss $L = L_1 + \lambda L_{fre}$, including a frequency-domain term via FFT to preserve textures, with $\lambda$ set to 0.1. Experiments on Middlebury and Synthetic Holopix50k show SDI-Net achieving state-of-the-art PSNR/SSIM, with ablations confirming the necessity of CAIM, PCAB, and the FFT-based loss for best performance, highlighting its practical impact for real-world low-light stereo imaging.

Abstract

Currently, most low-light image enhancement methods only consider information from a single view, neglecting the correlation between cross-view information. Therefore, the enhancement results produced by these methods are often unsatisfactory. In this context, there have been efforts to develop methods specifically for low-light stereo image enhancement. These methods take into account the cross-view disparities and enable interaction between the left and right views, leading to improved performance. However, these methods still do not fully exploit the interaction between left and right view information. To address this issue, we propose a model called Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement (SDI-Net). The backbone structure of SDI-Net is two encoder-decoder pairs, which are used to learn the mapping function from low-light images to normal-light images. Among the encoders and the decoders, we design a module named Cross-View Sufficient Interaction Module (CSIM), aiming to fully exploit the correlations between the binocular views via the attention mechanism. The quantitative and visual results on public datasets validate the superiority of our method over other related methods. Ablation studies also demonstrate the effectiveness of the key elements in our model.

SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement

TL;DR

The paper addresses the limited cross-view interaction in low-light stereo image enhancement by introducing SDI-Net, a dual-branch UNet framework that uses the Cross-View Sufficient Interaction Module (CSIM) to fully leverage left–right correlations. CSIM comprises CAIM for view-level attention and PCAB for channel and pixel-level refinements, enabling comprehensive dual-view feature fusion. The model is trained with a combined loss , including a frequency-domain term via FFT to preserve textures, with set to 0.1. Experiments on Middlebury and Synthetic Holopix50k show SDI-Net achieving state-of-the-art PSNR/SSIM, with ablations confirming the necessity of CAIM, PCAB, and the FFT-based loss for best performance, highlighting its practical impact for real-world low-light stereo imaging.

Abstract

Currently, most low-light image enhancement methods only consider information from a single view, neglecting the correlation between cross-view information. Therefore, the enhancement results produced by these methods are often unsatisfactory. In this context, there have been efforts to develop methods specifically for low-light stereo image enhancement. These methods take into account the cross-view disparities and enable interaction between the left and right views, leading to improved performance. However, these methods still do not fully exploit the interaction between left and right view information. To address this issue, we propose a model called Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement (SDI-Net). The backbone structure of SDI-Net is two encoder-decoder pairs, which are used to learn the mapping function from low-light images to normal-light images. Among the encoders and the decoders, we design a module named Cross-View Sufficient Interaction Module (CSIM), aiming to fully exploit the correlations between the binocular views via the attention mechanism. The quantitative and visual results on public datasets validate the superiority of our method over other related methods. Ablation studies also demonstrate the effectiveness of the key elements in our model.
Paper Structure (14 sections, 14 equations, 6 figures, 2 tables)

This paper contains 14 sections, 14 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: An example of low-light image enhancement based on the recently proposed monocular (single image) low-light enhancement method MIRNetzamir2020learning and binocular (stereo image) low-light enhancement method DCI-Netzheng2023decoupled, and our SDI-Net. The error maps represent the performance on restoring image details, in which darker pixels indicate larger errors from the ground-truth (GT) image. The image is from the Holopix50k dataset.
  • Figure 2: The overall framework of the proposed SDI-Net, which is consisted of three stages, i.e., Feature Encoder, Cross-View Sufficient Interaction, and Feature Decoder.
  • Figure 3: The architecture of Cross-View Sufficient Interaction Module (CSIM). It contains Cross-view Attention Interaction Modul (CAIM) and Pixel and Channel Attention Block (PCAB)
  • Figure 4: Visual comparison of the enhancement results on the Middlebury dataset. Better with a zoomed-in view.
  • Figure 5: Visual comparison of the enhancement results on the Middlebury dataset. In the error maps, darker pixels indicate larger errors. Better with a zoomed-in view.
  • ...and 1 more figures