SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement
Linlin Hu, Ao Sun, Shijie Hao, Richang Hong, Meng Wang
TL;DR
The paper addresses the limited cross-view interaction in low-light stereo image enhancement by introducing SDI-Net, a dual-branch UNet framework that uses the Cross-View Sufficient Interaction Module (CSIM) to fully leverage left–right correlations. CSIM comprises CAIM for view-level attention and PCAB for channel and pixel-level refinements, enabling comprehensive dual-view feature fusion. The model is trained with a combined loss $L = L_1 + \lambda L_{fre}$, including a frequency-domain term via FFT to preserve textures, with $\lambda$ set to 0.1. Experiments on Middlebury and Synthetic Holopix50k show SDI-Net achieving state-of-the-art PSNR/SSIM, with ablations confirming the necessity of CAIM, PCAB, and the FFT-based loss for best performance, highlighting its practical impact for real-world low-light stereo imaging.
Abstract
Currently, most low-light image enhancement methods only consider information from a single view, neglecting the correlation between cross-view information. Therefore, the enhancement results produced by these methods are often unsatisfactory. In this context, there have been efforts to develop methods specifically for low-light stereo image enhancement. These methods take into account the cross-view disparities and enable interaction between the left and right views, leading to improved performance. However, these methods still do not fully exploit the interaction between left and right view information. To address this issue, we propose a model called Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement (SDI-Net). The backbone structure of SDI-Net is two encoder-decoder pairs, which are used to learn the mapping function from low-light images to normal-light images. Among the encoders and the decoders, we design a module named Cross-View Sufficient Interaction Module (CSIM), aiming to fully exploit the correlations between the binocular views via the attention mechanism. The quantitative and visual results on public datasets validate the superiority of our method over other related methods. Ablation studies also demonstrate the effectiveness of the key elements in our model.
