Harnessing Multi-resolution and Multi-scale Attention for Underwater Image Restoration
Alik Pramanick, Arijit Sur, V. Vijaya Saradhi
TL;DR
This work tackles underwater image degradation by proposing Lit-Net, a lightweight, multi-stage network that preserves original resolution in the first stage, refines features in a second stage, and reconstructs in the final stage. It introduces MRAN and MSAN to achieve concurrent multi-resolution and multi-scale analysis, using parallel 1×1 encoder branches and attention-based skip connections to maintain spatial precision and semantic richness. A tailored loss combination—weighted color-channel L1 (cl$_1$), perceptual loss, and SSIM loss—drives color fidelity and texture preservation, yielding state-of-the-art PSNR/SSIM on EUVP, UIEB, and SUIM-E, with strong qualitative results and favorable perceptual metrics. The approach demonstrates tangible benefits for downstream underwater perception tasks like semantic segmentation and object detection, suggesting practical impact for AUVs and surveillance, with code available at GitHub for reproducibility.
Abstract
Underwater imagery is often compromised by factors such as color distortion and low contrast, posing challenges for high-level vision tasks. Recent underwater image restoration (UIR) methods either analyze the input image at full resolution, resulting in spatial richness but contextual weakness, or progressively from high to low resolution, yielding reliable semantic information but reduced spatial accuracy. Here, we propose a lightweight multi-stage network called Lit-Net that focuses on multi-resolution and multi-scale image analysis for restoring underwater images while retaining original resolution during the first stage, refining features in the second, and focusing on reconstruction in the final stage. Our novel encoder block utilizes parallel $1\times1$ convolution layers to capture local information and speed up operations. Further, we incorporate a modified weighted color channel-specific $l_1$ loss ($cl_1$) function to recover color and detail information. Extensive experimentations on publicly available datasets suggest our model's superiority over recent state-of-the-art methods, with significant improvement in qualitative and quantitative measures, such as $29.477$ dB PSNR ($1.92\%$ improvement) and $0.851$ SSIM ($2.87\%$ improvement) on the EUVP dataset. The contributions of Lit-Net offer a more robust approach to underwater image enhancement and super-resolution, which is of considerable importance for underwater autonomous vehicles and surveillance. The code is available at: https://github.com/Alik033/Lit-Net.
