Table of Contents
Fetching ...

FDCE-Net: Underwater Image Enhancement with Embedding Frequency and Dual Color Encoder

Zheng Cheng, Guodong Fan, Jingchun Zhou, Min Gan, C. L. Philip Chen

TL;DR

Underwater imaging suffers from brightness loss, color distortion, and texture degradation due to light absorption and scattering. The authors introduce FDCE-Net, combining a Frequency Spatial Network (FS-Net) that decouples degradation factors in the frequency domain with a Dual Color Encoder (DCE) that learns adaptive, semantically-aware color queries through cross-attention, fused via a Fusion-Net. They impose a multi-component loss, including SSIM, reconstruction, color-histogram constraints, and perceptual terms, and demonstrate state-of-the-art performance on paired and unpaired UIE datasets, with better color fidelity, texture, and noise handling, as well as improved downstream detection performance. This approach leverages FFT-based insights to separate brightness/color from texture/noise, enabling more balanced enhancement across challenging underwater scenes and offering practical benefits for robotics and vision systems in marine environments.

Abstract

Underwater images often suffer from various issues such as low brightness, color shift, blurred details, and noise due to light absorption and scattering caused by water and suspended particles. Previous underwater image enhancement (UIE) methods have primarily focused on spatial domain enhancement, neglecting the frequency domain information inherent in the images. However, the degradation factors of underwater images are closely intertwined in the spatial domain. Although certain methods focus on enhancing images in the frequency domain, they overlook the inherent relationship between the image degradation factors and the information present in the frequency domain. As a result, these methods frequently enhance certain attributes of the improved image while inadequately addressing or even exacerbating other attributes. Moreover, many existing methods heavily rely on prior knowledge to address color shift problems in underwater images, limiting their flexibility and robustness. In order to overcome these limitations, we propose the Embedding Frequency and Dual Color Encoder Network (FDCE-Net) in our paper. The FDCE-Net consists of two main structures: (1) Frequency Spatial Network (FS-Net) aims to achieve initial enhancement by utilizing our designed Frequency Spatial Residual Block (FSRB) to decouple image degradation factors in the frequency domain and enhance different attributes separately. (2) To tackle the color shift issue, we introduce the Dual-Color Encoder (DCE). The DCE establishes correlations between color and semantic representations through cross-attention and leverages multi-scale image features to guide the optimization of adaptive color query. The final enhanced images are generated by combining the outputs of FS-Net and DCE through a fusion network. These images exhibit rich details, clear textures, low noise and natural colors.

FDCE-Net: Underwater Image Enhancement with Embedding Frequency and Dual Color Encoder

TL;DR

Underwater imaging suffers from brightness loss, color distortion, and texture degradation due to light absorption and scattering. The authors introduce FDCE-Net, combining a Frequency Spatial Network (FS-Net) that decouples degradation factors in the frequency domain with a Dual Color Encoder (DCE) that learns adaptive, semantically-aware color queries through cross-attention, fused via a Fusion-Net. They impose a multi-component loss, including SSIM, reconstruction, color-histogram constraints, and perceptual terms, and demonstrate state-of-the-art performance on paired and unpaired UIE datasets, with better color fidelity, texture, and noise handling, as well as improved downstream detection performance. This approach leverages FFT-based insights to separate brightness/color from texture/noise, enabling more balanced enhancement across challenging underwater scenes and offering practical benefits for robotics and vision systems in marine environments.

Abstract

Underwater images often suffer from various issues such as low brightness, color shift, blurred details, and noise due to light absorption and scattering caused by water and suspended particles. Previous underwater image enhancement (UIE) methods have primarily focused on spatial domain enhancement, neglecting the frequency domain information inherent in the images. However, the degradation factors of underwater images are closely intertwined in the spatial domain. Although certain methods focus on enhancing images in the frequency domain, they overlook the inherent relationship between the image degradation factors and the information present in the frequency domain. As a result, these methods frequently enhance certain attributes of the improved image while inadequately addressing or even exacerbating other attributes. Moreover, many existing methods heavily rely on prior knowledge to address color shift problems in underwater images, limiting their flexibility and robustness. In order to overcome these limitations, we propose the Embedding Frequency and Dual Color Encoder Network (FDCE-Net) in our paper. The FDCE-Net consists of two main structures: (1) Frequency Spatial Network (FS-Net) aims to achieve initial enhancement by utilizing our designed Frequency Spatial Residual Block (FSRB) to decouple image degradation factors in the frequency domain and enhance different attributes separately. (2) To tackle the color shift issue, we introduce the Dual-Color Encoder (DCE). The DCE establishes correlations between color and semantic representations through cross-attention and leverages multi-scale image features to guide the optimization of adaptive color query. The final enhanced images are generated by combining the outputs of FS-Net and DCE through a fusion network. These images exhibit rich details, clear textures, low noise and natural colors.
Paper Structure (23 sections, 11 equations, 13 figures, 7 tables, 1 algorithm)

This paper contains 23 sections, 11 equations, 13 figures, 7 tables, 1 algorithm.

Figures (13)

  • Figure 1: We selected the PSNR and SSIM metrics for the paired dataset and the UIQM, UCIQE and UIF metrics for the unpaired dataset to plot the radar charts, with the coordinate points farther away from the center representing the better performance in that particular metric.
  • Figure 2: Visual results of SOTA UIE methods. We can see that existing methods cannot cope with challenging underwater image well, where color, brightness, noise and texture details are not handled in a balanced manner.
  • Figure 3: (a) By interchanging the amplitude and phase of the damaged and reference images in the frequency domain and reconstructing the images we find that the key factors causing underwater image degradation can be decoupled in the frequency domain where brightness and color information are expressed in amplitude while texture details and noise are reflected in phase. (b) Decomposing an image using FFT into phase and amplitude, then reconstructing images with either component reveals that phase mainly encodes texture and structure, while amplitude captures color and illumination details.
  • Figure 4: Overview of the proposed FDCE-Net architecture, which comprises two main components: FS-Net and DCE, enhances an underwater image $x$ in an end-to-end fashion. Initially, $x$ is simultaneously inputted into FS-Net and DCE, where FS-Net provides preliminary enhancement to produce $\hat{y}$. Concurrently, the first color encoder extracts its multi-scale features, followed by the second color encoder performing color queries on visual features at different scales to learn semantic-aware color representations. The Fusion-Net combines the outputs of FS-Net and DCE to generate a well-enhanced image $y'$.
  • Figure 5: (a) Schematic illustration of the proposed Frequency Spatial Residual Block (b) The design of the transformer-based color encoder block involves taking image features and trainable color queries as input and establishing a connection between semantic and color representation through cross-attention, self-attention, and feed-forward operations.
  • ...and 8 more figures