Table of Contents
Fetching ...

End-To-End Underwater Video Enhancement: Dataset and Model

Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Jianwei Niu

TL;DR

This study constructs the Synthetic Underwater Video Enhancement (SUVE) dataset, comprising 840 diverse underwater-style videos paired with ground-truth reference videos, and trains a novel underwater video enhancement model, UVENet, which utilizes inter-frame relationships to achieve better enhancement performance.

Abstract

Underwater video enhancement (UVE) aims to improve the visibility and frame quality of underwater videos, which has significant implications for marine research and exploration. However, existing methods primarily focus on developing image enhancement algorithms to enhance each frame independently. There is a lack of supervised datasets and models specifically tailored for UVE tasks. To fill this gap, we construct the Synthetic Underwater Video Enhancement (SUVE) dataset, comprising 840 diverse underwater-style videos paired with ground-truth reference videos. Based on this dataset, we train a novel underwater video enhancement model, UVENet, which utilizes inter-frame relationships to achieve better enhancement performance. Through extensive experiments on both synthetic and real underwater videos, we demonstrate the effectiveness of our approach. This study represents the first comprehensive exploration of UVE to our knowledge. The code is available at https://anonymous.4open.science/r/UVENet.

End-To-End Underwater Video Enhancement: Dataset and Model

TL;DR

This study constructs the Synthetic Underwater Video Enhancement (SUVE) dataset, comprising 840 diverse underwater-style videos paired with ground-truth reference videos, and trains a novel underwater video enhancement model, UVENet, which utilizes inter-frame relationships to achieve better enhancement performance.

Abstract

Underwater video enhancement (UVE) aims to improve the visibility and frame quality of underwater videos, which has significant implications for marine research and exploration. However, existing methods primarily focus on developing image enhancement algorithms to enhance each frame independently. There is a lack of supervised datasets and models specifically tailored for UVE tasks. To fill this gap, we construct the Synthetic Underwater Video Enhancement (SUVE) dataset, comprising 840 diverse underwater-style videos paired with ground-truth reference videos. Based on this dataset, we train a novel underwater video enhancement model, UVENet, which utilizes inter-frame relationships to achieve better enhancement performance. Through extensive experiments on both synthetic and real underwater videos, we demonstrate the effectiveness of our approach. This study represents the first comprehensive exploration of UVE to our knowledge. The code is available at https://anonymous.4open.science/r/UVENet.
Paper Structure (31 sections, 13 figures, 6 tables)

This paper contains 31 sections, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Visual results of different enhancement methods on a real underwater video. Only five frames in each video are shown for ease of display. From top to bottom, the four rows display the frames from the raw underwater video, the frame-by-frame enhancement results of MLLE, the frame-by-frame enhancement results of UShape, and our video enhancement results, respectively.
  • Figure 2: The synthesis process of the SUVE dataset. Given a clean in-air video, the corresponding depth sequence, and a randomly sampled real underwater image, UWNR can synthesize underwater video with a similar style to the sampled underwater image.
  • Figure 3: The overall framework of our UVENet. It mainly consists of an encoder, four feature alignment and aggregation modules dealing with feature maps at different scales, a decoder, and a global restoration module. For ease of display, we only plot the case where there are three input frames, i.e., $T=3$.
  • Figure 4: The illustration of the Feature Alignment and Aggregation Module (FAAM). We only draw the feature maps of three adjacent frames and four spatial shift patterns (up, down, left, right). In practice, we have four additional patterns for diagonal shifts. The void pixels (white parts in the shifted feature maps) caused by spatial shifts are filled with zeros. DSC and CA denote Depth-wise Separable Convolution and Channel Attention, respectively.
  • Figure 5: The illustration of the Global Restoration Module (GRM). Low-quality frames and the preliminary enhanced image are concatenated along the channel dimension and fed into the attention module to generate attention weights. The attention weights correspond to scaling factors for the RGB three channels, which are applied to the preliminary enhanced image to obtain the final enhanced frame image.
  • ...and 8 more figures