Table of Contents
Fetching ...

UnDIVE: Generalized Underwater Video Enhancement Using Generative Priors

Suhas Srinath, Aditya Chandrasekar, Hemang Jamadagni, Rajiv Soundararajan, Prathosh A P

TL;DR

This work proposes a two-stage framework for enhancing underwater videos that enables real-time and computationally-efficient processing of high-resolution underwater videos at lower resolutions, and offers efficient enhancement in the presence of diverse water-types.

Abstract

With the rise of marine exploration, underwater imaging has gained significant attention as a research topic. Underwater video enhancement has become crucial for real-time computer vision tasks in marine exploration. However, most existing methods focus on enhancing individual frames and neglect video temporal dynamics, leading to visually poor enhancements. Furthermore, the lack of ground-truth references limits the use of abundant available underwater video data in many applications. To address these issues, we propose a two-stage framework for enhancing underwater videos. The first stage uses a denoising diffusion probabilistic model to learn a generative prior from unlabeled data, capturing robust and descriptive feature representations. In the second stage, this prior is incorporated into a physics-based image formulation for spatial enhancement, while also enforcing temporal consistency between video frames. Our method enables real-time and computationally-efficient processing of high-resolution underwater videos at lower resolutions, and offers efficient enhancement in the presence of diverse water-types. Extensive experiments on four datasets show that our approach generalizes well and outperforms existing enhancement methods. Our code is available at github.com/suhas-srinath/undive.

UnDIVE: Generalized Underwater Video Enhancement Using Generative Priors

TL;DR

This work proposes a two-stage framework for enhancing underwater videos that enables real-time and computationally-efficient processing of high-resolution underwater videos at lower resolutions, and offers efficient enhancement in the presence of diverse water-types.

Abstract

With the rise of marine exploration, underwater imaging has gained significant attention as a research topic. Underwater video enhancement has become crucial for real-time computer vision tasks in marine exploration. However, most existing methods focus on enhancing individual frames and neglect video temporal dynamics, leading to visually poor enhancements. Furthermore, the lack of ground-truth references limits the use of abundant available underwater video data in many applications. To address these issues, we propose a two-stage framework for enhancing underwater videos. The first stage uses a denoising diffusion probabilistic model to learn a generative prior from unlabeled data, capturing robust and descriptive feature representations. In the second stage, this prior is incorporated into a physics-based image formulation for spatial enhancement, while also enforcing temporal consistency between video frames. Our method enables real-time and computationally-efficient processing of high-resolution underwater videos at lower resolutions, and offers efficient enhancement in the presence of diverse water-types. Extensive experiments on four datasets show that our approach generalizes well and outperforms existing enhancement methods. Our code is available at github.com/suhas-srinath/undive.

Paper Structure

This paper contains 16 sections, 11 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: UnDIVE (bottom row) enhances contiguous video frames (top row) from the UOT32 uot32 dataset (DeepSeaFish video), while maintaining consistent colors and illumination as opposed to image-based methods such as PhISH-Net (middle row).
  • Figure 2: Overall framework of UnDIVE. (a) The first stage learns a generative prior on underwater images, where a denoising DDPM UNet is trained with the loss $\mathcal{L}_{sim}$. (b) The second stage utilizes the trained encoder, and learns the spatial enhancement ($f_{\Theta}$) with loss $\mathcal{L}_s$. First, backscatter is removed, and the image $D_{hr}$ is processed through a guide network to capture low-level local details, while the downsampled (by two) image is passed through $\mathcal{E}$ capturing global (high-level) information. Finally, both streams are fused and upsampled to match the input resolution. (c) A temporal consistency loss $\mathcal{L}_t$ enforces uniform illumination and colors in the enhanced frames.
  • Figure 3: Results of different enhancement methods on frame $54$ of the PhuQuoc1_Jun2022.mp4 video from the MVK mvk dataset. The blue hue in the scene is efficiently reduced by UnDIVE, while also improving the contrast in the enhanced image.
  • Figure 4: Results from the ablation study on the effect of different loss components. The model that uses $\mathcal{L}_s$ and $\mathcal{L}_t$ effectively removes the spurious reddish hue.
  • Figure 5: Effect of the generative prior and the image pre-training on enhancement.
  • ...and 1 more figures