Table of Contents
Fetching ...

Towards Generating Realistic Underwater Images

Abdul-Kazeem Shamba

TL;DR

This work tackles generating realistic underwater imagery from synthetic scenes with uniform lighting by comparing paired and unpaired image translation methods, including the integration of depth information into a contrastive learning framework. On the VAROS dataset, paired translation favors pix2pix for sharp, high-frequency detail, while autoencoders preserve structural similarity but produce blurrier outputs. Among unpaired methods, CycleGAN achieves strong FID performance, CUT improves structural fidelity via patchwise contrastive losses, and incorporating depth into CUT yields the lowest FID, albeit with a slight SSIM drop. The results illuminate practical trade-offs between perceptual realism and content preservation, with depth cues and contrastive objectives offering promising gains for realistic underwater data generation in marine robotics applications.

Abstract

This paper explores the use of contrastive learning and generative adversarial networks for generating realistic underwater images from synthetic images with uniform lighting. We investigate the performance of image translation models for generating realistic underwater images using the VAROS dataset. Two key evaluation metrics, Fréchet Inception Distance (FID) and Structural Similarity Index Measure (SSIM), provide insights into the trade-offs between perceptual quality and structural preservation. For paired image translation, pix2pix achieves the best FID scores due to its paired supervision and PatchGAN discriminator, while the autoencoder model attains the highest SSIM, suggesting better structural fidelity despite producing blurrier outputs. Among unpaired methods, CycleGAN achieves a competitive FID score by leveraging cycle-consistency loss, whereas CUT, which replaces cycle-consistency with contrastive learning, attains higher SSIM, indicating improved spatial similarity retention. Notably, incorporating depth information into CUT results in the lowest overall FID score, demonstrating that depth cues enhance realism. However, the slight decrease in SSIM suggests that depth-aware learning may introduce structural variations.

Towards Generating Realistic Underwater Images

TL;DR

This work tackles generating realistic underwater imagery from synthetic scenes with uniform lighting by comparing paired and unpaired image translation methods, including the integration of depth information into a contrastive learning framework. On the VAROS dataset, paired translation favors pix2pix for sharp, high-frequency detail, while autoencoders preserve structural similarity but produce blurrier outputs. Among unpaired methods, CycleGAN achieves strong FID performance, CUT improves structural fidelity via patchwise contrastive losses, and incorporating depth into CUT yields the lowest FID, albeit with a slight SSIM drop. The results illuminate practical trade-offs between perceptual realism and content preservation, with depth cues and contrastive objectives offering promising gains for realistic underwater data generation in marine robotics applications.

Abstract

This paper explores the use of contrastive learning and generative adversarial networks for generating realistic underwater images from synthetic images with uniform lighting. We investigate the performance of image translation models for generating realistic underwater images using the VAROS dataset. Two key evaluation metrics, Fréchet Inception Distance (FID) and Structural Similarity Index Measure (SSIM), provide insights into the trade-offs between perceptual quality and structural preservation. For paired image translation, pix2pix achieves the best FID scores due to its paired supervision and PatchGAN discriminator, while the autoencoder model attains the highest SSIM, suggesting better structural fidelity despite producing blurrier outputs. Among unpaired methods, CycleGAN achieves a competitive FID score by leveraging cycle-consistency loss, whereas CUT, which replaces cycle-consistency with contrastive learning, attains higher SSIM, indicating improved spatial similarity retention. Notably, incorporating depth information into CUT results in the lowest overall FID score, demonstrating that depth cues enhance realism. However, the slight decrease in SSIM suggests that depth-aware learning may introduce structural variations.

Paper Structure

This paper contains 16 sections, 6 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Realistic underwater image generation. Given a synthetic image $X$ with uniform lighting and real underwater images $Y$, the unpaired image translation method learns to transform the synthetic input into a realistic underwater image.
  • Figure 2: A realistic underwater image generation learning procedure using contrastive objective and depth (CUT + depth). The refiner is the model generator conditioned on real underwater images. The contrastive loss preserves the content of the synthetic uniform lighting input park2020contrastive.
  • Figure 3: Patchwise contrastive learning for generating realistic underwater images. Generated output patches $z$ and the corresponding input patch $z^+$ should occupy similar embedding spaces, and patches from random locations $z^-$ are pushed apart park2020contrastive.
  • Figure 4: Paired training data (left) consists of training examples $\{x_i, y_i\}$, where the correspondence between $x_i$ and $y_i$ exists from folders B and A in the VAROS synthetic underwater dataset respectively. We consider unpaired training data (right), consisting of a source set $\{x_i\}$ ($x_i \in X$) and a target set $\{y_j\}$ ($y_j \in Y$) sampled from synthetic underwater images, with no information provided as to which $x_i$ matches which $y_j$. Unpaired training example by using the synthetic input $x_i$ for $i=1$ to $K$ from VAROS and a target set $y_j$ for $j=K-N$ to ensure no correspondence.
  • Figure 5: Paired training data (left) consists of training examples $\{x_i, y_i\}$, where the correspondence between $x_i$ and $y_i$ exists from folders B and A in the VAROS synthetic underwater dataset respectively, for training the autoencoder and pix2pix model.
  • ...and 5 more figures