Table of Contents
Fetching ...

LOBSTgER-enhance: an underwater image enhancement pipeline

Andreas Mentzelopoulos, Keith Ellenbogen

TL;DR

LOBSTgER-enhance tackles underwater image degradation by learning to invert synthetic, physics-inspired corruptions with a conditional latent diffusion model. It operates in a latent space via a pretrained VAE and uses a compact U-Net with about 11 million parameters, trained on roughly 2.5k high-quality underwater images at 512 by 768. The method yields strong generalization to unseen species and conditions, including inpainting and artifact removal, by conditioning on degraded inputs and employing quality-weighted sampling and classifier-free guidance. The approach offers a practical tool for conservation and educational imagery, enabling rapid, faithful enhancement without extensive manual post-processing and with robustness to distribution shifts. These results support diffusion-based restoration as viable for small, curated underwater datasets.

Abstract

Underwater photography presents significant inherent challenges including reduced contrast, spatial blur, and wavelength-dependent color distortions. These effects can obscure the vibrancy of marine life and awareness photographers in particular are often challenged with heavy post-processing pipelines to correct for these distortions. We develop an image-to-image pipeline that learns to reverse underwater degradations by introducing a synthetic corruption pipeline and learning to reverse its effects with diffusion-based generation. Training and evaluation are performed on a small high-quality dataset of awareness photography images by Keith Ellenbogen. The proposed methodology achieves high perceptual consistency and strong generalization in synthesizing 512x768 images using a model of ~11M parameters after training from scratch on ~2.5k images.

LOBSTgER-enhance: an underwater image enhancement pipeline

TL;DR

LOBSTgER-enhance tackles underwater image degradation by learning to invert synthetic, physics-inspired corruptions with a conditional latent diffusion model. It operates in a latent space via a pretrained VAE and uses a compact U-Net with about 11 million parameters, trained on roughly 2.5k high-quality underwater images at 512 by 768. The method yields strong generalization to unseen species and conditions, including inpainting and artifact removal, by conditioning on degraded inputs and employing quality-weighted sampling and classifier-free guidance. The approach offers a practical tool for conservation and educational imagery, enabling rapid, faithful enhancement without extensive manual post-processing and with robustness to distribution shifts. These results support diffusion-based restoration as viable for small, curated underwater datasets.

Abstract

Underwater photography presents significant inherent challenges including reduced contrast, spatial blur, and wavelength-dependent color distortions. These effects can obscure the vibrancy of marine life and awareness photographers in particular are often challenged with heavy post-processing pipelines to correct for these distortions. We develop an image-to-image pipeline that learns to reverse underwater degradations by introducing a synthetic corruption pipeline and learning to reverse its effects with diffusion-based generation. Training and evaluation are performed on a small high-quality dataset of awareness photography images by Keith Ellenbogen. The proposed methodology achieves high perceptual consistency and strong generalization in synthesizing 512x768 images using a model of ~11M parameters after training from scratch on ~2.5k images.
Paper Structure (20 sections, 6 equations, 30 figures, 1 table)

This paper contains 20 sections, 6 equations, 30 figures, 1 table.

Figures (30)

  • Figure 1: Inference samples for image enhancement and inpainting with LOBSTgER-enhance. Generated samples are shown on the left column while conditions are given in the right column. The model has never seen dolphins or seals during training. Sample dimensions are 512x768.
  • Figure 2: Artificial corruption process used to generate clean/corrupted pairs for conditional diffusion model supervised training. Left: Clean image by Keith Ellenbogen. Right: Corrupted image using the defined corruption pipeline: colors are distorted, bubbles are scattered throughout the image, hazing effect and gaussian blur are applied.
  • Figure 3: Illustration of forward and reverse diffusion process.
  • Figure 4: U-Net architecture adapted from karras2024analyzing.
  • Figure 5: Normalized learning rate schedule.
  • ...and 25 more figures