LOBSTgER-enhance: an underwater image enhancement pipeline

Andreas Mentzelopoulos; Keith Ellenbogen

LOBSTgER-enhance: an underwater image enhancement pipeline

Andreas Mentzelopoulos, Keith Ellenbogen

TL;DR

LOBSTgER-enhance tackles underwater image degradation by learning to invert synthetic, physics-inspired corruptions with a conditional latent diffusion model. It operates in a latent space via a pretrained VAE and uses a compact U-Net with about 11 million parameters, trained on roughly 2.5k high-quality underwater images at 512 by 768. The method yields strong generalization to unseen species and conditions, including inpainting and artifact removal, by conditioning on degraded inputs and employing quality-weighted sampling and classifier-free guidance. The approach offers a practical tool for conservation and educational imagery, enabling rapid, faithful enhancement without extensive manual post-processing and with robustness to distribution shifts. These results support diffusion-based restoration as viable for small, curated underwater datasets.

Abstract

Underwater photography presents significant inherent challenges including reduced contrast, spatial blur, and wavelength-dependent color distortions. These effects can obscure the vibrancy of marine life and awareness photographers in particular are often challenged with heavy post-processing pipelines to correct for these distortions. We develop an image-to-image pipeline that learns to reverse underwater degradations by introducing a synthetic corruption pipeline and learning to reverse its effects with diffusion-based generation. Training and evaluation are performed on a small high-quality dataset of awareness photography images by Keith Ellenbogen. The proposed methodology achieves high perceptual consistency and strong generalization in synthesizing 512x768 images using a model of ~11M parameters after training from scratch on ~2.5k images.

LOBSTgER-enhance: an underwater image enhancement pipeline

TL;DR

Abstract

Paper Structure (20 sections, 6 equations, 30 figures, 1 table)

This paper contains 20 sections, 6 equations, 30 figures, 1 table.

Introduction
Related Literature
Methodology & Experiments
Modeling Underwater Image Degradation
Conditional Latent Diffusion
Diffusion model training
Neural Architecture
Training hyperparameters
Training loop
Augmentations
Inference
Results
In-distribution generalization
In-distribution inpainting
Out-of-distribution generalization
...and 5 more sections

Figures (30)

Figure 1: Inference samples for image enhancement and inpainting with LOBSTgER-enhance. Generated samples are shown on the left column while conditions are given in the right column. The model has never seen dolphins or seals during training. Sample dimensions are 512x768.
Figure 2: Artificial corruption process used to generate clean/corrupted pairs for conditional diffusion model supervised training. Left: Clean image by Keith Ellenbogen. Right: Corrupted image using the defined corruption pipeline: colors are distorted, bubbles are scattered throughout the image, hazing effect and gaussian blur are applied.
Figure 3: Illustration of forward and reverse diffusion process.
Figure 4: U-Net architecture adapted from karras2024analyzing.
Figure 5: Normalized learning rate schedule.
...and 25 more figures

LOBSTgER-enhance: an underwater image enhancement pipeline

TL;DR

Abstract

LOBSTgER-enhance: an underwater image enhancement pipeline

Authors

TL;DR

Abstract

Table of Contents

Figures (30)