LOBSTgER-enhance: an underwater image enhancement pipeline
Andreas Mentzelopoulos, Keith Ellenbogen
TL;DR
LOBSTgER-enhance tackles underwater image degradation by learning to invert synthetic, physics-inspired corruptions with a conditional latent diffusion model. It operates in a latent space via a pretrained VAE and uses a compact U-Net with about 11 million parameters, trained on roughly 2.5k high-quality underwater images at 512 by 768. The method yields strong generalization to unseen species and conditions, including inpainting and artifact removal, by conditioning on degraded inputs and employing quality-weighted sampling and classifier-free guidance. The approach offers a practical tool for conservation and educational imagery, enabling rapid, faithful enhancement without extensive manual post-processing and with robustness to distribution shifts. These results support diffusion-based restoration as viable for small, curated underwater datasets.
Abstract
Underwater photography presents significant inherent challenges including reduced contrast, spatial blur, and wavelength-dependent color distortions. These effects can obscure the vibrancy of marine life and awareness photographers in particular are often challenged with heavy post-processing pipelines to correct for these distortions. We develop an image-to-image pipeline that learns to reverse underwater degradations by introducing a synthetic corruption pipeline and learning to reverse its effects with diffusion-based generation. Training and evaluation are performed on a small high-quality dataset of awareness photography images by Keith Ellenbogen. The proposed methodology achieves high perceptual consistency and strong generalization in synthesizing 512x768 images using a model of ~11M parameters after training from scratch on ~2.5k images.
