Image-Conditional Diffusion Transformer for Underwater Image Enhancement
Xingyang Nie, Su Pan, Xiaoyu Zhai, Shifei Tao, Fengzhong Qu, Biao Wang, Huilin Ge, Guojie Xiao
TL;DR
This paper tackles underwater image enhancement by leveraging a latent diffusion model conditioned on the degraded input. It introduces the Image-Conditional Diffusion Transformer (ICDT), which replaces the conventional U‑Net with a transformer backbone in a latent space diffusion framework and trains with a hybrid loss including learnt variances to enable faster sampling. Experiments on Underwater ImageNet show that larger ICDT models, particularly ICDT‑XL/2, achieve state‑of‑the‑art performance across full‑reference metrics (PSNR, SSIM, LPIPS) and non‑reference UIQM, verifying both quality and efficiency gains. The work demonstrates ICDT’s scalability and positions it as a potentially universal approach for image‑to‑image generation tasks beyond UIE.
Abstract
Underwater image enhancement (UIE) has attracted much attention owing to its importance for underwater operation and marine engineering. Motivated by the recent advance in generative models, we propose a novel UIE method based on image-conditional diffusion transformer (ICDT). Our method takes the degraded underwater image as the conditional input and converts it into latent space where ICDT is applied. ICDT replaces the conventional U-Net backbone in a denoising diffusion probabilistic model (DDPM) with a transformer, and thus inherits favorable properties such as scalability from transformers. Furthermore, we train ICDT with a hybrid loss function involving variances to achieve better log-likelihoods, which meanwhile significantly accelerates the sampling process. We experimentally assess the scalability of ICDTs and compare with prior works in UIE on the Underwater ImageNet dataset. Besides good scaling properties, our largest model, ICDT-XL/2, outperforms all comparison methods, achieving state-of-the-art (SOTA) quality of image enhancement.
